Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
361 views
in Technique[技术] by (71.8m points)

r - Plot probability heatmap/hexbin with different sized bins

This is related to another question: Plot weighted frequency matrix.

I have this graphic (produced by the code below in R): multisample

#Set the number of bets and number of trials and % lines
numbet <- 36 
numtri <- 1000 
#Fill a matrix where the rows are the cumulative bets and the columns are the trials
xcum <- matrix(NA, nrow=numbet, ncol=numtri)
for (i in 1:numtri) {
x <- sample(c(0,1), numbet, prob=c(5/6,1/6), replace = TRUE)
xcum[,i] <- cumsum(x)/(1:numbet)
}
#Plot the trials as transparent lines so you can see the build up
matplot(xcum, type="l", xlab="Number of Trials", ylab="Relative Frequency", main="", col=rgb(0.01, 0.01, 0.01, 0.02), las=1)

I very much like the way that this plot is built up and shows the more frequent paths as darker than the rarer paths (but it is not clear enough for a print presentation). What I would like to do is to produce some kind of hexbin or heatmap for the numbers. On thinking about it, it seems that the plot will have to incorporate different sized bins (see my back of the envelope sketch):

binsketch

My question then: If I simulate a million runs using the code above, how can I present it as a heatmap or hexbin, with the different sized bins as shown in the sketch?

To clarify: I do not want to rely on transparency to show the rarity of a trial passing through a part of the plot. Instead I would like to denote rarity with heat and show a common pathway as hot (red) and a rare pathway as cold (blue). Also, I do not think the bins should be the same size because the first trial has only two places where the path can be, but the last has many more. Hence the fact I chose a changing bin scale, based on that fact. Essentially I am counting the number of times a path passes through the cell (2 in col 1, 3 in col 2 etc) and then colouring the cell based on how many times it has been passed through.

UPDATE: I already had a plot similar to @Andrie, but I am not sure it is much clearer than the top plot. It is the discontinuous nature of this graph, that I do not like (and why I want some kind of heatmap). I think that because the first column has only two possible values, that there should not be a huge visual gap between them etc etc. Hence why I envisaged the different sized bins. I still feel that the binning version would show large number of samples better.

plot2

Update: This website outlines a procedure to plot a heatmap:

To create a density (heatmap) plot version of this we have to effectively enumerate the occurrence of these points at each discrete location in the image. This is done by setting a up a grid and counting the number of times a point coordinate "falls" into each of the individual pixel "bins" at every location in that grid.

Perhaps some of the information on that website can be combined with what we have already?

Update: I took some of what Andrie wrote with some of this question, to arrive at this, which is quite close to what I was conceiving: heatmap

numbet <- 20
numtri <- 100
prob=1/6
#Fill a matrix 
xcum <- matrix(NA, nrow=numtri, ncol=numbet+1)
for (i in 1:numtri) {
  x <- sample(c(0,1), numbet, prob=c(prob, 1-prob), replace = TRUE)
  xcum[i, ] <- c(i, cumsum(x)/cumsum(1:numbet))
}
colnames(xcum) <- c("trial", paste("bet", 1:numbet, sep=""))

mxcum <- reshape(data.frame(xcum), varying=1+1:numbet, 
  idvar="trial", v.names="outcome", direction="long", timevar="bet")

 #from the other question
 require(MASS)
dens <- kde2d(mxcum$bet, mxcum$outcome)
filled.contour(dens)

I don't quite understand what's going on, but this seems to be more like what I wanted to produce (obviously without the different sized bins).

Update: This is similar to the other plots here. It is not quite right:

hexbin

plot(hexbin(x=mxcum$bet, y=mxcum$outcome))

Last try. As above: enter image description here

image(mxcum$bet, mxcum$outcome)

This is pretty good. I would just like it to look like my hand-drawn sketch.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Edit

I think the following solution does what you ask for.

(Note that this is slow, especially the reshape step)

numbet <- 32
numtri <- 1e5
prob=5/6
#Fill a matrix 
xcum <- matrix(NA, nrow=numtri, ncol=numbet+1)
for (i in 1:numtri) {
  x <- sample(c(0,1), numbet, prob=c(prob, 1-prob), replace = TRUE)
  xcum[i, ] <- c(i, cumsum(x)/cumsum(1:numbet))
}
colnames(xcum) <- c("trial", paste("bet", 1:numbet, sep=""))

mxcum <- reshape(data.frame(xcum), varying=1+1:numbet, 
  idvar="trial", v.names="outcome", direction="long", timevar="bet")


library(plyr)
mxcum2 <- ddply(mxcum, .(bet, outcome), nrow)
mxcum3 <- ddply(mxcum2, .(bet), summarize, 
                ymin=c(0, head(seq_along(V1)/length(V1), -1)), 
                ymax=seq_along(V1)/length(V1),
                fill=(V1/sum(V1)))
head(mxcum3)

library(ggplot2)

p <- ggplot(mxcum3, aes(xmin=bet-0.5, xmax=bet+0.5, ymin=ymin, ymax=ymax)) + 
    geom_rect(aes(fill=fill), colour="grey80") + 
    scale_fill_gradient("Outcome", formatter="percent", low="red", high="blue") +
    scale_y_continuous(formatter="percent") +
    xlab("Bet")

print(p)

enter image description here


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...