Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
368 views
in Technique[技术] by (71.8m points)

r - How to assign your color scale on raw data in heatmap.2()

I have data that looks like this:

                         Name    h1    h2    h3    h4    h5
1            1420468_at_Asb17 0.000 2.328 0.000 0.000 0.000
2    1430261_at_1700024J04Rik 1.236 2.050 0.000 0.000 0.000
3           1431788_at_Fabp12 0.000 2.150 0.000 0.000 0.587
4    1433187_at_B230112I24Rik 0.000 2.240 1.343 0.000 1.383
5        1434430_s_at_Adora2b 0.000 2.006 1.459 0.000 1.272
6           1435217_at_Gm7969 0.727 2.350 1.494 0.976 0.000
7          1436717_x_at_Hbb-y 0.000 2.712 0.000 0.000 0.000
8            1440859_at_Akap6 0.000 2.053 0.000 0.000 1.840
9              1442625_at_--- 0.000 2.064 1.173 0.000 1.035
10           1443715_at_Rbm24 0.969 2.219 0.000 0.000 0.000
11             1445520_at_--- 0.000 2.497 0.000 0.000 0.000
12          1446035_at_Gm7173 0.000 3.869 0.000 0.000 0.000
13   1446597_at_6820445E23Rik 1.000 2.000 0.000 0.000 0.000
14          1448925_at_Twist2 0.000 2.089 0.938 0.000 0.000
15        1449711_at_Atp6v1e1 0.605 2.363 2.350 1.094 0.976
16          1455931_at_Chrna3 0.000 2.354 0.000 0.000 0.000
17 1457647_x_at_1600023N17Rik 0.000 2.734 0.000 0.000 1.812
18             1458975_at_--- 0.000 2.079 0.000 0.000 0.000
19             1459862_at_--- 0.727 2.606 0.000 0.000 1.151

Note in this data (and the actual one) there is no negative values and the positive values can be as large as 100 or so.

What I want to do is to plot heat map with my own assigned color scale and scheme:

  1. When the value is 0 set it into white.
  2. When the value is == 1 set it into black.
  3. When the value is > 1 set it into shade of reds.
  4. When the value is < 1 and > 0 set it into shade of greens.

also without using any data scaling or built-in z-score transformation. How can I achieve that?

My current code is this:

library(gplots)

# Read data
dat <- read.table("http://dpaste.com/1501148/plain/",sep="",header=T);
rownames(dat) <- dat$Name
dat <- dat[,!names(dat) %in% c("Name")]

# Clustering and distance measure functions
hclustfunc <- function(x) hclust(x, method="complete")
distfunc <- function(x) dist(x,method="maximum")

#  Define colours
hmcols <- rev(redgreen(2750));

# Plot 
pdf("~/Desktop/tmp.pdf",height=10)
heatmap.2(as.matrix(dat),Colv=FALSE,dendrogram="row",scale="row",col=hmcols,trace="none", margin=c(5,10), hclust=hclustfunc,distfun=distfunc,lwid=c(1.5,2.0),keysize=1);
dev.off()

Which produces the following plot, where it uses the default z-score row scaling.

enter image description here

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The key here is understanding that heatmap.2 uses the col argument in combination with the breaks argument.

Take a look at the code and figure below to see what I mean.

library(gplots)
set.seed(100)
dat = matrix( rexp(25,1/2), ncol=5 )
breaks = 0:5
col = c("green","blue","red","yellow","brown")
heatmap.2( dat, breaks=breaks, col=col )

enter image description here

As you can see, there must be n-1 colors for n breaks. For your particular question, the problem is to map the correct colors to the breaks. I'm using the scale="none" option as @josilber pointed out.

breaks = seq(0,max(dat),length.out=1000)
gradient1 = colorpanel( sum( breaks[-1]<=1 ), "white", "green", "black" )
gradient2 = colorpanel( sum( breaks[-1]>1 ), "black", "red" )
hm.colors = c(gradient1,gradient2)

heatmap.2(as.matrix(dat),scale="none",breaks=breaks,col=hm.colors,
          Colv=FALSE,dendrogram="row",trace="none", 
          margin=c(5,10), hclust=hclustfunc,distfun=distfunc,lwid=c(1.5,2.0))

enter image description here

Another alternative would be to have two gradients: green->black and black->red. Then, you could manually set the zero values to white by making them NA and setting na.color="white".

breaks = seq(0,max(dat),length.out=1000)
gradient1 = colorpanel( sum( breaks[-1]<=1 ), "green", "black" )
gradient2 = colorpanel( sum( breaks[-1]>1 ), "black", "red" )
hm.colors = c(gradient1,gradient2)

dat[dat==0] = NA
heatmap.2(as.matrix(dat),scale="none",breaks=breaks,col=hm.colors,na.color="white",
          Colv=FALSE,dendrogram="row",trace="none", 
          margin=c(5,10), hclust=hclustfunc,distfun=distfunc,lwid=c(1.5,2.0))

And finally, you could just manually edit the gradient for the zero values.

breaks = seq(0,max(dat),length.out=1000)
gradient1 = colorpanel( sum( breaks[-1]<=1 ), "green", "black" )
gradient2 = colorpanel( sum( breaks[-1]>1 ), "black", "red" )
hm.colors = c(gradient1,gradient2)
hm.colors[1] = col2hex("white")

heatmap.2(as.matrix(dat),scale="none",breaks=breaks,col=hm.colors,na.color="white",
          Colv=FALSE,dendrogram="row",trace="none", 
          margin=c(5,10), hclust=hclustfunc,distfun=distfunc,lwid=c(1.5,2.0))

enter image description here

Log fold changes

On another note, it appears that you might be looking at fold changes or some type of ratio. It is fairly common to plot the log fold changes when making a heat map. I "greyed" out the zero values.

dat[dat==0] = NA
heatmap.2( as.matrix(log2(dat)), col=greenred(100), 
           scale="none", na.color="grey",symbreaks=TRUE,
           Colv=FALSE,dendrogram="row",trace="none", 
           margin=c(5,10), hclust=hclustfunc,distfun=distfunc,lwid=c(1.5,2.0))

enter image description here

For an explanation of @josilber's nice solution:

This code hmcols <- c(colfunc1(200), colfunc2(200*(max(dat) - 1))) makes a character vector of length 774 (seen by length(hmcols)). Thus, this means that there should be 775 breaks defined. The heatmap.2 function by default makes n+1 breaks where n is the length of the vector used in the col argument. So the number of breaks and colors is worked out, but how does hmcols <- c(colfunc1(200), colfunc2(200*(max(dat) - 1))) map the colors to the breaks correctly? The trick is in clever way that the hmcols vector was created. The number of colors in the first gradient is 200. Since breaks was not explicitly defined, we know that the breaks will be evenly spaced. Since the first gradient goes from 0 to 1 and there are 200 breaks, the width of each break should be 0.005 (or 1/200). Since the second gradient goes from 1 to 3.869 (max(dat)), there should be 2.869/0.005=573.8 breaks (574 breaks when rounding up). Note that the 200*(max(dat) - 1)) does this calculation; it outputs 573.8. Thus, there are then 200+574 colors mapped to the correct breaks and everything works!


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...