Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
423 views
in Technique[技术] by (71.8m points)

Calculating correlation between columns of R data frame

I have a large data table containing 2 sets of 4 paired observations, the first few lines of which are as below:

   a1  a2  a3  a4  b1  b2  b3  b4
1 480 770 601 953 469 750 588 944
2   0   0   0   0   0   0   0   0
3   3  13   9  12   3  12   9  12
4   0   2   4   3   0  14   3   2
5   0   0  11   0   0   0  11   0
6 165 292 162 313 180 368 116 368

These are gene-expression counts from two different RNA-seq analysis pipelines 'a' and 'b': columns a1 and b1 are the results of analyzing the same sample (1) by the two different pipelines, same with a2 and b2, etc. Each row (1-6) is a different gene. I want to find if there are specific genes that show particularly poor pairwise correlation, i.e. overall correlation between column 1 & 5, 2 & 6, 3 & 7, 4 & 8. I can do this manually using the cor.test function, e.g. for the data in the first row:

cor.test(c(480,770,601,953), c(469,750,588,944))$estimate
      cor 
0.9997302

But for the life of me, I can't figure out how to do this in an automated fashion across the data table (i.e. returning a vector of correlation coefficients, one per row). I could probably do some sort of for loop, but that seems like an ugly solution and not the "R way."

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You could use apply to return a row-wise correlation. Set the MARGIN to 1 to apply your function to each row. Then you can use lapply to print out only the cor estimates of the list.

Here the code for you example:

l <- apply(X = df, MARGIN = 1, FUN = function(x) cor.test(x[1:4], x[5:8]))
lapply(X = l, FUN = function(x) x$estimate)

To do a correlation between columns you set the MARGIN to 2 and change your subsets to the columns you like to compare.

l <- apply(X = df, MARGIN = 2, FUN = function(x) cor.test(x[2], x[6]))
    lapply(X = l, FUN = function(x) x$estimate)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...