The OP hasn't mentioned what form they want in their output, but I'm entirely updating this answer with a possible solution.
First, some reproducible sample data to work with (that will actually work with t.test
).
set.seed(1)
mymat <- matrix(sample(100, 40, replace = TRUE),
ncol = 8, dimnames = list(
paste("gene", 1:5, sep = ""),
c("A", "A", "A", "B", "B", "B", "C", "C")))
mymat
# A A A B B B C C
# gene1 27 90 21 50 94 39 49 67
# gene2 38 95 18 72 22 2 60 80
# gene3 58 67 69 100 66 39 50 11
# gene4 91 63 39 39 13 87 19 73
# gene5 21 7 77 78 27 35 83 42
I've left all the hard work to the combn
function. Within the combn
function, I've made use of the FUN
argument to add a function that creates a vector of the t.test
"statistic" by each row (I'm assuming one gene per row). I've also added an attribute
to the resulting vector to remind us which columns were used in calculating the statistic.
temp <- combn(unique(colnames(mymat)), 2, FUN = function(x) {
out <- vector(length = nrow(mymat))
for (i in sequence(nrow(mymat))) {
out[i] <- t.test(mymat[i, colnames(mymat) %in% x[1]],
mymat[i, colnames(mymat) %in% x[2]])$statistic
}
attr(out, "NAME") <- paste(x, collapse = "")
out
}, simplify = FALSE)
The output of the above is a list
of vectors
. It might be more convenient to convert this into a matrix
. Since we know that each value in a vector represents one row, and each vector overall represents one column value combination (AB, AC, or BC), we can use that for the dimnames
of the resulting matrix
.
DimNames <- list(rownames(mymat), sapply(temp, attr, "NAME"))
final <- do.call(cbind, temp)
dimnames(final) <- DimNames
final
# AB AC BC
# gene1 -0.5407966 -0.5035088 0.157386919
# gene2 0.5900350 -0.7822292 -1.645448267
# gene3 -0.2040539 1.7263502 1.438525163
# gene4 0.6825062 0.5933218 0.009627409
# gene5 -0.4384258 -0.9283003 -0.611226402
Some manual verification:
## Should be the same as final[1, "AC"]
t.test(mymat[1, colnames(mymat) %in% "A"],
mymat[1, colnames(mymat) %in% "C"])$statistic
# t
# -0.5035088
## Should be the same as final[5, "BC"]
t.test(mymat[5, colnames(mymat) %in% "B"],
mymat[5, colnames(mymat) %in% "C"])$statistic
# t
# -0.6112264
## Should be the same as final[3, "AB"]
t.test(mymat[3, colnames(mymat) %in% "A"],
mymat[3, colnames(mymat) %in% "B"])$statistic
# t
# -0.2040539
Update
Building on @EDi's answer, here's another approach. It makes use of melt
from "reshape2" to convert the data into a "long" format. From there, as before, it's pretty straightforward subsetting work to get what you want. The output there is transposed in relation to the approach taken with the pure combn
approach, but the values are the same.
library(reshape2)
mymatL <- melt(mymat)
byGene <- split(mymatL, mymatL$Var1)
RowNames <- combn(unique(as.character(mymatL$Var2)), 2,
FUN = paste, collapse = "")
out <- sapply(byGene, function(combos) {
combn(unique(as.character(mymatL$Var2)), 2, FUN = function(x) {
t.test(value ~ Var2, combos[combos[, "Var2"] %in% x, ])$statistic
}, simplify = TRUE)
})
rownames(out) <- RowNames
out
# gene1 gene2 gene3 gene4 gene5
# AB -0.5407966 0.5900350 -0.2040539 0.682506188 -0.4384258
# AC -0.5035088 -0.7822292 1.7263502 0.593321770 -0.9283003
# BC 0.1573869 -1.6454483 1.4385252 0.009627409 -0.6112264
The first option is considerably faster, at least on this smaller dataset:
microbenchmark(fun1(), fun2())
# Unit: milliseconds
# expr min lq median uq max neval
# fun1() 8.812391 9.012188 9.116896 9.20795 17.55585 100
# fun2() 42.754296 43.388652 44.263760 45.47216 67.10531 100