Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
424 views
in Technique[技术] by (71.8m points)

plyr - R ddply with multiple variables

Here is a simple data frame for my real data set:

df <- data.frame(ID=rep(101:102,each=9),phase=rep(1:3,6),variable=rep(LETTERS[1:3],each=3,times=2),mm1=c(1:18),mm2=c(19:36),mm3=c(37:54))

I would like to first group by ID and variable, then for values(mm1, mm2, mm3), phase 3 is subtracted from all phases(phase1 to phase3), which would make mm(1-3) in phase 1 all -2, in phase 2 all -1, and phase 3 all 0.

R throws an error of "Error in Ops.data.frame(x, x[3, ]) : - only defined for equally-sized data frames" as I tried:

df1 <- ddply(df, .(ID, variable), function(x) (x - x[3,]))   

Any advice would be greatly appreciated. The output should be look like this:

ID phase variable mm1 mm2 mm3
101  1      A     -2  -2  -2
101  2      A     -1  -1  -1
101  3      A      0   0   0
101  1      B     -2  -2  -2
101  2      B     -1  -1  -1
101  3      B      0   0   0
101  1      C     -2  -2  -2
101  2      C     -1  -1  -1
101  3      C      0   0   0
102  1      A     -2  -2  -2
102  2      A     -1  -1  -1
102  3      A      0   0   0
102  1      B     -2  -2  -2
102  2      B     -1  -1  -1
102  3      B      0   0   0
102  1      C     -2  -2  -2
102  2      C     -1  -1  -1
102  3      C      0   0   0
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Okay, took me a little bit to figure out what you want, but here is a solution:

cols.to.sub <- paste0("mm", 1:3)
df1 <- ddply(
  df, .(ID, variable), 
  function(x) {
    x[cols.to.sub] <- t(t(as.matrix(x[cols.to.sub])) - unlist(x[x$phase == 3, cols.to.sub]))
    x
} ) 

This produces (first 6 rows):

    ID phase variable mm1 mm2 mm3
1  101     1        A  -2  -2  -2
2  101     2        A  -1  -1  -1
3  101     3        A   0   0   0
4  101     1        B  -2  -2  -2
5  101     2        B  -1  -1  -1
6  101     3        B   0   0   0

Generally speaking the best way to debug this type of issue is to put a browser() statement inside the function you are passing to ddply, so you can examine the objects at your leisure. Doing so would have revealed that:

  1. The data frame passed to your function includes the ID columns, as well as the phase columns, so your mm columns are not the first three (hence the need to define cols.to.sub)
  2. Even if you address that, you can't operate on data frames that have unequal dimensions, so what I do here is convert to matrix, and then take advantage of vector recycling to subtract the one row from the rest of the matrix. I need to t (transpose) because vector recycling is column-wise.

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...