Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
176 views
in Technique[技术] by (71.8m points)

dataframe - R: Combine multiple columns as pairs of column cells in same row

I'd like to combine/pair multiple columns in a data frame as pairs of column cells in the same row. As an example, df1 should be transformed to df2.

df1

col1 col2 col3
1    2    3   
0    0    1

df2

c1  c2
1    2
1    3
2    3
0    0
0    1
0    1

The solution should be scalable for df1s with (way) more than three columns.

I thought about melt/reshape/dcast but found no solution yet. There are no NAs in the data frame. Thank you!

EDIT: Reshape just produced errors, so I thought about

combn(df1[1,], 2) comb2 <- t(comb1)

and looping and appending through all rows. This inefficient, considering 2 million rows..

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Here's the approach I would take.

Create a function that uses rbindlist from "data.table" and combn from base R. The function looks like this:

lengthener <- function(indf) {
  temp <- rbindlist(
    combn(names(indf), 2, FUN = function(x) indf[x], simplify = FALSE),
    use.names = FALSE, idcol = TRUE)
  setorder(temp[, .id := sequence(.N), by = .id], .id)[, .id := NULL][]
}

Here's the sample data from the other answer, and the application of the function on it:

df1 = as.data.frame(matrix(c(1,2,3,4,0,0,1,1), byrow = TRUE, nrow = 2))

lengthener(df1)
#     V1 V2
#  1:  1  2
#  2:  1  3
#  3:  1  4
#  4:  2  3
#  5:  2  4
#  6:  3  4
#  7:  0  0
#  8:  0  1
#  9:  0  1
# 10:  0  1
# 11:  0  1
# 12:  1  1

Test it out on some larger data too:

set.seed(1)
M <- as.data.frame(matrix(sample(100, 100*100, TRUE), 100))
system.time(out <- lengthener(M))
#    user  system elapsed 
#    0.19    0.00    0.19 
out
#         V1 V2
#      1: 27 66
#      2: 27 27
#      3: 27 68
#      4: 27 66
#      5: 27 56
#     ---      
# 494996: 33 13
# 494997: 33 66
# 494998: 80 13
# 494999: 80 66
# 495000: 13 66

System time for the other approach:

funAMK <- function(indf) {
  nrow_combn = nrow(t(combn(indf[1,], m = 2)))
  nrow_df = nrow(indf) * nrow_combn
  df2 = data.frame(V1 = rep(0, nrow_df), V2 = rep(0, nrow_df))
  for(i in 1:nrow(indf)){
    df2[(((i-1)*nrow_combn)+1):(i*(nrow_combn)), ] = data.frame(t(combn(indf[i,], m = 2)))
  }
  df2
}

> system.time(funAMK(M))
   user  system elapsed 
  16.03    0.16   16.37 

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...