Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
639 views
in Technique[技术] by (71.8m points)

r - data table string concatenation of SD columns for by group values

I have a big data set with many variables that looks similar to this :

 > data.table(a=letters[1:10],b=LETTERS[1:10],ID=c(1,1,1,2,2,2,2,3,3,3))
     a b ID
  1: a A  1
  2: b B  1
  3: c C  1
  4: d D  2
  5: e E  2
  6: f F  2
  7: g G  2
  8: h H  3
  9: i I  3
 10: j J  3

I want to concatenate(with new line character between them) all column values except ID for each value of ID, so the result should look like this :

     a b ID
  1: a A  1
     b B   
     c C   
  2: d D  2
     e E   
     f F   
     g G   
  3: h H  3
     i I   
     j J   

I found a link R Dataframe: aggregating strings within column, across rows, by group which talks about how to do it for one column, how to extend this for all columns in .SD ?

To make it clear I changed the separator from to , and the result should look like :

   a       b       ID
1: a,b,c   A,B,C   1
2: d,e,f,g D,E,F,G 2
3: h,i,j   H,I,J   3
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You can concatenate all columns in using lapply.

dt[, lapply(.SD, paste0, collapse=" "), by = ID]
##    ID       a       b
## 1:  1   a b c   A B C
## 2:  2 d e f g D E F G
## 3:  3   h i j   H I J

Using newline characters as a ollapse argument instead of " " does work, but does not print as you seem to expect in your desired output.

dt[, lapply(.SD, paste0, collapse="
"), by = ID]
##    ID          a          b
## 1:  1    a
b
c    A
B
C
## 2:  2 d
e
f
g D
E
F
G
## 3:  3    h
i
j    H
I
J

As pointed out in the comments by @Frank, the question has been changed to have , as a seperator instead of . Of course you can just change the collapse argument to ",". If you want to have a space as well ", ", then the solution by @DavidArenburg is preferable.

dt[, lapply(.SD, paste0, collapse=","), by = ID]
dt[, lapply(.SD, toString), by = ID]

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...