Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
944 views
in Technique[技术] by (71.8m points)

r - data.table - group by all except one column

Can I group by all columns except one using data.table? I have a lot of columns, so I'd rather avoid writing out all the colnames.

The reason being I'd like to collapse duplicates in a table, where I know one column has no relevance.

library(data.table)

DT <- structure(list(N = c(1, 2, 2), val = c(50, 60, 60), collapse = c("A", 
"B", "C")), .Names = c("N", "val", "collapse"), row.names = c(NA, 
-3L), class = c("data.table", "data.frame"))

> DT
   N val collapse
1: 1  50        A
2: 2  60        B
3: 2  60        C

That is, given DT, is there something like like DT[, print(.SD), by = !collapse] which gives:

> DT[, print(.SD), .(N, val)]
   collapse
1:        A
   collapse
1:        B
2:        C

without actually having to specify .(N, val)? I realise I can do this by copy and pasting the column names, but I thought there might be some elegant way to do this too.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

To group by all columns except one, you can use:

by = setdiff(names(DT), "collapse")

Explanation: setdiff takes the general form of setdiff(x, y) which returns all values of x that are not in y. In this case it means that all columnnames are returned except the collapse-column.


Two alternatives:

# with '%in%'
names(dt1)[!names(dt1) %in% 'colB']

# with 'is.element'
names(dt1)[!is.element(names(dt1), 'colB')]

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...