Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
420 views
in Technique[技术] by (71.8m points)

r - Reshape long structured data.table into a wide structure using data.table functionality?

> library(data.table)
> A <- data.table(x = c(1,1,2,2), y = c(1,2,1,2), v = c(0.1,0.2,0.3,0.4))
> A
   x y   v
1: 1 1 0.1
2: 1 2 0.2
3: 2 1 0.3
4: 2 2 0.4
> B <- dcast(A, x~y)
Using v as value column: use value.var to override.
> B
  x   1   2
1 1 0.1 0.2
2 2 0.3 0.4

Apparently I can reshape a data.table from long to wide using f.x. dcast of package reshape2. But data.table comes along with an overloaded bracket-operator offering parameters like 'by' and 'group', which make me wonder if it is possible to achieve it using this (to data.table specific functionality)?

Just one random example from the manual:

DT[,lapply(.SD,sum),by=x]

That looks awesome - but I don't fully understand the usage yet.

I neither found a way nor an example for this so maybe it is just not possible maybe it isn't even supposed to be - so, a definite "no, is not possible because ..." is then of course also a valid answer.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I'll pick an example with unequal groups so that it's easier to illustrate for the general case:

A <- data.table(x=c(1,1,1,2,2), y=c(1,2,3,1,2), v=(1:5)/5)
> A
   x y   v
1: 1 1 0.2
2: 1 2 0.4
3: 1 3 0.6
4: 2 1 0.8
5: 2 2 1.0

The first step is to get the number of elements/entries for each group of "x" to be the same. Here, for x=1 there are 3 values of y, but only 2 for x=2. So, we'll have to fix that first with NA for x=2, y=3.

setkey(A, x, y)
A[CJ(unique(x), unique(y))]

Now, to get it to wide format, we should group by "x" and use as.list on v as follows:

out <- A[CJ(unique(x), unique(y))][, as.list(v), by=x]
   x  V1  V2  V3
1: 1 0.2 0.4 0.6
2: 2 0.8 1.0  NA

Now, you can set the names of the reshaped columns using reference with setnames as follows:

setnames(out, c("x", as.character(unique(A$y)))

   x   1   2   3
1: 1 0.2 0.4 0.6
2: 2 0.8 1.0  NA

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

57.0k users

...