Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
451 views
in Technique[技术] by (71.8m points)

reshape - Convert R dataframe from long to wide format, but with unequal group sizes, for use with qcc

I would like to convert a dataframe from long format to a wide format, but with unequal group sizes.

The eventual use will be in 'qcc', which requires a data frame or a matrix with each row consisting of one group, using NA's in groups which have fewer samples.

The following code will create an example dataset, as well as show manual conversion to the desired format.

# This is an example of the initial data that I have
# * 10 sample measurements, over 3 groups with 3, 2, and 5 elements respectively
x <- rnorm(10)
x_df <- data.frame( time = c( rep('2001 Q1',3), rep('2001 Q2',2), rep('2001 Q3',5) ), measure = x )
x_df

# This is a manual conversion into the desired format
x_pad <- c( x[1:3], NA, NA, x[4:5], NA, NA, NA, x[6:10] )
x_matrix <- matrix( x_pad, nrow = 3, ncol = 5, byrow = TRUE, dimnames = list(c('2001 Q1','2001 Q2','2001 Q3')) )
x_matrix # desired format

# An example of how it will be used
library(qcc)
plot(qcc(x_matrix, type = 'xbar', plot = FALSE))

So, I'd like to convert this:

      time     measure
1  2001 Q1  0.14680685
2  2001 Q1  0.53593193
3  2001 Q1  0.56097974
4  2001 Q2 -1.48102689
5  2001 Q2  0.18150972
6  2001 Q3  1.72018147
7  2001 Q3 -0.08480855
8  2001 Q3 -2.23208877
9  2001 Q3 -1.15269107
10 2001 Q3  0.57975023

... to this ...

              [,1]        [,2]       [,3]      [,4]      [,5]
2001 Q1  0.1468068  0.53593193  0.5609797        NA        NA
2001 Q2 -1.4810269  0.18150972         NA        NA        NA
2001 Q3  1.7201815 -0.08480855 -2.2320888 -1.152691 0.5797502

There is probably an easy way (perhaps some usage of reshape or reshape2 casting that I'm not familiar with?), but a bunch of searching hasn't helped me so far.

Thanks for any help!

==========

From one of the solutions below, the following will generate the final qcc xbar plot, including group labels:

library(splitstackshape)
out_df <- dcast( getanID( x_df, 'time' ), time~.id, value.var='measure' )
qcc( out_df[,-1], type = 'xbar', labels = out_df[,1] )
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You'll need an intermediate variable that gives a "within-time" id. You can create it and reshape like this

library(tidyr)
library(dplyr)

group_by(X, time) %>%
  mutate(seq = 1:n()) %>%
  ungroup() %>%
  spread(seq, measure)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...