Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
667 views
in Technique[技术] by (71.8m points)

r - Converting a column of type 'list' to multiple columns in a data frame

I have a data frame with one column which is a list, like so:

>head(movies$genre_list)
[[1]]
[1] "drama"   "action"  "romance"
[[2]]
[1] "crime" "drama"
[[3]]
[1] "crime"   "drama"   "mystery"
[[4]]
[1] "thriller" "indie"  
[[5]]
[1] "thriller"
[[6]]
[1] "drama"  "family"

I want to convert this one column to multiple columns, one for each unique element across the lists (in this case, genres), and have them as binary columns. I'm looking for an elegant solution, which doesn't involve first finding out how many genres are there, and then creating a column for each, and then checking each list element to then populate the genre columns. I tried unlist, but it doesn't work with a vector of lists in the way I want.

Thanks!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Here are a few approaches:

movies <- data.frame(genre_list = I(list(
   c("drama",   "action",  "romance"),
   c("crime", "drama"),
   c("crime",   "drama",   "mystery"),
   c("thriller", "indie"),  
   c("thriller"),
   c("drama",  "family"))))

Update, years later....

You can use the mtabulate function from "qdapTools" or the unexported charMat function from my "splitstackshape" package.

Syntax would be:

library(qdapTools)
mtabulate(movies$genre_list)
#   action crime drama family indie mystery romance thriller
# 1      1     0     1      0     0       0       1        0
# 2      0     1     1      0     0       0       0        0
# 3      0     1     1      0     0       1       0        0
# 4      0     0     0      0     1       0       0        1
# 5      0     0     0      0     0       0       0        1
# 6      0     0     1      1     0       0       0        0

or

splitstackshape:::charMat(movies$genre_list, fill = 0)
#      action crime drama family indie mystery romance thriller
# [1,]      1     0     1      0     0       0       1        0
# [2,]      0     1     1      0     0       0       0        0
# [3,]      0     1     1      0     0       1       0        0
# [4,]      0     0     0      0     1       0       0        1
# [5,]      0     0     0      0     0       0       0        1
# [6,]      0     0     1      1     0       0       0        0

Update: A couple of more direct approaches

Improved option 1: Use table somewhat directly:

table(rep(1:nrow(movies), sapply(movies$genre_list, length)), 
      unlist(movies$genre_list, use.names=FALSE))

Improved option 2: Use a for loop.

x <- unique(unlist(movies$genre_list, use.names=FALSE))
m <- matrix(0, ncol = length(x), nrow = nrow(movies), dimnames = list(NULL, x))
for (i in 1:nrow(m)) {
  m[i, movies$genre_list[[i]]] <- 1
}
m

Below is the OLD answer

Convert the list to a list of tables (in turn converted to data.frames):

tables <- lapply(seq_along(movies$genre_list), function(x) {
  temp <- as.data.frame.table(table(movies$genre_list[[x]]))
  names(temp) <- c("Genre", paste("Record", x, sep = "_"))
  temp
})

Use Reduce to merge the resulting list. If I understand your end goal correctly, this results in the transposed form of the result you are interested in.

merged_tables <- Reduce(function(x, y) merge(x, y, all = TRUE), tables)
merged_tables
#      Genre Record_1 Record_2 Record_3 Record_4 Record_5 Record_6
# 1   action        1       NA       NA       NA       NA       NA
# 2    drama        1        1        1       NA       NA        1
# 3  romance        1       NA       NA       NA       NA       NA
# 4    crime       NA        1        1       NA       NA       NA
# 5  mystery       NA       NA        1       NA       NA       NA
# 6    indie       NA       NA       NA        1       NA       NA
# 7 thriller       NA       NA       NA        1        1       NA
# 8   family       NA       NA       NA       NA       NA        1

Transposing and converting NA to 0 is pretty straightforward. Just drop the first column and re-use it as the column names for the new data.frame

movie_genres <- setNames(data.frame(t(merged_tables[-1])), merged_tables[[1]])
movie_genres[is.na(movie_genres)] <- 0
movie_genres

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...