Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.9k views
in Technique[技术] by (71.8m points)

r - Unnest daframe within tibble

I'm not sure how to create a reproducible dataset based on this situation. But below is the output from glimpse, where glimpse identifies 2 columns with item column is a data.frame object.

What I want is to have a rectangular tibble, which if item was just another tibble, I can simply unnest.

> glimpse(df)
Columns: 2
$ ID   <chr> "A", "B", "C", "D", 
$ item <df[,2]> <data.frame[25 x 2]>

> df
# A tibble: 149,826 x 2
   ID        item$link                                    $value             
   <chr>     <chr>                                        <chr>              
 1 A~ https://ex.com/api/now/                          941b821e6f9121004d
 2 B~ https://ex.com/api/now/                          6c5bdbd46fa9610080
 3 C~ https://ex.com/api/now/                          941b821e6f9121004d

The current workaround I use is by converting the data.frame column as tibble and bind it back to the rest of the table. But this is not ideal if I have many columns with this data.frame type. Notice how the important column identifier which is item is now lost as it only captures link and value. If there're multiple columns with data.frame type, this will generate an error since these columns will all be called link and value.

df <- bind_cols(df %>% dplyr::select(ID), as_tibble(dataset$item))

> df
# A tibble: 149,826 x 3
   number    link                                         value              
   <chr>     <chr>                                        <chr>              
 1 A~ https://ex.com/api/now/                          941b821e6f9121004d
 2 B~ https://ex.com/api/now/                          6c5bdbd46fa9610080
 3 C~ https://ex.com/api/now/                          941b821e6f9121004d
 

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I think the approach below can solve your problem. Please see the inline comments.

library(dplyr)
library(tidyr)
library(stringr)

# a data.frame
dat1 <- data.frame(link = rep("https://ex.com/api/now/", 3),
                   value = c("941b821e6f9121004d",
                             "6c5bdbd46fa9610080",
                             "941b821e6f9121004d"))
# another data.frame
dat2 <- data.frame(link = rep("https://another/link", 3),
                   value = c("843rq4nrfhvr843938",
                             "35rth165bft1tsv15d",
                             "435grg53th1ht51gte"))

# the example data containing two columns which are data.frames
dat <- tibble(id = c("A","B", "C"),
              item1 = dat1,
              item2 = dat2)

# lets have a look
dat %>% glimpse
#> Rows: 3
#> Columns: 3
#> $ id    <chr> "A", "B", "C"
#> $ item1 <df[,2]> <data.frame[3 x 2]>
#> $ item2 <df[,2]> <data.frame[3 x 2]>

# use chop to bring the data.frame columns in a nested format
dat_chop <- dat %>% chop(c(item1, item2)) 

# if you need one specific colum just unnest it:
dat_chop %>% unnest(item1)
#> # A tibble: 3 x 4
#>   id    link                    value                       item2
#>   <chr> <chr>                   <chr>              <list<df[,2]>>
#> 1 A     https://ex.com/api/now/ 941b821e6f9121004d        [1 × 2]
#> 2 B     https://ex.com/api/now/ 6c5bdbd46fa9610080        [1 × 2]
#> 3 C     https://ex.com/api/now/ 941b821e6f9121004d        [1 × 2]

# if you need all columns, unnest it and specify a `names_repair` strategy:
dat_chop %>%
  unnest(c(item1, item2), names_repair = "universal") %>%
  # the above will suffice if you don't mind:
  # (1) the dots `...` in the column names
  # (2) the consecutive numbering of the variables
  # both can be changed with the following call to `rename_with`
  rename_with(~ str_replace_all(.x, "\.\.\.", "") %>%
                str_replace_all(., "\d$",
                                function(x) ((as.numeric(x)%/% 2))),
              .cols = -id)
# Now each number refers to the item number it originated from!

#> New names:
#> * link -> link...2
#> * value -> value...3
#> * link -> link...4
#> * value -> value...5
#> # A tibble: 3 x 5
#>   id    link1                value1           link2              value2         
#>   <chr> <chr>                <chr>            <chr>              <chr>          
#> 1 A     https://ex.com/api/… 941b821e6f91210… https://another/l… 843rq4nrfhvr84…
#> 2 B     https://ex.com/api/… 6c5bdbd46fa9610… https://another/l… 35rth165bft1ts…
#> 3 C     https://ex.com/api/… 941b821e6f91210… https://another/l… 435grg53th1ht5…

Created on 2021-01-13 by the reprex package (v0.3.0)


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

57.0k users

...