Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
149 views
in Technique[技术] by (71.8m points)

r - How to decide which dataframes will be bundled into a list, based on dataframe-specific conditions

I have several dataframes that I want to merge. I'm looking for a scalable solution and I've found this nice one. So I do:

library(purrr)
library(dplyr)

df_a <- data.frame(id = 1:8)
df_b <- data.frame(id = 5:10)
df_c <- data.frame(id = 2:6)
df_d <- data.frame(id = 3:6)

dfs_to_merge <- list(df_a, df_b, df_c, df_d)

dfs_to_merge %>%
  reduce(left_join, by = "id")
#>   id
#> 1  1
#> 2  2
#> 3  3
#> 4  4
#> 5  5
#> 6  6
#> 7  7
#> 8  8

Created on 2021-01-25 by the reprex package (v0.3.0)

But what if, for example, I wanted to condition whether df_c will be included in dfs_to_merge based on a the value of a variable my_condition_df_c?

Example — Not a scalable solution

If my_condition_df_c > 5 then include df_c in dfs_to_merge

my_condition_df_c <- sample(1:10, 1)

if (my_condition_df_c > 5) {
  dfs_to_merge <- list(df_a, df_b, df_c, df_d)
} else {
  dfs_to_merge <- list(df_a, df_b, df_d)
}

dfs_to_merge %>%
  reduce(left_join, by = "id")

My Problem

Consider that I may have several dataframes to merge, and that each one of them may have its own condition that determines whether it should be passed forward for merging.

my_condition_df_a <- sample(1:100, 1) ## include df_a if my_condition_df_a > 65                      
my_condition_df_b <- sample(c("foo", "blah"), 1) ## include df_b if my_condition_df_b == "foo"       
my_condition_df_d <- sample(c(NA, 1, 2, 3, NA, 19), 1) ## include df_d if my_condition_df_d is not NA

How could I elegantly condition which data frame gets in and which is not? Using if-else blocks as I did above is not a scalable solution as it will easily become messy and unreadable code.


UPDATE — I made some progress


So what I do is to make a character vector of object names, containing the names of dataframes to be included in the list later (or not). Being included in this vector is subject to specific condition per data frame.

dfs_to_merge_names <- c()

if (my_condition_df_a > 65) {
  dfs_to_merge_names <- c(dfs_to_merge_names, "df_a")
} 

if (my_condition_df_b == "foo") {
  dfs_to_merge_names <- c(dfs_to_merge_names, "df_b")
} 

if (my_condition_df_c > 5) {
  dfs_to_merge_names <- c(dfs_to_merge_names, "df_c")
} 

if (!is.na(my_condition_df_d)) {
  dfs_to_merge_names <- c(dfs_to_merge_names, "df_d")
} 

mget(dfs_to_merge_names) %>% ## https://stackoverflow.com/a/45963957/6105259
  reduce(left_join, by = "id")

I will still be happy for ideas whether this code could be shortened and more concise.

question from:https://stackoverflow.com/questions/65886231/how-to-decide-which-dataframes-will-be-bundled-into-a-list-based-on-dataframe-s

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Here's stab that avoids the use of mget.

conditions <- c(
  (my_condition_df_a > 65),
  (my_condition_df_b == "foo"),
  (my_condition_df_c > 5),
  (!is.na(my_condition_df_d))
)
reduce(dfs_to_merge[conditions], left_join, by = "id")

An alternative, if you are not assured of the order of conditions versus frames in the list. Build your list of frames named, either manually or with lst:

dfs_to_merge <- list(df_a=df_a, df_b=df_b, df_c=df_c, df_d=df_D) # names can be anything
dfs_to_merge <- lst(df_a, df_b, df_c, df_d)                      # names will be object's symbols

Then using your condition-building:

dfs_to_merge_names <- c()

if (my_condition_df_a > 65) {
  dfs_to_merge_names <- c(dfs_to_merge_names, "df_a")
}
# ...
reduce(dfs_to_merge[dfs_to_merge_names ], left_join, by = "id")

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...