Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
720 views
in Technique[技术] by (71.8m points)

dataframe - Merge multiple variables in R

I have a dataset such that the same variable is contained in difference columns for each subject. I want to merge them to the same columns.

E.g.:, I have this dataframe, and there are three DVs, but they are in different columns (A,B,C) for different subjects.

data.frame(ID = c(1,2,3), DV1_A=c(1,NA,NA), DV1_B= c(NA,4,NA), DV1_C = c(NA,NA,5), DV2_A=c(3,NA,NA), DV2_B=c(NA,3,NA), DV2_C=c(NA,NA,5), FACT = c("A","B","C"))

How can I merge them to just two columns? so the result is:

data.frame(ID = c(1,2,3), DV1_A=c(1,NA,NA), DV1_B= c(NA,4,NA), DV1_C = c(NA,NA,5), DV2_A=c(3,NA,NA), DV2_B=c(NA,3,NA), DV2_C=c(NA,NA,5), FACT = c("A","B","C"), DV_1 = c(1,4,5), DV_2 = c(3,3,5))
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You can use coalesce from dplyr:

library(dplyr)

df %>%
  mutate(DV_1 = coalesce(DV1_A, DV1_B, DV1_C),
         DV_2 = coalesce(DV2_A, DV2_B, DV2_C))

If you have a lot of DV columns to combine, you might not want to type all the column names. In this case, you can first grep the column names for each DV, parse each name to symbols with rlang::syms, then splice (!!!) the symbols in coalesce (Advice from @hadley):

library(rlang)
var_quo1 = syms(grep("DV1", names(df), value = TRUE))
var_quo2 = syms(grep("DV2", names(df), value = TRUE))

df %>%
  mutate(DV_1 = coalesce(!!! var_quo1),
         DV_2 = coalesce(!!! var_quo2))

If instead, you have a ton of DV's, you might not even want to type all the coalesce lines, in this case, you can create a function that outputs one DV column given an input number and lapply + bind_col all of them together:

DV_combine = function(num_DVs){

  DV_name = sym(paste0("DV", num_DVs))
  DV_syms = syms(grep(paste0("DV", num_DVs), names(df), value = TRUE))

  df %>%
    transmute(!!DV_name := coalesce(!!! DV_syms))
}

bind_cols(df, lapply(1:2, DV_combine))

Result:

  ID DV1_A DV1_B DV1_C DV2_A DV2_B DV2_C FACT DV_1 DV_2
1  1     1    NA    NA     3    NA    NA    A    1    3
2  2    NA     4    NA    NA     3    NA    B    4    3
3  3    NA    NA     5    NA    NA     5    C    5    5

Note:

This method will work for both numeric and character class columns, but not factor's. One should first convert the factor columns to character before using this method.

Data:

df = structure(list(ID = c(1, 2, 3), DV1_A = c(1, NA, NA), DV1_B = c(NA, 
4, NA), DV1_C = c(NA, NA, 5), DV2_A = c(3, NA, NA), DV2_B = c(NA, 
3, NA), DV2_C = c(NA, NA, 5), FACT = structure(1:3, .Label = c("A", 
"B", "C"), class = "factor")), .Names = c("ID", "DV1_A", "DV1_B", 
"DV1_C", "DV2_A", "DV2_B", "DV2_C", "FACT"), row.names = c(NA, 
-3L), class = "data.frame")

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...