Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.1k views
in Technique[技术] by (71.8m points)

conditional statements - Delete duplicated rows in R with conditions in other columns

This is a little subset of the data :

I have :

Id var1 var2
1   POS NA
1   NA  NEG
2   NEG NA
2   NA  NEG
3   POS NA
3   NA  NEG
4   POS POS
5   POS NA

My ideal output

Id var1 var2
1   POS  NEG
2   NEG  NEG
3   POS  NEG
4   POS  POS
5   POS  NA

I would simply like to delete duplicated Id and have one row per unique id with the good result in var1 and var2. Anyone see the issue? Help would be greatly appreciated. Thank you !

question from:https://stackoverflow.com/questions/65940513/delete-duplicated-rows-in-r-with-conditions-in-other-columns

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You could try a solution with na.omit. This function will remove NA within each group. Assuming your data frame is df...

In base R:

aggregate(. ~ Id,
          data = df, 
          FUN = function(x) { 
            y = na.omit(x) 
            y[length(y) == 0] <- NA 
            y 
          },
          na.action = "na.pass")

Note that y[length(y) == 0] is included to ensure cases like Id 5 and var2 are NA and not character(0).


With dplyr:

library(dplyr)

df %>% 
  group_by(Id) %>%
  summarise(across(everything(), ~ first(na.omit(.))))

Using first will include the first value within the group after NA removed. across(everything()) will apply this method to all columns.


With data.table:

library(data.table)

setDT(df)[, lapply(.SD, na.omit), by = Id]

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...