Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
490 views
in Technique[技术] by (71.8m points)

sql - R equivalent of SELECT DISTINCT on two or more fields/variables

Say I have a dataframe df with two or more columns, is there an easy way to use unique() or other R function to create a subset of unique combinations of two or more columns?

I know I can use sqldf() and write an easy "SELECT DISTINCT var1, var2, ... varN" query, but I am looking for an R way of doing this.

It occurred to me to try ftable coerced to a dataframe and use the field names, but I also get the cross tabulations of combinations that don't exist in the dataset:

uniques <- as.data.frame(ftable(df$var1, df$var2))
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

unique works on data.frame so unique(df[c("var1","var2")]) should be what you want.

Another option is distinct from dplyr package:

df %>% distinct(var1, var2) # or distinct(df, var1, var2)

Note:

For older versions of dplyr (< 0.5.0, 2016-06-24) distinct required additional step

df %>% select(var1, var2) %>% distinct

(or oldish way distinct(select(df, var1, var2))).


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...