Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
812 views
in Technique[技术] by (71.8m points)

dplyr - How do I select all unique combinations of two columns in an R data frame?

I have a correlation matrix that I put in a dataframe like so:

row | var1 | var2 | cor
1   | A    | B    | 0.6
2   | B    | A    | 0.6
3   | A    | C    | 0.4
4   | C    | A    | 0.4

These results are duplicated into 2 rows each, with both combinations of "var1" and "var2". I only need one, preferably with the lower variable first (e.g. rows 1 and 3).

I've been playing with dplyr for two hours and reading old threads, but not finding what I need.

# get correlation of every concept versus every concept
data.cor <- data.jobs %>% 
  select(-y,-X) %>%
  as.matrix %>%
  cor %>%
  as.data.frame %>%
  rownames_to_column(var = 'var1') %>%
  gather(var2, value, -var1)

I would like output to look like so:

row | var1 | var2 | cor
1   | A    | B    | 0.6
3   | A    | C    | 0.4

I am trying to do this without resorting to a loop.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Here's one way with tidyverse -

dat2 <- dat %>% 
  filter(!duplicated(paste0(pmax(var1, var2), pmin(var1, var2))))


# A tibble: 2 x 3
  var1  var2    cor
  <chr> <chr> <dbl>
1 A     B     0.600
2 A     C     0.400

Data -

dat <- data_frame(
  var1 = LETTERS[c(1,2,1,3)],
  var2 = LETTERS[c(2,1,3,1)],
  cor = c(0.6,0.6,0.4,0.4))

Note: cleaned up the logic thanks to @tmfmnk


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...