Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
542 views
in Technique[技术] by (71.8m points)

r - Subset a data frame based on value pairs stored in independent ordered vectors

I have an R dataframe that I need to subset data from. The subsetting will be based on two columns in the dataframe. For example:

A <- c(1,2,3,3,5,1)
B <- c(6,7,8,9,8,8)
Value <- c(9,5,2,1,2,2)
DATA <- data.frame(A,B,Value)

This is how DATA looks

A B Value
1 6     9
2 7     5
3 8     2
3 9     1
5 8     2
1 8     2

I want those rows of data for which (A,B) combination is (1,6) and (3,8). These pairs are stored as individual (ordered) vectors of A and B:

AList <- c(1,3)
BList <- c(6,8)

Now, I am trying to subset the data basically by comparing if A column is present in AList AND B column is present in BList

DATA[(DATA$A %in% AList & DATA$B %in% BList),]

The subsetted result is shown below. In addition to the value pairs (1,6) and (3,8) I am also getting (1,8). Basically, this filter has given me value pairs for all combinations in AList and BList. How do I restrict it to just (1,6) and (3,8)?

A B Value
1 6     9
3 8     2
1 8     2

This is my desired result:

A B Value
1 6     9
3 8     2
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

This is a job for merge:

KEYS <- data.frame(A = AList, B = BList)
merge(DATA, KEYS)

#   A B Value
# 1 1 6     9
# 2 3 8     2

Edit: after the OP expressed his preference for a logical vector in the comments below, I would suggest one of the following.

Use merge:

df.in.df <- function(x, y) {
  common.names <- intersect(names(x), names(y))
  idx <- seq_len(nrow(x))
  x <- x[common.names]
  y <- y[common.names]
  x <- transform(x, .row.idx = idx)
  idx %in% merge(x, y)$.row.idx
}

or interaction:

df.in.df <- function(x, y) {
  common.names <- intersect(names(x), names(y))
  interaction(x[common.names]) %in% interaction(y[common.names])
}

In both cases:

df.in.df(DATA, KEYS)
# [1] TRUE FALSE  TRUE FALSE FALSE FALSE

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...