Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
122 views
in Technique[技术] by (71.8m points)

Using R, how to filter a column to keep item contained in another?

I do have a dataframe like this one

columna <- c(1,2,3)
columnb <- c("a b e", "c d", "a c d")
columnc <- as.Date(c('2010-11-1','2008-3-25','2007-3-14'))
alldata <- data.frame(columna,columnb,columnc)
tokeep <- c("c", "e")

And i would like to get the same alldata with columnb modified to only keep in columnb the strings found in tokeep.

Ideally, i would like to have alldata$columnb to be

[ "e", "c", "c" ]

I first thought i could use something like

filter(alldata, alldata$columnb %in% tokeep)
alldata[which(alldata$b %in% tokeep), ]

but i can't manage to find a solution.

Can someone guide me on this ?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

We can try using gsub to substitute the characters which we dont want with an empty string

alldata$columnb<- gsub(paste0("[^",paste0(tokeep,collapse = "|"),"]"),"", alldata$columnb)

alldata
#  columna columnb    columnc
#1       1       e 2010-11-01
#2       2       c 2008-03-25
#3       3       c 2007-03-14

The regular expression which we are creating is

paste0("[^",paste0(tokeep, collapse = "|"), "]")

#[1] "[^c|e]"

which means anything except c or e.

EDIT

As per Wiktor's comment we probably need regex as

paste0("[^",paste0(tokeep,collapse = ""),"]")
#[1] "[^ce]"

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...