Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
907 views
in Technique[技术] by (71.8m points)

regex - R dplyr filter based on matching search term with first words of any work in select columns

I'm trying to filter words from selected columns based on keywords that start the words in the text of match a particular regular expression. Here, I'm trying to pick all words starting with "bio" or "15". But the search terms can also be found in the middle of some words like symbiotic for the Name column and 161540 for the Code column.

**Name**                     **Code**
Biofuel is good          159403
Bioecological is good    161540
Probiotics is good       159883
Good is symbiotic        1877447

I tried the code below

Innov_filter <- Innov_Data %>% 
  select(everything()) %>% 
  filter(str_detect(str_to_lower(Name), "bio") | str_detect(str_to_lower(Code), "bio"))

This is however not working because it is filtering the last row which doesn't fit into any of the conditions. I will appreciate help in strict search based on the first appearance of the search term as part of the word and not just in any location of the word.

Thanks

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

We could use filter_all with any_vars

df %>% 
   filter_all(any_vars(str_detect(str_to_lower(.), "^(bio|15)")))
#                  Name   Code
#1       Biofuel is good 159403
#2 Bioecological is good 161540
#3    Probiotics is good 159883

NOTE: If it is a subset of columns that needs the conditions to apply, use filter_at

If we need to pick any word that start with 'Bio' in a sentence, wrap with word boundary (\b)

df %>% 
   filter_all(any_vars(str_detect(str_to_lower(.), "\bbio|^15")))

data

df <- structure(list(Name = structure(c(2L, 1L, 4L, 3L), 
   .Label = c("Bioecological is good", 
"Biofuel is good", "Good is symbiotic", "Probiotics is good"), 
  class = "factor"), 
Code = c(159403, 161540, 159883, 1877447)), class = "data.frame", row.names = c(NA, 
 -4L))

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...