Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
143 views
in Technique[技术] by (71.8m points)

r - match values in 2 columns with the corresponding position in another character column

An example dataframe:

example_df = data.frame(Gene.names = c("A", "B"),
                         Score = c("3.69,2.97,2.57,3.09,2.94",
                                   "3.99,2.27,2.89,2.89,2.00,2.52,2.09,2.83"),
                         ResidueAA = c("S", "Y"),
                         ResidueNo = c(3, 3),
                         Sequence = c("MSSYT", "MSSYTRAP") )

I want to check if the character at ResidueAA column at the position at ResidueNo column matches with the corresponding position in the ‘Sequence’ column. The output should be another column, say, ‘Check’ with a Yes or No.

This is working code:

example_df$Check=sapply(1:nrow(example_df),FUN=function(i){d=example_df[i,]; substr(d$Sequence,d$ResidueNo,d$ResidueNo)==d$ResidueAA})

Is there an easier/elegant way to do this? Ideally, I want something that works within a dplyr pipe. Also, related to this, how can I extract the corresponding value from the 'Score' column into a new column, say, 'Score_1'?

Thanks

question from:https://stackoverflow.com/questions/65908899/match-values-in-2-columns-with-the-corresponding-position-in-another-character-c

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

We can use substr directly

library(dplyr)
example_df  %>%
   mutate(Check = substr(Sequence, ResidueNo, ResidueNo) == ResidueAA)

-output

#  Gene.names                                   Score ResidueAA ResidueNo Sequence Check
#1          A                3.69,2.97,2.57,3.09,2.94         S         3    MSSYT  TRUE
#2          B 3.99,2.27,2.89,2.89,2.00,2.52,2.09,2.83         Y         3 MSSYTRAP FALSE

To create a new column with matching 'Score', use match to get the corresponding index instead of == (which does an elementwise comparison) and use the index for extracting the 'Score' element

example_df  %>%
    mutate(Score2 =  Score[match(ResidueAA,
         substr(Sequence, ResidueNo, ResidueNo), ResidueAA)])

-output

#Gene.names                                   Score ResidueAA ResidueNo Sequence
#1          A                3.69,2.97,2.57,3.09,2.94         S         3    MSSYT
#2          B 3.99,2.27,2.89,2.89,2.00,2.52,2.09,2.83         Y         3 MSSYTRAP
#                    Score2
#1 3.69,2.97,2.57,3.09,2.94
#2                     <NA>

Update

Based on the comments, we need to extract the corresponding element of 'Score' based on the 'ResidueNo' if the substring values of 'Sequence' is the same as the 'ResidueAA'. This can be done by splitting the 'Score' with strsplit into a list, extract the first element ([[1]] - after a rowwise operation) and then use the 'ResidueNo' to get the splitted word on that location

example_df  %>%
  rowwise %>% 
  mutate(Score2 = if(substr(Sequence, ResidueNo, ResidueNo) == 
    ResidueAA) strsplit(Score, ",")[[1]][ResidueNo] else NA_character_) %>%
  ungroup

-output

# A tibble: 2 x 6
#  Gene.names Score                                   ResidueAA ResidueNo Sequence Score2
#  <chr>      <chr>                                   <chr>         <dbl> <chr>    <chr> 
#1 A          3.69,2.97,2.57,3.09,2.94                S                 3 MSSYT    2.57  
#2 B          3.99,2.27,2.89,2.89,2.00,2.52,2.09,2.83 Y                 3 MSSYTRAP <NA>  

Or another option is separate_rows to split the rows to expand the data, then do a group by 'Gene.names', `summarise to get the corresponding 'Score2' element (similar to previous solution) and do a join with the original dataset

library(tidyr)
example_df %>%
    separate_rows(Score, sep= ",") %>% 
    group_by(Gene.names) %>% 
    summarise(Score2 = if(substr(first(Sequence), first(ResidueNo), first(ResidueNo)) ==
       first(ResidueAA)) Score[first(ResidueNo)] else
         NA_character_, .groups = 'drop') %>% 
    right_join(example_df)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

57.0k users

...