Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
384 views
in Technique[技术] by (71.8m points)

tidyverse - R - Applying the Same Code to Multiple Columns

In my data cleansing, I have multiple dimension columns with a name in them that need to be aggregated by multiple metric columns. The same code needs to be applied to my dimension columns. I can easily enough copy and paste the same chunk of code ten times and change the column reference, however surely there is a simpler solution.

My research is leading me to believe I am missing something obvious with the sapply() function that I can't put my finger on.

Very basic reprex:

library(tidyverse)

player_1 <- c("Smith", "Adams", "Washington")
player_2 <- c("Johnson", "Jefferson", "Fuller")
player_3 <- c("Forman", "Hyde", "Kelso")
metric_1 <- 1:3
metric_2 <- 2:4
metric_3 <- 3:5

df <- data.frame(player_1, player_2, player_3, metric_1, metric_2, metric_3)

p1 <- df %>% 
  group_by(player_1) %>% 
  summarize_at(c("metric_1", "metric_2", "metric_3"), sum)

Is there a way to only have to type this "p1" code once but have R loop through my columns player_1, player_2, and player_3?

If I can provide more detail, please let me know.

question from:https://stackoverflow.com/questions/66056229/r-applying-the-same-code-to-multiple-columns

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Couple of options :

  1. Use map and iterate over each player individually.
library(tidyverse)

cols <- paste0('player_', 1:3)

map(cols, ~df %>% 
           group_by(.data[[.x]]) %>% 
            summarise(across(starts_with('metric'), sum)))

#[[1]]
# A tibble: 3 x 4
#  player_1   metric_1 metric_2 metric_3
#* <chr>         <int>    <int>    <int>
#1 Adams             2        3        4
#2 Smith             1        2        3
#3 Washington        3        4        5

#[[2]]
# A tibble: 3 x 4
#  player_2  metric_1 metric_2 metric_3
#* <chr>        <int>    <int>    <int>
#1 Fuller           3        4        5
#2 Jefferson        2        3        4
#3 Johnson          1        2        3

#[[3]]
# A tibble: 3 x 4
#  player_3 metric_1 metric_2 metric_3
#* <chr>       <int>    <int>    <int>
#1 Forman          1        2        3
#2 Hyde            2        3        4
#3 Kelso           3        4        5

  1. Get the data in long format.
df %>%
  pivot_longer(cols = starts_with('player')) %>%
  group_by(name, value) %>%
  summarise(across(starts_with('metric'), sum))

#  name     value      metric_1 metric_2 metric_3
#  <chr>    <chr>         <int>    <int>    <int>
#1 player_1 Adams             2        3        4
#2 player_1 Smith             1        2        3
#3 player_1 Washington        3        4        5
#4 player_2 Fuller            3        4        5
#5 player_2 Jefferson         2        3        4
#6 player_2 Johnson           1        2        3
#7 player_3 Forman            1        2        3
#8 player_3 Hyde              2        3        4
#9 player_3 Kelso             3        4        5

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...