Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
810 views
in Technique[技术] by (71.8m points)

dataframe - Use of mutate in Summarise function using R

I have a dataframe like as shown below

identifier date       from       to         type  shift_back_max shift_forward_max
   <chr>      <date>     <date>     <date>     <chr>          <dbl>             <dbl>
   11         2011-12-31 2011-01-01 2011-12-31 last             364                 0
   11         2009-07-11 2009-01-01 2009-12-31 last             191               173
   11         NA         NA         NA         last              NA                NA
   11         2013-05-21 2013-01-01 2013-12-31 last             140               224
   11         2017-06-06 2017-01-01 2017-12-31 last             156               208
   12         2014-04-03 2014-01-01 2014-12-31 NA                92               272
   12         2016-08-04 2016-01-01 2016-12-31 NA               216               149
   12         2014-03-05 2014-01-01 2014-12-31 NA                63               301
   13         2011-02-07 2011-01-01 2011-12-31 NA                37               327
   14         2014-04-04 2014-01-01 2014-12-31 first             93               271
   14         2011-01-01 2011-01-01 2011-12-31 first              0               364
   14         2016-06-21 2016-01-01 2016-12-31 first            172               193
   16         NA         NA         NA         NA                NA                NA
   17         NA         NA         NA         NA                NA                NA
   18         NA         NA         NA         NA                NA                NA
   19         NA         NA         NA         NA                NA                NA

I am trying the below scenarios

Scenario - 1 (using mutate in across stmt)

data %>%
   group_by(identifier) %>%
   summarize(shift_back_max = - min(shift_back_max, na.rm = TRUE),
            shift_forward_max = min(shift_forward_max, na.rm = TRUE),
            mutate(across(starts_with("shift"), ~ ifelse(is.infinite(.x), 30 * sign(.x), .x))))

Scenario - 2 (without using mutate in across stmt)

data %>%
   group_by(identifier) %>%
   summarize(shift_back_max = - min(shift_back_max, na.rm = TRUE),
            shift_forward_max = min(shift_forward_max, na.rm = TRUE),
            across(starts_with("shift"), ~ ifelse(is.infinite(.x), 30 * sign(.x), .x)))

Both scenarios produce the same output as shown below. So what's the use of mutate stmt in across stmt? Can you let me know whether it is a bad programming practice or it will produce incorrect output in any specific case? I use across stmt to replace -Inf with -30 and Inf with 30. I already adopted scenario 2 to my data of several million records and did this. Do I have to rerun again as it might have incorrect output or its just a bad programming practice?

which of the two scenarios is the correct one? does it mean other scenarios can produce incorrect output? can help me, please?

enter image description here

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I find the use of mutate inside summarize very confusing, and don't really know what to expect of it (I'm honestly surprised it even works). If I understand correctly, what you want to do is best expressed as (Scenario - 3):

data %>%
   group_by(identifier) %>%
   summarize(shift_back_max = - min(shift_back_max, na.rm = TRUE),
             shift_forward_max = min(shift_forward_max, na.rm = TRUE)) %>%
   ungroup() %>%
   mutate(across(starts_with("shift"), ~ ifelse(is.infinite(.x), 30 * sign(.x), .x))))

(meaning you first summarize by identifier, then you apply a treatment to the whole result)

You can compare results of the different approaches with all.equal(). I'd expect all these approaches to give the same result, but not to be as clear to the reader.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...