Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
989 views
in Technique[技术] by (71.8m points)

dplyr - Tried code in R with mutate_at and max() functions with own data. Warning messages come up: no non-missing arguments to max

I'm curretly learning R with a book and was trying a mutate_at function from dplyr. In this example I want to standardize the survey items on a scale from 0 to 1. To do this, we can divide each value by the (theoretical) maximum value of the scale.

The book example stats_test from the package "pradadata" works perfectly fine:

data(stats_test, package = "pradadata")
  stats_test %>%
  drop_na() %>% 
  mutate_at(.vars = vars(study_time, self_eval, interest),
            .funs = funs(prop = ./max(.))) %>%                                         
  select(contains("_prop"))

Output:

study_time_prop self_eval_prop interest_prop
             <dbl>          <dbl>         <dbl>
 1             0.6            0.7         0.667
 2             0.8            0.8         0.833
 3             0.6            0.4         0.167
 4             0.8            0.7         0.833
 5             0.4            0.6         0.5  
 6             0.4            0.6         0.667
 7             0.8            0.6         0.5  
 8             0.2            0.7         0.667
 9             0.6            0.8         0.833
10             0.6            0.7         0.833
# ... with 1,617 more rows

Tried the same code with my own data but it doesn't work and I can't figure out why. The variable RG04 from my data has a range from 1-5. I tried to transform the variable from numeric to integer, because the variables from the the data stats_test are integer too:

df_literacy_2 <- transform(df_literacy, RG04 = as.integer(RG04))
df_literacy_2 <- tibble(df_literacy_2)


df_literacy_2 %>% 
  drop_na() %>% 
  mutate_at(.vars = vars(RG04),
            .funs = funs(prop = ./max(.))) %>% 
select(contains("_prop"))

Output:

# A tibble: 0 x 0
Warning messages:
1: Problem with `mutate()` input `prop`.
i no non-missing arguments to max; returning -Inf
i Input `prop` is `RG04/max(RG04)`. 
2: In base::max(x, ..., na.rm = na.rm) :
  no non-missing arguments to max; returning -Inf


str(df_literacy_2$RG04)
int [1:630] 2 4 2 1 2 2 1 3 1 3 ...

Why doesn't it work on my data?

Thank you for your help.

Edit with sample of df_literacy:

> dput(head(df_literacy,20))
structure(list(CASE = c(40, 41, 44, 45, 48, 49, 54, 55, 56, 57, 
58, 61, 62, 63, 64, 65, 66, 67, 68, 69), SERIAL = c(NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA), REF = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA), QUESTNNR = c("base", "base", 
"base", "base", "base", "base", "base", "base", "base", "base", 
"base", "base", "base", "base", "base", "base", "base", "base", 
"base", "base"), MODE = c("interview", "interview", "interview", 
"interview", "interview", "interview", "interview", "interview", 
"interview", "interview", "interview", "interview", "interview", 
"interview", "interview", "interview", "interview", "interview", 
"interview", "interview"), STARTED = structure(c(1607290462, 
1607290608, 1607291086, 1607291118, 1607291265, 1607291793, 1607294071, 
1607294336, 1607294337, 1607294419, 1607294814, 1607296474, 1607301809, 
1607329348, 1607333933, 1607335996, 1607336207, 1607336378, 1607343194, 
1607343414), tzone = "UTC", class = c("POSIXct", "POSIXt")), 
    EI01 = structure(c(2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L), .Label = c("Ja", 
    "Nein", "Nicht beantwortet"), class = "factor"), EI02 = c(2, 
    2, 2, 1, 1, 2, 1, 2, 2, 2, 2, 1, 2, 2, 1, 1, 1, 1, 2, 3), 
    RF01 = c(4, 2, 4, 3, 4, 4, 1, 3, 2, 3, 4, 3, 2, 3, 2, 2, 
    4, 2, 5, 3), RF02 = c(1, 1, 1, 1, 2, 2, 1, 2, 1, 1, 2, 1, 
    1, 1, 2, 2, 2, 2, 2, 2), RF03 = c(1, 2, 2, 2, 1, 2, 1, 1, 
    1, 1, 2, 1, 1, 2, 2, 2, 1, 2, 1, 2), RG01 = c(2, 2, 2, 2, 
    2, 2, 1, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2), RG02 = c(3, 
    3, 3, 3, 4, 3, 4, 2, 4, 2, 3, 4, 4, 2, 4, 3, 4, 3, 4, 4), 
    RG03 = c(3, 2, 2, 3, 3, 3, 1, 3, 1, 2, 3, 1, 2, 2, 1, 3, 
    2, 3, 2, 2), RG04 = c(2, 4, 2, 1, 2, 2, 1, 3, 1, 3, 2, 4, 
    1, 1, 1, 1, 1, 2, 4, 1), RG05 = c(1, 1, 1, 1, 1, 1, 1, 2, 
    1, 2, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1), SD01 = structure(c(2L, 
    1L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 
    2L, 2L, 1L, 1L), .Label = c("weiblich", "m?nnlich", "divers", 
    "nicht beantwortet"), class = "factor"), SD03 = c(4, 3, 2, 
    2, 1, 2, 4, 4, 1, 4, 3, 1, 2, 3, 2, 4, 2, 3, 1, 3), SD05_01 = c(23, 
    22, 22, 21, 18, 22, 21, 27, 17, 22, 17, 21, 21, 22, 50, 25, 
    23, 20, 23, 23), TIME001 = c(2, 3, 23, 73, 29, 2, 3, 3, 29, 7, 
    50, 55, 3, 2, 10, 2, 1, 5, 7, 35), TIME002 = c(2, 2, 16, 
    34, 12, 14, 2, 2, 21, 2, 30, 24, 21, 3, 3, 2, 3, 2, 3, 22
    ), TIME003 = c(34, 8, 12, 15, 13, 12, 12, 7, 13, 11, 16, 
    10, 11, 16, 8, 8, 7, 8, 11, 14), TIME004 = c(60, 33, 25, 
    31, 45, 25, 14, 13, 38, 35, 50, 50, 37, 32, 32, 25, 72, 55, 
    28, 29), TIME005 = c(84, 21, 29, 41, 54, 33, 30, 22, 32, 
    42, 44, 23, 65, 30, 28, 32, 51, 31, 27, 44), TIME006 = c(14, 
    9, 27, 11, 24, 8, 8, 9, 18, 12, 35, 33, 27, 46, 11, 15, 8, 
    14, 12, 14), TIME007 = c(3, 18, 3, 5, 6, 2, 9, 2, 3, 3, 6, 
    7, 3, 13, 4, 4, 378, 3, 4, 10), TIME_SUM = c(199, 94, 135, 
    142, 183, 96, 78, 58, 154, 112, 186, 152, 167, 142, 96, 88, 
    146, 118, 92, 168), MAILSENT = c(NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), 
    LASTDATA = structure(c(1607290661, 1607290702, 1607291221, 
    1607291328, 1607291448, 1607291889, 1607294149, 1607294394, 
    1607294491, 1607294531, 1607295045, 1607296676, 1607301976, 
    1607329490, 1607334030, 1607336084, 1607336727, 1607336496, 
    1607343286, 1607343582), tzone = "UTC", class = c("POSIXct", 
    "POSIXt")), FINISHED = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
    1, 1, 1, 1, 1, 1, 1, 1, 1), Q_VIEWER = c(0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), LASTPAGE = c(7, 
    7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7), 
    MAXPAGE = c(7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 
    7, 7, 7, 7, 7), MISSING = c(7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 
    7, 7, 7, 7, 7, 7, 0, 7, 7, 7), MISSREL = c(1, 1, 1, 1, 1, 
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1), TIME_RSI = c("46023", 
    "14246", "0.75", "0.63", "0.54", "12055", "17533", "30682", 
    "0.7", "44197", "0.45", "0.58", "0.83", "44378", "44501", 
    "18629", "46753", "46388", "44197", "0.57"), DEG_TIME = c(27, 
    27, 3, 1, 0, 23, 30, 42, 2, 17, 0, 2, 7, 18, 10, 27, 43, 
    18, 8, 0)), row.names = c(NA, -20L), class = c("tbl_df", 
"tbl", "data.frame"))

Edit with TRUE and FALSE NAs:

> sapply(df_literacy, function(a) table(c(T,F,is.na(a)))-1)
      CASE SERIAL REF QUESTNNR MODE STARTED EI01 EI02 RF01 RF02 RF03 RG01 RG02 RG03 RG04 RG05 SD01 SD03 SD05_01 TE03_01 TIME001 TIME002 TIME003
FALSE  630      0   0      630  630     630  630  630  630  630  630  630  630  630  630  630  629  629     615      99     630     630     630
TRUE     0    630 630        0    0       0    0    0    0    0    0    0    0    0    0    0    1    1      15     531       0       0       0
      TIME004 TIME005 TIME006 TIME007 TIME_SUM MAILSENT LASTDATA FINISHED Q_VIEWER LASTPAGE MAXPAGE MISSING MISSREL TIME_RSI DEG_TIME
FALSE     630     630     629     625      630        0      630      630      630      630     630     630     630      630      630
TRUE        0       0       1       5        0      630        0        0        0        0       0       0       0        0        0
question from:https://stackoverflow.com/questions/65921711/tried-code-in-r-with-mutate-at-and-max-functions-with-own-data-warning-messag

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

There are a few things to correct here.

  1. drop_na() is removing all of your data.

    drop_na(df_literacy)
    # # A tibble: 0 x 37
    # # ... with 37 variables: CASE <dbl>, SERIAL <lgl>, REF <lgl>, QUESTNNR <chr>,
    # #   MODE <chr>, STARTED <dttm>, EI01 <fct>, EI02 <dbl>, RF01 <dbl>, RF02 <dbl>,
    # #   RF03 <dbl>, RG01 <dbl>, RG02 <dbl>, RG03 <dbl>, RG04 <dbl>, RG05 <dbl>,
    # #   SD01 <fct>, SD03 <dbl>, SD05_01 <dbl>, TIME001 <dbl>, TIME002 <dbl>,
    # #   TIME003 <dbl>, TIME004 <dbl>, TIME005 <dbl>, TIME006 <dbl>, TIME007 <dbl>,
    # #   TIME_SUM <dbl>, MAILSENT <lgl>, LASTDATA <dttm>, FINISHED <dbl>,
    # #   Q_VIEWER <dbl>, LASTPAGE <dbl>, MAXPAGE <dbl>, MISSING <dbl>,
    # #   MISSREL <dbl>, TIME_RSI <chr>, DEG_TIME <dbl>
    

    The problem is that you have several columns that are completely NA, namely SERIAL, REF, and MAILSENT.

    sapply(df_literacy, function(a) table(c(T,F,is.na(a)))-1)
    #       CASE SERIAL REF QUESTNNR MODE STARTED EI01 EI02 RF01 RF02 RF03 RG01 RG02
    # FALSE   20      0   0       20   20      20   20   20   20   20   20   20   20
    # TRUE     0     20  20        0    0       0    0    0    0    0    0    0    0
    #       RG03 RG04 RG05 SD01 SD03 SD05_01 TIME001 TIME002 TIME003 TIME004 TIME005
    # FALSE   20   20   20   20   20      20      20      20      20      20      20
    # TRUE     0    0    0    0    0       0       0       0       0       0       0
    #       TIME006 TIME007 TIME_SUM MAILSENT LASTDATA FINISHED Q_VIEWER LASTPAGE
    # FALSE      20      20       20        0       20       20       20       20
    # TRUE        0       0        0       20        0        0        0        0
    #       MAXPAGE MISSING MISSREL TIME_RSI DEG_TIME
    # FALSE      20      20      20       20       20
    # TRUE        0       0       0        0        0
    

    Drop the drop_na(), or at least drop_na(-SERIAL, -REF, -MAILSENT).

  2. Your code is using funs, which has been deprecated since dplyr-0.8.0.

    # Warning: `funs()` is deprecated as of dplyr 0.8.0.
    # Please use a list of either functions or lambdas: 
    #   # Simple named list: 
    #   list(mean = mean, median = median)
    #   # Auto named with `tibble::lst()`: 
    #   tibble::lst(mean, median)
    #   # Using lambdas
    #   list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
    

    While this isn't causing an error, it is causing a warning (and will likely stop working at some point. Change your mutate_at to be:

      mutate_at(.vars = vars(RG04, RF02),
                .funs = list(prop = ~ . / max(.)))
    
  3. You are using a single variable within .vars and a single function within .funs, so the column names are preserved as-is (and you will not see a _prop column). From ?mutate_at:

         The names of the new columns are derived from the names of the
         input variables and the names of the functions.
    
            ? if there is only one unnamed function (i.e. if '.funs' is an
              unnamed list of length one), the names of the input variables
              are used to name the new columns;
    
            ? for _at functions, if there is only one unnamed variable
              (i.e., if '.vars' is of the form 'vars(a_single_column)') and
              '.funs' has length greater than one, the names of the
              functions are used to name the new columns;
    
            ? otherwise, the new names are created by concatenating the
              names of the input variables and the names of the functions,
              separated with an underscore '"_"'.
    

    If you aren't going to add more variables and functions, then you need to self-name it in the call, as in mutate_at(.vars = vars(RG04 = RG04), ...). Oddly enough, this causes it to produce RG04_prop.

If we fix all of those, then it works.

df_literacy %>%
  drop_na(-SERIAL, -REF, -MAILSENT) %>%
  mutate_at(.vars = vars(RG04 = RG04),
            .funs = list(prop = ~ ./max(.))) %>%
  select(contains("_prop")) %>%
  head(3)
# A tibble: 3 x 1
#   RG04_prop
#       <dbl>
# 1       0.5
# 2       1  
# 3       0.5

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...