Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
410 views
in Technique[技术] by (71.8m points)

dataframe - Subset matrix in mutate() function in R

Problem

I have a simple matrix:

library(tidyverse)

m <- matrix(seq(1,25), nrow = 5, ncol =5)
m
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]    1    6   11   16   21
#> [2,]    2    7   12   17   22
#> [3,]    3    8   13   18   23
#> [4,]    4    9   14   19   24
#> [5,]    5   10   15   20   25

This matrix contains values, that I would like to store as a third column in a dataframe that contains all combinations of indices:

library(tidyverse)
df <- expand_grid(V1 = 1:5, V2 = 1:5)

Attempt

df <- df %>%
    mutate(value = m[V1, V2])

This stores an entire matrix per dataframe field and not only the corresponding value.

Expected output

#> # A tibble: 25 x 3
#>       V1    V2   value
#>    <int> <int> <int>
#>  1     1     1     1
#>  2     1     2     6
#>  3     1     3    11
#>  4     1     4    16
#> and so on...

Question

How do I do this with mutate in R?

Note

I know that in this case, I could just reshape the data. However, I would like to apply the same approach to a multidimensional array (i.e., mutate(value = m[V1,V2,V3]), that does contain only a subset of index combinations. Hopefully, the question is clear, otherwise: let me know:)!


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Use cbind:

df %>%
  mutate(value = m[cbind(V1, V2)])
# # A tibble: 25 x 3
#       V1    V2 value
#    <int> <int> <int>
#  1     1     1     1
#  2     1     2     6
#  3     1     3    11
#  4     1     4    16
#  5     1     5    21
#  6     2     1     2
#  7     2     2     7
#  8     2     3    12
#  9     2     4    17
# 10     2     5    22
# # ... with 15 more rows

And it works with n-dimensional arrays (a matrix is a 2-dim array, so this is a natural extension of that):

m <- array(seq_len(5^3), dim=c(5, 5, 5))
expand_grid(V1 = 1:2, V2 = 2:3, V3 = 3:4) %>%
  mutate(value = m[cbind(V1, V2, V3)])
# # A tibble: 8 x 4
#      V1    V2    V3 value
#   <int> <int> <int> <int>
# 1     1     2     3    56
# 2     1     2     4    81
# 3     1     3     3    61
# 4     1     3     4    86
# 5     2     2     3    57
# 6     2     2     4    82
# 7     2     3     3    62
# 8     2     3     4    87

Commentary on the discussion to use rowwise:

rowwise is useful in very specific situations, and carries a not-insignificant performance penalty. Its utility is when the functions you need are not vectorized, needing instead one input (from zero or more models) at a time. Often, I find it better to explicitly do that type of calculation using sapply/lapply/vapply/mapply (base R) or the purrr::map* variants. While the effect is relatively the same (the calcs are done element-wise), to me it is a little clearer, and allows non-row-wise calculations in the same mutate (and preempts accidentally forgetting to ungroup the row-wise frame).


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...