Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
934 views
in Technique[技术] by (71.8m points)

r - How to preserve column names when dynamically passing data frame columns to `aggregate`

With a data frame like below

df1 <- data.frame(a=seq(1.1,9.9,1.1), b=seq(0.1,0.9,0.1),
                  c=rev(seq(10.1, 99.9, 11.1)))

I want to aggregate cols b and c by a

So I would do something like this

aggregate(cbind(b,c) ~ a, data = df1, mean)

This would get it done. However I want to generalize without hard coded column names like in a function.

myAggFunction <- function (df, col_main, col_1, col_2){
    return (aggregate(cbind(df[,col1], df[,col2]) ~ df[,col_main], df, mean))
    }
myAggFunction(df, 1, 2, 3)

The issue I have is that the col names of the returned data frame is as below

 df2[, 1]  V1   V2

How do I get the column names in the original data frame in the returned data frame?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I will be assuming a general case, where you have multiple LHS (left hand sides) as well as multiple RHS (right hand sides).


Using "data.frame" method

## S3 method for class 'data.frame'
aggregate(x, by, FUN, ..., simplify = TRUE, drop = TRUE)

If you pass object as a named list, you get names preserved. So do not access your data frame with [, ], but with []. You may construct your function as:

## `LHS` and `RHS` are vectors of column names or numbers giving column positions
fun1 <- function (df, LHS, RHS){
  ## call `aggregate.data.frame`
  aggregate.data.frame(df[LHS], df[RHS], mean)
  }

Still using "formula" method?

## S3 method for class 'formula'
aggregate(formula, data, FUN, ...,
          subset, na.action = na.omit)

It is slightly tedious, but we want to construct a nice formula via:

as.formula( paste(paste0("cbind(", toString(LHS), ")"),
                  paste(RHS, collapse = " + "), sep = " ~ ") )

For example:

LHS <- c("y1", "y2", "y3")
RHS <- c("x1", "x2")
as.formula( paste(paste0("cbind(", toString(LHS), ")"),
                  paste(RHS, collapse = " + "), sep = "~") )
# cbind(y1, y2, y3) ~ x1 + x2

If you feed this formula to aggregate, you will get decent column names preserved.

So construct your function as such:

fun2 <- function (df, LHS, RHS){
  ## ideally, `LHS` and `RHS` should readily be vector of column names
  ## but specifying vector of numeric positions are allowed
  if (is.numeric(LHS)) LHS <- names(df)[LHS]
  if (is.numeric(RHS)) RHS <- names(df)[RHS]
  ## make a formula 
  form <- as.formula( paste(paste0("cbind(", toString(LHS), ")"),
                      paste(RHS, collapse = " + "), sep = "~") )
  ## call `aggregate.formula`
  stats:::aggregate.formula(form, df, mean)
  }

Remark

aggregate.data.frame is the best. aggregate.formula is a wrapper and will call model.frame inside to construct a data frame first.

I give "formula" method as an option, because the way I construct a formula is useful for lm, etc.


Simple, reproducible example

set.seed(0)
dat <- data.frame(y1 = rnorm(10), y2 = rnorm(10),
                  x1 = gl(2,5, labels = letters[1:2]))

## "data.frame" method with `fun1`
fun1(dat, 1:2, 3)
#  x1          y1         y2
#1  a  0.79071819 -0.3543499
#2  b -0.07287026 -0.3706127

## "formula" method with `fun2`
fun2(dat, 1:2, 3)
#  x1          y1         y2
#1  a  0.79071819 -0.3543499
#2  b -0.07287026 -0.3706127

fun2(dat, c("y1", "y2"), "x1")
#  x1          y1         y2
#1  a  0.79071819 -0.3543499
#2  b -0.07287026 -0.3706127

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...