r - Can't get aggregate() work for regression by group

Question

Welcome To Ask or Share your Answers For Others

r - Can't get aggregate() work for regression by group

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

r - Can't get aggregate() work for regression by group

I want to use aggregate with this custom function:

#linear regression f-n
CalculateLinRegrDiff = function (sample){
  fit <- lm(value~ date, data = sample)
  diff(range(fit$fitted))
}

dataset2 = aggregate(value ~ id + col, dataset, CalculateLinRegrDiff(dataset))

I receive the error:

Error in get(as.character(FUN), mode = "function", envir = envir) : 
  object 'FUN' of mode 'function' was not found

What is wrong?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T21:31:44+0000

Your syntax on using aggregate is wrong in the first place. Pass function CalculateLinRegrDiff not an evaluated one CalculateLinRegrDiff(dataset) to FUN argument.

Secondly, you've chosen the wrong tool. aggregate can't help you fit a regression by group. It splits the vector on the LHS of ~ according to combinations on the RHS, and then apply FUN on the LHS. That is, FUN should be a function that works with an atomic vector not a data frame. Say, mean, sd, quantile, etc are all functions that take atomic vector as input. CalculateLinRegrDiff expects a data frame input and that is not going to work with aggregate.

Note that sometimes we use cbind on the LHS, like cbind(x, y) ~ f. This means that we apply FUN in parallel to x ~ f and y ~ f. The LHS variables are independent and not used together.

The right tool for you is the by function. It splits a data frame into sub data frames and applies FUN on each sub frame. So it is ideal for regression by group.

by(dataset[c("value", "date")], dataset[c("id", "col")], CalculateLinRegrDiff)

A simple reproducible example:

set.seed(0)
dataset <- data.frame(value = runif(20), date = runif(20),
                      f = sample(gl(2, 10)), g = sample(gl(4, 5)))
oo <- by(dataset[c("value", "date")], dataset[c("f", "g")], CalculateLinRegrDiff)
str(oo)
# by [1:2, 1:4] 0.307 0.251 0.109 0.201 0.472 ...
# - attr(*, "dimnames")=List of 2
#  ..$ f: chr [1:2] "1" "2"
#  ..$ g: chr [1:4] "1" "2" "3" "4"

Since CalculateLinRegrDiff is a scalar function that returns a single scalar, by will simplify the result oo to an array rather than a list. This array is like a contingency table, so we can use the "table" method of as.data.frame to reshape it to a data frame:

oo <- as.data.frame.table(oo)
#  f g      Freq
#1 1 1 0.3069877
#2 2 1 0.2508591
#3 1 2 0.1087895
#4 2 2 0.2007295
#5 1 3 0.4715680
#6 2 3 0.4942069
#7 1 4 0.3223174
#8 2 4 0.4687340

The name "Freq" may be undesired but you can easily change it. Say names(oo)[3] <- "foo".

As I said in my comments under your question, we can also use split and lapply. But then there is no trivial way to convert the result into a good-looking data frame.

datlist <- split(dataset[c("value", "date")], dataset[c("f", "g")], drop = TRUE)
rr <- lapply(datlist, CalculateLinRegrDiff)
stack(rr)
#     values ind
#1 0.3069877 1.1
#2 0.2508591 2.1
#3 0.1087895 1.2
#4 0.2007295 2.2
#5 0.4715680 1.3
#6 0.4942069 2.3
#7 0.3223174 1.4
#8 0.4687340 2.4

I suggest you read Linear Regression and group by in R for a thorough demonstrations on regression by group.

Categories

r - Can't get aggregate() work for regression by group

r - Can't get aggregate() work for regression by group

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags