Q1: Why is the behavior so strange?
This is actually a documented behavior at ?aggregate
(though it may still be unexpected). The relevant argument to look at would be simplify
.
If simplify
is set to FALSE
, aggregate
would produce a list
instead in a case like this.
res2 <- aggregate(airquality$Wind, list(airquality$Month), function (x)
quantile(x, c(0.9, 0.95, 0.975)), simplify = FALSE)
str(res2)
# 'data.frame': 5 obs. of 2 variables:
# $ Group.1: int 5 6 7 8 9
# $ x :List of 5
# ..$ 1 : Named num 16.6 17.5 18.8
# .. ..- attr(*, "names")= chr "90%" "95%" "97.5%"
# ..$ 32 : Named num 14.9 15.6 17.4
# .. ..- attr(*, "names")= chr "90%" "95%" "97.5%"
# ..$ 62 : Named num 14.3 14.6 14.9
# .. ..- attr(*, "names")= chr "90%" "95%" "97.5%"
# ..$ 93 : Named num 12.6 14.1 14.6
# .. ..- attr(*, "names")= chr "90%" "95%" "97.5%"
# ..$ 124: Named num 15 15.5 15.8
# .. ..- attr(*, "names")= chr "90%" "95%" "97.5%"
Now, both a matrix
and a list
as columns may seem to be strange behavior, but I presume it's more of a case of "status by design" rather than a "bug" or a "flaw".
For instance, consider the following: We want to aggregate both the "Wind" and the "Temp" columns from the "airquality" dataset, and we know that each aggregation would result in multiple columns (like we would expect with quantile
).
res3 <- aggregate(cbind(Wind, Temp) ~ Month, airquality,
function (x) quantile(x, c(0.9, 0.95, 0.975)))
res3
# Month Wind.90% Wind.95% Wind.97.5% Temp.90% Temp.95% Temp.97.5%
# 1 5 16.6000 17.5000 18.8250 74.000 77.500 79.500
# 2 6 14.9000 15.5600 17.3650 87.300 91.100 92.275
# 3 7 14.3000 14.6000 14.9000 89.000 91.500 92.000
# 4 8 12.6000 14.0500 14.6000 94.000 95.000 96.250
# 5 9 14.9600 15.5000 15.8025 91.100 92.550 93.000
In some ways, keeping these values as matrix
-columns might make sense--the data aggregated data are easily accessible by their original column names:
res3$Temp
# 90% 95% 97.5%
# [1,] 74.0 77.50 79.500
# [2,] 87.3 91.10 92.275
# [3,] 89.0 91.50 92.000
# [4,] 94.0 95.00 96.250
# [5,] 91.1 92.55 93.000
Q2: How do you get the results as separate columns in a data.frame
?
But a list
as a column is just as awkward to deal with as a matrix
as a column in many cases. If you want to "flatten" your matrix
into columns, use do.call(data.frame, ...)
:
do.call(data.frame, res1)
# Group.1 x.90. x.95. x.97.5.
# 1 5 16.60 17.50 18.8250
# 2 6 14.90 15.56 17.3650
# 3 7 14.30 14.60 14.9000
# 4 8 12.60 14.05 14.6000
# 5 9 14.96 15.50 15.8025
str(.Last.value)
# 'data.frame': 5 obs. of 4 variables:
# $ Group.1: int 5 6 7 8 9
# $ x.90. : num 16.6 14.9 14.3 12.6 15
# $ x.95. : num 17.5 15.6 14.6 14.1 15.5
# $ x.97.5.: num 18.8 17.4 14.9 14.6 15.8a
Q3: Are there other alternatives?
As with most things R, yes of course. My preferred alternative would be to use the "data.table" package, with which you can do:
library(data.table)
as.data.table(airquality)[, as.list(quantile(Wind, c(.9, .95, .975))),
by = Month]
# Month 90% 95% 97.5%
# 1: 5 16.60 17.50 18.8250
# 2: 6 14.90 15.56 17.3650
# 3: 7 14.30 14.60 14.9000
# 4: 8 12.60 14.05 14.6000
# 5: 9 14.96 15.50 15.8025
str(.Last.value)
# Classes ‘data.table’ and 'data.frame': 5 obs. of 4 variables:
# $ Month: int 5 6 7 8 9
# $ 90% : num 16.6 14.9 14.3 12.6 15
# $ 95% : num 17.5 15.6 14.6 14.1 15.5
# $ 97.5%: num 18.8 17.4 14.9 14.6 15.8
# - attr(*, ".internal.selfref")=<externalptr>