sql - Calculating Outliers - Nested Aggregate Error

Question

Welcome To Ask or Share your Answers For Others

sql - Calculating Outliers - Nested Aggregate Error

posted Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

sql - Calculating Outliers - Nested Aggregate Error

I am currently working SQL Workbench/J and Amazon Redshift.

I am working on a query with the intent to identify the number of outliers within a data set.

My source data contains one record per day for multiple symbols. I am utilizing 30 days of trailing data. In short, for 30 days there are ten symbols with 30 records each.

I am then utilizing the following query to calculate the mean, standard deviation, and upper/lower control limits for each unique symbol based upon the 30 day data set.

select
symbol,
avg(high) as MEAN,
cast(stddev_samp(high) as dec(14,2)) STDV,
(MEAN+STDV*3) as UCL,
(MEAN-STDV*3) as LCL
from historical
group by symbol
;

My next step will be calculating how many individual values from the 'high' column exceed the upper control limit calculated value. I have tried to add the following count(case...) statement, but it is failing:

select
symbol,
avg(high) as MEAN,
cast(stddev_samp(high) as dec(14,2)) STDV,
(MEAN+STDV*3) as UCL,
(MEAN-STDV*3) as LCL,
count(case when high>avg(high) then 1 else 0 end) as outlier
from historical
group by symbol
;

The specific error is

Amazon Invalid operation: aggregate function calls may not have nested aggregate or window function

Is a count(case..) statement the right method to utilize here, or what would the recommended approach or example be?

question from:https://stackoverflow.com/questions/65546489/calculating-outliers-nested-aggregate-error

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-06T18:50:44+0000

There are a number of ways to do this but I think all of them involve a sub-query. This is because you have an aggregate (avg) compared to a per-row value (high) and then summing the the comparison.

I'd go with a sub-query where you perform an avg() window function partitioned by symbol. This will give you the average of the group on every row then just do the query as you have it. Kinda like this:

I am currently working SQL Workbench/J and Amazon Redshift.

I am working on a query with the intent to identify the number of outliers within a data set.

My source data contains one record per day for multiple symbols. I am utilizing 30 days of trailing data. In short, for 30 days there are ten symbols with 30 records each.

I am then utilizing the following query to calculate the mean, standard deviation, and upper/lower control limits for each unique symbol based upon the 30 day data set.

select symbol, avg(high) as MEAN, cast(stddev_samp(high) as dec(14,2)) STDV, (MEAN+STDV3) as UCL, (MEAN-STDV3) as LCL from historical group by symbol ;

My next step will be calculating how many individual values from the 'high' column exceed the upper control limit calculated value. I have tried to add the following count(case...) statement, but it is failing:

select symbol, avg(high) as MEAN, cast(stddev_samp(high) as dec(14,2)) STDV, (MEAN+STDV3) as UCL, 
  (MEAN-STDV3) as LCL, count(case when high>group_avg then 1 else 0 end) as outlier
from (
  select *, avg(high) over (partition by symbol) as group_avg
  from historical ) 
group by symbol ;

(You could also replace "avg(high) as MEAN" with "min(group_avg) as MEAN" since you already computed the average in the window function. Just a possible slight optimization.)

Categories

sql - Calculating Outliers - Nested Aggregate Error

sql - Calculating Outliers - Nested Aggregate Error

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags