Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
172 views
in Technique[技术] by (71.8m points)

sql - Calculating Outliers - Nested Aggregate Error

I am currently working SQL Workbench/J and Amazon Redshift.

I am working on a query with the intent to identify the number of outliers within a data set.

My source data contains one record per day for multiple symbols. I am utilizing 30 days of trailing data. In short, for 30 days there are ten symbols with 30 records each.

I am then utilizing the following query to calculate the mean, standard deviation, and upper/lower control limits for each unique symbol based upon the 30 day data set.

select
symbol,
avg(high) as MEAN,
cast(stddev_samp(high) as dec(14,2)) STDV,
(MEAN+STDV*3) as UCL,
(MEAN-STDV*3) as LCL
from historical
group by symbol
;

My next step will be calculating how many individual values from the 'high' column exceed the upper control limit calculated value. I have tried to add the following count(case...) statement, but it is failing:

select
symbol,
avg(high) as MEAN,
cast(stddev_samp(high) as dec(14,2)) STDV,
(MEAN+STDV*3) as UCL,
(MEAN-STDV*3) as LCL,
count(case when high>avg(high) then 1 else 0 end) as outlier
from historical
group by symbol
;

The specific error is

Amazon Invalid operation: aggregate function calls may not have nested aggregate or window function

Is a count(case..) statement the right method to utilize here, or what would the recommended approach or example be?

question from:https://stackoverflow.com/questions/65546489/calculating-outliers-nested-aggregate-error

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

There are a number of ways to do this but I think all of them involve a sub-query. This is because you have an aggregate (avg) compared to a per-row value (high) and then summing the the comparison.

I'd go with a sub-query where you perform an avg() window function partitioned by symbol. This will give you the average of the group on every row then just do the query as you have it. Kinda like this:

I am currently working SQL Workbench/J and Amazon Redshift.

I am working on a query with the intent to identify the number of outliers within a data set.

My source data contains one record per day for multiple symbols. I am utilizing 30 days of trailing data. In short, for 30 days there are ten symbols with 30 records each.

I am then utilizing the following query to calculate the mean, standard deviation, and upper/lower control limits for each unique symbol based upon the 30 day data set.

select symbol, avg(high) as MEAN, cast(stddev_samp(high) as dec(14,2)) STDV, (MEAN+STDV3) as UCL, (MEAN-STDV3) as LCL from historical group by symbol ;

My next step will be calculating how many individual values from the 'high' column exceed the upper control limit calculated value. I have tried to add the following count(case...) statement, but it is failing:

select symbol, avg(high) as MEAN, cast(stddev_samp(high) as dec(14,2)) STDV, (MEAN+STDV3) as UCL, 
  (MEAN-STDV3) as LCL, count(case when high>group_avg then 1 else 0 end) as outlier
from (
  select *, avg(high) over (partition by symbol) as group_avg
  from historical ) 
group by symbol ;

(You could also replace "avg(high) as MEAN" with "min(group_avg) as MEAN" since you already computed the average in the window function. Just a possible slight optimization.)


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...