Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
346 views
in Technique[技术] by (71.8m points)

mysql - Difference between using SUM(column = "xxx") and COUNT() with WHERE

I am a bit confused about why I got a different answer by using SUM(column = "xxx") and COUNT() with WHERE. Their logic looks the same to me...

In case you guys want some background of the problem:

enter image description here

My query that got the wrong answer:

WITH clicked AS(
SELECT ad_id, COUNT(action) AS c_cnt
FROM Ads
WHERE action = "clicked"
GROUP BY ad_id),

viewed AS(
SELECT ad_id, COUNT(action) AS v_cnt
FROM Ads
WHERE action = "Viewed"
GROUP BY ad_id),

total AS(
SELECT ad_id, c_cnt/(c_cnt+v_cnt)*100 AS ctr
FROM clicked
JOIN viewed
USING(ad_id)
GROUP BY ad_id)

SELECT DISTINCT ad_id, IFNULL(ROUND(ctr,2),0) AS ctr
FROM Ads
LEFT JOIN total
USING(ad_id)
ORDER BY 2 DESC, 1 ASC

My query that got the right answer:

WITH clicked AS(
SELECT ad_id, SUM(action = "Clicked") AS c_cnt
FROM Ads
GROUP BY ad_id),

viewed AS(
SELECT ad_id, SUM(action = "Viewed") AS v_cnt
FROM Ads
GROUP BY ad_id),

total AS(
SELECT ad_id, c_cnt/(c_cnt+v_cnt)*100 AS ctr
FROM clicked
JOIN viewed
USING(ad_id)
GROUP BY ad_id)

SELECT DISTINCT ad_id, IFNULL(ROUND(ctr,2),0) AS ctr
FROM Ads
LEFT JOIN total
USING(ad_id)
ORDER BY 2 DESC, 1 ASC

The only different part is the part I highlighted in yellow: enter image description here

enter image description here

The picture is the test case that my first query did not pass.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The important difference between those two options is:

  • if you use count() ... where ... , only ad_ids that have a row with that property will be part of the resultset (e.g. all results will have a x_cnt-value > 0).
  • if you use SUM(action = "xxx"), all ad_ids will be part of the resultset, but some will have the x_cnt-value 0.

Your total-resultset now does a join, e.g. the where-option will remove specifically those ad_ids that are in clicked but not in viewed (e.g. which would have v_cnt = 0). Those are the ones with a 100% conversion rate, which in turn will give you the wrong result in your final step, where those will show as 0% (as they are not in the total-resultset).

You can fix your solution by using a left join:

total AS( 
  SELECT ad_id, c_cnt/(c_cnt+IFNULL(v_cnt,0))*100 AS ctr
  FROM clicked
  LEFT JOIN viewed
  USING(ad_id)
  GROUP BY ad_id, c_cnt, v_cnt)

Now the cases "v_cnt = 0, c_cnt > 0" are part of the total-resultset.

Note that this only works as expected as the correct result for "v_cnt > 0, c_cnt = 0" is 0. Those cases won't be part of the total-resultset, but your final select handles those since you use left join total and IFNULL(ctr,0). If you wanted (or in a situation where 0 isn't the right correction), you could include them to your total-resultset by using a full outer join (e.g. ... clicked left join viewed ... union ... viewed left join clicked ...).

Speaking of edge cases, the SUM-option has one too: for "v_cnt = 0, c_cnt = 0" (e.g. ad_id 5), your division /(c_cnt+v_cnt) will fail. Although MySQL will handle this (by setting it null), and you in turn handle MySQLs handling of it (by making it 0 in IFNULL(ctr,0)), you may want to be aware of it.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...