Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
278 views
in Technique[技术] by (71.8m points)

sql - Using group by and having clause

Using the following schema:

Supplier (sid, name, status, city)
Part (pid, name, color, weight, city)
Project (jid, name, city)
Supplies (sid, pid, jid**, quantity)
  1. Get supplier numbers and names for suppliers of parts supplied to at least two different projects.

  2. Get supplier numbers and names for suppliers of the same part to at least two different projects.

These were my answers:

1.

SELECT s.sid, s.name
FROM Supplier s, Supplies su, Project pr
WHERE s.sid = su.sid AND su.jid = pr.jid
GROUP BY s.sid, s.name
HAVING COUNT (DISTINCT pr.jid) >= 2 

2.

SELECT s.sid, s.name
FROM Suppliers s, Supplies su, Project pr, Part p
WHERE s.sid = su.sid AND su.pid = p.pid AND su.jid = pr.jid
GROUP BY s.sid, s.name
HAVING COUNT (DISTINCT pr.jid)>=2

Can anyone confirm if I wrote this correctly? I'm a little confused as to how the Group By and Having clause works

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The semantics of Having

To better understand having, you need to see it from a theoretical point of view.

A group by is a query that takes a table and summarizes it into another table. You summarize the original table by grouping the original table into subsets (based upon the attributes that you specify in the group by). Each of these groups will yield one tuple.

The Having is simply equivalent to a WHERE clause after the group by has executed and before the select part of the query is computed.

Lets say your query is:

select a, b, count(*) 
from Table 
where c > 100 
group by a, b 
having count(*) > 10;

The evaluation of this query can be seen as the following steps:

  1. Perform the WHERE, eliminating rows that do not satisfy it.
  2. Group the table into subsets based upon the values of a and b (each tuple in each subset has the same values of a and b).
  3. Eliminate subsets that do not satisfy the HAVING condition
  4. Process each subset outputting the values as indicated in the SELECT part of the query. This creates one output tuple per subset left after step 3.

You can extend this to any complex query there Table can be any complex query that return a table (a cross product, a join, a UNION, etc).

In fact, having is syntactic sugar and does not extend the power of SQL. Any given query:

SELECT list 
FROM table
GROUP BY attrList
HAVING condition;

can be rewritten as:

SELECT list from (
   SELECT listatt 
   FROM table 
   GROUP BY attrList) as Name
WHERE condition;

The listatt is a list that includes the GROUP BY attributes and the expressions used in list and condition. It might be necessary to name some expressions in this list (with AS). For instance, the example query above can be rewritten as:

select a, b, count 
from (select a, b, count(*) as count
      from Table 
      where c > 100
      group by a, b) as someName
where count > 10;

The solution you need

Your solution seems to be correct:

SELECT s.sid, s.name
FROM Supplier s, Supplies su, Project pr
WHERE s.sid = su.sid AND su.jid = pr.jid
GROUP BY s.sid, s.name
HAVING COUNT (DISTINCT pr.jid) >= 2 

You join the three tables, then using sid as a grouping attribute (sname is functionally dependent on it, so it does not have an impact on the number of groups, but you must include it, otherwise it cannot be part of the select part of the statement). Then you are removing those that do not satisfy your condition: the satisfy pr.jid is >= 2, which is that you wanted originally.

Best solution to your problem

I personally prefer a simpler cleaner solution:

  1. You need to only group by Supplies (sid, pid, jid**, quantity) to find the sid of those that supply at least to two projects.
  2. Then join it to the Suppliers table to get the supplier same.

 SELECT sid, sname from
    (SELECT sid from supplies 
    GROUP BY sid, pid 
    HAVING count(DISTINCT jid) >= 2
    ) AS T1
NATURAL JOIN 
Supliers;

It will also be faster to execute, because the join is only done when needed, not all the times.

--dmg


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...