Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.7k views
in Technique[技术] by (71.8m points)

database - crosstab: Counting the same string in a field and display it as field name

I would like to have a total of specific string in the field, for this example in the PAYROLL_PAYMONTH field. For example, I will count the number of 'HELLO' in the field and display it in a group.

-- DATA 
EMP_SURNAME   PAYROLL_PAYYEAR    PAYROLL_PAYMONTH
    X              1999                JAN
    X              1999                JAN
    X              1999                FEB

-- OUTPUT 
EMP_SURNAME   PAYROLL_PAYYEAR       JAN   FEB   MAR
    X              1999              2     1     0

For counting the same string in a field and display it I made a group select procedure in Firebird 3 using SQL Manager for Firebird

CREATE PROCEDURE PAID_LISTING(
  SORT_PAYROLL_YEAR VARCHAR(50) CHARACTER SET ISO8859_1 COLLATE ISO8859_1)
RETURNS(
  EMP_SURNAME VARCHAR(50) CHARACTER SET ISO8859_1 COLLATE ISO8859_1,
  PAYROLL_PAYYEAR VARCHAR(50) CHARACTER SET ISO8859_1 COLLATE ISO8859_1,
  PAYROLL_MON_JAN VARCHAR(50) CHARACTER SET ISO8859_1 COLLATE ISO8859_1)
AS
BEGIN
  FOR
    SELECT
      B.EMP_SURNAME,
      A.PAYROLL_PAYYEAR,
      COUNT (A.PAYROLL_PAYMONTH)

    FROM PAYROLL A, EMP B
    WHERE A.EMP_PK = B.EMP_PK AND  A.PAYROLL_YEAR =: SORT_PAYROLL_YEAR

    GROUP BY
      B.EMP_SURNAME,
      A.PAYROLL_PAYYEAR

    ORDER BY B.EMP_SURNAME ASC
    INTO
      :EMP_SURNAME,
      :PAYROLL_PAYYEAR,
      :PAYROLL_MON_JAN
  DO
    BEGIN
      SUSPEND;
    END
END;

but it is not the result I want. What to do next?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

What you want to get is called "cross-table report" - https://en.wikipedia.org/wiki/Crosstab

A normal way to generate it is split in two steps:

  1. You make a usual 1-directional query in the database, with fixed column count, with columns being semantically different, not repeating one another. So, years and months - JAN, FEB, etc - would go in different rows, not in adjacent columns.

  2. then you represent the results of that 1D query in the desired 2-directional table by the client application means. What language and libraries you make your client application with they should provide means to make cross-tables out of regular 1D queries.

The thing is, the database is a tool to keep and extract data, not to make it eye-candy. And your client application is a tool to render the data in the ways easy and nice to look at. "Divide et empera", use every tool for the task it was created and optimized for. Forcing SQL servers into doing visual representation, while possible, would be a "quest for glory" both unnatural and relatively slow.

However if you intend to implement it in pure SQL regardless inefficiency of it then you can use CTE's for it.

Again, "divide et empera", split your complex task into smaller simpler ones. I will work with the sample data you put in your question.

CREATE TABLE DATA (
  EMP_SURNAME VARCHAR(10) NOT NULL,
  PAYROLL_PAYYEAR SMALLINT NOT NULL,
  PAYROLL_PAYMONTH CHAR(3) NOT NULL);


/*
EMP_SURNAME   PAYROLL_PAYYEAR    PAYROLL_PAYMONTH
  X              1999                JAN
  X              1999                FEB
  X              1999                JAN
*/

You have to make three steps.

  1. Fold the data - count the per-month rows. This is the usual GROUP BY query and normally it would be the only one, as the cross-tabbing would be done by your application out of its results.

  2. Make the "skeleton" list of the rows your result table would contain. Here that means - all pairs PERSON+YEAR for which there is any data. This would skip any year where there is no data for not a single month.

  3. Enforce those queries results together and make them align horizontally, column by column, instead of normal for SQL row-under-row structure.

Here we go.

Step 1:

 select EMP_SURNAME, PAYROLL_PAYYEAR, PAYROLL_PAYMONTH, Count(*) as QTY
 from DATA 
 group by EMP_SURNAME, PAYROLL_PAYYEAR, PAYROLL_PAYMONTH


EMP_SURNAME PAYROLL_PAYYEAR PAYROLL_PAYMONTH    QTY
X       1999        FEB                 1
X       1999        JAN                 2

Step 2:

select distinct EMP_SURNAME, PAYROLL_PAYYEAR from DATA

EMP_SURNAME PAYROLL_PAYYEAR
X       1999

Step 3:

with EMP_YEAR as ( select distinct EMP_SURNAME, PAYROLL_PAYYEAR from DATA )
,GROUPED as
(
  select EMP_SURNAME, PAYROLL_PAYYEAR, PAYROLL_PAYMONTH, Count(*) as QTY
  from DATA group by EMP_SURNAME, PAYROLL_PAYYEAR, PAYROLL_PAYMONTH
)

select EMP_YEAR.EMP_SURNAME, EMP_YEAR.PAYROLL_PAYYEAR
  ,coalesce( emp_jan.qty, 0) as JAN
  ,coalesce( emp_feb.qty, 0) as FEB
  ,coalesce( emp_mar.qty, 0) as MAR
from EMP_YEAR
left join GROUPED as EMP_JAN on
     EMP_YEAR.EMP_SURNAME = EMP_JAN.EMP_SURNAME and
     EMP_YEAR.PAYROLL_PAYYEAR = EMP_JAN.PAYROLL_PAYYEAR and
     EMP_JAN.PAYROLL_PAYMONTH = 'JAN'
left join GROUPED as EMP_FEB on
     EMP_YEAR.EMP_SURNAME = EMP_FEB.EMP_SURNAME and
     EMP_YEAR.PAYROLL_PAYYEAR = EMP_FEB.PAYROLL_PAYYEAR and
     EMP_FEB.PAYROLL_PAYMONTH = 'FEB'
left join GROUPED as EMP_MAR on
     EMP_YEAR.EMP_SURNAME = EMP_MAR.EMP_SURNAME and
     EMP_YEAR.PAYROLL_PAYYEAR = EMP_MAR.PAYROLL_PAYYEAR and
     EMP_MAR.PAYROLL_PAYMONTH = 'MAR'

...and here is what you wanted to get:

EMP_SURNAME PAYROLL_PAYYEAR JAN FEB MAR
X           1999            2   1   0

Now, this query is ugly, it is fragile (lot of copy-paste where you can easily make mistake and then it would be harder to spot it), and - it is slow. Just see the query plan for this request - you join the table with itself again and again for every column!

PLAN JOIN (JOIN (JOIN (SORT (EMP_YEAR DATA NATURAL), SORT (EMP_JAN DATA NATURAL)), SORT (EMP_FEB DATA NATURAL)), SORT (EMP_MAR DATA NATURAL))

So... that is how you can do it in SQL servers, but think again and try to distribute tasks between proper tools, thus only do group-by query #1 on the server, and make your client application regrouping it into cross-table report instead.

PS. wrapping this query into a stored procedure would not be a good idea in Firebird. Procedures and functions are for programming. If you want just to keep a complex query as a named SQL object - that is what SQL VIEWs are for.

create view CTE_CROSSTAB (EMP_SURNAME,PAYROLL_PAYYEAR,JAN,FEB,MAR) as   
with EMP_YEAR as ( select distinct EMP_SURNAME, PAYROLL_PAYYEAR from data )
......etc

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...