Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
117 views
in Technique[技术] by (71.8m points)

mysql - How to make multiple LEFT JOINs with OR fully use a composite index?

I am working on a timekeeping system and trying to determine that for each person, are they early or late for their scheduled shift. The "TB_Scan" table originally had the person code and scan time which is a datetime field. However, due to querying issues I was having, decided to add the scYear, scMonth and scDay fields thinking it might help.

It is a system that calculates how the users scan their fingerprints when they enter/leave the workplace. I don't know how it is called in English. I need to determine if the user is late in the morning, and if the user leaves work early.

This tb_scan table contains date and time a user scans a fingerprint.

CREATE TABLE `tb_scan` (
  `scpercode` varchar(6) DEFAULT NULL,
  `scyear` varchar(4) DEFAULT NULL,
  `scmonth` varchar(2) DEFAULT NULL,
  `scday` varchar(2) DEFAULT NULL,
  `scscantime` datetime,
  KEY `all` (`scyear`,`scmonth`,`scday`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1

It has 100,000+ rows, something like this

scpercode scyear scmonth scday     scdateandtime
000001    2010      10     10      2016-01-10 08:02:00
000001    2010      10     10      2016-01-02 17:33:00
000001    2010      10     11      2016-01-11 07:48:00
000001    2010      10     11      2016-01-11 17:29:00
000002    2010      10     10      2016-01-10 17:31:00
000002    2010      10     10      2016-01-02 17:28:00
000002    2010      10     11      2016-01-11 05:35:00
000002    2010      10     11      2016-01-11 05:29:00

And this tb_workday table contains each date

CREATE TABLE `tb_workday` (
  `wdpercode` varchar(6) DEFAULT NULL,
  `wdshift` varchar(1) DEFAULT NULL,
  `wddate` date DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1

It has rows with date sequence like this:

wdpercode  wdshift wddate
000001     1       2010-10-10
000001     1       2010-10-11
000001     1       2010-10-12
000001     1       2010-10-13
000002     2       2010-10-10
000002     2       2010-10-11
000002     2       2010-10-12
000002     2       2010-10-13

There is another tb_shift table containing shift time

CREATE TABLE `tb_shift` (
  `shiftcode` varchar(1) DEFAULT NULL,
  `shiftbegin2` varchar(4) DEFAULT NULL,
  `shiftbegin` varchar(4) DEFAULT NULL,
  `shiftmid` varchar(4) DEFAULT NULL,
  `shiftend` varchar(4) DEFAULT NULL,
  `shiftend2` varchar(4) DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1

shiftcode   shiftbegin2  shiftbegin  shiftmid  shiftend  shiftend2
        1     04:00:00     08:00:00  12:00:00  17:30:00  21:30:00 
        2     12:00:00     17:30:00  21:00:00  05:30:00  09:30:00

I want to determine that in each day, is the employee comes to work late or leaves work early, and at what time.

SELECT wdpercode,wddate,shiftbegin,shiftend,time(tlate.scscantime) wdlate,time(tearly.scscantime) wdearly
FROM tb_workday
LEFT JOIN tb_shift
  ON wdshift=shiftcode
LEFT JOIN tb_scan tlate 
  ON wdpercode=tlate.scpercode
  AND tlate.scyear=year(wddate)
  AND tlate.scmonth=month(wddate)
  AND (tlate.scday=day(wddate)
    OR tlate.scday=day(wddate)+1)
  AND tlate.scscantime>=ADDDATE(CONCAT(wddate,' ',shiftbegin),INTERVAL IF(shiftbegin2>shiftbegin,1,0) DAY)
  AND tlate.scscantime<=ADDDATE(CONCAT(wddate,' ',shiftmid),INTERVAL IF(shiftbegin2>shiftmid,1,0) DAY)
LEFT JOIN tb_scan tearly 
  ON wdpercode=tearly.scpercode
  AND tearly.scyear=year(wddate)
  AND tearly.scmonth=month(wddate)
  AND (tearly.scday=day(wddate)
    OR tearly.scday=day(wddate)+1)
  AND tearly.scscantime>ADDDATE(CONCAT(wddate,' ',shiftmid),INTERVAL IF(shiftbegin2>shiftmid,1,0) DAY)
  AND tearly.scscantime<ADDDATE(CONCAT(wddate,' ',shiftend),INTERVAL IF(shiftbegin2>shiftend,1,0) DAY)

Here is the example of an output:

wdpercode wddate      shiftbegin  shiftend  wdlate    wdearly
000001    2016-01-10  08:00:00    17:30:00  08:02:00  (null)
000001    2016-01-11  08:00:00    17:30:00  (null)    17:29:00
000002    2016-01-11  17:30:00    05:30:00  17:31:00  (null)
000002    2016-01-11  17:30:00    05:30:00  (null)    05:29:00

this ADDDATE(CONCAT(wddate,' ',shiftbegin),INTERVAL IF(shiftbegin2>shiftbegin,1,0) DAY) is for employees who work on night shift, so it has to add 1 day into the shift time

The problem is if I create an index for scscantime, MySQL refuses to use it for comparison (>=,<=,>,<). Please see this thread Why does MySQL not use an index for a greater than comparison?

Because of this I created the scyear, scmonth, and scday fields and combine them in an index along with scpercode. And I have to make sure it calculates for workers working in night shift too so I have to add it with OR scday=day(wddate)+1 condition.

Before I added the OR condition, the EXPLAIN result was 52 rows. But when I added the OR scday=day(wddate)+1 condition, the EXPLAIN result became 364 rows, that means MySQL did not use scday part of the index. Is there any way to use the whole index, so the EXPLAIN result becomes 52 rows? I also tried removing the +1 part and the result is also 52.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

From your query (readability formatted)

SELECT 
      table1.*, 
      tb21.year, 
      tb21.month, 
      tb21.day, 
      tb22.year, 
      tb22.month, 
      tb22.day 
   FROM 
      table1
         LEFT JOIN table2 tb21 
            ON table1.year = tb21.year
            AND table1.month = tb21.month
            AND (tb21.day = table1.day 
              OR tb21.day = table1.day+1)
         LEFT JOIN table2 tb22 
            ON table1.year = tb22.year
            AND table1.month = tb22.month
            AND (tb22.day = table1.day+2 
              OR tb22.day = table1.day+3)

Aside from your restrictive content provided, lets look at you trying to compare the data from same day to day + 1, +2 and +3. Lets also assume just for this example you have only 10 days in the table represented as all June 1 - June 10, 2016 in both your Table1 AND Table2 tables.

Again, this is an assumption that each table has all 10 dates in question just for simplistic purposes of why so many records. So, for Table 1 date of June 1, 2016, it will qualify with table 2 (tb21 version) and return TWO records. One for June 1 and another for June 2. So now you have TWO records in your result. Now, you do that again left-join to table 2 (tb22 version). This time you are looking for 2 and 3 days out, of which you have June 3 and 4 in the table. So you are getting a Cartesian result. So, for the June 1 record in table 1, you now have 4 records as follows.

T1Year  T1Month  T1Day  T21Day T22Day
2016    6        1      1      3
2016    6        1      1      4
2016    6        1      2      3
2016    6        1      2      4

Now, lets say your table 2 has 3 entries on June 2 and 3 entries on June 3 and your data is going to super bloat. This is why you need to provide more clarification on what you are trying to do.

So, not having true context on what you are looking for, ignore the fact it is not perfectly utilizing your index. You have an OR based on the date via the day comparisons. It should still utilize for the query anyhow.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

56.9k users

...