Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
433 views
in Technique[技术] by (71.8m points)

mysql - Higher cardinality column first in an index when involving a range?

CREATE TABLE `files` (
  `did` int(10) unsigned NOT NULL DEFAULT '0',
  `filename` varbinary(200) NOT NULL,
  `ext` varbinary(5) DEFAULT NULL,
  `fsize` double DEFAULT NULL,
  `filetime` datetime DEFAULT NULL,
  PRIMARY KEY (`did`,`filename`),
  KEY `fe` (`filetime`,`ext`),          -- This?
  KEY `ef` (`ext`,`filetime`)           -- or This?
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ;

There are a million rows in the table. The filetimes are mostly distinct. There are a finite number of ext values. So, filetimehas a high cardinality and ext has a much lower cardinality.

The query involves both ext and filetime:

WHERE ext = '...'
  AND filetime BETWEEN ... AND ...

Which of those two indexes is better? And why?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

First, let's try FORCE INDEX to pick either ef or fe. The timings are too short to get a clear picture of which is faster, but `EXPLAIN shows a difference:

Forcing the range on filetime first. (Note: The order in WHERE has no impact.)

mysql> EXPLAIN SELECT COUNT(*), AVG(fsize)
    FROM files FORCE INDEX(fe)
    WHERE ext = 'gif' AND filetime >= '2015-01-01'
                      AND filetime <  '2015-01-01' + INTERVAL 1 MONTH;
+----+-------------+-------+-------+---------------+------+---------+------+-------+-----------------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows  | Extra                 |
+----+-------------+-------+-------+---------------+------+---------+------+-------+-----------------------+
|  1 | SIMPLE      | files | range | fe            | fe   | 14      | NULL | 16684 | Using index condition |
+----+-------------+-------+-------+---------------+------+---------+------+-------+-----------------------+

Forcing the low-cardinality ext first:

mysql> EXPLAIN SELECT COUNT(*), AVG(fsize)
    FROM files FORCE INDEX(ef)
    WHERE ext = 'gif' AND filetime >= '2015-01-01'
                      AND filetime <  '2015-01-01' + INTERVAL 1 MONTH;
+----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra                 |
+----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+
|  1 | SIMPLE      | files | range | ef            | ef   | 14      | NULL |  538 | Using index condition |
+----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+

Clearly, the rows says ef is better. But let's check with the Optimizer trace. The output is rather bulky; I'll show only the interesting parts. No FORCE is needed; the trace will show both options then pick the better.

             ...
             "potential_range_indices": [
                ...
                {
                  "index": "fe",
                  "usable": true,
                  "key_parts": [
                    "filetime",
                    "ext",
                    "did",
                    "filename"
                  ]
                },
                {
                  "index": "ef",
                  "usable": true,
                  "key_parts": [
                    "ext",
                    "filetime",
                    "did",
                    "filename"
                  ]
                }
              ],

...

              "analyzing_range_alternatives": {
                "range_scan_alternatives": [
                  {
                    "index": "fe",
                    "ranges": [
                      "2015-01-01 00:00:00 <= filetime < 2015-02-01 00:00:00"
                    ],
                    "index_dives_for_eq_ranges": true,
                    "rowid_ordered": false,
                    "using_mrr": false,
                    "index_only": false,
                    "rows": 16684,
                    "cost": 20022,               <-- Here's the critical number
                    "chosen": true
                  },
                  {
                    "index": "ef",
                    "ranges": [
                      "gif <= ext <= gif AND 2015-01-01 00:00:00 <= filetime < 2015-02-01 00:00:00"
                    ],
                    "index_dives_for_eq_ranges": true,
                    "rowid_ordered": false,
                    "using_mrr": false,
                    "index_only": false,
                    "rows": 538,
                    "cost": 646.61,               <-- Here's the critical number
                    "chosen": true
                  }
                ],

...

          "attached_conditions_computation": [
            {
              "access_type_changed": {
                "table": "`files`",
                "index": "ef",
                "old_type": "ref",
                "new_type": "range",
                "cause": "uses_more_keyparts"   <-- Also interesting
              }
            }

With fe (range column first), the range could be used, but it estimated scanning through 16684 rows fishing for ext='gif'.

With ef (low cardinality ext first), it could use both columns of the index and drill down more efficiently in the BTree. Then it found an estimated 538 rows, all of which are useful for the query -- no further filtering needed.

Conclusions:

  • INDEX(filetime, ext) used only the first column.
  • INDEX(ext, filetime) used both columns.
  • Put columns involved in = tests first in the index regardless of cardinality.
  • The query plan won't go beyond the first 'range' column.
  • "Cardinality" is irrelevant for composite indexes and this type of query.

("Using index condition" means that the Storage Engine (InnoDB) will use columns of the index beyond the one used for filtering.)


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

57.0k users

...