Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
408 views
in Technique[技术] by (71.8m points)

indexing - How do I get MySQL to use an INDEX for view query?

I'm working on a web project with MySql database on Java EE. We needed a view to summarize data from 3 tables with over 3M rows overall. Each table was created with index. But I haven't found out a way to take advantages in the indexes in the conditional select statement retrieval from the view that we created with [group by].

I've getting suggestions from people that using views in MySql is not a good idea. Because you can't create index for views in mysql like in oracle. But in some test that I took, indexes can be used in view select statement. Maybe I've created those views in a wrong way.

I'll use a example to describe my problem.

We have a table that records data for high scores in NBA games, with index on column [happend_in]

CREATE  TABLE `highscores` (
   `tbl_id` int(11) NOT NULL auto_increment,
   `happened_in` int(4) default NULL,
   `player` int(3) default NULL,
   `score` int(3) default NULL,
   PRIMARY KEY  (`tbl_id`),
   KEY `index_happened_in` (`happened_in`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

insert data(8 rows)

INSERT INTO highscores(happened_in, player, score)
VALUES (2006, 24, 61),(2006, 24, 44),(2006, 24, 81),
(1998, 23, 51),(1997, 23, 46),(2006, 3, 55),(2007, 24, 34), (2008, 24, 37);

then I create a view to see the highest score that Kobe Bryant got in each year

CREATE OR REPLACE VIEW v_kobe_highScores
AS
   SELECT player, max(score) AS highest_score, happened_in
   FROM highscores
   WHERE player = 24
   GROUP BY happened_in;

I wrote a conditional statement to see the highest score that kobe got in 2006;

select * from v_kobe_highscores where happened_in = 2006;

When I explain it in toad for mysql, I found out that mysql have scan all rows to form the view, then find data with condition in it, without using index on [happened_in].

explain select * from v_kobe_highscores where happened_in = 2006;

explain result

The view that we use in our project is built among tables with millions of rows. Scanning all the rows from table in every view data retrieval is unacceptable. Please help! Thanks!

@zerkms Here is the result I tested on real-life. I don't see much differences between. I think @spencer7593 has the right point. The MySQL optimizer doesn't "push" that predicate down in the view query. real-life test

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

How do you get MySQL to use an index for a view query? The short answer, provide an index that MySQL can use.

In this case, the optimum index is likely a "covering" index:

... ON highscores (player, happened_in, score)

It's likely that MySQL will use that index, and the EXPLAIN will show: "Using index" due to the WHERE player = 24 (an equality predicate on the leading column in the index. The GROUP BY happened_id (the second column in the index), may allow MySQL to optimize that using the index to avoid a sort operation. Including the score column in the index will allow the query to satisfied entirely from the index, without having to visit (lookup) the data pages referenced by the index.

That's the quick answer. The longer answer is that MySQL is very unlikely to use an index with leading column of happened_id for the view query.


Why the view causes a performance issue

One of the issues you have with the MySQL view is that MySQL does not "push" the predicate from the outer query down into the view query.

Your outer query specifies WHERE happened_in = 2006. The MySQL optimizer does not consider the predicate when it runs the inner "view query". That query for the view gets executed separately, before the outer query. The resultset from the execution of that query get "materialized"; that is, the results are stored as an intermediate MyISAM table. (MySQL calls it a "derived table", and that name they use makes sense, when you understand the operations that MysQL performs.)

The bottom line is that the index you have defined on happened_in is not being used by MySQL when it rusn the query that forms the view definition.

After the intermediate "derived table" is created, THEN the outer query is executed, using that "derived table" as a rowsource. It's when that outer query runs that the happened_in = 2006 predicate is evaluated.

Note that all of the rows from the view query are stored, which (in your case) is a row for EVERY value of happened_in, not just the one you specify an equality predicate on in the outer query.

The way that view queries are processed may be "unexpected" by some, and this is one reason that using "views" in MySQL can lead to performance problems, as compared to the way view queries are processed by other relational databases.


Improving performance of the view query with a suitable covering index

Given your view definition and your query, about the best you are going to get would be a "Using index" access method for the view query. To get that, you'd need a covering index, e.g.

... ON highscores (player, happened_in, score).

That's likely to be the most beneficial index (performance wise) for your existing view definition and your existing query. The player column is the leading column because you have an equality predicate on that column in the view query. The happened_in column is next, because you've got a GROUP BY operation on that column, and MySQL is going to be able to use this index to optimize the GROUP BY operation. We also include the score column, because that is the only other column referenced in your query. That makes the index a "covering" index, because MySQL can satisfy that query directly from index pages, without a need to visit any pages in the underlying table. And that's as good as we're going to get out of that query plan: "Using index" with no "Using filesort".


Compare performance to standalone query with no derived table

You could compare the execution plan for your query against the view vs. an equivalent standalone query:

SELECT player
     , MAX(score) AS highest_score
     , happened_in
 FROM highscores
WHERE player = 24
  AND happened_in = 2006
GROUP
   BY player
    , happened_in

The standalone query can also make use of a covering index e.g.

... ON highscores (player, happened_in, score)

but without a need to materialize an intermediate MyISAM table.


I am not sure that any of the previous provides a direct answer to the question you were asking.

Q: How do I get MySQL to use an INDEX for view query?

A: Define a suitable INDEX that the view query can use.

The short answer is provide a "covering index" (index includes all columns referenced in the view query). The leading columns in that index should be the columns that are referenced with equality predicates (in your case, the column player would be a leading column because you have a player = 24 predicate in the query. Also, the columns referenced in the GROUP BY should be leading columns in the index, which allows MySQL to optimize the GROUP BY operation, by making use of the index rather than using a sort operation.

The key point here is that the view query is basically a standalone query; the results from that query get stored in an intermediate "derived" table (a MyISAM table that gets created when a query against the view gets run.

Using views in MySQL is not necessarily a "bad idea", but I would strongly caution those who choose to use views within MySQL to be AWARE of how MySQL processes queries that reference those views. And the way MySQL processes view queries differs (significantly) from the way view queries are handled by other databases (e.g. Oracle, SQL Server).


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...