How do I get MySQL to use an INDEX for view query?
I'm working on a web project with MySql database on Java EE. We needed a view to summarize data from 3 tables with over 3M rows overall. Each table was created with index. But I haven't found out a way to take advantages in the indexes in the conditional select statement retrieval from the view that we created with [group by].
I've getting suggestions from people that using views in MySql is not a good idea. Because you can't create index for views in mysql like in oracle. But in some test that I took, indexes can be used in view select statement. Maybe I've created those views in a wrong way.
I'll use a example to describe my problem.
We have a table that records data for high scores in NBA games, with index on column [happend_in]
CREATE TABLE `highscores` ( `tbl_id` int(11) NOT NULL auto_increment, `happened_in` int(4) default NULL, `player` int(3) default NULL, `score` int(3) default NULL, PRIMARY KEY (`tbl_id`), KEY `index_happened_in` (`happened_in`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
insert data(8 rows)
INSERT INTO highscores(happened_in, player, score) VALUES (2006, 24, 61),(2006, 24, 44),(2006, 24, 81), (1998, 23, 51),(1997, 23, 46),(2006, 3, 55),(2007, 24, 34), (2008, 24, 37);
then I create a view to see the highest score that Kobe Bryant got in each year
CREATE OR REPLACE VIEW v_kobe_highScores AS SELECT player, max(score) AS highest_score, happened_in FROM highscores WHERE player = 24 GROUP BY happened_in;
I wrote a conditional statement to see the highest score that kobe got in 2006;
select * from v_kobe_highscores where happened_in = 2006;
When I explain it in toad for mysql, I found out that mysql have scan all rows to form the view, then find data with condition in it, without using index on [happened_in].
explain select * from v_kobe_highscores where happened_in = 2006;
The view that we use in our project is built among tables with millions of rows. Scanning all the rows from table in every view data retrieval is unacceptable. Please help! Thanks!
@zerkms Here is the result I tested on real-life. I don't see much differences between. I think @spencer7593 has the right point. The MySQL optimizer doesn't "push" that predicate down in the view query.
How do you get MySQL to use an index for a view query? The short answer, provide an index that MySQL can use.
In this case, the optimum index is likely a "covering" index:
... ON highscores (player, happened_in, score)
It's likely that MySQL will use that index, and the EXPLAIN will show:
"Using index" due to the
WHERE player = 24 (an equality predicate on the leading column in the index. The
GROUP BY happened_id (the second column in the index), may allow MySQL to optimize that using the index to avoid a sort operation. Including the
score column in the index will allow the query to satisfied entirely from the index, without having to visit (lookup) the data pages referenced by the index.
That's the quick answer. The longer answer is that MySQL is very unlikely to use an index with leading column of
happened_id for the view query.
Why the view causes a performance issue
One of the issues you have with the MySQL view is that MySQL does not "push" the predicate from the outer query down into the view query.
Your outer query specifies
WHERE happened_in = 2006. The MySQL optimizer does not consider the predicate when it runs the inner "view query". That query for the view gets executed separately, before the outer query. The resultset from the execution of that query get "materialized"; that is, the results are stored as an intermediate MyISAM table. (MySQL calls it a "derived table", and that name they use makes sense, when you understand the operations that MysQL performs.)
The bottom line is that the index you have defined on
happened_in is not being used by MySQL when it rusn the query that forms the view definition.
After the intermediate "derived table" is created, THEN the outer query is executed, using that "derived table" as a rowsource. It's when that outer query runs that the
happened_in = 2006 predicate is evaluated.
Note that all of the rows from the view query are stored, which (in your case) is a row for EVERY value of
happened_in, not just the one you specify an equality predicate on in the outer query.
The way that view queries are processed may be "unexpected" by some, and this is one reason that using "views" in MySQL can lead to performance problems, as compared to the way view queries are processed by other relational databases.
Improving performance of the view query with a suitable covering index
Given your view definition and your query, about the best you are going to get would be a "Using index" access method for the view query. To get that, you'd need a covering index, e.g.
... ON highscores (player, happened_in, score).
That's likely to be the most beneficial index (performance wise) for your existing view definition and your existing query. The
player column is the leading column because you have an equality predicate on that column in the view query. The
happened_in column is next, because you've got a GROUP BY operation on that column, and MySQL is going to be able to use this index to optimize the GROUP BY operation. We also include the
score column, because that is the only other column referenced in your query. That makes the index a "covering" index, because MySQL can satisfy that query directly from index pages, without a need to visit any pages in the underlying table. And that's as good as we're going to get out of that query plan: "Using index" with no "Using filesort".
Compare performance to standalone query with no derived table
You could compare the execution plan for your query against the view vs. an equivalent standalone query:
SELECT player , MAX(score) AS highest_score , happened_in FROM highscores WHERE player = 24 AND happened_in = 2006 GROUP BY player , happened_in
The standalone query can also make use of a covering index e.g.
... ON highscores (player, happened_in, score)
but without a need to materialize an intermediate MyISAM table.
I am not sure that any of the previous provides a direct answer to the question you were asking.
Q: How do I get MySQL to use an INDEX for view query?
A: Define a suitable INDEX that the view query can use.
The short answer is provide a "covering index" (index includes all columns referenced in the view query). The leading columns in that index should be the columns that are referenced with equality predicates (in your case, the column
player would be a leading column because you have a
player = 24 predicate in the query. Also, the columns referenced in the GROUP BY should be leading columns in the index, which allows MySQL to optimize the
GROUP BY operation, by making use of the index rather than using a sort operation.
The key point here is that the view query is basically a standalone query; the results from that query get stored in an intermediate "derived" table (a MyISAM table that gets created when a query against the view gets run.
Using views in MySQL is not necessarily a "bad idea", but I would strongly caution those who choose to use views within MySQL to be AWARE of how MySQL processes queries that reference those views. And the way MySQL processes view queries differs (significantly) from the way view queries are handled by other databases (e.g. Oracle, SQL Server).
- A Linux, Apache, MySQL, PHP (LAMP) stack is a combination of open source software that is typically installed together to enable a server to host dynamic […]
- MySQL is an the most popular open-source database management system, in most of hosting company its has been bundle by LAMP(Linux, Apache, MySQL, PHP/Python/Perl) stack. Another alternative […]
- MySQL is the world’s most popular open-source database. Despite its powerful features, MySQL is simple to set up and easy to use. Below are some instructions […]