Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
453 views
in Technique[技术] by (71.8m points)

query optimization - MySQL: Can I do a left join and pull only one row from the join table?

I wrote a custom help desk for work and it's been running great... until recently. One query has really slowed down. It takes about 14 seconds now! Here are the relevant tables:

CREATE TABLE `tickets` (
  `id` int(11) unsigned NOT NULL DEFAULT '0',
  `date_submitted` datetime DEFAULT NULL,
  `date_closed` datetime DEFAULT NULL,
  `first_name` varchar(50) DEFAULT NULL,
  `last_name` varchar(50) DEFAULT NULL,
  `email` varchar(50) DEFAULT NULL,
  `description` text,
  `agent_id` smallint(5) unsigned NOT NULL DEFAULT '1',
  `status` smallint(5) unsigned NOT NULL DEFAULT '1',
  `priority` tinyint(4) NOT NULL DEFAULT '0',
  PRIMARY KEY (`id`),
  KEY `date_closed` (`date_closed`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE TABLE `solutions` (
  `id` int(10) unsigned NOT NULL,
  `ticket_id` mediumint(8) unsigned DEFAULT NULL,
  `date` datetime DEFAULT NULL,
  `hours_spent` float DEFAULT NULL,
  `agent_id` smallint(5) unsigned DEFAULT NULL,
  `body` text,
  PRIMARY KEY (`id`),
  KEY `ticket_id` (`ticket_id`),
  KEY `date` (`date`),
  KEY `hours_spent` (`hours_spent`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

When a user submits a ticket, it goes into the "tickets" table. Then, as the agents work through the problem, they record the actions they took. Each entry goes into the "solutions" table. In other words, tickets have many solutions.

The goal of the query that has slowed down is to pull all the fields from the "tickets" table and also the latest entry from the "solutions" table. This is the query I've been using:

SELECT tickets.*,
    (SELECT CONCAT_WS(" * ", DATE_FORMAT(solutions.date, "%c/%e/%y"), solutions.hours_spent, CONCAT_WS(": ", solutions.agent_id, solutions.body))
    FROM solutions
    WHERE solutions.ticket_id = tickets.id
    ORDER BY solutions.date DESC, solutions.id DESC
    LIMIT 1
) AS latest_solution_entry
FROM tickets
WHERE tickets.date_closed IS NULL
OR tickets.date_closed >= '2012-06-20 00:00:00'
ORDER BY tickets.id DESC

Here is an example of what the "latest_solution_entry" field looks like:

6/20/12 * 1337 * 1: I restarted the computer and that fixed the problem. Yes, I took an hour to do this.

In PHP, I split up the "latest_solution_entry" field and format it correctly.

When I noticed that the page that runs the query had slowed way down, I ran the query without the subquery and it was super fast. I then ran an EXPLAIN on the original query and got this:

+----+--------------------+-----------+-------+---------------+-----------+---------+---------------------+-------+-----------------------------+
| id | select_type        | table     | type  | possible_keys | key       | key_len | ref                 | rows  | Extra                       |
+----+--------------------+-----------+-------+---------------+-----------+---------+---------------------+-------+-----------------------------+
|  1 | PRIMARY            | tickets   | index | date_closed   | PRIMARY   | 4       | NULL                | 35804 | Using where                 |
|  2 | DEPENDENT SUBQUERY | solutions | ref   | ticket_id     | ticket_id | 4       | helpdesk.tickets.id |     1 | Using where; Using filesort |
+----+--------------------+-----------+-------+---------------+-----------+---------+---------------------+-------+-----------------------------+

So I'm looking for a way to make my query more efficient, yet still achieve the same goal. Any ideas?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Let me sum up what I understood: you'd like to select each ticket and its last solution.

I like using the following pattern for this kind of question as it avoids the subquery pattern and is therefore rather good where performance is needed. The drawback is that it is a bit tricky to understand:

SELECT
  t.*,
  s1.*
FROM tickets t
INNER JOIN solutions s1 ON t.id = s1.ticket_id
LEFT JOIN solutions s2 ON s1.ticket_id = s2.ticket_id AND s2.id > s1.id
WHERE s2.id IS NULL;

I wrote only the heart of the pattern for a better understanding.

The keys are:

  • the LEFT JOIN of the solutions table with itself with the s1.ticket_id = s2.ticket_id condition: it emulates the GROUP BY ticket_id.

  • the condition s2.id > s1.id : it is the SQL for "I only want the last solution", it emulates the MAX(). I assumed that in your model, the last means with the greatest id but you could use here a condition on the date. Note that s2.id < s1.id would give you the first solution.

  • the WHERE clause s2.id IS NULL: the weirdest one but absolutely necessary... keeps only the records you want.

Have a try and let me know :)

Edit 1: I just realised that the second point assumption was oversimplifying the problem. That makes it even more interesting :p I'm trying to see how this pattern may work with your date, id ordering.

Edit 2: Ok, it works great with a little twist. The condition on the LEFT JOIN becomes:

LEFT JOIN solutions s2 ON s1.ticket_id = s2.ticket_id
  AND (s2.date > s1.date OR (s2.date = s1.date AND s2.id > s1.id))

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...