Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
220 views
in Technique[技术] by (71.8m points)

sql - MySQL optimizing INSERT speed being slowed down because of indices

MySQL Docs say :

The size of the table slows down the insertion of indexes by log N, assuming B-tree indexes.

Does this mean that for insertion of each new row, the insertion speed will be slowed down by a factor of log N where N, I assume is number of rows? even if I insert all rows in just one query? i.e. :

INSERT INTO mytable VALUES (1,1,1), (2,2,2),  (3,3,3), .... ,(n,n,n)

Where n is ~70,000

I currently have ~1.47 million rows in a table with the following structure :

CREATE TABLE mytable (
   `id` INT,
   `value` MEDIUMINT(5),
   `date` DATE,
   PRIMARY_KEY(`id`,`date`)
) ENGINE = InnoDB

When I insert in the above mentioned fashion in a transaction, the commit time taken is ~275 seconds. How can I optimize this, since new data is to be added everyday and the insert time will just keep on slowing down.

Also, is there anything apart from just queries that might help? maybe some configuration settings?

Possible Method 1 - Removing Indices

I read that removing indices just before insert might help insert speed. And after inserts, I add the index again. But here the only index is primary key, and dropping it won't help much in my opinion. Also, while the primary key is dropped , all the select queries will be crippling slow.

I do not know of any other possible methods.

Edit : Here are a few tests on inserting ~60,000 rows in the table with ~1.47 mil rows:

Using the plain query described above : 146 seconds

Using MySQL's LOAD DATA infile : 145 seconds

Using MySQL's LOAD DATA infile and splitting the csv files as suggested by David Jashi in his answer: 136 seconds for 60 files with 1000 rows each, 136 seconds for 6 files with 10,000 rows each

Removing and re-adding primary key : key removal took 11 seconds, 0.8 seconds for inserting data BUT 153 seconds for re-adding primary key, totally taking ~165 seconds

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

If you want fast inserts, first thing you need is proper hardware. That assumes sufficient amount of RAM, an SSD instead of mechanical drives and rather powerful CPU.

Since you use InnoDB, what you want is to optimize it since default config is designed for slow and old machines.

Here's a great read about configuring InnoDB

After that, you need to know one thing - and that's how databases do their stuff internally, how hard drives work and so on. I'll simplify the mechanism in the following description:

A transaction is MySQL waiting for the hard drive to confirm that it wrote the data. That's why transactions are slow on mechanical drives, they can do 200-400 input-output operations per second. Translated, that means you can get 200ish insert queries per second using InnoDB on a mechanical drive. Naturally, this is simplified explanation, just to outline what's happening, it's not the full mechanism behind transaction.

Since a query, especially the one corresponding to size of your table, is relatively small in terms of bytes - you're effectively wasting precious IOPS on a single query.

If you wrap multiple queries (100 or 200 or more, there's no exact number, you have to test) in a single transaction and then commit it - you'll instantly achieve more writes per second.

Percona guys are achieving 15k inserts a second on a relatively cheap hardware. Even 5k inserts a second isn't bad. The table such as yours is small, I've done tests on a similar table (3 columns more) and I managed to get to 1 billion records without noticeable issues, using 16gb ram machine with a 240GB SSD (1 drive, no RAID, used for testing purposes).

TL;DR: - follow the link above, configure your server, get an SSD, wrap multiple inserts in 1 transactions and profit. And don't turn indexing off and then on, it's not applicable always, because at some point you will spend processing and IO time to build them.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...