Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
435 views
in Technique[技术] by (71.8m points)

loading enwiki-latest-categorylinks.sql into mysql

I want to load the wikipedia category-links table into the MySql database. I used the following command in the terminal:

mysql -u root -proot categorylinks < enwiki-latest-categorylinks.sql

The file is 11.6 GB in size and running since 3 days. Is there any better way to load it faster or some settings i need to set to make it run faster ???

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

1) Separate the SQL dump file to two parts: creation and data

  1. grep -n "Dumping data" enwiki-latest-redirect.sql

This will return the line number separating the creation and data of the table parts of the file. Let it be 46.

  1. tail -n +46 enwiki-latest-redirect.sql > redirect -data.sql

The number we get in the last command is used as the input to tail to get the data part of the file.

  1. head -46 enwiki-latest-redirect.sql > redirect-creation.sql

The number we get in the first command is used as the input to head to get the table creation part of the file.

2) Remove indexes from the redirect-creation.sql

The following is the original CREATE TABLE statement in the dump:

CREATE TABLE `redirect` ( 
 `rd_from` int(8) unsigned NOT NULL DEFAULT '0', 
 `rd_namespace` int(11) NOT NULL DEFAULT '0', 
 `rd_title` varbinary(255) NOT NULL DEFAULT '', 
 `rd_interwiki` varbinary(32) DEFAULT NULL, 
 `rd_fragment` varbinary(255) DEFAULT NULL,   
  PRIMARY KEY (`rd_from`),   
  KEY `rd_ns_title` (`rd_namespace`,`rd_title`,`rd_from`)
) ENGINE=InnoDB DEFAULT CHARSET=binary; 

The following is the desired version after removing the indexes:

CREATE TABLE `redirect` ( 
 `rd_from` int(8) unsigned NOT NULL DEFAULT '0', 
 `rd_namespace` int(11) NOT NULL DEFAULT '0', 
 `rd_title` varbinary(255) NOT NULL DEFAULT '', 
 `rd_interwiki` varbinary(32) DEFAULT NULL, 
 `rd_fragment` varbinary(255) DEFAULT NULL, 
) ENGINE=InnoDB DEFAULT CHARSET=binary; 

Edit the page-creation.sql to remove indexes as shown above.

3) Load the created files

mysql -u root -p wikipedia < redirect-creation.sql
mysql -u root -p wikipedia < redirect-data.sql

4) Create indexes on fields you want after loading the data

create index idx_redirect_rd_from on redirect(rd_from)

I am able to load the categorylinks table in under an hour on my Macbook Pro by following the steps above.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...