Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
408 views
in Technique[技术] by (71.8m points)

sql - MySQL Convert latin1 data to UTF8

I imported some data using LOAD DATA INFILE into a MySQL Database. The table itself and the columns are using the UTF8 character set, but the default character set of the database is latin 1. Because the default character type of the database is latin1, and I used LOAD DATA INFILE without specifying a character set, it interpreted the file as latin1, even though the data in the file was UTF8. Now I have a bunch of badly encoded data in my UTF8 colum. I found this article which seems to address a similar problem, which is "UTF8 inserted in cp1251", but my problem is "Latin1 inserted in UTF8". I've tried editing the queries there to convert the latin1 data to UTF8, but can't get it to work. Either the data comes out the same, or even more mangled than before. Just as an example, the word Québec is showing as Qu??bec.

[ADDITIONAL INFO]

When Selecting the data wrapped in HEX(), Qu??bec has the value 5175C383C2A9626563.

The Create Table (shortened) of this table is.

CREATE TABLE MyDBName.`MyTableName`
(
`ID` INT NOT NULL AUTO_INCREMENT, 
.......
`City` CHAR(32) NULL, 
.......
`)) ENGINE InnoDB CHARACTER SET utf8;
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I've had cases like this in old wordpress installations with the problem being that the data itself was already in UTF-8 within a Latin1 database (due to WP default charset). This means there was no real need for conversion of the data but the ddbb and table formats. In my experience things get messed up when doing the dump as I understand MySQL will use the client's default character set which in many cases is now UTF-8. Therefore making sure that exporting with the same coding of the data is very important. In case of Latin1 DDBB with UTF-8 coding:

$ mysqldump –default-character-set=latin1 –databases wordpress > m.sql

Then replace the Latin1 references within the exported dump before reimporting to a new database in UTF-8. Sort of:

$ replace "CHARSET=latin1" "CHARSET=utf8" 
    "SET NAMES latin1" "SET NAMES utf8" < m.sql > m2.sql

In my case this link was of great help. Commented here in spanish.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...