Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
575 views
in Technique[技术] by (71.8m points)

php - Stored non-English characters, got '?????' - MySQL Character Set issue

My site that I am working on is in Farsi and all the text are being displayed as ????? (question marks). I changed the collation of my DB tables to UTF8_general_ci but it still shows ???

I ran the following script to change all the tables but this did not work as well.

I want to know what am I doing wrong

<?php
// your connection
mysql_connect("mysql.ord1-1.websitesettings.com","user_name","pass");
mysql_select_db("895923_masihiat");

// convert code
$res = mysql_query("SHOW TABLES");
while ($row = mysql_fetch_array($res))
{
    foreach ($row as $key => $table)
    {
        mysql_query("ALTER TABLE " . $table . " CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci");
        echo $key . " =&gt; " . $table . " CONVERTED<br />";
    }
}
?>
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Bad news. But first, double check:

SELECT col, HEX(col)...

to see what is in the table. If the hex shows 3F, then the data is gone. Correctly stored, the dal character should be hex D8AF; hah is hex D8AD.

What happened:

  • you had utf8-encoded data (good)
  • SET NAMES latin1 was in effect (default, but wrong)
  • the column was declared CHARACTER SET latin1 (default, but wrong)

As you INSERTed the data, it was converted to latin1, which does not have values for Farsi characters, so question marks replaced them.

The cure (for future `INSERTs):

  • Recode your application using mysqli_* interface instead of the deprecated mysql_* interface.
  • utf8-encoded data (good)
  • mysqli_set_charset('utf8')
  • check that the column(s) and/or table default are CHARACTER SET utf8
  • If you are displaying on a web page, <meta...utf8> should be near the top.

The discussion above is about CHARACTER SET, the encoding of characters. Now for a tip on COLLATION, which is used for comparing and sorting.

If you want these to be treated equal: '??????' = '???', then use utf8_unicode_ci (instead of utf8_general_ci) for the COLLATION.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...