Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
757 views
in Technique[技术] by (71.8m points)

php - mb_detect_encoding detects ASCII as UTF-8?

I'm trying to automatically convert imported IPTC metadata from images to UTF-8 for storage in a database based on the PHP mb_ functions.

Currently it looks like this:

$val = mb_convert_encoding($val, 'UTF-8', mb_detect_encoding($val));

However, when mb_detect_encoding() is supplied an ASCII string (special characters in the Latin1-fields from 192-255) it detects it as UTF-8, hence in the following attempt to convert everything to proper UTF-8 all special characters are removed.

I tried writing my own method by looking for Latin1 values and if none occured I would go on to letting mb_detect_encoding decide what it is. But I stopped midway when I realized that I can't be sure that other encoding don't use the same byte values for other things.

So, is there a way to properly detect ASCII to feed to mb_convert_encoding as the source encoding?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Specifying a custom order, where ASCII is detected first, works.

mb_detect_encoding($val, 'ASCII,UTF-8,ISO-8859-15');

For completeness, the list of available encodings is at http://www.php.net/manual/en/mbstring.supported-encodings.php


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...