Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
918 views
in Technique[技术] by (71.8m points)

string - How to open file in PHP that has unicode characters in its name?

For example I have a filename like this - проба.xml and I am unable to open it from PHP script.

If I setup php script to be in utf-8 than all the text in script is utf-8 thus when I pass this to file_get_contents:

$fname = "проба.xml";
file_get_contents($fname);

I get error that file does not exist. The reason for this is that in Windows (XP) all file names with non-latin characters are unicode (UTF-16). OK so I tried this:

$fname = "проба.xml";
$res = mb_convert_encoding($fname,'UTF-8','UTF-16');
file_get_contents($res);

But the error persists since file_get_contents can not accept unicode strings...

Any suggestions?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

UPDATE (July 13 '17)

Although the docs do not seem to mention it, PHP 7.0 and above finally supports Unicode filenames on Windows out of the box. PHP's Filesystem APIs accept and return filenames according to default_charset, which is UTF-8 by default.

Refer to bug fix here: https://github.com/php/php-src/commit/3d3f11ede4cc7c83d64cc5edaae7c29ce9c6986f


UPDATE (Jan 29 '15)

If you have access to the PHP extensions directory, you can try installing php-wfio.dll at https://github.com/kenjiuno/php-wfio, and refer to files via the wfio:// protocol.

file_get_contents("wfio://你好.xml");

Original Answer

PHP on Windows uses the Legacy "ANSI APIs" exclusively for local file access, which means PHP uses the System Locale instead of Unicode.

To access files whose filenames contain Unicode, you must convert the filename to the specified encoding for the current System Locale. If the filename contains characters that are not representable in the specified encoding, you're out of luck (Update: See section above for a solution). scandir will return gibberish for these files and passing the string back in fopen and equivalents will fail.

To find the right encoding to use, you can get the system locale by calling <?=setlocale(LC_TYPE,0)?>, and looking up the Code Page Identifier (the number after the .) at the MSDN Article https://msdn.microsoft.com/en-us/library/dd317756(VS.85).aspx.

For example, if the function returns Chinese (Traditional)_HKG.950, this means that the 950 codepage is in use and the filename should be converted to the big-5 encoding. In that case, your code will have to be as follows, if your file is saved in UTF-8 (preferrably without BOM):

$fname = iconv('UTF-8','big-5',"你好.xml");
file_get_contents($fname);

or as follows if you directly save the file as Big-5:

$fname = "你好.xml";
file_get_contents($fname);

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...