Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
483 views
in Technique[技术] by (71.8m points)

apache - Can a PHP file name (or a dir in its full path) have UTF-8 characters?

I would like to access a PHP file whose name has UTF-8 characters in it.

The file does not have a BOM in it. It just contains an echo statement that displays a few unicode characters.

Accessing the PHP page from the browser (FireFox 3.0.8, IE7) results in HTTP error 500.

There are two entries in the Apache log (file is /?.php; the letter ? is a composite one and corresponds to the characters xe0xaex95 in the log below):

[Sat Apr 04 09:30:25 2009] [error] [client 127.0.0.1] PHP Warning: Unknown: failed to open stream: No such file or directory in Unknown on line 0

[Sat Apr 04 09:30:25 2009] [error] [client 127.0.0.1] PHP Fatal error: Unknown: Failed opening required 'D:/va/ROOT/xe0xaex95.php' (include_path='.;C:php5pear') in Unknown on line 0

The same page works when file and dir names are in English. In the same setup, there is no problem using SSI for these pages.

EDIT

Removed info on url rewriting since it does not seem to be a factor.

When mod_rewrite is removed, the PHP file still does not work. Works if the file is renamed to a non-UTF name. However, shtml works even with UTF characters in file and/or path name.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I have come across the same problem and done some research and conclude the following. This is for php5 on Windows; it is probably true on other platforms but I haven't checked.

  1. ALL php file system functions (dir, is_dir, is_file, file, filemtime, filesize, file_exists etc) only accept and return file names in ISO-8859-1, irrespective of the default_charset set in the program or ini files.

  2. Where a filename contains a unicode character dir->read will return it as the corresponding ISO-8859-1 character if there is one, otherwise it will substitute a question mark.

  3. When referencing a file, e.g. in is_file or file, if you pass in a UTF-8 file name the file will not be found when the name contains any two-byte or more characters. However, is_file(utf8_decode($filename)) etc will work providing the UTF-8 character is representable in ISO-8859-1.

In other words, PHP5 is not capable of addressing files with multi-byte characters in their names at all.

If a UTF-8 URL with multibyte characters is requested and this corresponds directly to a file, PHP won't be able to open the file because it cannot address it.

If you simply want pretty URLs in your language the suggestion of using mod_rewrite seems like a good one.

But if you are storing and retrieving files uploaded and downloaded by users, this problem has to be resolved. One way is to use an arbitrary (non UTF-8) file name, such as an incrementing number, on the server and index the files in a database or XML file or some such. Another way is to store the files in the database itself as a BLOB. Another way (which is perhaps easier to see what is going on, and not subject to problems if your index gets corrupted) is to encode the filenames yourself - a good technique is to urlencode (sic) all your incoming filenames when storing on the server disk and urldecode them before setting the filename in the mime header for the download. All even vaguely unusual characters (except %) are then encoded as %nn and so any problems with spaces in file names, cross platform support and pattern matching are largely avoided.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

57.0k users

...