Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
452 views
in Technique[技术] by (71.8m points)

iphone - NSString initWithData returns null

I am pulling data from a website via NSURLConnection and stashing the received data away in an instance of NSMutableData. In the connectionDidFinishLoading delegate method the data is convert into a string with a call to NSString's appropriate method:

NSString *result = [[NSString alloc] initWithData:data 
                                     encoding:NSUTF8StringEncoding]

The resulting string turns out to be a null. If I use the NSASCIIStringEncoding, however, I do obtain the appropriate string, albeit with unicode characters garbled up as expected. The server's Content-Type header does not specify the UTF-8 encoding, but I have attempted a number of different websites with a similar scenario, and there string conversion happens just fine. It seems like the problem only pertains to the given web service but I have no clue why.

On a side note, is pulling web pages and data from an API good practice, i.e. buffering the data, converting into a string, and manipulating the string afterwards?

Much appreciated!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You say that it “is definitely UTF-8”, but without a Content-Type header, you don't really know that. (And even if you did have a header saying that, it could still be wrong.)

My guess is that your data is usually ASCII, which always parses correctly as UTF-8, but you sometimes are trying to parse data that's actually encoded in ISO?8859-1 or Windows codepage 1252. Such data will generally be mostly ASCII, but with some bytes outside the 0–127 range ASCII defines. UTF-8 would expect such bytes to form a sequence of code units within a specified sequence of ranges, but in other encodings, any byte, regardless of value, is a complete character on its own. Trying to interpret non-ASCII non-UTF-8 data as UTF-8 will almost always get you either wrong results (wrong characters) or no results at all (cannot decode; decoder returns nil), because the data was never encoded in UTF-8 in the first place.

You should try UTF-8 first, and if it fails, use ISO 8859-1. If you're letting the user retrieve any web page, you should let them change the encoding you use to decode the data, in case they discover that it was actually 8859-9 or codepage-1252 or some other 8-bit encoding.

If you're downloading the data from a specific server, and especially if you have influence on what runs on that server, you should make it serve up an accurate Content-Type header and/or fix whatever bug is causing it to serve up text that isn't in UTF-8.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...