Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.1k views
in Technique[技术] by (71.8m points)

xml - PHP - SimpleXML parse error

SEE EDITS AT BOTTOM TO SHOW MORE ACCURATE ERROR OUTPUT

I'm parsing somewhat large (~15MB) XML files with PHP for the first time using SimpleXML. The files are flight search results so they have long attributes (links back to Kayak; example:
"/book/flightcode=1238917408.NxJI6G.0.F.ORBITZAIR,ORBITZAIR.0.f36f1ea92513977249aa695112410052&sid=26-Vu01v7ilzhSAjPVLZ3Ul"

SimpleXML throws this error when parsing:

"Entity: line 10: parser error : EntityRef: expecting ';' in" and then;

"38917408.NxJI6G.0.F.ORBITZAIR,ORBITZAIR.0.f36f1ea92513977249aa695112410052&sid in" and then;

"simplexml_load_string() [function.simplexml-load-string]: ^ in,"

and so forth for each line where there are these urls.

I found a mention of SimpleXML not liking long attributes on php.net with no solution. I would rather just use and learn SimpleXML for now and work past this error if there is a non-janky, somewhat easy workaround.

Does anyone have a solution? Thanks in advance!

I tried entering the first 13 lines of the XML but it only outputs the info without the XML so.... I can do that if it will help. I'm not sure if using another parser/extension would reduce the functionality or ease of use but please feel free to suggest another if there's not workaround (DOM or XMLReader is what I'm thinking perhaps).

EDITS BELOW TO INCLUDE LESS ADULTERATED ERROR OUTPUT:

http://dl.dropbox.com/u/10206237/stack_overflow_xml.xml

ERROR 1:

simplexml_load_string() [<a href='function.simplexml-load-string'>function.simplexml-load-string</a>]: Entity: line 10: parser error : EntityRef: expecting ';' in 

ERROR 2:(The XML I think is fine because it works with a Python script using DOM; I'm translating it to PHP because I don't know Python). I didn't know that the output in the browser would be different. Thanks for being patient.)

<a href='function.simplexml-load-string'>function.simplexml-load-string</a>]: 38917408.Pt8rW8.0.F.ORBITZAIR,ORBITZAIR.0.f36f1ea92513977249aa695112410052&amp;_sid_ in 

ERROR 3:

function.simplexml-load-string</a>]:                                                                                ^ in     

(all of those spaces are in there)

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

As mentionned in other answers and comments, your source XML is broken and XML parsers are supposed to reject invalid input. libxml has a "recover" mode which would let you load this broken XML, but you would lose the "&sid" part so it wouldn't help.

If you're lucky and you like taking chances, you can try to somehow make it work by kind-of-fixing the input. You can use some string replacement to escape the ampersands that look like they're in the query part of an URL.

$xml = file_get_contents('broken.xml');
// replace '&' followed by a bunch of letters, numbers
// and underscores and an equal sign with &amp;
$xml = preg_replace('#&(?=[a-z_0-9]+=)#', '&amp;', $xml);
$sxe = simplexml_load_string($xml);

This is, of course, nothing but a hack and the only good way to fix your situation is to ask your XML provider to fix their generator. Because if it generates broken XML, who knows what other errors slip by unnoticed?


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...