Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
348 views
in Technique[技术] by (71.8m points)

python - XML Unicode strings with encoding declaration are not supported

Trying to do the following...

from lxml import etree
from lxml.etree import fromstring

if request.POST:
    parser = etree.XMLParser(ns_clean=True, recover=True)
    h = fromstring(request.POST['xml'], parser=parser)
    return HttpResponse(h.cssselect('itagg_delivery_receipt status').text_content())

but it give this error:

[Fri Apr 05 10:27:54 2013] [error] Internal Server Error: /sms/status_postback/
[Fri Apr 05 10:27:54 2013] [error] Traceback (most recent call last):
[Fri Apr 05 10:27:54 2013] [error]   File "/usr/local/lib/python2.7/dist-packages/django/core/handlers/base.py", line 115, in get_response
[Fri Apr 05 10:27:54 2013] [error]     response = callback(request, *callback_args, **callback_kwargs)
[Fri Apr 05 10:27:54 2013] [error]   File "/usr/local/lib/python2.7/dist-packages/django/views/decorators/csrf.py", line 77, in wrapped_view
[Fri Apr 05 10:27:54 2013] [error]     return view_func(*args, **kwargs)
[Fri Apr 05 10:27:54 2013] [error]   File "/srv/project/livewireSMS/sms/views.py", line 42, in update_delivery_status
[Fri Apr 05 10:27:54 2013] [error]     h = fromstring(request.POST['xml'], parser=parser)
[Fri Apr 05 10:27:54 2013] [error]   File "lxml.etree.pyx", line 2754, in lxml.etree.fromstring (src/lxml/lxml.etree.c:54631)
[Fri Apr 05 10:27:54 2013] [error]   File "parser.pxi", line 1569, in lxml.etree._parseMemoryDocument (src/lxml/lxml.etree.c:82659)
[Fri Apr 05 10:27:54 2013] [error] ValueError: Unicode strings with encoding declaration are not supported.

this is the XML

 <?xml version="1.1" encoding="ISO-8859-1"?>
<itagg_delivery_receipt>
<version>1.0</version>
<msisdn>447889000000</msisdn>
<submission_ref>
845tgrgsehg394g3hdfhhh56445y7ts6</
submission_ref>
<status>Delivered</status>
<reason>4</reason>
<timestamp>20050709120945</timestamp>
<retry>0</retry>
</itagg_delivery_receipt> 

I don't have control over the xml document this comes from the SMS company.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You'll have to encode it and then force the same encoding in the parser:

from lxml import etree
from lxml.etree import fromstring

if request.POST:
    xml = request.POST['xml'].encode('utf-8')
    parser = etree.XMLParser(ns_clean=True, recover=True, encoding='utf-8')
    h = fromstring(xml, parser=parser)

    return HttpResponse(h.cssselect('delivery_reciept status').text_content())

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...