Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
286 views
in Technique[技术] by (71.8m points)

Replace ` ` in html page with space in python LXML

I have an unclear xml and process it with python lxml module. I want replace all in content with space before any processing, how can I do this work for text of all elements.

edit my xml example:

<root>
    <a> dsdfs
 dsf
 sdf
</a>
    <bds> 
        <d>sdf





</d>
        <d>sdf


sdf
sdf

</d>
    </bds>
    ....
    ....
    ....
    ....
</root>

and i wan't to get this in output when i print ittertext:

root = #get root element
for i in root.ittertext():
   print i

dsdfs  dsf  sdf
dsdfs  dsf  sdf
sdf  nsdf sdf  
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Below code will parse the xml into a string, then replace with space and then write to a new xml file. You can do other processing in between, depending what exactly you want to do.

from lxml import etree 
tree = etree.parse('some.xml') 
root = tree.getroot()
# Get the whole XML content as  string
xml_in_str = etree.tostring(root)

# Replace all 
 with space
new_xml_data = xml_in_str.replace(r'
', ' ')

# Do the processing with the new_xml_data string which is formatted

# Maybe also write to a new XML file, without the 

with open('newxml.xml', 'w') as f:
    f.write(new_xml_data)

some.xml looks like:

<root>
    <a> dsdfs
 dsf
 sdf
</a>
    <bds> 
        <d>sdf





</d>
        <d>sdf


sdf
sdf

</d>
    </bds>
    <bds> 
        <d>sdf





</d>
        <d>sdf


sdf
sdf

</d>
    </bds>
    <bds> 
        <d>sdf





</d>
        <d>sdf


sdf
sdf

</d>
    </bds>
</root>

newxml.xml looks like:

<root>
    <a> dsdfs  dsf  sdf </a>
    <bds> 
        <d>sdf      </d>
        <d>sdf   sdf sdf  </d>
    </bds>
    <bds> 
        <d>sdf      </d>
        <d>sdf   sdf sdf  </d>
    </bds>
    <bds> 
        <d>sdf      </d>
        <d>sdf   sdf sdf  </d>
    </bds>
</root>

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...