python - How can one replace an element with text in lxml?

Question

Welcome To Ask or Share your Answers For Others

python - How can one replace an element with text in lxml?

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - How can one replace an element with text in lxml?

It's easy to completely remove a given element from an XML document with lxml's implementation of the ElementTree API, but I can't see an easy way of consistently replacing an element with some text. For example, given the following input:

input = '''<everything>
<m>Some text before <r/></m>
<m><r/> and some text after.</m>
<m><r/></m>
<m>Text before <r/> and after</m>
<m><b/> Text after a sibling <r/> Text before a sibling<b/></m>
</everything>
'''

... you could easily remove every <r> element with:

from lxml import etree
f = etree.fromstring(data)
for r in f.xpath('//r'):
    r.getparent().remove(r)
print etree.tostring(f, pretty_print=True)

However, how would you go about replacing each element with text, to get the output:

<everything>
<m>Some text before DELETED</m>
<m>DELETED and some text after.</m>
<m>DELETED</m>
<m>Text before DELETED and after</m>
<m><b/>Text after a sibling DELETED Text before a sibling<b/></m>
</everything>

It seems to me that because the ElementTree API deals with text via the .text and .tail attributes of each element rather than nodes in the tree, this means you have to deal with a lot of different cases depending on whether the element has sibling elements or not, whether the existing element had a .tail attribute, and so on. Have I missed some easy way of doing this?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T19:19:19+0000

I think that unutbu's XSLT solution is probably the correct way to achieve your goal.

However, here's a somewhat hacky way to achieve it, by modifying the tails of <r/> tags and then using etree.strip_elements.

from lxml import etree

data = '''<everything>
<m>Some text before <r/></m>
<m><r/> and some text after.</m>
<m><r/></m>
<m>Text before <r/> and after</m>
<m><b/> Text after a sibling <r/> Text before a sibling<b/></m>
</everything>
'''

f = etree.fromstring(data)
for r in f.xpath('//r'):
  r.tail = 'DELETED' + r.tail if r.tail else 'DELETED'

etree.strip_elements(f,'r',with_tail=False)

print etree.tostring(f,pretty_print=True)

Gives you:

<everything>
<m>Some text before DELETED</m>
<m>DELETED and some text after.</m>
<m>DELETED</m>
<m>Text before DELETED and after</m>
<m><b/> Text after a sibling DELETED Text before a sibling<b/></m>
</everything>

Categories

python - How can one replace an element with text in lxml?

python - How can one replace an element with text in lxml?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags