python - How to find/replace text in html while preserving html tags/structure

Question

1 Reply

深蓝 · Answer 1 · 2021-10-17T01:13:28+0000

Use a DOM library, not regular expressions, when dealing with manipulating HTML:

lxml: a parser, document, and HTML serializer. Also can use BeautifulSoup and html5lib for parsing.
BeautifulSoup: a parser, document, and HTML serializer.
html5lib: a parser. It has a serializer.
ElementTree: a document object, and XML serializer
cElementTree: a document object implemented as a C extension.
HTMLParser: a parser.
Genshi: includes a parser, document, and HTML serializer.
xml.dom.minidom: a document model built into the standard library, which html5lib can parse to.

Out of these I would recommend lxml, html5lib, and BeautifulSoup.