Let's say i have a html fragment like this:
<p> <span> foo </span> <em> bar <a> foobar </a> baz </em> </p>
What i want to extract from that is:
foo bar foobar baz
So my question is: how can i strip all the wrapping tags from a html and get only the text in the same order as it is in the html?
As you can see in the title, i want to use jsoup for the parsing.
Example for accented html (note the 'á' character):
<p><strong>Tarthatatlan biztonsági viszonyok</strong></p>
<p><strong>Tarthatatlan biztonsági viszonyok</strong></p>
What i want:
Tarthatatlan biztonsági viszonyok
Tarthatatlan biztonsági viszonyok
This html is not static, generally i just want every text of a generic html fragment in decoded human readable form, width line breaks.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…