Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
573 views
in Technique[技术] by (71.8m points)

java - Parser JSoup change the tags to lower case letter

I did some research and it seems that is standard Jsoup make this change. I wonder if there is a way to configure this or is there some other Parser I can be converted to a document of Jsoup, or some way to fix this?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Unfortunately not, the constructor of Tag class changes the name to lower case:

private Tag(String tagName) {
    this.tagName = tagName.toLowerCase();
}

But there are two ways to change this behavour:

  1. If you want a clean solution, you can clone / download the JSoup Git and change this line.
  2. If you want a dirty solution, you can use reflection.

Example for #2:

Field tagName = Tag.class.getDeclaredField("tagName"); // Get the field which contains the tagname
tagName.setAccessible(true); // Set accessible to allow changes

for( Element element : doc.select("*") ) // Iterate over all tags
{
    Tag tag = element.tag(); // Get the tag of the element
    String value = tagName.get(tag).toString(); // Get the value (= name) of the tag

    if( !value.startsWith("#") ) // You can ignore all tags starting with a '#'
    {
        tagName.set(tag, value.toUpperCase()); // Set the tagname to the uppercase
    }
}

tagName.setAccessible(false); // Revert to false

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...