U+FFFD (decimal 65533) is the "replacement character". When a decoder encounters an invalid sequence of bytes, it may (depending on its configuration) substitute � for the corrupt sequence and continue.
One common reason for a "corrupt" sequence is that the wrong decoder has been applied. For example, the decoder might be UTF-8, but the page is actually encoded with ISO-8859-1 (the default if another is not specified in the content-type header or equivalent).
So, before you even pass the string to escapeHtml
, the "é" has already been replaced with "�"; the method encodes this correctly.
The page in question uses ISO-8859-1 encoding. Make sure that you are using that decoder when converting the fetched resource to a String
.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…