I have the following method to write an XMLDom to a stream:
public void writeToOutputStream(Document fDoc, OutputStream out) throws Exception {
fDoc.setXmlStandalone(true);
DOMSource docSource = new DOMSource(fDoc);
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.setOutputProperty(OutputKeys.METHOD, "xml");
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
transformer.setOutputProperty(OutputKeys.INDENT, "no");
transformer.transform(docSource, new StreamResult(out));
}
I am testing some other XML functionality, and this is just the method that I use to write to a file. My test program generates 33 test cases where files are written out. 28 of them have the following header:
<?xml version="1.0" encoding="UTF-8"?>...
But for some reason, 1 of the test cases now produce:
<?xml version="1.0" encoding="ISO-8859-1"?>...
And four more produce:
<?xml version="1.0" encoding="Windows-1252"?>...
As you can clearly see, I am setting ENCODING output key to UTF-8. These tests used to work on an earlier version of Java. I have not run the tests in a while (more than a year) but running today on "Java(TM) SE Runtime Environment (build 1.6.0_22-b04)" I get this funny behavior.
I have verified that the documents causing the problem were read from files that originally had those encoding. It seems that the new versions of the libraries are attempting to preserve the encoding of the source file that was read. But that is not what I want ... I really do want the output to be in UTF-8.
Does anyone know of any other factor that might cause the transformer to ignore the UTF-8 encoding setting? Is there anything else that has to be set on the document to say to forget the encoding of the file that was originally read?
UPDATE:
I checked out the same project out on another machine, built and ran the tests there. On that machine all the tests pass! All the files have "UTF-8" in their header. That machine has "Java(TM) SE Runtime Environment (build 1.6.0_29-b11)" Both machines are running Windows 7. On the new machine that works correctly, jdk1.5.0_11 is used to make the build, but on the old machine jdk1.6.0_26 is used to make the build. The libraries used for both builds are exactly the same. Can it be a JDK 1.6 incompatibility with 1.5 at build time?
UPDATE:
After 4.5 years, the Java library is still broken, but due to the suggestion by Vyrx below, I finally have a proper solution!
public void writeToOutputStream(Document fDoc, OutputStream out) throws Exception {
fDoc.setXmlStandalone(true);
DOMSource docSource = new DOMSource(fDoc);
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.setOutputProperty(OutputKeys.METHOD, "xml");
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
transformer.setOutputProperty(OutputKeys.INDENT, "no");
out.write("<?xml version="1.0" encoding="UTF-8"?>".getBytes("UTF-8"));
transformer.transform(docSource, new StreamResult(out));
}
The solution is to disable the writing of the header, and to write the correct header just before serializing the XML to the output steam. Lame, but it produces the correct results. Tests broken over 4 years ago are now running again!
See Question&Answers more detail:
os