Thread View: gmane.linux.suse.general
1 messages
1 total messages
Started by "Henri Sivonen (
Wed, 02 Jan 2008 03:08
[jira] Created: (XALANJ-2419) Astral characters written as a pair of NCRs with the surrogate scalar values when using UTF-8
Author: "Henri Sivonen (
Date: Wed, 02 Jan 2008 03:08
Date: Wed, 02 Jan 2008 03:08
38 lines
1961 bytes
1961 bytes
Astral characters written as a pair of NCRs with the surrogate scalar values when using UTF-8 --------------------------------------------------------------------------------------------- Key: XALANJ-2419 URL: https://issues.apache.org/jira/browse/XALANJ-2419 Project: XalanJ2 Issue Type: Bug Components: Serialization Affects Versions: 2.7.1 Reporter: Henri Sivonen org.apache.xml.serializer.ToStream contains the following code: else if (m_encodingInfo.isInEncoding(ch)) { // If the character is in the encoding, and // not in the normal ASCII range, we also // just leave it get added on to the clean characters } else { // This is a fallback plan, we should never get here // but if the character wasn't previously handled // (i.e. isn't in the encoding, etc.) then what // should we do? We choose to write out an entity writeOutCleanChars(chars, i, lastDirtyCharProcessed); writer.write("&#"); writer.write(Integer.toString(ch)); writer.write(';'); lastDirtyCharProcessed = i; } This leads to the wrong (latter) if branch running for surrogates, because isInEncoding() for UTF-8 returns false for surrogates. It is always wrong (regardless of encoding) to escape a surrogate as an NCR. The practical effect of this bug is that any document with astral characters in it ends up in an ill-formed serialization and does not parse back using an XML parser. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Thread Navigation
This is a paginated view of messages in the thread with full content displayed inline.
Messages are displayed in chronological order, with the original post highlighted in green.
Use pagination controls to navigate through all messages in large threads.
Back to All Threads