🚀 go-pugleaf

RetroBBS NetNews Server

Inspired by RockSolid Light RIP Retro Guy

Thread View: gmane.linux.suse.general
1 messages
1 total messages Started by "Henri Sivonen ( Wed, 02 Jan 2008 03:08
[jira] Created: (XALANJ-2419) Astral characters written as a pair of NCRs with the surrogate scalar values when using UTF-8
#89696
Author: "Henri Sivonen (
Date: Wed, 02 Jan 2008 03:08
38 lines
1961 bytes
Astral characters written as a pair of NCRs with the surrogate scalar values when using UTF-8
---------------------------------------------------------------------------------------------

                 Key: XALANJ-2419
                 URL: https://issues.apache.org/jira/browse/XALANJ-2419
             Project: XalanJ2
          Issue Type: Bug
          Components: Serialization
    Affects Versions: 2.7.1
            Reporter: Henri Sivonen


org.apache.xml.serializer.ToStream contains the following code:
                    else if (m_encodingInfo.isInEncoding(ch)) {
                        // If the character is in the encoding, and
                        // not in the normal ASCII range, we also
                        // just leave it get added on to the clean characters

                    }
                    else {
                        // This is a fallback plan, we should never get here
                        // but if the character wasn't previously handled
                        // (i.e. isn't in the encoding, etc.) then what
                        // should we do?  We choose to write out an entity
                        writeOutCleanChars(chars, i, lastDirtyCharProcessed);
                        writer.write("&#");
                        writer.write(Integer.toString(ch));
                        writer.write(';');
                        lastDirtyCharProcessed = i;
                    }

This leads to the wrong (latter) if branch running for surrogates, because isInEncoding() for UTF-8 returns false for surrogates. It is always wrong (regardless of encoding) to escape a surrogate as an NCR.

The practical effect of this bug is that any document with astral characters in it ends up in an ill-formed serialization and does not parse back using an XML parser.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
Thread Navigation

This is a paginated view of messages in the thread with full content displayed inline.

Messages are displayed in chronological order, with the original post highlighted in green.

Use pagination controls to navigate through all messages in large threads.

Back to All Threads