Class XMLScanner

java.lang.Object
org.apache.xerces.impl.XMLScanner
All Implemented Interfaces:
org.apache.xerces.xni.parser.XMLComponent
Direct Known Subclasses:
XMLDocumentFragmentScannerImpl, XMLDTDScannerImpl

public abstract class XMLScanner extends Object implements org.apache.xerces.xni.parser.XMLComponent
This class is responsible for holding scanning methods common to scanning the XML document structure and content as well as the DTD structure and content. Both XMLDocumentScanner and XMLDTDScanner inherit from this base class.

This component requires the following features and properties from the component manager that uses it:

  • http://xml.org/sax/features/validation
  • http://xml.org/sax/features/namespaces
  • http://apache.org/xml/features/scanner/notify-char-refs
  • http://apache.org/xml/properties/internal/symbol-table
  • http://apache.org/xml/properties/internal/error-reporter
  • http://apache.org/xml/properties/internal/entity-manager

INTERNAL:

Usage of this class is not supported. It may be altered or removed at any time.
Version:
$Id: XMLScanner.java 1499506 2013-07-03 18:29:43Z mrglavas $
Author:
Andy Clark, IBM, Arnaud Le Hors, IBM, Eric Ye, IBM
  • Field Details

    • VALIDATION

      protected static final String VALIDATION
      Feature identifier: validation.
      See Also:
    • NAMESPACES

      protected static final String NAMESPACES
      Feature identifier: namespaces.
      See Also:
    • NOTIFY_CHAR_REFS

      protected static final String NOTIFY_CHAR_REFS
      Feature identifier: notify character references.
      See Also:
    • PARSER_SETTINGS

      protected static final String PARSER_SETTINGS
      See Also:
    • SYMBOL_TABLE

      protected static final String SYMBOL_TABLE
      Property identifier: symbol table.
      See Also:
    • ERROR_REPORTER

      protected static final String ERROR_REPORTER
      Property identifier: error reporter.
      See Also:
    • ENTITY_MANAGER

      protected static final String ENTITY_MANAGER
      Property identifier: entity manager.
      See Also:
    • DEBUG_ATTR_NORMALIZATION

      protected static final boolean DEBUG_ATTR_NORMALIZATION
      Debug attribute normalization.
      See Also:
    • fValidation

      protected boolean fValidation
      Validation. This feature identifier is: http://xml.org/sax/features/validation
    • fNamespaces

      protected boolean fNamespaces
      Namespaces.
    • fNotifyCharRefs

      protected boolean fNotifyCharRefs
      Character references notification.
    • fParserSettings

      protected boolean fParserSettings
      Internal parser-settings feature
    • fSymbolTable

      protected SymbolTable fSymbolTable
      Symbol table.
    • fErrorReporter

      protected XMLErrorReporter fErrorReporter
      Error reporter.
    • fEntityManager

      protected XMLEntityManager fEntityManager
      Entity manager.
    • fEntityScanner

      protected XMLEntityScanner fEntityScanner
      Entity scanner.
    • fEntityDepth

      protected int fEntityDepth
      Entity depth.
    • fCharRefLiteral

      protected String fCharRefLiteral
      Literal value of the last character refence scanned.
    • fScanningAttribute

      protected boolean fScanningAttribute
      Scanning attribute.
    • fReportEntity

      protected boolean fReportEntity
      Report entity boundary.
    • fVersionSymbol

      protected static final String fVersionSymbol
      Symbol: "version".
    • fEncodingSymbol

      protected static final String fEncodingSymbol
      Symbol: "encoding".
    • fStandaloneSymbol

      protected static final String fStandaloneSymbol
      Symbol: "standalone".
    • fAmpSymbol

      protected static final String fAmpSymbol
      Symbol: "amp".
    • fLtSymbol

      protected static final String fLtSymbol
      Symbol: "lt".
    • fGtSymbol

      protected static final String fGtSymbol
      Symbol: "gt".
    • fQuotSymbol

      protected static final String fQuotSymbol
      Symbol: "quot".
    • fAposSymbol

      protected static final String fAposSymbol
      Symbol: "apos".
    • fResourceIdentifier

      protected final XMLResourceIdentifierImpl fResourceIdentifier
  • Constructor Details

    • XMLScanner

      public XMLScanner()
  • Method Details

    • reset

      public void reset(org.apache.xerces.xni.parser.XMLComponentManager componentManager) throws org.apache.xerces.xni.parser.XMLConfigurationException
      Description copied from interface: org.apache.xerces.xni.parser.XMLComponent
      Resets the component. The component can query the component manager about any features and properties that affect the operation of the component.
      Specified by:
      reset in interface org.apache.xerces.xni.parser.XMLComponent
      Parameters:
      componentManager - The component manager.
    • setProperty

      public void setProperty(String propertyId, Object value) throws org.apache.xerces.xni.parser.XMLConfigurationException
      Sets the value of a property during parsing.
      Specified by:
      setProperty in interface org.apache.xerces.xni.parser.XMLComponent
      Parameters:
      propertyId -
      value -
      Throws:
      org.apache.xerces.xni.parser.XMLConfigurationException - Thrown for configuration error. In general, components should only throw this exception if it is really a critical error.
    • setFeature

      public void setFeature(String featureId, boolean value) throws org.apache.xerces.xni.parser.XMLConfigurationException
      Description copied from interface: org.apache.xerces.xni.parser.XMLComponent
      Sets the state of a feature. This method is called by the component manager any time after reset when a feature changes state.

      Note: Components should silently ignore features that do not affect the operation of the component.

      Specified by:
      setFeature in interface org.apache.xerces.xni.parser.XMLComponent
      Parameters:
      featureId - The feature identifier.
      value - The state of the feature.
      Throws:
      org.apache.xerces.xni.parser.XMLConfigurationException - Thrown for configuration error. In general, components should only throw this exception if it is really a critical error.
    • getFeature

      public boolean getFeature(String featureId) throws org.apache.xerces.xni.parser.XMLConfigurationException
      Throws:
      org.apache.xerces.xni.parser.XMLConfigurationException
    • reset

      protected void reset()
    • scanXMLDeclOrTextDecl

      protected void scanXMLDeclOrTextDecl(boolean scanningTextDecl, String[] pseudoAttributeValues) throws IOException, org.apache.xerces.xni.XNIException
      Scans an XML or text declaration.

       [23] XMLDecl ::= 'invalid input: '<'?xml' VersionInfo EncodingDecl? SDDecl? S? '?>'
       [24] VersionInfo ::= S 'version' Eq (' VersionNum ' | " VersionNum ")
       [80] EncodingDecl ::= S 'encoding' Eq ('"' EncName '"' |  "'" EncName "'" )
       [81] EncName ::= [A-Za-z] ([A-Za-z0-9._] | '-')*
       [32] SDDecl ::= S 'standalone' Eq (("'" ('yes' | 'no') "'")
                       | ('"' ('yes' | 'no') '"'))
      
       [77] TextDecl ::= 'invalid input: '<'?xml' VersionInfo? EncodingDecl S? '?>'
       
      Parameters:
      scanningTextDecl - True if a text declaration is to be scanned instead of an XML declaration.
      pseudoAttributeValues - An array of size 3 to return the version, encoding and standalone pseudo attribute values (in that order). Note: This method uses fString, anything in it at the time of calling is lost.
      Throws:
      IOException
      org.apache.xerces.xni.XNIException
    • scanPseudoAttribute

      public String scanPseudoAttribute(boolean scanningTextDecl, org.apache.xerces.xni.XMLString value) throws IOException, org.apache.xerces.xni.XNIException
      Scans a pseudo attribute.
      Parameters:
      scanningTextDecl - True if scanning this pseudo-attribute for a TextDecl; false if scanning XMLDecl. This flag is needed to report the correct type of error.
      value - The string to fill in with the attribute value.
      Returns:
      The name of the attribute Note: This method uses fStringBuffer2, anything in it at the time of calling is lost.
      Throws:
      IOException
      org.apache.xerces.xni.XNIException
    • scanPI

      protected void scanPI() throws IOException, org.apache.xerces.xni.XNIException
      Scans a processing instruction.

       [16] PI ::= '<?' PITarget (S (Char* - (Char* '?>' Char*)))? '?>'
       [17] PITarget ::= Name - (('X' | 'x') ('M' | 'm') ('L' | 'l'))
       
      Note: This method uses fString, anything in it at the time of calling is lost.
      Throws:
      IOException
      org.apache.xerces.xni.XNIException
    • scanPIData

      protected void scanPIData(String target, org.apache.xerces.xni.XMLString data) throws IOException, org.apache.xerces.xni.XNIException
      Scans a processing data. This is needed to handle the situation where a document starts with a processing instruction whose target name starts with "xml". (e.g. xmlfoo) Note: This method uses fStringBuffer, anything in it at the time of calling is lost.
      Parameters:
      target - The PI target
      data - The string to fill in with the data
      Throws:
      IOException
      org.apache.xerces.xni.XNIException
    • scanComment

      protected void scanComment(XMLStringBuffer text) throws IOException, org.apache.xerces.xni.XNIException
      Scans a comment.

       [15] Comment ::= 'invalid input: '&lt'!--' ((Char - '-') | ('-' (Char - '-')))* '-->'
       

      Note: Called after scanning past '<!--' Note: This method uses fString, anything in it at the time of calling is lost.

      Parameters:
      text - The buffer to fill in with the text.
      Throws:
      IOException
      org.apache.xerces.xni.XNIException
    • scanAttributeValue

      protected boolean scanAttributeValue(org.apache.xerces.xni.XMLString value, org.apache.xerces.xni.XMLString nonNormalizedValue, String atName, boolean checkEntities, String eleName) throws IOException, org.apache.xerces.xni.XNIException
      Scans an attribute value and normalizes whitespace converting all whitespace characters to space characters. [10] AttValue ::= '"' ([^invalid input: '<'invalid input: '&'"] | Reference)* '"' | "'" ([^invalid input: '<'invalid input: '&''] | Reference)* "'"
      Parameters:
      value - The XMLString to fill in with the value.
      nonNormalizedValue - The XMLString to fill in with the non-normalized value.
      atName - The name of the attribute being parsed (for error msgs).
      checkEntities - true if undeclared entities should be reported as VC violation, false if undeclared entities should be reported as WFC violation.
      eleName - The name of element to which this attribute belongs.
      Returns:
      true if the non-normalized and normalized value are the same Note: This method uses fStringBuffer2, anything in it at the time of calling is lost.
      Throws:
      IOException
      org.apache.xerces.xni.XNIException
    • scanExternalID

      protected void scanExternalID(String[] identifiers, boolean optionalSystemId) throws IOException, org.apache.xerces.xni.XNIException
      Scans External ID and return the public and system IDs.
      Parameters:
      identifiers - An array of size 2 to return the system id, and public id (in that order).
      optionalSystemId - Specifies whether the system id is optional. Note: This method uses fString and fStringBuffer, anything in them at the time of calling is lost.
      Throws:
      IOException
      org.apache.xerces.xni.XNIException
    • scanPubidLiteral

      protected boolean scanPubidLiteral(org.apache.xerces.xni.XMLString literal) throws IOException, org.apache.xerces.xni.XNIException
      Scans public ID literal. [12] PubidLiteral ::= '"' PubidChar* '"' | "'" (PubidChar - "'")* "'" [13] PubidChar::= #x20 | #xD | #xA | [a-zA-Z0-9] | [-'()+,./:=?;!*#@$_%] The returned string is normalized according to the following rule, from http://www.w3.org/TR/REC-xml#dt-pubid: Before a match is attempted, all strings of white space in the public identifier must be normalized to single space characters (#x20), and leading and trailing white space must be removed.
      Parameters:
      literal - The string to fill in with the public ID literal.
      Returns:
      True on success. Note: This method uses fStringBuffer, anything in it at the time of calling is lost.
      Throws:
      IOException
      org.apache.xerces.xni.XNIException
    • normalizeWhitespace

      protected void normalizeWhitespace(org.apache.xerces.xni.XMLString value)
      Normalize whitespace in an XMLString converting all whitespace characters to space characters.
    • normalizeWhitespace

      protected void normalizeWhitespace(org.apache.xerces.xni.XMLString value, int fromIndex)
      Normalize whitespace in an XMLString converting all whitespace characters to space characters.
    • isUnchangedByNormalization

      protected int isUnchangedByNormalization(org.apache.xerces.xni.XMLString value)
      Checks whether this string would be unchanged by normalization.
      Returns:
      -1 if the value would be unchanged by normalization, otherwise the index of the first whitespace character which would be transformed.
    • startEntity

      public void startEntity(String name, org.apache.xerces.xni.XMLResourceIdentifier identifier, String encoding, org.apache.xerces.xni.Augmentations augs) throws org.apache.xerces.xni.XNIException
      This method notifies of the start of an entity. The document entity has the pseudo-name of "[xml]" the DTD has the pseudo-name of "[dtd]" parameter entity names start with '%'; and general entities are just specified by their name.
      Parameters:
      name - The name of the entity.
      identifier - The resource identifier.
      encoding - The auto-detected IANA encoding name of the entity stream. This value will be null in those situations where the entity encoding is not auto-detected (e.g. internal entities or a document entity that is parsed from a java.io.Reader).
      augs - Additional information that may include infoset augmentations
      Throws:
      org.apache.xerces.xni.XNIException - Thrown by handler to signal an error.
    • endEntity

      public void endEntity(String name, org.apache.xerces.xni.Augmentations augs) throws org.apache.xerces.xni.XNIException
      This method notifies the end of an entity. The document entity has the pseudo-name of "[xml]" the DTD has the pseudo-name of "[dtd]" parameter entity names start with '%'; and general entities are just specified by their name.
      Parameters:
      name - The name of the entity.
      augs - Additional information that may include infoset augmentations
      Throws:
      org.apache.xerces.xni.XNIException - Thrown by handler to signal an error.
    • scanCharReferenceValue

      protected int scanCharReferenceValue(XMLStringBuffer buf, XMLStringBuffer buf2) throws IOException, org.apache.xerces.xni.XNIException
      Scans a character reference and append the corresponding chars to the specified buffer.

       [66] CharRef ::= 'invalid input: '&#'' [0-9]+ ';' | 'invalid input: '&#x'' [0-9a-fA-F]+ ';'
       
      Note: This method uses fStringBuffer, anything in it at the time of calling is lost.
      Parameters:
      buf - the character buffer to append chars to
      buf2 - the character buffer to append non-normalized chars to
      Returns:
      the character value or (-1) on conversion failure
      Throws:
      IOException
      org.apache.xerces.xni.XNIException
    • isInvalid

      protected boolean isInvalid(int value)
    • isInvalidLiteral

      protected boolean isInvalidLiteral(int value)
    • isValidNameChar

      protected boolean isValidNameChar(int value)
    • isValidNameStartChar

      protected boolean isValidNameStartChar(int value)
    • isValidNCName

      protected boolean isValidNCName(int value)
    • isValidNameStartHighSurrogate

      protected boolean isValidNameStartHighSurrogate(int value)
    • versionSupported

      protected boolean versionSupported(String version)
    • getVersionNotSupportedKey

      protected String getVersionNotSupportedKey()
    • scanSurrogates

      protected boolean scanSurrogates(XMLStringBuffer buf) throws IOException, org.apache.xerces.xni.XNIException
      Scans surrogates and append them to the specified buffer.

      Note: This assumes the current char has already been identified as a high surrogate.

      Parameters:
      buf - The StringBuffer to append the read surrogates to.
      Returns:
      True if it succeeded.
      Throws:
      IOException
      org.apache.xerces.xni.XNIException
    • reportFatalError

      protected void reportFatalError(String msgId, Object[] args) throws org.apache.xerces.xni.XNIException
      Convenience function used in all XML scanners.
      Throws:
      org.apache.xerces.xni.XNIException