Last time, we learned about JAXP, Xerces, DOM and the javax.xml.parsers Java Package. How about getting a little taste of the SAX interfaces? We look at available classes and interfaces, and learn how to use SAX for XML Processing. Given SAX's power, perhaps we can look forward to the day when we'll be translating not just XML, but maybe even Klingon! Maybe not. Before you get started, you'll want to download the support files for this tutorial.
While, in the case of the DOM parser, we pass the responsibility to the actual parser to parse the XML document and return us the DOM Document object, when using SAX, the approach is quite opposite: we call the parse method and pass a handler object – this handler will receive notifications about the parsing progress, errors encountered and so on.
NOTEAt the moment, SAX has reached version 2 (referred to as SAX2); applications based on SAX1 were using the HandlerBase class, while SAX2 has introduced the DefaultHandler class. We will target specifically SAX2, and therefore use the DefaultHandler class.
Now let’s have a look at a simple parsing process when using SAX, and you will see that there isn’t too much different at first glance (SimpleSAXParser.java) Just as in the case of the DocumentBuilder, we instantiate a factory which then we use to create the actual SAX parser:
The only major difference is in the call to the parse function – first, the parse function doesn’t return a Document object and, secondly, we need to specify a DefaultHandler-derived class! This means, as you might have guessed already, that the handler class is meant to build up the DOM internally, should it need to. (In fact, the DOM parser we have used in the part one of this series is internally using a SAX parser to build up the DOM.)
.parse( isXML, new SimpleSAXParser() );
If we are to try this code against the file simple1.xml we get the following:
Hmmm… while in the previous article, when using DOM, at least we had a Document object returned back. In this case, can we be sure that there is actually any parsing being done? One way of making sure of this is to feed simple2.xml into this simple SAX parser:
-classpath "%CLASSPATH%;." SimpleSAXParser simple2.xml Parsing error: org.xml.sax.SAXParseException: Expected "</duration>" to terminate element start ing on line 5. at org.apache.crimson.parser.Parser2.fatal(Parser2.java:3182) at org.apache.crimson.parser.Parser2.fatal(Parser2.java:3176) at org.apache.crimson.parser.Parser2.maybeElemen(Parser2.java:1513) at org.apache.crimson.parser.Parser2.content(Parser2.java:1779) at org.apache.crimson.parser.Parser2.maybeElement(Parser2.java:1507) at org.apache.crimson.parser.Parser2.content(Parser2.java:1779) at org.apache.crimson.parser.Parser2.maybeElement(Parser2.java:1507) at org.apache.crimson.parser.Parser2.content(Parser2.java:1779) at org.apache.crimson.parser.Parser2.maybeElement(Parser2.java:1507 at org.apache.crimson.parser.Parser2.parseInternal(Parser2.java:500) at org.apache.crimson.parser.Parser2.parse(Parser2.java:305) at org.apache.crimson.parser.XMLReaderImpl.parse(XMLReaderImpl.java:442) at javax.xml.parsers.SAXParser.parse(SAXParser.java:345) at javax.xml.parsers.SAXParser.parse(SAXParser.java:143) at SimpleSAXParser.main(SimpleSAXParser.java:98)
As we have expected, the parser did signal the problem with the <duration> tag – still, can we really be sure that the parsing went ok? We can, of course, by overriding functions in the DefaultHandler class – these are notification functions that get called by the parser every time an event occurs during the parsing process.