Last time, we learned about JAXP, Xerces, DOM and the javax.xml.parsers Java Package. How about getting a little taste of the SAX interfaces? We look at available classes and interfaces, and learn how to use SAX for XML Processing. Given SAX's power, perhaps we can look forward to the day when we'll be translating not just XML, but maybe even Klingon! Maybe not. Before you get started, you'll want to download the support files for this tutorial.
Java and XML Basics, Part 2 - Parser Reports (Page 3 of 6 )
We will get back to “listening” to document errors in a minute; now let’s take a look at what our parser reports in terms of document content progress. We will override the following functions declared in the ContentHandler interface:
startDocument
endDocument
startElement
endElement
characters
All we do in our code at this stage (SampleSAXParser3.java) is print out messages regarding the notification(s) we have received:
... public void startDocument() { System.out.println( "Document started." ); } public void endDocument() { System.out.println( "Document ended." ); } public void startElement( String namespaceURI, String localName, String qName, Attributes atts ) { System.out.println( "Started element " + qName ); } public void endElement( String namespaceURI, String localName, String qName ) { System.out.println( "Ended element " + qName ); } public void characters( char[] ch, int start, int length ) { System.out.println( "Encountered characters:" + new String(ch, start, length) ); }
Running this class against our simple1.xml document will produce something like this:
java
-classpath "%CLASSPATH%;." SimpleSAXParser3 simple1.xml Document started. Started element applog Encountered characters: Encountered characters:
Encountered characters: Started element session Encountered characters: Encountered characters:
Encountered characters: Started element duration Encountered characters:01:00:00 Ended element duration Encountered characters: Encountered characters:
Encountered characters: Started element files Encountered characters:7 Ended element files Encountered characters: Encountered characters:
Encountered characters: Started element application Encountered characters:notepad.exe Ended element application Encountered characters: Encountered characters:
Encountered characters: Started element comments Encountered characters:Started by the administrator to edit some config files. Ended element comments Encountered characters: Encountered characters:
Encountered characters: Ended element session Encountered characters: Encountered characters:
Encountered characters: Encountered characters:
Encountered characters: Started element session Encountered characters: Encountered characters:
Encountered characters: Started element duration Encountered characters:00:10:00 Ended element duration Encountered characters: Encountered characters:
Encountered characters: Started element distance Encountered characters:37 Ended element distance Encountered characters: Encountered characters:
Encountered characters: Started element application Encountered characters:grep.exe Ended element application Encountered characters: Encountered characters:
Encountered characters: Started element comments Encountered characters:Probably part of one of the maintenance scripts. Ended element comments Encountered characters: Encountered characters:
Encountered characters: Ended element session Encountered characters: Encountered characters:
Ended element applog Document ended. Parsing successfull!
As you can see, our parser is doing its job properly! Now, assuming we want to make a pretty-printing of this output and want to indent each line depending on the “depth” of the element in the DOM tree. "Easy," some might say--but then again, some said 640k RAM would be enough for any possible programmer out there! The problem is that from this code we simply can’t tell how deep we are in the tree when we receive one of the contents notifications! In the case of the DOM interface by using getParent--or getChild--we could figure out exactly where we were in the DOM; however, SAX doesn’t do that for us. Instead, it is up to each program using SAX to build its own state machine based on the notifications received.