Last time, we learned about JAXP, Xerces, DOM and the javax.xml.parsers Java Package. How about getting a little taste of the SAX interfaces? We look at available classes and interfaces, and learn how to use SAX for XML Processing. Given SAX's power, perhaps we can look forward to the day when we'll be translating not just XML, but maybe even Klingon! Maybe not. Before you get started, you'll want to download the support files for this tutorial.
Java and XML Basics, Part 2 - Simple State Machine (Page 4 of 6 )
In the next example (SimpleSAXParser4.java) we will build a simple “state” machine that will only give us the level we are currently at in the DOM tree. This will increase every time startElement is called and decrease every time endElement is called – and based on its value we will indent the lines accordingly:
... /** * Our "state" machine -- the current level we are on * (0 means root level) */ private int m_Level = 0; private void spaceForLevel() { if( m_Level <= 0 ) return; for( int i = 0; i < m_Level; i++ ) System.out.print( " " ); } ... public void startDocument() { m_Level = 0; spaceForLevel(); System.out.println( "Document started." ); } public void endDocument() { spaceForLevel(); System.out.println( "Document ended." ); m_Level = 0; } public void startElement( String namespaceURI, String localName, String qName, Attributes atts ) { spaceForLevel(); System.out.println( "Started element " + qName ); m_Level++; } public void endElement( String namespaceURI, String localName, String qName ) { m_Level--; spaceForLevel(); System.out.println( "Ended element " + qName ); } public void characters( char[] ch, int start, int length ) { spaceForLevel(); System.out.println( "Encountered characters:’" + new String(ch, start, length) + “’” ); } ...
Run this code against simple1.xml and it now begins to make sense!
Document started. Started element applog Encountered characters:'' Encountered characters:' ' Encountered characters:' ' Started element session Encountered characters:'' Encountered characters:' ' Encountered characters:' ' Started element duration Encountered characters:'01:00:00' Ended element duration Encountered characters:'' Encountered characters:' ' Encountered characters:' ' Started element files Encountered characters:'7' Ended element files Encountered characters:'' Encountered characters:' ' Encountered characters:' ' Started element application Encountered characters:'notepad.exe' Ended element application Encountered characters:'' Encountered characters:' ' Encountered characters:' ' Started element comments Encountered characters:'Started by the administrator to edit some config f iles.' Ended element comments Encountered characters:'' Encountered characters:' ' Encountered characters:' ' Ended element session Encountered characters:'' Encountered characters:' ' Encountered characters:'' Encountered characters:' ' Encountered characters:' ' Started element session Encountered characters:'' Encountered characters:' ' Encountered characters:' ' Started element duration Encountered characters:'00:10:00' Ended element duration Encountered characters:'' Encountered characters:' ' Encountered characters:' ' Started element distance Encountered characters:'37' Ended element distance Encountered characters:'' Encountered characters:' ' Encountered characters:' ' Started element application Encountered characters:'grep.exe' Ended element application Encountered characters:'' Encountered characters:' ' Encountered characters:' ' Started element comments Encountered characters:'Probably part of one of the maintenance scripts.' Ended element comments Encountered characters:'' Encountered characters:' ' Encountered characters:' ' Ended element session Encountered characters:'' Encountered characters:' ' Ended element applog Document ended. Parsing successfull!
NOTEWe’ve added the apostrophes so we can see clearly whether there is actually any character or whether it is just a dummy call we are receiving
It is clear now that the parser (1) starts with the applog tag (which we were expecting, as this is the root element), then (2) comes over the spaces in between the end of the applog tag and the beginning of the session tag. (3) Then, the parsers reaches the session tag and sends a notification. (4) All the spaces then up until the beginning of the duration tag, when it (5) sends us a notification again. (6) The parser then finds the characters enclosed within the duration tag, so it (7) notifies our class again and so on. Now, based on this code and the little state machine, you can probably figure out how a DocumentBuilder class (the one we have used in the previous article) builds up the whole DOM little by little...