Java and XML Basics, Part 2 - Simple State Machine
(Page 4 of 6 )
In the next example (SimpleSAXParser4.java) we will build a simple “state” machine that will only give us the level we are currently at in the DOM tree. This will increase every time startElement is called and decrease every time endElement is called – and based on its value we will indent the lines accordingly:
...
/**
* Our "state" machine -- the current level we
are on
* (0 means root level)
*/
private
int m_Level = 0;
private void spaceForLevel()
{
if( m_Level <= 0
)
return;
for( int i = 0; i < m_Level; i++
)
System.out.print( " " );
}
...
public void
startDocument()
{
m_Level =
0;
spaceForLevel();
System.out.println( "Document started."
);
}
public void
endDocument()
{
spaceForLevel();
System.out.println(
"Document ended." );
m_Level = 0;
}
public void startElement( String namespaceURI, String localName, String
qName, Attributes atts
)
{
spaceForLevel();
System.out.println( "Started element "
+ qName );
m_Level++;
}
public void endElement( String namespaceURI, String localName, String qName
)
{
m_Level--;
spaceForLevel();
System.out.println(
"Ended element " + qName );
}
public void characters( char[] ch, int start, int length
)
{
spaceForLevel();
System.out.println( "Encountered
characters:’" + new String(ch, start, length) + “’”
);
}
...
Run this code against
simple1.xml and it now begins to make sense!
Document started.
Started element applog
Encountered characters:''
Encountered characters:'
'
Encountered characters:' '
Started element session
Encountered characters:''
Encountered characters:'
'
Encountered characters:' '
Started element duration
Encountered characters:'01:00:00'
Ended element duration
Encountered characters:''
Encountered characters:'
'
Encountered characters:' '
Started element files
Encountered characters:'7'
Ended element files
Encountered characters:''
Encountered characters:'
'
Encountered characters:' '
Started element application
Encountered characters:'notepad.exe'
Ended element application
Encountered characters:''
Encountered characters:'
'
Encountered characters:' '
Started element comments
Encountered characters:'Started by the administrator to edit some config f
iles.'
Ended element comments
Encountered characters:''
Encountered characters:'
'
Encountered characters:' '
Ended element session
Encountered characters:''
Encountered characters:'
'
Encountered characters:''
Encountered characters:'
'
Encountered characters:' '
Started element session
Encountered characters:''
Encountered characters:'
'
Encountered characters:' '
Started element duration
Encountered characters:'00:10:00'
Ended element duration
Encountered characters:''
Encountered characters:'
'
Encountered characters:' '
Started element distance
Encountered characters:'37'
Ended element distance
Encountered characters:''
Encountered characters:'
'
Encountered characters:' '
Started element application
Encountered characters:'grep.exe'
Ended element application
Encountered characters:''
Encountered characters:'
'
Encountered characters:' '
Started element comments
Encountered characters:'Probably part of one of the maintenance scripts.'
Ended element comments
Encountered characters:''
Encountered characters:'
'
Encountered characters:' '
Ended element session
Encountered characters:''
Encountered characters:'
'
Ended element applog
Document ended.
Parsing successfull!
NOTE We’ve added the apostrophes so we can see clearly whether there is actually any character or whether it is just a dummy call we are receiving
It is clear now that the parser (1) starts with the applog tag (which we were expecting, as this is the root element), then (2) comes over the spaces in between the end of the applog tag and the beginning of the session tag. (3) Then, the parsers reaches the session tag and sends a notification. (4) All the spaces then up until the beginning of the duration tag, when it (5) sends us a notification again. (6) The parser then finds the characters enclosed within the duration tag, so it (7) notifies our class again and so on. Now, based on this code and the little state machine, you can probably figure out how a DocumentBuilder class (the one we have used in the previous article) builds up the whole DOM little by little...
Next: Using SAX for XML Processing >>
More XML Articles
More By Liviu Tudor