Home arrow XML arrow Page 4 - Java and XML Basics, Part 1
XML

Java and XML Basics, Part 1


In a previous article (XML Basics Part One);we had a brief look at XML. However, as stated in the article, XML itself is worth nothing without the set of APIs that are generated-it would simply be just another fancy form of CSV, that is, a proprietary data format! My aim in this set of articles is not to actually reveal XML in its every detail, but rather have a look at the implications of XML in todayís technologies. Thatís why the previous article only sheds a little light into the insights of XML, which included some terms and technologies you will be confronted with the moment you step in the XML arena.

Author Info:
By: Liviu Tudor
Rating: 5 stars5 stars5 stars5 stars5 stars / 48
March 08, 2004
TABLE OF CONTENTS:
  1. · Java and XML Basics, Part 1
  2. · The javax.xml.parsers Java Package
  3. · Parsing Using JAXP and the DocumentBuilder
  4. · Traversing the DOM
  5. · Appendix: Installing Xerces-J 2.0.0 under JDK 1.4

print this article
SEARCH DEVARTICLES

Java and XML Basics, Part 1 - Traversing the DOM
(Page 4 of 5 )

Now, letís imagine an application that is only interested in any automatically-started processes and, even more, only needs to know the application that was started.  We can modify the source weíve just written so that once the parsing is complete and we get our hand on the Document object, we traverse the DOM structure and only look for the data we need using the DOM API (SimpleDOMParser2.java).

Looking through the code, the following lines are actually doing the information retrieval:


/***********************/
/* Now traverse the DOM*/
/***********************/
Element elDoc 
doc.getDocumentElement();
//get a list of all the <session> nodes
NodeList lstSessions = elDoc.getElementsByTagName( "session" );
if( lstSessions.getLength() == 0 )
{
 System.out.println( "No sessions found!" );
 System.exit( 0 );
}
for( int i = 0; i < lstSessions.getLength(); i++ )
{
 //for each session, only keep those with type="automatic"
 Element elSession = (Element)lstSessions.item( i );
 String type = elSession.getAttribute( "type" );
 if( type.equals("automatic") )
 {
  /**
   * We have an automatic session, now find the application name
   */
  NodeList lstApp = elSession.getElementsByTagName( "application" );
  Element elApp = (Element)lstApp.item( 0 );
  //now find the first TEXT node, this will containing the app name
  NodeList lstChildren = elApp.getChildNodes();
  for( int j = 0; j < lstChildren.getLength(); j++ )
  {
   Node n = lstChildren.item( j );
   if( n.getNodeType() == Node.TEXT_NODE )
     System.out.println( "Found application:" + n.getNodeValue() );
  }
 }
}

As per the W3C DOM API, each XML document is structured as a tree of nodes, each node in turn being of a certain type: element, CDATA node, text node and so on. The basic function we are using here is getElementsByTagName Ė applied to an Element item, this function goes through the whole tree structure underneath this Element and finds all the Elements which have the given tag name. As our structure assumes that a session wonít include another session tag, it only means that searching underneath that root level for this tag will only return all the session tags.

NOTE  If the XML structure you are parsing allows a tag to contain itself as a child, you will have to make certain checks when using getElementsByTagName if you only want to retrieve the nodes at a certain level.

Example (simple3.xml):


<session type="manual" date="12/12/2003">
  
<duration>01:00:00</duration>
  
<files>7</files>
  
<application>notepad.exe</application>
  
<comments>Started by the administrator to edit some config files.</comments>
</session>
<session type="automatic" date="12/12/2003">
  
<duration>01:00:00</duration>
  
<files>7</files>
  
<application>grep.exe</application>
  
<comments/>
    
<session type="automatic" date="12/12/2003">
  
<duration>01:00:00</duration>
  
<files>7</files>
  
<application>rgrep.exe</application>
  
<comments>forked by grep.exe.</comments>
</session>
<session type="automatic" date="12/12/2003">
  
<duration>01:00:00</duration>
  
<files>7</files>
  
<application>find.exe</application>
  
<comments>forked by grep.exe.</comments>
</session>
</session>

In this example, the above code will produce the following result:


java -classpath "%CLASSPATH%;." SimpleDOMParser2 simple3.xml
Parsing successfull
!
Found application
:grep.exe
Found application
:rgrep.exe
Found application
:find.exe

Thatís because getElementsByTagName will search underneath root level (applog) and find all the tags with the name session. If we are only interested to capture the very high level session tags, we should check the parent of each node and make sure it is the root one (SimpleDOMParser3.java):


//make sure we are right underneath root level!
if( elSession.getParentNode() == elDoc )
{
  String type = elSession.getAttribute( "type" );
  ...
}

Such a source will correctly identify only the top level session tags:


java -classpath "%CLASSPATH%;." SimpleDOMParser3 simple3.xml
Parsing successfull
!
Found application
:grep.exe

Retrieving attributes on an Element is easy as it can be from these sources (just provide the attribute name as a parameter to the getAttribute function); however, getting the text enclosed within an XML tag is a bit trickier. Letís have a closer look at the code that does this:


Element elApp = (Element)lstApp.item);
//now find the first TEXT node, this will containing the app name
NodeList lstChildren = elApp.getChildNodes();
for( int j = 0; j < lstChildren.getLength(); j++ )
{
  Node n = lstChildren.item( j );
  if( n.getNodeType() == Node.TEXT_NODE )
  {
   System.out.println( "Found application:" + n.getNodeValue() );
   break;
 }
}

As you can see, what we do is get a list of all the children nodes (getChildNodes) and then we traverse this list and look for the first Node of type TEXT (this is where our text will be stored) and we find out the string by calling the getNodeValue function, which in the case of TEXT nodes will return the text contained within.The conclusion is that the DOM API is very intuitive and easy to use; however, as pointed out in the previous articles, it makes for heavy usage of memory, as the overall DOM structure is stored in memory, where we can inspect it at any time and retrieve the data we want. However, in the case of the above application, it doesnít possibly justify the waste of memory to hold all the XML structure in memory when we are only interested in a very thin bit of data (the application name of those sessions started automatically). In the next article we will have a look on how to use SAX together with JAXP and implement a better way of doing so.


blog comments powered by Disqus
XML ARTICLES

- Open XML Finally Supported by MS Office
- XML Features Added to Two Systems
- Using Regions with XSL Formatting Objects
- Using XSL Formatting Objects
- More Schematron Features
- Schematron Patterns and Validation
- Using Schematron
- Datatypes and More in RELAX NG
- Providing Options in RELAX NG
- An Introduction to RELAX NG
- Path, Predicates, and XQuery
- Using Predicates with XQuery
- Navigating Input Documents Using Paths
- XML Basics
- Introduction to XPath

Watch our Tech Videos 
Dev Articles Forums 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us 
Weekly Newsletter
 
Developer Updates  
Free Website Content 
Contact Us 
Site Map 
Privacy Policy 
Support 

Developer Shed Affiliates

 




© 2003-2017 by Developer Shed. All rights reserved. DS Cluster - Follow our Sitemap
Popular Web Development Topics
All Web Development Tutorials