XML
  Home arrow XML arrow Page 4 - Java and XML Basics, Part 1
Dev Articles Forums 
ADO.NET  
Apache  
ASP  
ASP.NET  
C#  
C++  
ColdFusion  
COM/COM+  
Delphi-Kylix  
Design Usability  
Development Cycles  
DHTML  
Embedded Tools  
Flash  
Graphic Design  
HTML  
IIS  
Interviews  
Java  
JavaScript  
MySQL  
Oracle  
Photoshop  
PHP  
Reviews  
Ruby-on-Rails  
SQL  
SQL Server  
Style Sheets  
VB.Net  
Visual Basic  
Web Authoring  
Web Services  
Web Standards  
XML  
Mobile Linux 
App Generation ROI 
IBM® developerWorks 
Weekly Newsletter
 
Developer Updates  
Free Website Content 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us Get Paid 
Request Media Kit
Contact Us 
Site Map 
Privacy Policy 
Support 
 USERNAME
 
 PASSWORD
 
 
  >>> SIGN UP!  
  Lost Password? 
XML

Java and XML Basics, Part 1
By: Liviu Tudor
  • Search For More Articles!
  • Disclaimer
  • Author Terms
  • Rating: 5 stars5 stars5 stars5 stars5 stars / 46
    2004-03-08

    Table of Contents:
  • Java and XML Basics, Part 1
  • The javax.xml.parsers Java Package
  • Parsing Using JAXP and the DocumentBuilder
  • Traversing the DOM
  • Appendix: Installing Xerces-J 2.0.0 under JDK 1.4

  • Rate this Article: Poor Best 
      ADD THIS ARTICLE TO:
      Del.ici.ous Digg
      Blink Simpy
      Google Spurl
      Y! MyWeb Furl
    Email Me Similar Content When Posted
    Add Developer Shed Article Feed To Your Site
    Email Article To Friend
    Print Version Of Article
    PDF Version Of Article
     
     
    ADVERTISEMENT


    Java and XML Basics, Part 1 - Traversing the DOM


    (Page 4 of 5 )

    Now, let’s imagine an application that is only interested in any automatically-started processes and, even more, only needs to know the application that was started.  We can modify the source we’ve just written so that once the parsing is complete and we get our hand on the Document object, we traverse the DOM structure and only look for the data we need using the DOM API (SimpleDOMParser2.java).

    Looking through the code, the following lines are actually doing the information retrieval:


    /***********************/
    /* Now traverse the DOM*/
    /***********************/
    Element elDoc 
    doc.getDocumentElement();
    //get a list of all the <session> nodes
    NodeList lstSessions = elDoc.getElementsByTagName( "session" );
    if( lstSessions.getLength() == 0 )
    {
     System.out.println( "No sessions found!" );
     System.exit( 0 );
    }
    for( int i = 0; i < lstSessions.getLength(); i++ )
    {
     //for each session, only keep those with type="automatic"
     Element elSession = (Element)lstSessions.item( i );
     String type = elSession.getAttribute( "type" );
     if( type.equals("automatic") )
     {
      /**
       * We have an automatic session, now find the application name
       */
      NodeList lstApp = elSession.getElementsByTagName( "application" );
      Element elApp = (Element)lstApp.item( 0 );
      //now find the first TEXT node, this will containing the app name
      NodeList lstChildren = elApp.getChildNodes();
      for( int j = 0; j < lstChildren.getLength(); j++ )
      {
       Node n = lstChildren.item( j );
       if( n.getNodeType() == Node.TEXT_NODE )
         System.out.println( "Found application:" + n.getNodeValue() );
      }
     }
    }

    As per the W3C DOM API, each XML document is structured as a tree of nodes, each node in turn being of a certain type: element, CDATA node, text node and so on. The basic function we are using here is getElementsByTagName – applied to an Element item, this function goes through the whole tree structure underneath this Element and finds all the Elements which have the given tag name. As our structure assumes that a session won’t include another session tag, it only means that searching underneath that root level for this tag will only return all the session tags.

    NOTE  If the XML structure you are parsing allows a tag to contain itself as a child, you will have to make certain checks when using getElementsByTagName if you only want to retrieve the nodes at a certain level.

    Example (simple3.xml):


    <session type="manual" date="12/12/2003">
      
    <duration>01:00:00</duration>
      
    <files>7</files>
      
    <application>notepad.exe</application>
      
    <comments>Started by the administrator to edit some config files.</comments>
    </session>
    <session type="automatic" date="12/12/2003">
      
    <duration>01:00:00</duration>
      
    <files>7</files>
      
    <application>grep.exe</application>
      
    <comments/>
        
    <session type="automatic" date="12/12/2003">
      
    <duration>01:00:00</duration>
      
    <files>7</files>
      
    <application>rgrep.exe</application>
      
    <comments>forked by grep.exe.</comments>
    </session>
    <session type="automatic" date="12/12/2003">
      
    <duration>01:00:00</duration>
      
    <files>7</files>
      
    <application>find.exe</application>
      
    <comments>forked by grep.exe.</comments>
    </session>
    </session>

    In this example, the above code will produce the following result:


    java -classpath "%CLASSPATH%;." SimpleDOMParser2 simple3.xml
    Parsing successfull
    !
    Found application
    :grep.exe
    Found application
    :rgrep.exe
    Found application
    :find.exe

    That’s because getElementsByTagName will search underneath root level (applog) and find all the tags with the name session. If we are only interested to capture the very high level session tags, we should check the parent of each node and make sure it is the root one (SimpleDOMParser3.java):


    //make sure we are right underneath root level!
    if( elSession.getParentNode() == elDoc )
    {
      String type = elSession.getAttribute( "type" );
      ...
    }

    Such a source will correctly identify only the top level session tags:


    java -classpath "%CLASSPATH%;." SimpleDOMParser3 simple3.xml
    Parsing successfull
    !
    Found application
    :grep.exe

    Retrieving attributes on an Element is easy as it can be from these sources (just provide the attribute name as a parameter to the getAttribute function); however, getting the text enclosed within an XML tag is a bit trickier. Let’s have a closer look at the code that does this:


    Element elApp = (Element)lstApp.item);
    //now find the first TEXT node, this will containing the app name
    NodeList lstChildren = elApp.getChildNodes();
    for( int j = 0; j < lstChildren.getLength(); j++ )
    {
      Node n = lstChildren.item( j );
      if( n.getNodeType() == Node.TEXT_NODE )
      {
       System.out.println( "Found application:" + n.getNodeValue() );
       break;
     }
    }

    As you can see, what we do is get a list of all the children nodes (getChildNodes) and then we traverse this list and look for the first Node of type TEXT (this is where our text will be stored) and we find out the string by calling the getNodeValue function, which in the case of TEXT nodes will return the text contained within.The conclusion is that the DOM API is very intuitive and easy to use; however, as pointed out in the previous articles, it makes for heavy usage of memory, as the overall DOM structure is stored in memory, where we can inspect it at any time and retrieve the data we want. However, in the case of the above application, it doesn’t possibly justify the waste of memory to hold all the XML structure in memory when we are only interested in a very thin bit of data (the application name of those sessions started automatically). In the next article we will have a look on how to use SAX together with JAXP and implement a better way of doing so.

    More XML Articles
    More By Liviu Tudor


       · good one for beginers....
     

    XML ARTICLES

    - Using Regions with XSL Formatting Objects
    - Using XSL Formatting Objects
    - More Schematron Features
    - Schematron Patterns and Validation
    - Using Schematron
    - Datatypes and More in RELAX NG
    - Providing Options in RELAX NG
    - An Introduction to RELAX NG
    - Path, Predicates, and XQuery
    - Using Predicates with XQuery
    - Navigating Input Documents Using Paths
    - XML Basics
    - Introduction to XPath
    - Simple Web Syndication with RSS 2.0
    - Java UI Design with an IDE







    © 2003-2009 by Developer Shed. All rights reserved. DS Cluster 6 Hosted by Hostway
    For more Enterprise Application Development news, visit eWeek