Home arrow XML arrow Page 4 - Java and XML Basics, Part 3

Java and XML Basics, Part 3

So far, during this series of articles (part 1, part 2) we've looked at DOM and SAX, and I suppose most of you are thinking which one of the two approaches is preferable? Well, there is no general rule of thumb, but this article might help you make the right decision when you’ll have to.

Author Info:
By: Liviu Tudor
Rating: 5 stars5 stars5 stars5 stars5 stars / 25
April 20, 2004
  1. · Java and XML Basics, Part 3
  2. · Which One is the Better One to Use?
  3. · Running the Parser
  4. · Problems with Big XML Files
  5. · Validating Parsers - DOM
  6. · Where do We Get a Validating Parser?
  7. · ErrorHandler
  8. · Validating Parsers - SAX
  9. · Conclusion

print this article

Java and XML Basics, Part 3 - Problems with Big XML Files
(Page 4 of 9 )

The problem of course occurs in the case of big XML files. Based on simple4.xml and good old copy/paste, I've created simple5.xml, which is now about 2 Mb big. Have a look at the results:

java -classpath "%CLASSPATH%;." SimpleDOMParser4 simple5.xml
Parsing took : 2484 msec
Memory occupied : -9637976 bytes
Parsing successful!
Traversing the DOM took 805551 msec
Total processing time 808035 msec

java -classpath "%CLASSPATH%;." SimpleSAXParser7 simple5.xml
Parsing took 561 msec
Memory occupied 1496 bytes
Parsing successful!

As you can see, SAX is still ticking away in 1.5Kb and way under a second (actually slightly over half a second); while DOM seems to behave like a mammoth. (The "-" sign in front of the memory allocation figures proves that JVM had to request its heap to be increased to the operation system as it needed more than it had; the actual correct figure that can be found out using a typical profiler, and is around 14 megs!)

But don't rush to the conclusion yet! If you look carefully, the parsing process in the case of DOM took 2.8 seconds, but it created a document tree that we can inspect anytime later on in our program! Also, bear in mind that the program we have written actually steps through all the <session> tags -- all situated on the same level -- and checks them, which is quite thorough; in most cases, in real applications you will have quite a few levels in the document tree and also you wouldn't probably have to step through each element in the tree, but only a few of them (which you can probably find very quickly using getElementsByTagName!).

So to cut to the conclusion, the general rule is that SAX is always faster and lighter, but it might not be the most practical solution!

For those of you who are really fanatic about speed/performance testing, I happen to have the results of testing against 10, 20 and 50 Mb worth of files in the appendix. How I managed to get the patience to sit in front of the computer and wait for the SimpleDOMParser4 to rattle through all that data and finally print the figures deserves an article all its own!

blog comments powered by Disqus

- Open XML Finally Supported by MS Office
- XML Features Added to Two Systems
- Using Regions with XSL Formatting Objects
- Using XSL Formatting Objects
- More Schematron Features
- Schematron Patterns and Validation
- Using Schematron
- Datatypes and More in RELAX NG
- Providing Options in RELAX NG
- An Introduction to RELAX NG
- Path, Predicates, and XQuery
- Using Predicates with XQuery
- Navigating Input Documents Using Paths
- XML Basics
- Introduction to XPath

Watch our Tech Videos 
Dev Articles Forums 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us 
Weekly Newsletter
Developer Updates  
Free Website Content 
Contact Us 
Site Map 
Privacy Policy 

Developer Shed Affiliates


© 2003-2018 by Developer Shed. All rights reserved. DS Cluster - Follow our Sitemap
Popular Web Development Topics
All Web Development Tutorials