So far, during this series of articles (part 1, part 2) we've looked at DOM and SAX, and I suppose most of you are thinking which one of the two approaches is preferable? Well, there is no general rule of thumb, but this article might help you make the right decision when you’ll have to.
Java and XML Basics, Part 3 - Problems with Big XML Files (Page 4 of 9 )
The problem of course occurs in the case of big XML files. Based on simple4.xml and good old copy/paste, I've created simple5.xml, which is now about 2 Mb big. Have a look at the results:
java -classpath "%CLASSPATH%;." SimpleDOMParser4 simple5.xml Parsing took : 2484 msec Memory occupied : -9637976 bytes Parsing successful! Traversing the DOM took 805551 msec Total processing time 808035 msec
As you can see, SAX is still ticking away in 1.5Kb and way under a second (actually slightly over half a second); while DOM seems to behave like a mammoth. (The "-" sign in front of the memory allocation figures proves that JVM had to request its heap to be increased to the operation system as it needed more than it had; the actual correct figure that can be found out using a typical profiler, and is around 14 megs!)
But don't rush to the conclusion yet! If you look carefully, the parsing process in the case of DOM took 2.8 seconds, but it created a document tree that we can inspect anytime later on in our program! Also, bear in mind that the program we have written actually steps through all the <session> tags -- all situated on the same level -- and checks them, which is quite thorough; in most cases, in real applications you will have quite a few levels in the document tree and also you wouldn't probably have to step through each element in the tree, but only a few of them (which you can probably find very quickly using getElementsByTagName!).
So to cut to the conclusion, the general rule is that SAX is always faster and lighter, but it might not be the most practical solution!
For those of you who are really fanatic about speed/performance testing, I happen to have the results of testing against 10, 20 and 50 Mb worth of files in the appendix. How I managed to get the patience to sit in front of the computer and wait for the SimpleDOMParser4 to rattle through all that data and finally print the figures deserves an article all its own!