So far, during this series of articles (part 1, part 2) we've looked at DOM and SAX, and I suppose most of you are thinking which one of the two approaches is preferable? Well, there is no general rule of thumb, but this article might help you make the right decision when you’ll have to.
Running SimpleDOMParser4 against same file produces:
java -classpath "%CLASSPATH%;." SimpleDOMParser4 simple1.xml Parsing took : 50 msec Memory occupied : 5680 bytes Parsing successful! Traversing the DOM took 10 msec Total processing time 60 msec
So over all a delay of about 10 msec (remember that the figures are approximate), and a memory consumption which is bigger by about 4kb when using DOM as opposed to SAX. Now, that isn't too much, most of you will agree (ok, apart from those of you that can are still fanatic about the 48k that the good ole' Sinclair Spectrum used to have), but let's put this in perspective: the simple1.xml file is just slightly over 0.5 Kb, so what would happen if we were to process a large XML file? To test this, I've copied and pasted the data in simple1.xml file quite a few times and ran the tests against this new file (simple4.xml which is now about 65kb worth of XML). The results speak for themselves:
Parsing took 191 msec Memory occupied 1496 bytes Parsing successful!
java -classpath "%CLASSPATH%;." SimpleDOMParser4 simple4.xml Parsing took : 300 msec Memory occupied : 438856 bytes Parsing successful! Traversing the DOM took 681 msec Total processing time 991 msec
Now this is significant! While the SAX approach will only take less then 0.2 seconds, the DOM implementation goes to nearly a whole second; also, the memory taken by the DOM approach is huge compared to the (nearly) 1.5Kb in the case of SAX!
The explanation for this is quite simple:
The DOM parser will have to build a document tree as the parsing goes on -- this takes both time and memory; while in the case of SAX we process the data as it "arrives", without necessary storing it.
Once the document tree is created (in the case of DOM) we then have to step through this tree and find/retrieve the relevant information for us -- in other way we actually step through the DOM tree twice every parsing!
Assuming that we only make usage of one single item in the whole XML document, the (nearly) 0.5 Mb taken by DOM becomes a waste. However, if we need to come back to the parsed information very often and we are going to make usage of most of the data, then using SAX might be pointless as we will have to gather all this data in a tree/list/stack/array and that might be too much of an overload in terms of programming when it's easier to just use the DOM API to traverse the tree and retrieve individual elements. Also, the DOM API is easier to use from a programmer's point of view, and as the differences are nearly unnoticeable in the case of small files, you could use the DOM approach without too much of an overhead.