Advanced SAX (Page 1 of 4 )
This article, the first of three parts, takes a look at the Simple API for XML (SAX) that goes beyond basic parsing and content handling. It is excerpted from chapter four of the book
Java and XML, Third Edition, written by Brett McLaughlin and Justin Edelson (O'Reilly, 2006; ISBN: 059610149X). Copyright © 2006 O'Reilly Media, Inc. All rights reserved. Used with permission from the publisher. Available from booksellers or direct from O'Reilly Media.
What you’ve seen regarding SAX so far is essentially the simplest way to process and parse XML. And while SAX is indeed named the Simple API for XML, it offers programmers much more than basic parsing and content handling. There is an array of settings that affect parser behavior, as well as several additional handlers for edge-case scenarios; if you need to specify exactly how strings should be interned, or what behavior should occur when a DTD declares a notation, or even differentiate between CDATA sections and regular text sections, SAX provides. In fact, you can even modify and write out XML using SAX (along with a few additional packages); SAX is a full-featured API, and this chapter will give you the lowdown on features that go beyond simple parsing.
Properties and Features I glossed over validation in the last chapter, and probably left you with a fair amount of questions. When I cover JAXP in Chapter 7, you’ll see that you can use either a method (setValidating() ) or a set of classes ( javax.xml.validation ) to handle validation; you might expect to call a similar method— setValidation() or something similar—to initiate validation in SAX. But then, there’s also namespace awareness, dealt with quite a bit in Chapter 2 (and Chapter 3, with respect to Q names and local names—maybe setNamespaceAwareness() ? But what about schema validation? And setting the location of a schema to validate on, if the document doesn’t specify one? There’s also low-level behavior, like telling the parser what to do with entities (parse them? don’t parse them?), how to handle strings, and a lot more. As you can imagine, dealing with each of these could cause real API bloat, adding 20 or 30 methods to SAX’s XMLReader class. And, even worse, each time a new setting was needed (perhaps for the next type of constraint model supported? How about setRelaxNGSchema() ?), the SAX APIwould have to add a method or two, and re-release a new version. Clearly, this isn’t a very effective approach to API design.
If this isn’t clear to you, check out Head First Design Patterns, by Elisabeth and Eric Freeman (O’Reilly). In particular, read up on Chapter 1 (pages 8 and 9), which details why it’s critical to encapsulate what varies.
To address the ever-changing need to affect parser behavior, without causing constant API change, SAX 2 defines a standard mechanism for setting parser behavior: through the use of properties and features.
Next: Setting Properties and Features >>
More Java Articles
More By O'Reilly Media
|
This article is excerpted from chapter four of the book Java and XML, Third Edition, written by Brett McLaughlin and Justin Edelson (O'Reilly, 2006; ISBN: 059610149X). Check it out today at your favorite bookstore. Buy this book now.
|
|