Advanced SAX - Resolving Entities
(Page 4 of 4 )
You’ve already seen how to interact with content in the XML document you’re parsing (using ContentHandler ), and how to deal with error conditions ( ErrorHandler ). Both of these are concerned specifically with the data in an XML document. What I haven’t talked about is the process by which the parser goes outside of the document and gets data. For example, consider a simple entity reference in an XML document:
<FM>
<P>Text placed in the public domain by Moby Lexical Tools, 1992.</P>
<P>SGML markup by Jon Bosak, 1992-1994.</P>
<P>XML version by Jon Bosak, 1996-1998.</P>
<P>&usage-terms;</P>
</FM>
Your schema then indicates to the parser how to resolve that entity:
<!ENTITY usage-terms
SYSTEM "http://www.newInstance.com/entities/usage-terms.xml">
At parse time, the usage-terms entity reference will be expanded (in this case, to “This work may be freely copied and distributed worldwide.”, as seen in Figure 4-1).

Figure 4-1. The usage-terms entity was resolved to a URI, which was then parsed and inserted into the document
However, there are several cases where you might not want this “default” behavior:
- You don’t have network access, so you want the entity to resolve to a local copy of the referenced document (perhaps a version you’ve downloaded yourself).
- You want to substitute your own content for the content specified in the schema.
You can short-circuit normal entity resolution using org.xml.sax.EntityResolver . This interface does exactly what it says: resolves entities. More important, it allows you to get involved in the entity resolution process. The interface defines only a sin gle method, as shown in Figure 4-2.
To insert your own logic into the resolution process, create an implementation of this interface, and register it with your XMLReader instance through setEntityResolver() . Once that’s done, every time the reader comes across an entity

Figure 4-2. There's not much to the EntityResolver class; just a single, albeit useful, method
reference, it passes the public ID and system ID for that entity to the resolveEntity() method of your EntityResolver implementation.
Typically, the XML reader resolves the entity through the specified public or system ID. If you want to accept this default behavior in your own EntityResolver implementation, just return null from your version of resolveEntity() . In fact, you should always make sure that whatever code you add to your resolveEntity() implementation, it returns null in the default case. In other words, start with an implementation class that looks like Example 4-1.
Example 4-1. Before coding in special entity resolution, always ensure that any unhandled cases result in a null return value (and therefore normal entity resolution)
package javaxml3;
import java.io.IOException;
import org.xml.sax.EntityResolver;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
public class SimpleEntityResolver implements EntityResolver {
public InputSource resolveEntity(String publicID, String systemID )
throws IOException, SAXException {
// In the default case, return null
return null;
}
}
Please check back tomorrow for the continuation of this article.
| DISCLAIMER: The content provided in this article is not warranted or guaranteed by Developer Shed, Inc. The content provided is intended for entertainment and/or educational purposes in order to introduce to the reader key ideas, concepts, and/or product reviews. As such it is incumbent upon the reader to employ real-world tactics for security and implementation of best practices. We are not liable for any negative consequences that may result from implementing any information covered in our articles or tutorials. If this is a hardware review, it is not recommended to open and/or modify your hardware. |
|
This article is excerpted from chapter four of the book Java and XML, Third Edition, written by Brett McLaughlin and Justin Edelson (O'Reilly, 2006; ISBN: 059610149X). Check it out today at your favorite bookstore. Buy this book now.
|
|