Home arrow XML arrow Page 2 - Parsing XML with SAX and Python

Parsing XML with SAX and Python

In this article Nadia explains how to parse an XML document using the SAX API implementation available for Python.

Author Info:
By: Nadia Poulou
Rating: 4 stars4 stars4 stars4 stars4 stars / 56
November 09, 2004
  1. · Parsing XML with SAX and Python
  2. · The xml.sax Package
  3. · Our SAX Parser
  4. · The Heart of the Code
  5. · Element Content
  6. · The Main Code
  7. · Homework

print this article

Parsing XML with SAX and Python - The xml.sax Package
(Page 2 of 7 )

SAX is a simple API for XML. The package xml.sax and its sub packages provide a Python implementation of the SAX interface.

The structure of a SAX application should include one or more input sources, parser and handler objects. The idea is as follows: a parser reads the bytes or characters from the input source and fires a sequence of events on the handler. In this document and in the Python documentation the term ‘reader’ is preferred over ‘parser’.

The SAX API defines four basic interfaces. Since Python does not support interfaces, these SAX interfaces are implemented in the xml.sax.handler module as the following Python classes:

  1. ContentHandler: this implements the main SAX interface for handling document events. It is also the interface which we will use in the example of the next section

  2. DTDHandler: class for handling DTD events

  3. EntityResolver: class for resolving external entities

  4. ErrorHandler: as the name suggests, this class is used for reporting all errors and warnings.

I would like to mention here the presence of the DefaultHandler class from the xml.sax.saxutils package that inherits from all four interfaces above. An application needs to implement only the interfaces it needs, as will be shown by the following example.

Basic Methods

Now we have checked out the interfaces, it’s time to see the basic methods of the xml.sax package. These are:

make_parser() - This will create and return an SAX XMLReader object. Notice that the xml.sax readers are non-validating.

parse(filename, handler) - This will create a parser and parse the given document (which can be passed either as a file object or as a stream). The handler is one of the SAX interfaces we mentioned above.

A reader and a handler can be connected with the appropriate method (for example setContentHandler() for a ContentHandler object). Once this happens, the reader will notify of parsing events through the methods of the handler. In the following example, the methods startElement(), endElement() and characters() of the ContentHandler illustrate this procedure.

We will not go into error handling details in this article, but xml.sax provides enough exception classes for your programming needs. In the Python reference documentation, you may find more details.

Enough theory, let’s move on to a hands-on example.

blog comments powered by Disqus

- Open XML Finally Supported by MS Office
- XML Features Added to Two Systems
- Using Regions with XSL Formatting Objects
- Using XSL Formatting Objects
- More Schematron Features
- Schematron Patterns and Validation
- Using Schematron
- Datatypes and More in RELAX NG
- Providing Options in RELAX NG
- An Introduction to RELAX NG
- Path, Predicates, and XQuery
- Using Predicates with XQuery
- Navigating Input Documents Using Paths
- XML Basics
- Introduction to XPath

Watch our Tech Videos 
Dev Articles Forums 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us 
Weekly Newsletter
Developer Updates  
Free Website Content 
Contact Us 
Site Map 
Privacy Policy 

Developer Shed Affiliates


© 2003-2019 by Developer Shed. All rights reserved. DS Cluster - Follow our Sitemap
Popular Web Development Topics
All Web Development Tutorials