Home arrow XML arrow Page 5 - Parsing XML with SAX and Python
XML

Parsing XML with SAX and Python


In this article Nadia explains how to parse an XML document using the SAX API implementation available for Python.

Author Info:
By: Nadia Poulou
Rating: 4 stars4 stars4 stars4 stars4 stars / 56
November 09, 2004
TABLE OF CONTENTS:
  1. · Parsing XML with SAX and Python
  2. · The xml.sax Package
  3. · Our SAX Parser
  4. · The Heart of the Code
  5. · Element Content
  6. · The Main Code
  7. · Homework

print this article
SEARCH DEVARTICLES

Parsing XML with SAX and Python - Element Content
(Page 5 of 7 )

The elements ‘points’ and ‘rebounds’ in our XML document are a little different, in the sense that their value is not set in element properties. This means that what we need to parse is the element content. This is the job of the characters() method, where our variables playerPoints and playerRebounds will be loaded. This is why, at the moment a ‘points’ or ‘rebound’ element is found, we set our flags to 1. Here is how our startElement() method looks like:

def startElement(self, name, attrs):

   if name == 'player':     
     self.playerName = attrs.get('name',"")
     self.playerAge = attrs.get('age',"")
     self.playerHeight = attrs.get('height',"")
   elif name == 'points':
     self.isPointsElement= 1;
     self.playerPoints = "";
   elif name == 'rebounds':
     self.isReboundsElement = 1;
     self.playerRebounds = "";
   return

In the endElement() method we finally do the comparison of our search term with the value of the ‘name’ property. If they match, then we print our output. You can format this output anyway you like. This is also the proper place to re-set our flags, before the parser moves to the next element.

Here is how our endElement() method looks:

 def endElement(self, name):
   if name == 'points':
     self.isPointsElement= 0
   if name == 'rebounds':
     self.inPlayersContent = 0
   if name == 'player' and self.searchTerm== self.playerName :
       print '<h2>Statistics for player:' , self.playerName, '</h2><br>(age:', self.playerAge , 'height' , self.playerHeight , ")<br>"
       print 'Match average:', self.playerPoints , 'points,' , self.playerRebounds, 'rebounds'

The characters() method is invoked whenever a chunk of character data is found. Here is the place we use the flags set in our startElement() method; when they have the value of ‘1’ we load our variables with the data. Please note that all our character data are not necessarily returned in a single call. The function may split it in more than one chunks.

Here is how our characters() method looks:

def characters (self, ch):
   if self.isPointsElement== 1:
     self.playerPoints += ch
   if self.isReboundsElement == 1:
     self.playerRebounds += ch

So, this is it with the basic structure of our application! If you remember, this script will be called from a Web form and our search term is in the field ‘playerName’ of this form. The following part is the ‘main’ code that does this job and uses the methods defined earlier.


blog comments powered by Disqus
XML ARTICLES

- Open XML Finally Supported by MS Office
- XML Features Added to Two Systems
- Using Regions with XSL Formatting Objects
- Using XSL Formatting Objects
- More Schematron Features
- Schematron Patterns and Validation
- Using Schematron
- Datatypes and More in RELAX NG
- Providing Options in RELAX NG
- An Introduction to RELAX NG
- Path, Predicates, and XQuery
- Using Predicates with XQuery
- Navigating Input Documents Using Paths
- XML Basics
- Introduction to XPath

Watch our Tech Videos 
Dev Articles Forums 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us 
Weekly Newsletter
 
Developer Updates  
Free Website Content 
Contact Us 
Site Map 
Privacy Policy 
Support 

Developer Shed Affiliates

 




© 2003-2017 by Developer Shed. All rights reserved. DS Cluster - Follow our Sitemap
Popular Web Development Topics
All Web Development Tutorials