Extensible Mark-up Language (XML) has just about made its impression everywhere: B2B, B2C, as a replacement for .INI files, to aid with data abstraction and transportation, etc. In this article Mitchell shows us how to use the DOMXML library to parse and extract data from both local and remote XML files with PHP.
Parsing XML With DOMXML And PHP - What is the DOMXML parsing method? (Page 2 of 6 )
The DOMXML (Document Object Model Extensible Mark-up Language) is one of two ways to parse an XML document. The DOM method treats an XML file as a complete entity, which contains related items in a hierarchical structure. Using the DOM method, we can reference XML nodes using parent/child relationships, loops and associative arrays of related elements.
The DOMXML method loads the entire XML document into memory and organizes each and every element so that they are easily accessible using a DOM-compliant XML parser. If you've ever worked with XML on a Windows system, then you will be familiar with the MSXML library, which is a great example of a DOM-compliant XML parser.
If we have an XML document that looks like this:
... then this would be represented in-memory using a DOM-compliant XML parser like this:
As you can see, the document is hierarchically structured, and it's very easy to differentiate between parent and child nodes (In our example, the "child" node is both a parent (to the "another_child" element) and child (to the "document" element)).
Another commonly used XML parsing method is SAX (Simple API for XML). SAX-compliant parsers are event-based and have a data-centric view of XML documents, meaning that they focus on the data parts of the document, and not its structure. SAX parsers process an XML document from top to bottom and fire events when a specific condition is encountered, such as the start of an element, the end of an element, the starting of character data, etc. For SAX parsers to work, your application/script must implement callback functions to "catch" the events fired by the parser and handle them accordingly.
Because SAX-compliant XML parsers don't load the complete XML document into memory beforehand, they are extremely fast and efficient. SAX-compliant XML parsers are ideal when working with huge XML files or streams of continuous XML data, because there is no limit on what that documents size can be.
Processing an XML document with a DOM-compliant XML parser is extremely easy compared to using a SAX-compliant parser, and this will be the implementation method described throughout this article. If you want more information on how to use Expat (A PHP-compatible SAX-compliant XML parsing engine written in C), then take a look at http://www.zend.com/zend/art/parsing.php.
Making sure DOMXML is installed
Before we can continue with this article, we must make sure that we have the DOMXML parser installed and configured on our Apache web server.
For Unix/Linux Users:
DOMXML is bundled with the latest version of PHP, however it isn't enabled by default. To enable it, you'll need to change into the directory where PHP is installed and re-configure PHP using the following commands:
Remember to change "../apache_1.3.12" to the directory where you installed Apache (You can find this directory with the "where apache" command).
For Windows NT/2000 Users:
You’ll need to install the DOMXML functionality add-on for Apache, which is available here. The zip file contains three files: libxml2.dll and iconb.dll need to be extracted to the \winnt\system32 directory, and php_domxml.dll needs to be extracted to your extensions directory, which is c:\php\extensions by default.
Next, you'll need to modify your PHP.INI file so that it loads the extension DLL automatically. You'll find your PHP.INI file in the \winnt directory by default. Firstly, you'll want to make sure your extensions directory is set correctly. Look for the line starting "extension_dir = " and make sure it's set to "c:\php\extensions". The value should be surrounded by double quotes.
Next, look for the line starting ";extension=php_domxml.dll" and take away the semi-colon from the front; the semi-colon acts as a comment in the INI file. Once you restart Apache, the DOMXML DLL will load into memory automatically, giving you programmatic access to it from within PHP.
Now that we know what DOMXML is and have actually configured/installed it, let's get to work and begin with some simple XML parsing!