Home arrow XML arrow Page 5 - XML Basics - Part One
XML

XML Basics - Part One


XMLDo you cringe when you hear the words "XML"?  Are you just not sure what the heck this acronym is all about?  Not to worry!  In this article, Liviu introduces us to XML, the various ways it can be created and parsed, as well as a brief introduction to XML and the way it came to be.

Author Info:
By: Liviu Tudor
Rating: 5 stars5 stars5 stars5 stars5 stars / 84
December 15, 2003
TABLE OF CONTENTS:
  1. · XML Basics - Part One
  2. · Brief History of Data Exchange
  3. · The Need for XML
  4. · DTD, Schema and Valid XML
  5. · DTD

print this article
SEARCH DEVARTICLES

XML Basics - Part One - DTD
(Page 5 of 5 )

DTDs, or Document Type Definitions, are a means of establishing an XML “grammar”. They rely on a specialized syntax for describing the structure of an XML set of documents. (This is actually one of the XML community’s biggest complaints about the DTD, as it doesn’t make too much sense to learn another syntax for describing the contents of an XML document, when XML itself can be used for this, as we will find in the next section about schemas.) Although the DTD structure is compact, the way it actually describes the XML contents is rather cryptical and it has its limitations and drawbacks. Still, it’s not a hard task to learn the DTD syntax and create a DTD for simple XML documents (as we will see shortly, for documents that contain complex data structures, it is recommended to use XML Schemas). Let’s try to describe now the contents of an XML document containing data about a company’s employees; we assume each employee data will be stored in a separate XML document, and an employee should have (at least) the following properties:

  • Name
  • Surname
  • Date of birth
  • Email
  • Phone
  • Home address

If we take the approach of using attributes to describe an employee, then our DTD will look like this:

<!ELEMENT Employee EMPTY>
<!ATTLIST Employee
          name  CDATA #REQUIRED
          surname CDATA #REQUIRED
          dob  CDATA #REQUIRED
          email  CDATA #IMPLIED
          phone  CDATA #IMPLIED
          address CDATA #IMPLIED>

This defines a structure that has three mandatory attributes (marked with “#REQUIRED” and three optional ones – we have to give our employees the right to keep their home number secret in order to avoid being called into the office late at night ;). All attributes are declared as “character data” (CDATA) – therefore we will expect strings in these fields.

A short explanation is needed here: there are two types of “character data” in XML: parsed and un-parsed. The un-parsed character data are represented by a CDATA tag – in such case, data arrives exactly as it is stored in the XML document.  For example, for a value of “CDATA&amp;NOT CDATA” for one of the attributes, the corresponding string will be “CDATA&amp;NOT CDATA”. Parsed character data – or PCDATA – on the other hand allows us to include escape characters in the attribute values; these escapes characters will be parsed and translated into the corresponding characters which will be passed back to the application. For those of you who have done some HTML coding, the following sequences will make sense: &amp; &quot; etc.; such sequences correspond to & (ampersand), (quote) and so on – that’s exactly what will happen with PCDATA sections. In the case of the above-mentioned string, “CDATA&amp;NOT CDATA”  the final string passed to the application will be “CDATA&NOT CDATA”.

Let’s have a look at what an XML document conforming to the above document would look like:

<?xml version="1.0" encoding="UTF-8"?>
<Employee name="Liviu" surname="Tudor" dob="14/02/1975" />

This is allowed as only these three attributes are mandatory, but also, the following structure is valid and conforming to the DTD as well:

<?xml version="1.0" encoding="UTF-8"?>
<Employee name="Liviu" surname="Tudor" dob="14/02/1975"
email=”user@domain.com” address=”Coocooland”/>

However, in neither of these composites, have we specified which DTD this XML is conforming to!  So, how would an application know where to look for the DTD?  Well, it doesn’t! We have to specify in the actual XML document where the DTD can be found. To do this, we have two options:

  • Embedding the actual DTD in the XML body
  • Creating an external DTD and reference it in our XML document

The first option is easy and is suitable for small DTDs, in the case of XMLs that are rarely generated and used. The XML above will be transformed as follows:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE Employee [
<!ELEMENT Employee EMPTY>
<!ATTLIST Employee
          name  CDATA #IMPLIED
          surname CDATA #IMPLIED
          dob  CDATA #IMPLIED
          email  CDATA #IMPLIED
          phone  CDATA #IMPLIED
          address CDATA #IMPLIED>
]>
<Employee name="Liviu" surname="Tudor" dob="14/02/1975"
email=”user@domain.com” address=”Coocooland”/>

The second one is just as easy and it is indicated in the case where either the DTD is substantial in size/complexity, or it is known that at one time there will be more than one document to be parsed which will be using this DTD (so the parsers can cache it) – or both! (Of course, these are just a few of the considerations that should be kept in mind when isolating the DTD from the XML document or embedding it within. There are cases where practice dictates otherwise, however, from my experience, they work as a general rule of thumb.)

XML Document:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE Employee SYSTEM “employee.dtd”>
<Employee name="Liviu" surname="Tudor" dob="14/02/1975"
email=”user@domain.com” address=”Coocooland”/>

DTD File (employee.dtd)
<!ELEMENT Employee EMPTY>
<!ATTLIST Employee
          name  CDATA #IMPLIED
          surname CDATA #IMPLIED
          dob  CDATA #IMPLIED
          email  CDATA #IMPLIED
          phone  CDATA #IMPLIED
          address CDATA #IMPLIED>

As you may have noticed in the above example, we have placed a reference to the DTD file name.  The good news is that this doesn’t have to be a path to the file system – it can be a URL to a well-known location on the Internet (or intranet) to a DTD that has been, for example, defined by an international body and to which your document has to adhere.

In fact, once the XML/DTD paradigm came out there were a lot of companies from the same field teaming up to construct DTDs for different industries – in order to allow integration between different computer systems for that specific industry; these DTDs, once established, were (and are still) published in a “well-known” location on the web and, therefore, your document only had to reference the URL to the DTD, rather than distributing a DTD with your XML file all the time.

Now, knowing how to build up a DTD, and how to reference it in our XML document doesn’t automatically make our document perfect.  An XML document can be valid (it can respect the XML rules about constructing tags and so on), but it might not be well formed (that is, it might not adhere to the rules laid out in the DTD).  In order for an XML document to be “process-able” by another application, it has to be both valid and well formed.


DISCLAIMER: The content provided in this article is not warranted or guaranteed by Developer Shed, Inc. The content provided is intended for entertainment and/or educational purposes in order to introduce to the reader key ideas, concepts, and/or product reviews. As such it is incumbent upon the reader to employ real-world tactics for security and implementation of best practices. We are not liable for any negative consequences that may result from implementing any information covered in our articles or tutorials. If this is a hardware review, it is not recommended to open and/or modify your hardware.

blog comments powered by Disqus
XML ARTICLES

- Open XML Finally Supported by MS Office
- XML Features Added to Two Systems
- Using Regions with XSL Formatting Objects
- Using XSL Formatting Objects
- More Schematron Features
- Schematron Patterns and Validation
- Using Schematron
- Datatypes and More in RELAX NG
- Providing Options in RELAX NG
- An Introduction to RELAX NG
- Path, Predicates, and XQuery
- Using Predicates with XQuery
- Navigating Input Documents Using Paths
- XML Basics
- Introduction to XPath

Watch our Tech Videos 
Dev Articles Forums 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us 
Weekly Newsletter
 
Developer Updates  
Free Website Content 
Contact Us 
Site Map 
Privacy Policy 
Support 

Developer Shed Affiliates

 




© 2003-2017 by Developer Shed. All rights reserved. DS Cluster - Follow our Sitemap
Popular Web Development Topics
All Web Development Tutorials