Home arrow XML arrow Page 4 - XML Basics - Part One

XML Basics - Part One

XMLDo you cringe when you hear the words "XML"?  Are you just not sure what the heck this acronym is all about?  Not to worry!  In this article, Liviu introduces us to XML, the various ways it can be created and parsed, as well as a brief introduction to XML and the way it came to be.

Author Info:
By: Liviu Tudor
Rating: 5 stars5 stars5 stars5 stars5 stars / 84
December 15, 2003
  1. · XML Basics - Part One
  2. · Brief History of Data Exchange
  3. · The Need for XML
  4. · DTD, Schema and Valid XML
  5. · DTD

print this article

XML Basics - Part One - DTD, Schema and Valid XML
(Page 4 of 5 )

Remember earlier on when I mentioned the term “grammar” when talking about XML? Well, that’s what “grammar” is for.  As each human language has a set of rules defining how the atoms are grouped to form a phrase, XML allows us to define rules upon how an XML document should be created.

“Why rules?” the sceptics ask.  “Isn’t XML meant to be a meta language where we define our own tags and make our own rules?”  True – XML allows you to define your own tags, your own “rules” and your own “language”, and it doesn’t enforce you to assign a “grammar” to it whatsoever – as stated before, both pieces of XML presented above are perfectly valid, and even more so, you can find all sorts of other variations to represent the same logical entity. However, when you want your XML document to be “read” and “understood” by a third party, this third party will need to know about your XML “language” in order to make sense of it. Going back to the example with human languages, for somebody who doesn’t speak English, the phrase “the quick brown fox jumps over the lazy dog” will be absolutely meaningless and as far as s/he is concerned, might not even be in English! Put an English dictionary and an English grammar book next to it (err, ok, and a few years of practice) and it is all perfectly understandable – even more, put a Windows logo next to it and the person will actually know you’re talking about the Fonts applet in the Control Panel!  Same thing happens with XML: feed some XML tags in an application, and if you do the magic dance around it at midnight, it might actually understand the logical contents of your document; however, describe the contents of this XML document and the dance more than likely will not be needed (could this be the reason why there are so few programmers going to the disco-clubs nowadays? :O)

There are two ways to describe an XML “grammar”: through a DTD (Document Type Definition) and through an XML Schema. We’ll have a look briefly at each one of them shortly. Before that though, let’s look at one simple thing: take the previous example and wrap it all up into one single XML document like this:

<?xml version="1.0" encoding="UTF-8"?>

This represents a valid XML document.  If we have a look at this now:

<?xml version="1.0" encoding="UTF-8"?>

It “nearly” looks the same as the previous one – however, this second form is not a valid XML document? Therefore, such XML will be rejected by any XML-enabled application as “invalid data” or such.

It is then about time we introduce the term “valid XML”: this is a document that follows the XML convention:

  • it has one single root element
  • tags are not interlaced
  • And every opening tag has an ending tag pair.

A document can “look” like an XML document, but not be a valid XML document – if you consider HTML (which is full of such tags that do not have an ending tag – for example <p>), even though that it is a meta language built on tags, it does not resemble a valid XML document (there is though nowadays something called XHTML which combines both XML and HTML.  From my experience, websites are now starting to transition over to XHTML). Even more, a document can have pairs of open/close tags, and still not be considered a valid XML document – for example:

<?xml version="1.0" encoding="UTF-8"?>

Now, assuming we have a valid XML structure, how do we know that the actual data represented by the XML document is logically correct? For example, how do we know that this structure:

<?xml version="1.0" encoding="UTF-8"?>

is logically correct and the application that produced the XML didn’t intend to “say” instead something like this:

<?xml version="1.0" encoding="UTF-8"?>
<Employee Name=”Liviu” Surname=”Tudor”>
 <Date Birthday=”14/02/1975” />

Even more, as pointed out earlier as well, how do we know that the Birthday is a correct date? We can only tell this based on the XML “grammar” – be it a DTD or an XML Schema.

Next: DTD >>

blog comments powered by Disqus

- Open XML Finally Supported by MS Office
- XML Features Added to Two Systems
- Using Regions with XSL Formatting Objects
- Using XSL Formatting Objects
- More Schematron Features
- Schematron Patterns and Validation
- Using Schematron
- Datatypes and More in RELAX NG
- Providing Options in RELAX NG
- An Introduction to RELAX NG
- Path, Predicates, and XQuery
- Using Predicates with XQuery
- Navigating Input Documents Using Paths
- XML Basics
- Introduction to XPath

Watch our Tech Videos 
Dev Articles Forums 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us 
Weekly Newsletter
Developer Updates  
Free Website Content 
Contact Us 
Site Map 
Privacy Policy 

Developer Shed Affiliates


© 2003-2019 by Developer Shed. All rights reserved. DS Cluster - Follow our Sitemap
Popular Web Development Topics
All Web Development Tutorials