XML Basics - Part One - DTD, Schema and Valid XML
(Page 4 of 5 )
Remember earlier on when I mentioned the term “grammar” when talking about XML? Well, that’s what “grammar” is for. As each human language has a set of rules defining how the atoms are grouped to form a phrase, XML allows us to define rules upon how an XML document should be created.
“Why rules?” the sceptics ask. “Isn’t XML meant to be a meta language where we define our own tags and make our own rules?” True – XML allows you to define your own tags, your own “rules” and your own “language”, and it doesn’t enforce you to assign a “grammar” to it whatsoever – as stated before, both pieces of XML presented above are perfectly valid, and even more so, you can find all sorts of other variations to represent the same logical entity. However, when you want your XML document to be “read” and “understood” by a third party, this third party will need to know about your XML “language” in order to make sense of it. Going back to the example with human languages, for somebody who doesn’t speak English, the phrase “the quick brown fox jumps over the lazy dog” will be absolutely meaningless and as far as s/he is concerned, might not even be in English! Put an English dictionary and an English grammar book next to it (err, ok, and a few years of practice) and it is all perfectly understandable – even more, put a Windows logo next to it and the person will actually know you’re talking about the Fonts applet in the Control Panel! Same thing happens with XML: feed some XML tags in an application, and if you do the magic dance around it at midnight, it might actually understand the logical contents of your document; however, describe the contents of this XML document and the dance more than likely will not be needed (could this be the reason why there are so few programmers going to the disco-clubs nowadays? :O)
There are two ways to describe an XML “grammar”: through a DTD (Document Type Definition) and through an XML Schema. We’ll have a look briefly at each one of them shortly. Before that though, let’s look at one simple thing: take the previous example and wrap it all up into one single XML document like this:
<?xml version="1.0" encoding="UTF-8"?>
<Employee>
<Name>Liviu</Name>
<Surname>Tudor</Surname>
<Birthday>14/02/1975</Birthday>
</Employee>
This represents a valid XML document. If we have a look at this now:
<?xml version="1.0" encoding="UTF-8"?>
<Employee>
<Name>Liviu
<Surname>Tudor
<Birthday>14/02/1975
</Employee>
It “nearly” looks the same as the previous one – however, this second form is not a valid XML document? Therefore, such XML will be rejected by any XML-enabled application as “invalid data” or such.
It is then about time we introduce the term “valid XML”: this is a document that follows the XML convention:
- it has one single root element
- tags are not interlaced
- And every opening tag has an ending tag pair.
A document can “look” like an XML document, but not be a valid XML document – if you consider HTML (which is full of such tags that do not have an ending tag – for example <p>), even though that it is a meta language built on tags, it does not resemble a valid XML document (there is though nowadays something called XHTML which combines both XML and HTML. From my experience, websites are now starting to transition over to XHTML). Even more, a document can have pairs of open/close tags, and still not be considered a valid XML document – for example:
<?xml version="1.0" encoding="UTF-8"?>
<Employee>
<Name>Liviu</Name>
<Surname>Tudor<Birthday>14/02/1975</Surname></Birthday>
</Employee>
Now, assuming we have a valid XML structure, how do we know that the actual data represented by the XML document is logically correct? For example, how do we know that this structure:
<?xml version="1.0" encoding="UTF-8"?>
<Employee>
<Name>Liviu</Name>
<Surname>Tudor</Surname>
<Birthday>14/02/1975</Birthday>
</Employee>
is logically correct and the application that produced the XML didn’t intend to “say” instead something like this:
<?xml version="1.0" encoding="UTF-8"?>
<Employee Name=”Liviu” Surname=”Tudor”>
<Date Birthday=”14/02/1975” />
</Employee>
Even more, as pointed out earlier as well, how do we know that the Birthday is a correct date? We can only tell this based on the XML “grammar” – be it a DTD or an XML Schema.
Next: DTD >>
More XML Articles
More By Liviu Tudor