An Introduction to XML - A Basic XML Example (Page 3 of 4 )
Letís get going with a basic example of separating content from formatting with XML. In this example, we'll use an XML file to store headlines and associated news items, and an HTML page to present them. As time passes, more and more dedicated XML editors are appearing on the market. This is all very well and good if you have the time to sit down and learn how to use those as well as learning how to use XML itself. For the purpose of this article, I have produced everything using just Notepad. Until you are a little more familiar with XML itself (for the duration of this article at least!), I suggest you do the same.
In Notepad then, on a blank document, type the following line of code:
<?xml version="1.0" standalone="yes"?>
This is what is known as the XML declaration and should be present in all XML documents whether it is well-formed or valid. It must also be in lowercase. The version attribute is not strictly necessary as documents are processed as version 1.0 by default anyway, it is just good practice to include it.
In addition to this there are some other optional attributes that can be used here such as the character encoding, which by default is UTF-8, and the stand-alone attribute, which by default is set to no. In this example, we'll be creating an internal DTD (although we won't actually use it) so this attribute can be safely set to 'yes'. For info, if you do decide to use the encoding attribute, it must come directly after the version attribute.
As we are creating a news repository, we'll need somewhere to store the news headlines and somewhere to store the actual news. XML documents consist of a tree hierarchy of nested elements. The top, or document element contains all other elements and is known as the root element. The logical order and structure of the elements in the document are defined in the DTD and must be conformed to. The DTD begins with a Document type statement that either references an external DTD file, or paves the way for an internal one. If we wanted to reference an external DTD, the Document type statement would be as follows:
<!DOCTYPE News SYSTEM "my.dtd">
This tells the parser the name and location of the DTD. Our Document Type Definition is internal however, so we will use the following statement:
<!DOCTYPE News [
This statement names the DTD as News. Instead of the SYSTEM attribute, you use one open square bracket, which we'll close at the end of the DTD. Now we need to name the elements that will appear in our document, in the order that they'll appear:
<!ELEMENT News (article)>
This statement defines News as the root element of the document; this element will act as a container for all other elements of the document. You name elements that are child elements in brackets after naming the root element. In this case our News element will contain one child element - the article element. We now need to define the article element:
<!ELEMENT Article (Headline+, story+)>
Again, this declares the article element as having two child elements - headline and story which each must appear one or more times (the plus sign denotes this). The order that the elements are declared in is the order in which they must appear in your document. The comma denotes that they must bother appear; if we were to substitute the comma for a pipe symbol (|), it would mean that either element could appear but not both. Now define the headline and story elements:
<!ELEMENT Headline (#PCDATA)> <!ELEMENT Story (#PCDATA)>
The #PCDATA means that the elements will contain Parsed Character data, meaning that all text within these elements will be parsed and cannot therefore contain any mark-up characters (like square brackets for instance.) If you want to be able to include mark-up data in your elements, you need to define the elements as #CDATA which stands for Character data.
As a demonstration of other statements that can sometimes appear in DTD's, we'll define an attribute. Attributes are used to store information about an element without making it part of the element, to provide information to the parser for example. We'll give our article element a unique reference number using an attribute. It won't be used in this example but attributes can be very useful and are widely used. Add the following code:
<!ATTLIST Article ArticleNumber ID #REQUIRED>
This sets ArticleNumber as an attribute of the Article element, of an ID type. The #REQUIRED means it is compulsory. If you wanted the attribute to be discretionary, you would add the keyword #IMPLIED to the end of the statement.
This is all our DTD needs to include, you'll find that there are many more keywords that can be used in DTD's but this is another article in itself. Mark the end of the DTD with the closing square and angle brackets: