Welcome to the third part of a three-part series on RELAX NG. In this part, we will discuss datatypes, the grammar element, and creating named patterns. That's a lot of ground to cover, so let's get started.
In the last two articles, if an element did not contain child elements, then it was either empty or it contained plain text. This works fine for some situations. For example, there's no other practical way to represent cities and states. However, in other situations, the text pattern just doesn't work. For example, the date attribute accepts any sort of text. We specified a date in the form YYYY-MM-DD, like this:
<person date="2008-06-30">
However, there's nothing to stop a person from putting the date in the form MM-DD-YYYY:
<person date="06-30-2008">
Nor is there anything to stop a user from doing something like this:
<person date="a week or so ago">
The document will still validate. Clearly, the text pattern is far too loose for a large number of situations, such as when representing a date. Thankfully, however, RELAX NG supports datatypes to remedy this situation. Instead of using the text pattern, it's possible to use stricter datatypes. RELAX NG itself actually only defines two datatypes, string and token. The string datatype takes strings as-is for comparison, without reducing whitespace, while the token datatype reduces whitespace.
These two datatypes will only get you so far, though, and we still can't represent our date properly. However, in most implementations of RELAX NG, you can actually use XML Schema datatypes. This ability provides access to a number of datatypes beyond just string and token, and it will allow us, at the very least, to represent the date properly.
The datatypes first need to be specified as a datatype library. This way, RELAX NG knows what datatypes it is working with. Using the XML syntax, this is done by setting the datatypeLibrary attribute. The value of this attribute takes the form of a URI. In the case of the XML Schema datatypes, this URI is “http://www.w3.org/2001/XMLSchema-datatypes.”
Every element in the schema has this attribute, and if set, the element and any of its children will use the corresponding datatype library, unless its children specify something different. So, the value of the attribute is actually inherited. We can go ahead and set this attribute on the root element of the XML schema:
Using the compact syntax, datatype libraries are identified using a prefix. The xsd prefix should already be set up for you, but if you wanted to set a prefix up yourself, you'd insert a line like this at the top of the schema definition:
Now the datatypes are ready to be used. To make use of datatypes using the XML syntax, the data element is used. This element will match the datatype specified by the type attribute. To represent the value of the person element's date attribute properly, we'd use the date datatype, which calls for a date in the form YYYY-MM-DD. Using the data element, the date attribute would be set up in the schema like this:
Using the compact syntax, the process is even simpler. In the compact syntax, the datatype is specified inside of an element's or attribute's brackets. So, only one word has to be changed in the compact schema: