XML is gaining acceptance today, not because it is a great technology looking for a problem, but because today's problems require its flexibility and simplicity. In this article Doug talks about how XML can be used to accomodate human-authored content. He also discusses structured and unstructured data as well as tips for designing XML DTD's and more.
XML Unlocks Information - Tips for Designing an XML DTD or Schema (Page 3 of 5 )
XML transfers information between two parties, whether human or machine. Just as two people must know the same language, both parties must speak the same XML dialect. The dialect, defined in the DTD (data type definition) or schema, is the vocabulary and grammar used to describe the information being transferred.
The producer and the processor of XML information must share a common DTD or schema. Because the DTD or schema is vital to the success of XML, this article provides guidelines for designing a DTD or schema. Even if you are not designing a DTD or schema, it is worthwhile to understand the rationale behind their design, since it is the structure of XML data that gives it meaning. This structure changes a random sequence of unintelligible words to speech, that is, it transforms data to information.
When designing a DTD or schema for XML data, analyze the nature of the data and how it is created and processed. Consider how data is stored in a relational database, with a clearly defined structure of records, fields and tables.
Before you begin your design, decide whether to store data as the value of an attribute or as a text element (even if numeric) within tags. Generally, it is better to store data in elements, as this approach is more flexible when used with XSL. (XSL is a specification for transforming XML to HTML or some other XML structure.)
Be careful not to design solely from a developer's perspective. Consider who produces the XML data. If it is produced and processed programmatically, a developer-friendly perspective is appropriate. In fact, XML for B2B transactions should be designed from this perspective to generate fast, reliable and efficient transfer of information. However, if a human will author or read the XML data, consider those needs when designing a DTD or schema.
Elements and Attributes An attribute is the name-value pair that immediately follows a tag name. An element is a tag along with its attributes and all the text and elements that it encloses. Elements within another element are called child elements. Consider the following example.
<tag_name attr_name1="value1" attr_name2="value2"> <child_tag attr_name3="value3" /> <child_with_text>This is some text</child_with_text> This text is part of the tag_name element </tag_name>
As seen in this illustration, the tags are tag_name, child_tag and child_with_text. The attribute attr_name1 has a value of "value1". The element, tag_name, consists of the following attributes and child elements:
Attributes: attr_name1, attr_name2
Child Elements: child_tag, child_with_text
Note that, in XML, every attribute value must be quoted with single (') or double quotes ("). Also, every tag must have a closing tag or end with "/>". Since the child_tag element has no child elements or text, the tag ends with "/>" instead of a closing tag, for example, "</child_tag>".
Michael C. Daconta, in his article "Are Elements and Attributes Interchangeable?" (XML Journal volume 2 issue 7, page 42), presents eight practical rules for deciding whether to use elements or attributes. Some rules depend on whether the design is implemented in a DTD or schema. DTDs cannot enforce constraints between attributes and elements as extensively as schemas can. As a result, the decision to use an attribute may depend on whether a value is constrained.
Elements vs. Attributes with Semi-Structured Documents When creating semi-structured data and content-oriented documents, place human-readable text in elements, not attributes. This is because attributes are part of the structure, not the content. If you can separate structure from content, you can extract content without tags while retaining the human-readable information.
Text within an element should be considered "viewable." Attribute values, on the other hand, are either invisible or rendered in some other way by a graphical object. Use attribute values to modify or further identify specific elements.
<prompt type="boolean">Do you want the information? <choice value="true">Yes, please send the information</choice> <choice value="false">Don't send me the information</choice> </prompt>
<photo width="x" height="y" src="URL" alt="Text if photo not rendered or on mouse-over"> This is the caption for the photo. </photo>
If you follow this rule, the value-of XSL tag or nodeValue property in the XML DOM (or text property in the Microsoft XML DOM) can easily recondition the content for publication on an unformatted device, as illustrated below.