Home arrow XML arrow Page 2 - XML Unlocks Information

XML Unlocks Information

XML is gaining acceptance today, not because it is a great technology looking for a problem, but because today's problems require its flexibility and simplicity. In this article Doug talks about how XML can be used to accomodate human-authored content. He also discusses structured and unstructured data as well as tips for designing XML DTD's and more.

Author Info:
By: Doug Domeny
Rating: 5 stars5 stars5 stars5 stars5 stars / 4
May 13, 2002
  1. · XML Unlocks Information
  2. · How XML Accommodates Human-Authored Content
  3. · Tips for Designing an XML DTD or Schema
  4. · Elements vs. Attributes with Database Oriented Data
  5. · Conclusion

print this article

XML Unlocks Information - How XML Accommodates Human-Authored Content
(Page 2 of 5 )

While highly structured data is independent of the style used to present it, unstructured data is full of style and format. Contrast plain text (no style) with rich text (full of style).

Text documents meant for human authoring and reading have design needs that only XML can address. Examples of semi-structured documents include catalogs, press releases, news reports, and technical documentation. Even highly structured data becomes semi-structured if it includes comments, descriptions, or instructions meant to be read by people.

XML supports the development of semi-structured documents that contain both relational meta data (the structure) and free-form (unstructured) formatted text. The meta data (that is, the XML tags) meets the programmatic need for structure. Without meta data, a computer program cannot understand the content. Formatted text meets the human and business need to express richly styled content. Without style, the content is dry and unattractive.

The paragraph you are reading now is an example of formatted text. Most document editors display content (unstructured data) as WYSIWYG (what you see is what you get). For a business user to comfortably create semi-structured textual documents, a document editor must allow the author to add style to the text.

Variations of Structured and Unstructured Data

Two kinds of semi-structured data exist between highly structured and unstructured data:
  • highly structured data
  • structured data with unstructured elements
  • unstructured documents with tagged meta data
  • unstructured documents
Structured data with unstructured elements is commonly used in web forms, where most fields are tightly constrained (for example, "State" must be selected from a list and "ZIP" must be all digits), yet a 'comment' field is available for human-readable content.

For example,

<name>Deluxe Widget</name>
<listprice units="usd">$19.95</listprice>
This <em>deluxe <strong>gold</strong> plated</em> product fits most attachments.

For this kind of document, use a DTD or schema to validate the structure, and include an unstructured element (for example, description) that allows both text and tags. In a DTD, this element would typically be defined as

<!ELEMENT description ANY>

Unstructured documents with tagged meta data are less common but offer the best promise for content that can be effectively searched. HTML provides some meta tags, like <ADDRESS> and <CODE>, but XML provides the flexibility to create custom tags.


<owner studentid="2456">Jim Smith</owner> owns a <automobile model="OCC96">Cutlass Ciera</automobile>.
<my:conditional value="birds">
<my:author>Joe Kluck</my:author> in his article <my:title type="article">Why Chicken have Wings</my:title> <my:bibliography>(<my:source><my:periodical>Poultry Monthly</my:periodical> <my:issue>September 2001</my:issue></my:source>, page <my:page>9</my:page>)</my:bibliography> dispels the usual stereotypes of flightless birds."

This kind of document must be well formed to allow processing by an XML parser but is usually not validated against a DTD or schema. For such a document, XHTML is a natural choice because it is well formed, has extensive formatting capability, and custom XML tags can be added without causing display problems in browsers. Note the namespace "my" was used to distinguish the custom XML tags from standard HTML tags.
blog comments powered by Disqus

- Open XML Finally Supported by MS Office
- XML Features Added to Two Systems
- Using Regions with XSL Formatting Objects
- Using XSL Formatting Objects
- More Schematron Features
- Schematron Patterns and Validation
- Using Schematron
- Datatypes and More in RELAX NG
- Providing Options in RELAX NG
- An Introduction to RELAX NG
- Path, Predicates, and XQuery
- Using Predicates with XQuery
- Navigating Input Documents Using Paths
- XML Basics
- Introduction to XPath

Watch our Tech Videos 
Dev Articles Forums 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us 
Weekly Newsletter
Developer Updates  
Free Website Content 
Contact Us 
Site Map 
Privacy Policy 

Developer Shed Affiliates


© 2003-2019 by Developer Shed. All rights reserved. DS Cluster - Follow our Sitemap
Popular Web Development Topics
All Web Development Tutorials