Home arrow XML arrow Page 3 - Sample Chapter: Beginning XML
XML

Sample Chapter: Beginning XML


Still haven't gotten into XML but want to know how? In this article Tim takes a look at a sample chapter from Wrox's extremely popular title "Beginning XML". Beginning XML was written for those who know it's a good idea to learn XML but aren't exactly sure why and how they could use it to their advantage. This sample chapter talks about well-formed XML, attributes, empty elements, processing instructions and parsing XML.

Author Info:
By: Tim Pabst
Rating: 5 stars5 stars5 stars5 stars5 stars / 10
April 01, 2002
TABLE OF CONTENTS:
  1. · Sample Chapter: Beginning XML
  2. · Well-Formed XML
  3. · Attributes
  4. · Comments
  5. · Empty Elements
  6. · Processing Instructions
  7. · Parsing XML

print this article
SEARCH DEVARTICLES

Sample Chapter: Beginning XML - Attributes
(Page 3 of 7 )

In addition to tags and elements, XML documents can also include attributes.

Attributes are simple name/value pairs associated with an element.

They are attached to the start-tag, as shown below, but not to the end-tag:

<name nickname='Shiny John'>

<first>John</first>

<middle>Fitzgerald Johansen</middle>

<last>Doe</last>

</name>


Attributes must have values even if that value is just an empty string (like "") and those values must be in quotes. So the following, which is part of a common HTML tag, is not legal in XML:

<INPUT checked>

and neither is this:

<INPUT checked=true>

Either single quotes or double quotes are fine, but they have to match. For example, to make this well-formed XML, you can use one of these:

<INPUT checked='true'>

<INPUT checked="true">


but you can't use:

<INPUT checked="true'>

Because either single or double quotes are allowed, it's easy to include quote characters in your attribute values, like "John's nickname" or 'I said "hi" to him'. You just have to be careful not to accidentally close your attribute, like 'John's nickname'; if an XML parser sees an attribute value like this, it will think you're closing the value at the second single quote, and will raise an error when it sees the "s" which comes right after it.

The same rules apply to naming attributes as apply to naming elements: names are case sensitive, can't start with "xml", and so on. Also, you can't have more than one attribute with the same name on an element. So if we create an XML document like this:

<bad att="1" att="2"></bad>

we will get the following error in IE5:



Try It Out Adding Attributes to Al's CD

With all of the information we recorded about our CD in our earlier Try It Out, we forgot to include the CD's serial number, or the length of the disc. Let's add some attributes, so that our hypothetical CD Player application can easily find this information out.
  1. Open your cd.xml file created earlier, and resave it to your hard drive as cd2.xml.
  2. With our new-found attributes knowledge, add two attributes to the <CD> element, like this:

    <CD serial=B6B41B

    disc-length='36:55'>

    <artist>"Weird Al" Yankovic</artist>

    <title>Dare to be Stupid</title>

    <genre>parody</genre>

    <date-released>1990</date-released>

    <song>

    <title>Like A Surgeon</title>

    <length>

    <minutes>3</minutes>

    <seconds>33</seconds>

    </length>

    <parody>

    <title>Like A Virgin</title>

    <artist>Madonna</artist>

    </parody>

    </song>

    <song>

    <title>Dare to be Stupid</title>

    <length>

    <minutes>3</minutes>

    <seconds>25</seconds>

    </length>

    <parody></parody>

    </song>

    </CD>

  3. If you typed in exactly what's written above, when you display it in IE5 it should look something like this:

  4. Now edit the first attribute, like this:

    <CD serial='B6B41B'

    disc-length='36:55'>
  5. Re-save the file, and view it in IE5. It will look something like this:

How It Works

Using attributes, we added some information about the CD's serial number and length to our document:

<CD serial=B6B41B

disc-length='36:55'>


When the XML parser got to the "=" character after the serial attribute, it expected an opening quotation mark, but instead it found a B. This is an error, and it caused the parser to stop and raise the error to the user.

So we changed our serial attribute declaration:

<CD serial='B6B41B'

and this time the browser displayed our XML correctly.

The information we added might be useful, for example, in the CD Player application we considered earlier. We could write our CD Player to use the serial number of a CD to load any previous settings the user may have previously saved (such as a custom play list).

Why Use Attributes?

There have been many debates in the XML community about whether attributes are really necessary, and if so, where they should be used. Here are some of the main points in that debate:

Attributes Can Provide Metadata that May Not be Relevant to Most Applications Dealing with Our XML

For example, if we know that some applications may care about a CD's serial number, but most won't, it may make sense to make it an attribute. This logically separates the data most applications will need from the data that most applications won't need.

In reality, there is no such thing as "pure metadata" all information is "data" to some application. Think about HTML; you could break the information in HTML into two types of data: the data to be shown to a human, and the data to be used by the web browser to format the human-readable data. From one standpoint, the data used to format the data would be metadata, but to the browser or the person writing the HTML, the metadata is the data. Therefore, attributes can make sense when we're separating one type of information from another.

What Do Attributes Buy Me that Elements Don't?

Can't elements do anything attributes can do?

In other words, on the face of it there's really no difference between:

<name nickname='Shiny John'></name>

and:

<name>

<nickname>Shiny John</nickname>

</name>


So why bother to pollute the language with two ways of doing the same thing?

The main reason that XML was invented was that SGML could do some great things, but it was too massively difficult to use without a fully-fledged SGML expert on hand. So one concept behind XML is a simpler, kinder, gentler SGML. For this reason, many people don't like attributes, because they add a complexity to the language that they feel isn't needed.

On the other hand, some people find attributes easier to use for example, they don't require nesting and you don't have to worry about crossed tags.

Why Use Elements, if Attributes Take Up So Much Less Space?

Wouldn't it save bandwidth to use attributes instead?

For example, if we were to rewrite our <name> document to use only attributes, it might look like this:

<name nickname='Shiny John' first='John' middle='Fitzgerald Johansen' last='Doe'></name>

Which takes up much less space than our earlier code using elements.

However, in systems where size is really an issue, it turns out that simple compression techniques would work much better than trying to optimize the XML. And because of the way compression works, you end up with almost the same file sizes regardless of whether attributes or elements are used.

Besides, when you try to optimize XML this way, you lose many of the benefits XML offers, such as readability and descriptive tag names. And there are cases where using elements allows more flexibility and scope for extension. For example, if we decided that first needed additional metadata in the future, it would be much simpler to modify our code if we'd used elements rather than attributes.

Why Use Attributes when Elements Look So Much Better? I Mean, Why Use Elements when Attributes Look So Much Better?

Many people have different opinions as to whether attributes or child elements "look better". In this case, it comes down to a matter of personal preference and style.

In fact, much of the attributes versus elements debate comes from personal preference. Many, but not all, of the arguments boil down to "I like the one better than the other". But since XML has both elements and attributes, and neither one is going to go away, you're free to use both. Choose whichever works best for your application, whichever looks better to you, or whichever you're most comfortable with.
blog comments powered by Disqus
XML ARTICLES

- Open XML Finally Supported by MS Office
- XML Features Added to Two Systems
- Using Regions with XSL Formatting Objects
- Using XSL Formatting Objects
- More Schematron Features
- Schematron Patterns and Validation
- Using Schematron
- Datatypes and More in RELAX NG
- Providing Options in RELAX NG
- An Introduction to RELAX NG
- Path, Predicates, and XQuery
- Using Predicates with XQuery
- Navigating Input Documents Using Paths
- XML Basics
- Introduction to XPath

Watch our Tech Videos 
Dev Articles Forums 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us 
Weekly Newsletter
 
Developer Updates  
Free Website Content 
Contact Us 
Site Map 
Privacy Policy 
Support 

Developer Shed Affiliates

 




© 2003-2017 by Developer Shed. All rights reserved. DS Cluster - Follow our Sitemap
Popular Web Development Topics
All Web Development Tutorials