HTML Comes of Age: XHTML - Rocky Upgrade Path (Page 5 of 7 )
The recommendation and associated documentation include descriptions of a number of ways that XHTML differs from HTML, arising because of the looseness allowed by early HTML specifications, the relative sloppiness allowed by most browsers when rendering HTML, and from the rigor required by XML.
An XHTML document must be structured properly, and elements that HTML doesn't require will cause an error in an XHTML document. The root element of an XHTML 1.0 document must be <html> and must designate the XHTML 1.0 namespace. The <head> and <body> elements cannot be omitted, and the <title> element must be the first element in the <head> element.
XHTML documents must be well-formed, strictly complying with syntax rules. This means that tags must be nested properly and all tags must have closing tags or written in a special form that combines the opening and closing tag. Element and attribute names must be lower case. XML is case-sensitive, and the XHTML DTDs are written in lower case.
User-defined attribute values, however, can be in any case. All attribute values, including those that appear to be numeric, must be quoted in single or double quotes:
rather than the form acceptable in HTML:
Empty elements must either have an end tag, or the start tag must end with />. This is sometimes called a self-terminating element. For example, elements can be written in either of the following ways. The first version is called the minimized tag syntax, and is generally preferred over paired tags that have no content between them. In the first form, placing a space before the / will make the form usable in some older browsers.
<hr /> <hr></hr>
All elements other than those declared as EMPTY in the DTD must have an end tag.
Elements must also be properly nested, so that closing tags must be in reverse order of the opening tags. For example, this code works in HTML:
<p><i>An italicized paragraph</p></i>
but will be unacceptable in XHTML because of the reversed closing tags. Instead, the following code conforms to the XHTML standard, because the tags are properly nested:
<p><i>An italicized paragraph</i><p>
An attribute is called minimized when there is only one value for it. For example, in the form element
<input type="checkbox" ... checked>
the attribute 'checked' has been minimized. Because XML does not support attribute minimization, in XHTML 1.0 attribute-value pairs cannot be minimized and must be written in full, as if they had multiple values.
<input type="checkbox" ... checked="checked" />
Different browsers handle white space characters, such as a line break, differently. When white space is used in attribute values, browsers strip leading and trailing white space and map sequences of white space characters to the ASCII space character. So you should avoid line breaks and multiple white space characters within attribute values.
Because any < and & characters are considered parts of tags in XHTML, any script and style tag sections must be wrapped in a CDATA section to ignore characters that would normally be considered markup. The only delimiter that is recognized in a CDATA section is the "]]>" string that ends the section. You can also use external script and style documents to solve the problem.
Comments pose another problem. XML is not required to preserve comments in the body of a document, so you can no longer hide script code from the HTML parser by enclosing them in comments. XHTML will parse the document and throw away the comments before processing it. This is actually a good thing, because it has become too much of a catch all to hide every new feature in a Web page from browsers that can't understand it. Instead, wrap the script in a CDATA tag like this:
<script> <[CDATA[ comment/script goes here ]]> </script>
id and name attributes are used as fragment identifiers so that you can identify a tag and the fragment of code or content in a document. But XML recognizes only the id attribute. Use both id and name if you need to, but name has been formally deprecated, so you can't count on it appearing in future versions of the specification.
Nesting of elements in a document also are much tighter than in HTML. Table 2 lists some of the prohibitions.
Table 2: XHTML Element Prohibitions.
<a> cannot contain other <a> elements.
<pre> cannot contain the <img>, <object>, <big>, <small>, <sub>, or <sup> elements.
<button> cannot contain the <input>, <select>, <textarea>, <label>, <button>, <form>, <fieldset>, <iframe>, or <isindex> elements.
<label> cannot contain other <label> elements.
<form> cannot contain other <form> elements.
There are a lot of benefits to tightening up the markup code in a Web page. The parsing engines in browsers will be able to be much trimmer. Parsers now have way too much fat from having to deal with sloppy HTML code, defining how a particular browser will handle undefined situations. Best of all, either an XHTML document will work or it won't, and you'll know why. You may lose some of the tricks you've learned to force HTML into submission, but you'll also be a far more productive and precise developer.