When offering content on a website, many webmasters find that they must consider the context of that content. Will it be offered as HTML? A PDF document? Or do you want to let your visitors choose? If you choose the last option, how do you avoid having to redo the entire document by hand in each format? XML lets you generate context-specific representations of rich content sources through both modular construction and data transformation. This article is taken from chapter one of XML Publishing with AxKit by Kip Hampton (O'Reilly, 2004; ISBN 0596002165).
XML as a Publishing Technology - Publishing XML Content (Page 4 of 5 )
In the most general sense, delivering XML documents over the Web is much the same as serving any other type of document—a client application makes a request over a network to a server for a given resource, the server then interprets that request (URI, headers, content), returns the appropriate response (headers, content), and closes the connection. However, unlike serving HTML documents or MP3 files, the intended use for an XML document is not apparent from the format (or content type) itself. Further processing is usually required. For example, even though most modern web browsers offer a way to view XML documents, there is no way for the browser to know how to render your custom grammar visually. Simply presenting the literal markup or an expandable tree view of the document’s contents usually communicates nothing meaningful to the user. In short, the document must be transformed from the markup grammar that best fits your needs into the format that best fits the expectations of the requesting client.
This separation between the source content and the form in which it will be presented (and the need to transform one into the other) is the heart and soul of XML publishing. Not only does making a clear distinction between content and presentation allow you to use the grammar that best captures your content, it provides a clear and logical path toward reusing that content in novel ways without altering the data’s source. Suppose you want to publish the poems from the collection mentioned in the previous section as HTML. You simply transform the documents from the poemsfrag grammar into the grammar that an HTML browser expects. Later, if you decide that PDF or PostScript is the best way to deliver the content, you only need to change the way the source is transformed, not the source itself. Similarly, if your XML expresses more record-oriented data—generated from the result of an SQL query, for exam-ple—the separation between content and presentation offers a way to provide the data through a variety of interfaces just by changing the way the markup is transformed.
Although there are many ways to transform XML content, the most common is to pass the document—together with a stylesheet document—into a specialized processor that transforms or renders the data based on the rules set forth in the stylesheet. Extensible Stylesheet Language Transformations (XSLT) and Cascading Stylesheets (CSS) are two popular variations of this model. Putting aside features offered by various stylesheetbased transformative processors for later chapters, you still need to decide where the transformation is to take place.
In the client-side processing model, the remote application, typically a web browser, is responsible for transforming the requested XML document into the desired format. This is usually achieved by extracting the URL for the appropriate stylesheet from the href attribute of an xml-stylesheet processing instruction or link element contained in the document, followed by a separate request to the remote server to fetch that stylesheet. The stylesheet is then applied to the XML document using the client’s internal processor and, assuming no errors occur along the way, the result of the transformation is rendered in the browser. (See Figure 1-2.)
Figure 1-2.The client-side processing model
Using the client-side approach has several benefits. First, it is trivial to set up a web server to deliver XML documents in this manner—perhaps adding a few lines to the server’s mime.conf file to ensure that the proper content type is part of the outgoing response. Also, since the client handles all processing, no additional XML tools need to be installed and configured on the server. There is no additional performance hit over and above serving static HTML pages, since documents are offered up as is, without additional processing by the server.
Client-side processing also has weaknesses. It assumes that the user at the other end of the request has an appropriate browser installed that can process and render the data correctly. Years of working around browser idiosyncrasies have taught web developers not to rely too heavily on client-side processing. The stakes are higher when you expect the browser to be solely responsible for extracting, transforming, and rendering the information for the user. Developers lose one of the important benefits of XML publishing, namely, the ability to repurpose content for different types of client devices such as PDAs, WAP phones, and set-top boxes. Many of these platforms cannot or do not implement the processors required to transform the documents into the proper format.
Using preprocessed transformations, the appropriate stylesheets are applied to the source content offline. Only the results of those transformations are published. Typically, a staging area is used, where the source content is transformed into the desired formats. The results are copied from there into the appropriate location on the publicly available server, as shown in Figure 1-3.
Figure 1-3.The preprocessed transformation model
On the plus side, transforming content into the correct format ahead of time solves potential problems that can arise from expecting too much from the requesting client. That is to say, for example, that the browser gets the data that it can cope with best, just as if you authored the content in HTML to begin with, and you did not introduce any additional risk. Also, as with client-side transformations, no additional tools need to be installed on the web-server machine; any vanilla web server can capably deliver the preprocessed documents.
On the down side, offline preprocessing adds at least one additional step to publishing every document. Each time a document changes, it must be retransformed and the new version published. As the site grows or the number of team members increases, the chances of collision and missed or slow updates increase. Also, making the same content available in different formats greatly increases complexity. A simple text change, for example, requires a content transformation for each format, as well as a separate URL for each variation of every document. Scripted automation can help reduce some costs and risks, but someone must write and maintain the code for the automation process. That means more time and money spent. In any case, the static site that results from offline preprocessing lacks the ability to repurpose content on the fly in response to the client’s request.
This article is excerpted from XML Publishing with AxKit by Kip Hampton (O'Reilly, 2004; ISBN 0596002165). Check it out at your favorite bookstore today. Buy this book now.