This article, the first of two parts, examines the problems raised by the glut of information available through the web, and how to tame it. It is excerpted from the book Wicked Cool Java, written by Brian D. Eubanks (No Starch Press, 2005; ISBN: 1593270615).
Crawling the Semantic Web - Getting Attached: Attaching Dublin Core to HTML Documents (Page 5 of 6 )
One of our original reasons for exploring RDF (besides it being cool!) was because of the limited linking capability of HTML. We’d like web browsers to still be able to display our HTML and web content, yet also have metadata available for processing by search engines and automated knowledge discovery systems. Given that most websites are probably still going to be using HTML for many more years, has RDF solved our link metadata problem yet? In some ways it has. There are several ways of marking up HTML documents with Dublin Core or other RDF metadata. The method I’ll be using here is the method suggested by the Dublin Core, and it also embeds the metadata without affecting the browser’s view of the data and without breaking the XHTML validation.
The browser may or may not know how to do anything with our RDF data, but we are assuming that other programs may be able to process it. We will need to embed the metadata so that it doesn’t interfere with the browser’s understanding or rendering of the HTML. We can do this by using link and meta tags in our HTML. Any programs that read this data should have a way to discover which technique we are using. Rather than let programs make assumptions (which could be wrong), we place a marker as an attribute of the head tag of the HTML, telling any programs how to retrieve this metadata:
The profile URI means that there is metadata in the HTML document and that it should be interpreted in the manner associated with the given profile. Any software processing this document will also need to know the schemas for RDF prefixes used in the metadata. We do this by placing link tags in the head section:
You can now add the actual Dublin Core properties to meta tags in the head section. It’s the same as using RDF triples, but the implicit subject of each triple is the current HTML document. Here is an example showing how to attach title and subject metadata to a document:
<META name="DC.title" xml:lang="en" content="The World is Full of RDF"/> <META name="DC.subject" content="earth"/>
See this book’s website for more information on HTML metadata and the Dublin Core.