Home arrow Java arrow Page 5 - Crawling the Semantic Web

Crawling the Semantic Web

This article, the first of two parts, examines the problems raised by the glut of information available through the web, and how to tame it. It is excerpted from the book Wicked Cool Java, written by Brian D. Eubanks (No Starch Press, 2005; ISBN: 1593270615).

Author Info:
By: No Starch Press
Rating: 5 stars5 stars5 stars5 stars5 stars / 8
February 23, 2006
  1. · Crawling the Semantic Web
  2. · This Somethings That: A Short Introduction to N3 and Jena
  3. · Triple the Fun: Creating an RDF Vocabulary for Your Organization
  4. · Who’s a What? Using RDF Hierarchies in Jena
  5. · Getting Attached: Attaching Dublin Core to HTML Documents
  6. · What’s the Reason? Making Queries with Jena RDQL

print this article

Crawling the Semantic Web - Getting Attached: Attaching Dublin Core to HTML Documents
(Page 5 of 6 )

One of our original reasons for exploring RDF (besides it being cool!) was because of the limited linking capability of HTML. We’d like web browsers to still be able to display our HTML and web content, yet also have metadata available for processing by search engines and automated knowledge discovery systems. Given that most websites are probably still going to be using HTML for many more years, has RDF solved our link metadata problem yet? In some ways it has. There are several ways of marking up HTML documents with Dublin Core or other RDF metadata. The method I’ll be using here is the method suggested by the Dublin Core, and it also embeds the metadata without affecting the browser’s view of the data and without breaking the XHTML validation.

The browser may or may not know how to do anything with our RDF data, but we are assuming that other programs may be able to process it. We will need to embed the metadata so that it doesn’t interfere with the browser’s understanding or rendering of the HTML. We can do this by using link and meta tags in our HTML. Any programs that read this data should have a way to discover which technique we are using. Rather than let programs make assumptions (which could be wrong), we place a marker as an attribute of the head tag of the HTML, telling any programs how to retrieve this metadata:

<HEAD profile="http://dublincore.org/documents/dcq-html/">

The profile URI means that there is metadata in the HTML document and that it should be interpreted in the manner associated with the given profile. Any software processing this document will also need to know the schemas for RDF prefixes used in the metadata. We do this by placing link tags in the head section:

<LINK rel="schema.DC" href="http://purl.org/dc/elements/1.1/"/>
<LINK rel="schema.DCTERMS" href="http://purl.org/dc/terms/"/>

You can now add the actual Dublin Core properties to meta tags in the head section. It’s the same as using RDF triples, but the implicit subject of each triple is the current HTML document. Here is an example showing how to attach title and subject metadata to a document:

<META name="DC.title" xml:lang="en"
      content="The World is Full of RDF"/>
<META name="DC.subject" content="earth"/>

See this book’s website for more information on HTML metadata and the Dublin Core.

blog comments powered by Disqus

- Java Too Insecure, Says Microsoft Researcher
- Google Beats Oracle in Java Ruling
- Deploying Multiple Java Applets as One
- Deploying Java Applets
- Understanding Deployment Frameworks
- Database Programming in Java Using JDBC
- Extension Interfaces and SAX
- Entities, Handlers and SAX
- Advanced SAX
- Conversions and Java Print Streams
- Formatters and Java Print Streams
- Java Print Streams
- Wildcards, Arrays, and Generics in Java
- Wildcards and Generic Methods in Java
- Finishing the Project: Java Web Development ...

Watch our Tech Videos 
Dev Articles Forums 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us 
Weekly Newsletter
Developer Updates  
Free Website Content 
Contact Us 
Site Map 
Privacy Policy 

Developer Shed Affiliates


© 2003-2019 by Developer Shed. All rights reserved. DS Cluster - Follow our Sitemap
Popular Web Development Topics
All Web Development Tutorials