Home arrow Java arrow Page 4 - Crawling the Semantic Web

Crawling the Semantic Web

This article, the first of two parts, examines the problems raised by the glut of information available through the web, and how to tame it. It is excerpted from the book Wicked Cool Java, written by Brian D. Eubanks (No Starch Press, 2005; ISBN: 1593270615).

Author Info:
By: No Starch Press
Rating: 5 stars5 stars5 stars5 stars5 stars / 8
February 23, 2006
  1. · Crawling the Semantic Web
  2. · This Somethings That: A Short Introduction to N3 and Jena
  3. · Triple the Fun: Creating an RDF Vocabulary for Your Organization
  4. · Who’s a What? Using RDF Hierarchies in Jena
  5. · Getting Attached: Attaching Dublin Core to HTML Documents
  6. · What’s the Reason? Making Queries with Jena RDQL

print this article

Crawling the Semantic Web - Who’s a What? Using RDF Hierarchies in Jena
(Page 4 of 6 )

 Earlier we created a hierarchy of terms to use for our metadata. We used the word vocabulary to refer to this collection of terms, but it is often called an ontology if it defines relationships between the terms. According to the Wikipedia definition, an ontology (in the computer science sense) is a “data structure containing all the relevant entities and their relationships and rules (theorems, regulations) within a domain.”

In Jena, there are built-in helper classes for working with commonly used ontologies. The RDF schema is one of these. Jena has a helper class called RDFS, which has a static variable for the subClassOf property. You can create the graph in the previous section by using this code:

Model model = ModelFactory.createDefaultModel();
model.setNsPrefix("wcj", "http://example.org/wcjava/");
Resource employee = model.createResource("wcj:employee");
Resource person = model.createResource("wcj:person");
Resource employer = model.createResource("wcj:employer");
Resource organization = model.createResource(
Property hires = model.createProperty("wcj:hires");
model.add(employer, hires, employee);
model.add(employer, RDFS.subClassOf, organization);
model.add(employee, RDFS.subClassOf, person);
model.write(new FileWriter("ourEntities.rdf"), "RDF/XML");

The second line sets a namespace prefix for our graph, which makes the code easier to read because we can describe the URIs in a simpler way. There is nothing special about the choice of “wcj” as our prefix. It could have been any String of letters, but whichever value is used becomes the prefix that is sent to the output file. The RDF/XML output type is the XML representation of our RDF graph. Most applications will exchange RDF graphs using the XML format rather than N3. As you can see, Jena’s RDF model can work with either type.

Once you have an RDF vocabulary defined for your data, you will want to put it onto a website so that applications can use it. You can use your new vocabulary to semantically tag any components within applications. For the database example above, you might create a new table to hold metadata linking each column and table name to their RDF types. It could be as simple as an entry for each table/column name and the corresponding URI from your RDF vocabulary that describes its meaning. You might use this for automatically generating documentation or in analyzing and reusing application code. Using RDF for this type of metadata is a convenient way to tag the data without changing anything in the existing data structures. For our Java classes, we could also add code annotations or JavaDoc tags to semantically mark up our code to facilitate its reuse.

There are some well-known standard RDF vocabularies that you can use to build your own vocabulary. The first one to consider using is a vocabulary extension to RDF, created by the W3C, called the OWL Web Ontology Language. It includes vocabulary along with formal semantics that you can use in your own definitions. OWL builds on the framework created by the RDF and RDF schema vocabularies. Although we used the RDF schema’s subClassOf property, OWL has a much more comprehensive version that adds formal semantics such as property restrictions and set operations. Jena has an OWL helper class with static variables for each of the OWL resources and properties. Another common RDF standard is the Dublin Core (DC), an element set for describing metadata about information resources of any kind. It defines generic properties such as title, creator, type, format, language, and rights. The type property uses values from the Type Vocabulary, part of the Dublin Core. Some examples of types are collection, dataset, interactive resource, and software. In Jena, there is a DC class with static Property variables for each of the Dublin Core properties. You can add a type property to an item within a model by using:

model.add(myDatabaseResource, DC.type, DCTypes.Dataset);

This marks the resource myDatabaseResource as being a type of Dataset. Combining with RDF schema or OWL, you can create your own hierarchy of terms using these as a baseline. For example, you might create terms for “JDBC-accessible database,” “relational database table,” and “relational database column” that are RDF subclasses of Dataset. You could then define unique URIs for specific instances of these and make statements about them in RDF: “MySQL instance #743234 at OurOrganization contains data about employees, stored in the table named Employee.” Having such metadata available can make managing IT resources much easier.

Eventually there will probably be a standard upper-level ontology for all information technology terms. Many groups are working to create standard vocabularies for various domains. One effort, the Suggested Upper Merged Ontology (SUMO), aims to develop an upper-level hierarchy for all abstract concepts. Future applications that use ontologies based on this may be able to make high-level inferences using data from entirely different domains. There are some domain-specific hierarchies that are also based on SUMO. In this section’s resource page, there is an updated list of some existing vocabularies that you can use. In the next section, we attach an RDF document as metadata for an HTML document.

blog comments powered by Disqus

- Java Too Insecure, Says Microsoft Researcher
- Google Beats Oracle in Java Ruling
- Deploying Multiple Java Applets as One
- Deploying Java Applets
- Understanding Deployment Frameworks
- Database Programming in Java Using JDBC
- Extension Interfaces and SAX
- Entities, Handlers and SAX
- Advanced SAX
- Conversions and Java Print Streams
- Formatters and Java Print Streams
- Java Print Streams
- Wildcards, Arrays, and Generics in Java
- Wildcards and Generic Methods in Java
- Finishing the Project: Java Web Development ...

Watch our Tech Videos 
Dev Articles Forums 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us 
Weekly Newsletter
Developer Updates  
Free Website Content 
Contact Us 
Site Map 
Privacy Policy 

Developer Shed Affiliates


© 2003-2018 by Developer Shed. All rights reserved. DS Cluster - Follow our Sitemap
Popular Web Development Topics
All Web Development Tutorials