Home arrow Java arrow Page 2 - Crawling the Semantic Web
JAVA

Crawling the Semantic Web


This article, the first of two parts, examines the problems raised by the glut of information available through the web, and how to tame it. It is excerpted from the book Wicked Cool Java, written by Brian D. Eubanks (No Starch Press, 2005; ISBN: 1593270615).

Author Info:
By: No Starch Press
Rating: 5 stars5 stars5 stars5 stars5 stars / 8
February 23, 2006
TABLE OF CONTENTS:
  1. · Crawling the Semantic Web
  2. · This Somethings That: A Short Introduction to N3 and Jena
  3. · Triple the Fun: Creating an RDF Vocabulary for Your Organization
  4. · Who’s a What? Using RDF Hierarchies in Jena
  5. · Getting Attached: Attaching Dublin Core to HTML Documents
  6. · What’s the Reason? Making Queries with Jena RDQL

print this article
SEARCH DEVARTICLES

Crawling the Semantic Web - This Somethings That: A Short Introduction to N3 and Jena
(Page 2 of 6 )

The theory behind the RDF standard is actually quite simple. Everything has a Uniform Resource Identifier (URI), and by this I mean everything: not only documents but also generic concepts and relationships between them. Even though you are not a document (or are you?), there could be a URI assigned to represent you as an entity. This URI can then be used to make connections to other things. For the “you” URI, these connections might represent related organizations, addresses, and phone numbers. URIs do not have to return an actual document! This is what sometimes confuses developers when they see a URI referenced somewhere and find that there is nothing at the location. These addresses are often used as markers or unique identifiers to represent concepts. We make links between URIs to represent relationships between things. This functions much like a simple sentence in English:

Programmers enjoy Java.

To begin with, let’s use a shorthand notation, called N3, to encode this as an RDF graph. N3 is an easy way to learn RDF because the syntax is only slightly more complex than the sentence above! In essence, N3 is merely a set of triples, or “subject predicate object” relationships. Here is the N3 version of the sentence:

@prefix wcj: <HTTP://EXAMPLE.ORG/WCJAVA/URI/> .
wcj:programmers wcj:enjoy wcj:java .

We first define a prefix to make the N3 code less verbose. The prefix is used as the beginning part of a URI wherever it is found in the document, so that wcj:java then becomes http://example.org/wcjava/uri/java (the value is also placed within < and > markers—these have nothing to do with XML). The three items together are called a triple, and the verb is usually called a predicate. RDF makes a link by stating that a subject URI is related by a predicate URI to an object URI. The predicate represents some relationship between the subject and object—it tells how things link together. This is very different than an anchor in HTML, because here a relationship type is clearly defined. Remember that URIs in RDF could be anything: concepts, documents, or even (in some cases) String literals. In theoretical terms, we are creating a labeled directed graph of the relationship. A graph representation of the above might look like Figure 4-1.


Figure 4-1:
RDF subject, predicate, and object

As you might expect, there is a Java API for creating and managing RDF and N3 documents. Jena is an open-source API for working with RDF graphs. Here is one way to create the graph in Jena and serialize it to an N3 document:

import com.hp.hpl.jena.rdf.model.*;
import java.io.FileOutputStream;

Model model = ModelFactory.createDefaultModel();
Resource programmers = model.createResource(
     "http://example.org/wcjava/uri/programmers");
Property enjoy = model.createProperty(
     "http://example.org/wcjava/uri/enjoy");
Resource java = model.createResource(
     "http://example.org/wcjava/uri/java");
model.add(programmers, enjoy, java);
FileOutputStream outStream = new FileOutputStream("out.n3");
model.write(outStream, "N3");
outStream.close();

Here, Jena is using the term property to refer to the predicate and resource to refer to something used as a subject or object. The model’s write method also has options to write out the document in other formats besides N3. With the Jena API, you can connect many entities together into very large semantic networks. Let’s make some additional relationships using the entities and relationships that we just created. We will produce the graph shown in Figure 4-2. 


Figure 4-2:
An RDF graph with mulitple subjects

Here is the additional code to produce the network in Figure 4-2:

Property typeOf = model.createProperty(
   "http://example.org/wcjava/typeOf");
Property use = model.createProperty(
   "http://example.org/wcjava/use");
Property understand = model.createProperty(
   "http://example.org/wcjava/understand");
Resource computers = model.createResource(
   "http://example.org/wcjava/computers");
Resource progLang =model.createResource(
   "http://example.org/wcjava/progLang");
model.add(java, typeOf, progLang);
model.add(programmers, use, computers);
model.add(computers, understand, progLang);
model.write(new java.io.FileOutputStream("out2.n3"), "N3");

The N3 output of this code is the following:

<HTTP://EXAMPLE.ORG/WCJAVA/URI/JAVA>
  <HTTP://EXAMPLE.ORG/WCJAVA/TYPEOF>
    <HTTP://EXAMPLE.ORG/WCJAVA/PROGLANG> .

<HTTP://EXAMPLE.ORG/WCJAVA/COMPUTERS>
  <HTTP://EXAMPLE.ORG/WCJAVA/UNDERSTAND>
    <HTTP://EXAMPLE.ORG/WCJAVA/PROGLANG> .

<HTTP://EXAMPLE.ORG/WCJAVA/URI/PROGRAMMERS>
  <HTTP://EXAMPLE.ORG/WCJAVA/URI/ENJOY>
    <HTTP://EXAMPLE.ORG/WCJAVA/URI/JAVA>
;
 
<HTTP://EXAMPLE.ORG/WCJAVA/USE>
    <HTTP://EXAMPLE.ORG/WCJAVA/COMPUTERS> .

The semicolon in the N3 document is a shortcut that indicates we are going to attach another property to the same subject (“programmers enjoy java, and programmers use computers”). The meanings of elements within a document are often defined in terms of a predefined set of resources and properties called a vocabulary. Your RDF data can be combined with other data in existing vocabularies to allow semantic searches and analysis of complex RDF graphs. In the next section, we illustrate how to build upon existing RDF vocabularies to build your own vocabulary.


blog comments powered by Disqus
JAVA ARTICLES

- Java Too Insecure, Says Microsoft Researcher
- Google Beats Oracle in Java Ruling
- Deploying Multiple Java Applets as One
- Deploying Java Applets
- Understanding Deployment Frameworks
- Database Programming in Java Using JDBC
- Extension Interfaces and SAX
- Entities, Handlers and SAX
- Advanced SAX
- Conversions and Java Print Streams
- Formatters and Java Print Streams
- Java Print Streams
- Wildcards, Arrays, and Generics in Java
- Wildcards and Generic Methods in Java
- Finishing the Project: Java Web Development ...

Watch our Tech Videos 
Dev Articles Forums 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us 
Weekly Newsletter
 
Developer Updates  
Free Website Content 
Contact Us 
Site Map 
Privacy Policy 
Support 

Developer Shed Affiliates

 




© 2003-2017 by Developer Shed. All rights reserved. DS Cluster - Follow our Sitemap
Popular Web Development Topics
All Web Development Tutorials