This article, the first of two parts, examines the problems raised by the glut of information available through the web, and how to tame it. It is excerpted from the book Wicked Cool Java, written by Brian D. Eubanks (No Starch Press, 2005; ISBN: 1593270615).
Crawling the Semantic Web - This Somethings That: A Short Introduction to N3 and Jena (Page 2 of 6 )
The theory behind the RDF standard is actually quite simple. Everything has a Uniform Resource Identifier (URI), and by this I mean everything: not only documents but also generic concepts and relationships between them. Even though you are not a document (or are you?), there could be a URI assigned to represent you as an entity. This URI can then be used to make connections to other things. For the “you” URI, these connections might represent related organizations, addresses, and phone numbers. URIs do not have to return an actual document! This is what sometimes confuses developers when they see a URI referenced somewhere and find that there is nothing at the location. These addresses are often used as markers or unique identifiers to represent concepts. We make links between URIs to represent relationships between things. This functions much like a simple sentence in English:
Programmers enjoy Java.
To begin with, let’s use a shorthand notation, called N3, to encode this as an RDF graph. N3 is an easy way to learn RDF because the syntax is only slightly more complex than the sentence above! In essence, N3 is merely a set of triples, or “subject predicate object” relationships. Here is the N3 version of the sentence:
We first define a prefix to make the N3 code less verbose. The prefix is used as the beginning part of a URI wherever it is found in the document, so that wcj:java then becomes http://example.org/wcjava/uri/java (the value is also placed within < and > markers—these have nothing to do with XML). The three items together are called a triple, and the verb is usually called a predicate. RDF makes a link by stating that a subject URI is related by a predicate URI to an object URI. The predicate represents some relationship between the subject and object—it tells how things link together. This is very different than an anchor in HTML, because here a relationship type is clearly defined. Remember that URIs in RDF could be anything: concepts, documents, or even (in some cases) String literals. In theoretical terms, we are creating a labeled directed graph of the relationship. A graph representation of the above might look like Figure 4-1.
Figure 4-1: RDF subject, predicate, and object
As you might expect, there is a Java API for creating and managing RDF and N3 documents. Jena is an open-source API for working with RDF graphs. Here is one way to create the graph in Jena and serialize it to an N3 document:
Model model = ModelFactory.createDefaultModel(); Resource programmers = model.createResource( "http://example.org/wcjava/uri/programmers"); Property enjoy = model.createProperty( "http://example.org/wcjava/uri/enjoy"); Resource java = model.createResource( "http://example.org/wcjava/uri/java"); model.add(programmers, enjoy, java); FileOutputStream outStream = new FileOutputStream("out.n3"); model.write(outStream, "N3"); outStream.close();
Here, Jena is using the term property to refer to the predicate and resource to refer to something used as a subject or object. The model’s write method also has options to write out the document in other formats besides N3. With the Jena API, you can connect many entities together into very large semantic networks. Let’s make some additional relationships using the entities and relationships that we just created. We will produce the graph shown in Figure 4-2.
Figure 4-2:An RDF graph with mulitple subjects
Here is the additional code to produce the network in Figure 4-2:
The semicolon in the N3 document is a shortcut that indicates we are going to attach another property to the same subject (“programmers enjoy java, and programmers use computers”). The meanings of elements within a document are often defined in terms of a predefined set of resources and properties called a vocabulary. Your RDF data can be combined with other data in existing vocabularies to allow semantic searches and analysis of complex RDF graphs. In the next section, we illustrate how to build upon existing RDF vocabularies to build your own vocabulary.