This article, the first of two parts, examines the problems raised by the glut of information available through the web, and how to tame it. It is excerpted from the book Wicked Cool Java, written by Brian D. Eubanks (No Starch Press, 2005; ISBN: 1593270615).
Crawling the Semantic Web - Who’s a What? Using RDF Hierarchies in Jena (Page 4 of 6 )
Earlier we created a hierarchy of terms to use for our metadata. We used the word vocabulary to refer to this collection of terms, but it is often called an ontology if it defines relationships between the terms. According to the Wikipedia definition, an ontology (in the computer science sense) is a “data structure containing all the relevant entities and their relationships and rules (theorems, regulations) within a domain.”
In Jena, there are built-in helper classes for working with commonly used ontologies. The RDF schema is one of these. Jena has a helper class called RDFS, which has a static variable for the subClassOf property. You can create the graph in the previous section by using this code:
The second line sets a namespace prefix for our graph, which makes the code easier to read because we can describe the URIs in a simpler way. There is nothing special about the choice of “wcj” as our prefix. It could have been any String of letters, but whichever value is used becomes the prefix that is sent to the output file. The RDF/XML output type is the XML representation of our RDF graph. Most applications will exchange RDF graphs using the XML format rather than N3. As you can see, Jena’s RDF model can work with either type.
Once you have an RDF vocabulary defined for your data, you will want to put it onto a website so that applications can use it. You can use your new vocabulary to semantically tag any components within applications. For the database example above, you might create a new table to hold metadata linking each column and table name to their RDF types. It could be as simple as an entry for each table/column name and the corresponding URI from your RDF vocabulary that describes its meaning. You might use this for automatically generating documentation or in analyzing and reusing application code. Using RDF for this type of metadata is a convenient way to tag the data without changing anything in the existing data structures. For our Java classes, we could also add code annotations or JavaDoc tags to semantically mark up our code to facilitate its reuse.
There are some well-known standard RDF vocabularies that you can use to build your own vocabulary. The first one to consider using is a vocabulary extension to RDF, created by the W3C, called the OWL Web Ontology Language. It includes vocabulary along with formal semantics that you can use in your own definitions. OWL builds on the framework created by the RDF and RDF schema vocabularies. Although we used the RDF schema’s subClassOf property, OWL has a much more comprehensive version that adds formal semantics such as property restrictions and set operations. Jena has an OWL helper class with static variables for each of the OWL resources and properties. Another common RDF standard is the Dublin Core (DC), an element set for describing metadata about information resources of any kind. It defines generic properties such as title, creator, type, format, language, and rights. The type property uses values from the Type Vocabulary, part of the Dublin Core. Some examples of types are collection, dataset, interactive resource, and software. In Jena, there is a DC class with static Property variables for each of the Dublin Core properties. You can add a type property to an item within a model by using:
This marks the resource myDatabaseResource as being a type of Dataset. Combining with RDF schema or OWL, you can create your own hierarchy of terms using these as a baseline. For example, you might create terms for “JDBC-accessible database,” “relational database table,” and “relational database column” that are RDF subclasses of Dataset. You could then define unique URIs for specific instances of these and make statements about them in RDF: “MySQL instance #743234 at OurOrganization contains data about employees, stored in the table named Employee.” Having such metadata available can make managing IT resources much easier.
Eventually there will probably be a standard upper-level ontology for all information technology terms. Many groups are working to create standard vocabularies for various domains. One effort, the Suggested Upper Merged Ontology (SUMO), aims to develop an upper-level hierarchy for all abstract concepts. Future applications that use ontologies based on this may be able to make high-level inferences using data from entirely different domains. There are some domain-specific hierarchies that are also based on SUMO. In this section’s resource page, there is an updated list of some existing vocabularies that you can use. In the next section, we attach an RDF document as metadata for an HTML document.