Home arrow Java arrow Page 6 - Crawling the Semantic Web
JAVA

Crawling the Semantic Web


This article, the first of two parts, examines the problems raised by the glut of information available through the web, and how to tame it. It is excerpted from the book Wicked Cool Java, written by Brian D. Eubanks (No Starch Press, 2005; ISBN: 1593270615).

Author Info:
By: No Starch Press
Rating: 5 stars5 stars5 stars5 stars5 stars / 8
February 23, 2006
TABLE OF CONTENTS:
  1. · Crawling the Semantic Web
  2. · This Somethings That: A Short Introduction to N3 and Jena
  3. · Triple the Fun: Creating an RDF Vocabulary for Your Organization
  4. · Who’s a What? Using RDF Hierarchies in Jena
  5. · Getting Attached: Attaching Dublin Core to HTML Documents
  6. · What’s the Reason? Making Queries with Jena RDQL

print this article
SEARCH DEVARTICLES

Crawling the Semantic Web - What’s the Reason? Making Queries with Jena RDQL
(Page 6 of 6 )

 You’ve built the perfect ontology for your organization’s knowledge base. You’ve encoded it in RDF based on standard vocabularies, so you can exchange data with other applications. And now you have a large amount of data encoded using this vocabulary. “But what can I do with all this data?” you think to yourself. “It’s not like I can just use a query language like SQL!” Well, actually, you can—not specifically with the SQL language but with a similar structured language designed for querying knowledge bases. In this section, we’ll use an RDF query language to retrieve information from an existing knowledge base.

Because RDF data is not organized into tables, columns, and rows like a relational database, SQL won’t work for querying RDF graphs. Instead, we need to search within a graph to find subgraphs that match some pattern of RDF nodes (subject, predicate, and object). For instance, you might ask a knowledge base whether a particular employee is a supervisor. In this case, you know the subject, predicate, and object that you are looking for. You can directly ask whether the given structure exists in the RDF. However, most often you won’t know every part of the target structure, such as when you want a list of supervisors having a salary less than $100,000. Because we don’t know the URI of each item, we will have to use variables to represent the unknown items in the query. In this type of query, we are asking: “Show me all X where X is a supervisor, and X has salary Y, and Y < 100000.” The response will list all the possible values for X that would match the desired properties. Jena’s built-in query language is called RDF Data Query Language (RDQL). An RDQL query has several parts:

  • What values the query should return
  • The RDF sources to query
  • The query predicates
  • Optional namespace prefixes

RDQL will let us declare the RDF source (where the data is coming from) directly within the query String, but that is very inefficient for multiple queries against the same source. It’s usually better to run the query from an RDF model already in memory. Let’s run a query on the Suggested Upper Merged Ontology (SUMO), a very high-level ontology created by the IEEE. SUMO has standard names for high-level abstractions such as Process, Organization, and GeopoliticalArea. These are not Java classes; they are classes in the mathematical sense: a set whose members share one or more properties in common. We’ll look at Organization and find all of its direct subclasses, using the RDQL query:

SELECT ?x
WHERE (?x <RDFS:SUBCLASSOF> <SUMO:ORGANIZATION>)
USING rdfs FOR <HTTP://WWW.W3.ORG/2000/01/RDF-SCHEMA#> 
      sumo FOR <HTTP://RELIANT.TEKNOWLEDGE.COM/DAML/SUMO.OWL#>

The ?x in this query is a variable representing something that we want the query to locate. The query engine will try to substitute a value for ?x wherever it finds a subclass of Organism. Remember that all entities in RDF are URIs. The rdfs and sumo prefixes make the URIs in the query much shorter and less awkward. To run the query in Jena, we first load the SUMO ontology into memory. Then we run the query using the static exec method of Jena’s Query class and process the results. The following code performs this query:

Model sumo = ModelFactory.createOntologyModel();
String sumoURL = "http://reliant.teknowledge.com/DAML/SUMO.owl";
sumo.read(sumoURL);
sumo.setNsPrefix("sumo", sumoURL + "#");
String rdq = "SELECT ?x " +
   "WHERE (?x <RDFS:SUBCLASSOF> <SUMO:ORGANIZATION>) " +
   "USING rdfs FOR <HTTP://WWW.W3.ORG/2000/01/RDF-SCHEMA#> " +
     
"sumo FOR <" + sumoURL + "#>";
QueryResults results = Query.exec(rdq, sumo);
RDFVisitor aVisitor = new SysoutVisitor();
while (results.hasNext())
{
  
ResultBindingImpl binding =
      (ResultBindingImpl) results.next();
   RDFNode node = (RDFNode) binding.get("x");
   node.visitWith(aVisitor);
}

This matches the known subclasses of the Organization entity in SUMO. To visit each node and display its URI, you’ll need to write a visitor, using Jena’s RDFVisitor interface. My SysoutVisitor class prints out the URI of each node that it visits. You can do more interesting things with a visitor besides just printing a node’s value, such as visiting nodes connected to it by a particular property. Here is the code for SysoutVisitor:

public class SysoutVisitor implements RDFVisitor {
  
public Object visitBlank(Resource r, AnonId id) {
      System.out.println("anon: " + id);
      return null;
  
}

   public Object visitURI(Resource r, String uri) {
      System.out.println("uri: " + uri);
      return null;
   
}

   public Object visitLiteral(Literal l) {
      System.out.println(l);
      return null; 
   
}
}

There is a feature of the Visitor pattern that lets a visitor return a value, but we are not using that feature here. To make the program do something else instead of print each node’s value, all you need to do is plug in a different visitor. The previous query matches the following nodes:

http://reliant.teknowledge.com/DAML/SUMO.owl#Corporation http://reliant.teknowledge.com/DAML/SUMO.owl#PoliticalOrganization http://reliant.teknowledge.com/DAML/SUMO.owl#EducationalOrganization http://reliant.teknowledge.com/DAML/SUMO.owl#JudicialOrganization http://reliant.teknowledge.com/DAML/SUMO.owl#ReligiousOrganization http://reliant.teknowledge.com/DAML/SUMO.owl#GovernmentOrganization http://reliant.teknowledge.com/DAML/SUMO.owl#Organization http://reliant.teknowledge.com/DAML/SUMO.owl#MercantileOrganization http://reliant.teknowledge.com/DAML/SUMO.owl#Manufacturer http://reliant.teknowledge.com/DAML/SUMO.owl#Government http://reliant.teknowledge.com/DAML/SUMO.owl#PoliceOrganization http://reliant.teknowledge.com/DAML/SUMO.owl#MilitaryOrganization http://reliant.teknowledge.com/DAML/SUMO.owl#MilitaryForce http://reliant.teknowledge.com/DAML/SUMO.owl#ParamilitaryOrganization

Jena can also make rule-based inferences. You can create a knowledge base, combine it with SUMO facts, and query the model while applying matching rules. See the documentation and tutorial links on the resource page for more details. The W3C recently created its own query language called SPARQL, which works very similarly to Jena’s. See this book’s website for updated information on this and other query languages.


DISCLAIMER: The content provided in this article is not warranted or guaranteed by Developer Shed, Inc. The content provided is intended for entertainment and/or educational purposes in order to introduce to the reader key ideas, concepts, and/or product reviews. As such it is incumbent upon the reader to employ real-world tactics for security and implementation of best practices. We are not liable for any negative consequences that may result from implementing any information covered in our articles or tutorials. If this is a hardware review, it is not recommended to open and/or modify your hardware.

blog comments powered by Disqus
JAVA ARTICLES

- Java Too Insecure, Says Microsoft Researcher
- Google Beats Oracle in Java Ruling
- Deploying Multiple Java Applets as One
- Deploying Java Applets
- Understanding Deployment Frameworks
- Database Programming in Java Using JDBC
- Extension Interfaces and SAX
- Entities, Handlers and SAX
- Advanced SAX
- Conversions and Java Print Streams
- Formatters and Java Print Streams
- Java Print Streams
- Wildcards, Arrays, and Generics in Java
- Wildcards and Generic Methods in Java
- Finishing the Project: Java Web Development ...

Watch our Tech Videos 
Dev Articles Forums 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us 
Weekly Newsletter
 
Developer Updates  
Free Website Content 
Contact Us 
Site Map 
Privacy Policy 
Support 

Developer Shed Affiliates

 




© 2003-2017 by Developer Shed. All rights reserved. DS Cluster - Follow our Sitemap
Popular Web Development Topics
All Web Development Tutorials