Java
  Home arrow Java arrow Page 2 - Crawling the Semantic Web, concluded
Dev Articles Forums 
ADO.NET  
Apache  
ASP  
ASP.NET  
C#  
C++  
ColdFusion  
COM/COM+  
Delphi-Kylix  
Design Usability  
Development Cycles  
DHTML  
Embedded Tools  
Flash  
Graphic Design  
HTML  
IIS  
Interviews  
Java  
JavaScript  
MySQL  
Oracle  
Photoshop  
PHP  
Reviews  
Ruby-on-Rails  
SQL  
SQL Server  
Style Sheets  
VB.Net  
Visual Basic  
Web Authoring  
Web Services  
Web Standards  
XML  
Mobile Linux 
App Generation ROI 
IBM® developerWorks 
Weekly Newsletter
 
Developer Updates  
Free Website Content 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us Get Paid 
Request Media Kit
Contact Us 
Site Map 
Privacy Policy 
Support 
 USERNAME
 
 PASSWORD
 
 
  >>> SIGN UP!  
  Lost Password? 
JAVA

Crawling the Semantic Web, concluded
By: No Starch Press
  • Search For More Articles!
  • Disclaimer
  • Author Terms
  • Rating: 5 stars5 stars5 stars5 stars5 stars / 4
    2006-03-02

    Table of Contents:
  • Crawling the Semantic Web, concluded
  • Guess What? Publishing RSS Newsfeeds with Informa
  • What’s Up? Aggregating RSS Newsfeeds
  • Heading to the Polls: Polling RSS Feeds with Informa
  • All the News Fit to Print: Filtering RSS Feeds with Informa

  • Rate this Article: Poor Best 
      ADD THIS ARTICLE TO:
      Del.ici.ous Digg
      Blink Simpy
      Google Spurl
      Y! MyWeb Furl
    Email Me Similar Content When Posted
    Add Developer Shed Article Feed To Your Site
    Email Article To Friend
    Print Version Of Article
    PDF Version Of Article
     
     
    ADVERTISEMENT


    Crawling the Semantic Web, concluded - Guess What? Publishing RSS Newsfeeds with Informa


    (Page 2 of 5 )

     RDF Site Summary (RSS) is a standard for summarizing content on a web server. An RSS feed is stored in an XML file, and it might include items such as recent news, changes to a website, or new blog entries. A client program called an aggregator collects RSS feeds from multiple web servers and displays them in summary form, sorted by category. The user then chooses to view the full content of any summaries that are of interest. The summary has metadata, such as its subject, encoded along with a text summary. Over time I expect that document metadata will have much more than the Dublin Core and other terms that RSS currently uses. In theory, you could plug into other ontologies such as SUMO, and the meaning of an entire article could be encoded using RDF. This is possible only if you are using an ontology that is expressive enough. This is certainly a lot of effort, but the long-term advantage is that machines would have access to the fully encoded semantics of the text. This probably won’t happen for a while, but adding metadata such as RSS descriptions is a good start in that direction and has an immediate benefit of giving us more accurate categorization of content.

    There are several standards named RSS, all of them XML-based and used for similar purposes. Unfortunately the different standards not only have different XML structures but even use different definitions for the RSS acronym. Most aggregators are able to understand all RSS flavors, though. The version we discuss here, RDF Site Summary 1.0, uses RDF and is most closely related to the semantic work we’ve done so far in this chapter. However, it’s still better to use something rather than encoding no metadata at all. There are ways to map between the semantics of each standard, although all of them are not equally expressive. One common practice is to use XSL-T stylesheets to transform between the different forms of RSS.

    Because RSS 1.0 is built on RDF and XML, there are several ways of creating feeds: a DOM parser, an RDF API, or an RSS-specific API. DOM is more low-level than is necessary for creating RDF. Jena has RSS support through its RSS class, which has static objects that represent RSS properties you can use in building an RSS-compatible RDF graph. But if you’re going to be working a lot with RSS, you’ll want to use an RSS-specific API that can understand the different RSS versions that are commonly used.

    Informa is an open-source API for reading and writing RSS in Java. One of its most powerful features is the ability to persist the feed metadata in a database. Informa can also read data from external feeds (as described in a later section), perform text-filtering tasks, and update RSS content on a periodic schedule. Let’s use it to create a feed using the basic in-memory builder—the ChannelBuilder class from the de.nava.informa.impl.basic package. In RSS terminology, a channel is another name for metadata about some content (such as a website) and is the main entity in a newsfeed. Each RSS file defines a channel and items belonging to the channel. Rather than work with the XML directly, which can be somewhat tedious, we’ll use a ChannelBuilder to create the RSS file.

    ChannelBuilder builder = new ChannelBuilder();
    ChannelIF myChannel = builder.createChannel("Latest Bug Fixes");
    // This is the URL for which we are describing the metadata
    URL channelURL = new URL("http://example.org/wcj/bugs.rss");
    myChannel.setLocation(channelURL);
    myChannel.setDescription("The latest news on our bug fixes");

    // We create a first item
    String title = "Annoying Bug #25443 Now Fixed";
    String desc = "A major bug in OurGreatApplication is fixed. " +
     
    "Bug #25443, which has been annoying users ever since 3.0, " +
     
    "was due to a rogue null pointer.";
    URL url = new URL("http://example.org/wcj/bugfix25443.html");
    ItemIF anItem =
     
    builder.createItem(myChannel, title, desc, url);
    anItem.setCreator("Ecks Amples");

    // We create a second item
    title = "Bug #12121 not Fixed in 7.1";
    desc = "Bug #12121 will not be fixed in OurGreatApplication " +
          
    "release 7.1, so that developers can focus on adding " +
          
    "the WickedCool feature.";
    url = new URL("http://example.org/wcj/bugfix12121.html");
    anItem = builder.createItem(myChannel, title, desc, url);
    anItem.setCreator("Dee Veloper");

    // export the document to disk, in RSS 1.0 format
    ChannelExporterIF exporter = new RSS_1_0_Exporter("bugs.rss");
    exporter.write(myChannel);

    You can place the XML-encoded RSS feed anywhere on your site. The main page of your site should include a link to the feed. For automated discovery by RSS crawlers such as Syndic8, you can do this with a link tag in the page’s head section:

    <LINK rel="alternate" type="application/rss+xml"
    title="Bugs" href="http://your-site/bugs.rss"/>

    You’ll also want a hypertext link for human visitors, so they can add your site to their aggregator. If you are going to be creating large feeds that change often or working with many feeds simultaneously, use the Hibernate -based version of the builder, which will persist the RSS metadata in a database. Hibernate is an API for mapping Java objects to relational database structures and automatically translating data between them. See the Informa documentation, and this section’s resource page, for more information. In the next section, we’ll see how to read newsfeeds with Informa. 

    More Java Articles
    More By No Starch Press


       · This article is an excerpt from the book "Wicked Cool Java," published by No Starch...
     

    Buy this book now. This article is excerpted from the book Wicked Cool Java, written by Brian D. Eubanks (No Starch Press, 2005; ISBN: 1593270615). Check it out today at your favorite bookstore. Buy this book now.

    JAVA ARTICLES

    - Deploying Multiple Java Applets as One
    - Deploying Java Applets
    - Understanding Deployment Frameworks
    - Database Programming in Java Using JDBC
    - Extension Interfaces and SAX
    - Entities, Handlers and SAX
    - Advanced SAX
    - Conversions and Java Print Streams
    - Formatters and Java Print Streams
    - Java Print Streams
    - Wildcards, Arrays, and Generics in Java
    - Wildcards and Generic Methods in Java
    - Finishing the Project: Java Web Development ...
    - Generics and Limitations in Java
    - Getting Started with Java Web Development in...







    © 2003-2010 by Developer Shed. All rights reserved. DS Cluster 2 Hosted by Hostway
    For more Enterprise Application Development news, visit eWeek