Home arrow Java arrow Page 4 - Crawling the Semantic Web, concluded

Crawling the Semantic Web, concluded

This article, the second of two parts, examines the problems raised by the glut of information available through the web, and how to tame it. It is excerpted from the book Wicked Cool Java, written by Brian D. Eubanks (No Starch Press, 2005; ISBN: 1593270615).

Author Info:
By: No Starch Press
Rating: 5 stars5 stars5 stars5 stars5 stars / 7
March 02, 2006
  1. · Crawling the Semantic Web, concluded
  2. · Guess What? Publishing RSS Newsfeeds with Informa
  3. · Whatís Up? Aggregating RSS Newsfeeds
  4. · Heading to the Polls: Polling RSS Feeds with Informa
  5. · All the News Fit to Print: Filtering RSS Feeds with Informa

print this article

Crawling the Semantic Web, concluded - Heading to the Polls: Polling RSS Feeds with Informa
(Page 4 of 5 )

 We just showed how Informa can retrieve data from an RSS channel, using the ChannelBuilder class. Ideally, updating your copy of the feed should be an automated process, and Informa can also do this. The Poller class (located in the de.nava.informa.utils.poller package) can periodically poll a Channel objectís RSS feed and trigger some action whenever there are changes. By default, this polling occurs every 60 minutes but can be configured to use longer or shorter periods. The Poller class works by notifying an observer object whenever something changes in the feed. To use this process, you must first create a class implementing the PollerObserverIF interface. This interface has methods for poll tracking, error handling, and feed change notification.

Letís look at an example of a PollerObserverIF that uses the newItem method, which the Poller calls whenever the feed has a new item. However, the new item will not be added to the copy in your Channel object unless the observer explicitly adds it. Here is a PollerObserverIF implementation that does not add feed changes to the Channel object but instead prints a notification message to the console:

public class AnObserver
implements de.nava.informa.utils.poller.PollerObserverIF
  public void itemFound(ItemIF item, ChannelIF channel) {
    System.out.println("New item found");

  public void pollStarted(ChannelIF channel) {
      "Started poll with " + channel.getItems().size() +
      " items in channel");

  public void pollFinished(ChannelIF channel) { 
"Finished poll with " + channel.getItems().size() +
      " items in channel");

  public void channelChanged(ChannelIF channel) {}
  public void channelErrored(ChannelIF channel, Exception e) {}

This observer will print information about the beginning and end of each polling event, list any new items in the feed, and add new items to the object model. Warning: An observer does not add new items to the Channel object unless you explicitly call the addItem method. If you have more than one observer attached, one of them should be assigned the task of adding the new item to the Channel. With real RSS feeds, youíll want to set a polling frequency that doesnít clog the network or the site with unnecessary traffic. A polling period of 60 minutes (the default) or longer should be frequent enough for most sites. The following code fragment uses the observer that we just defined and polls the RSS feed for a previously loaded Channel object every 60 minutes.

Poller poller = new Poller();
poller.addObserver(new AnObserver());

To use a three-hour interval instead of the default, you can call:

poller.registerChannel(channel, 3 * 60 * 60 * 1000);

Make sure to remember that the polling interval is specified in milliseconds! If you are going to filter items from the feed, the observers should not be doing the filtering. There is a separate component that can approve polled changes prior to observer notification. This keeps the observers focused on their task of propagating changes rather than filtering data. The process is more scalable that way, as you may want many observers to receive approved changes. This filtering and approval process is described in the next section.

blog comments powered by Disqus

- Java Too Insecure, Says Microsoft Researcher
- Google Beats Oracle in Java Ruling
- Deploying Multiple Java Applets as One
- Deploying Java Applets
- Understanding Deployment Frameworks
- Database Programming in Java Using JDBC
- Extension Interfaces and SAX
- Entities, Handlers and SAX
- Advanced SAX
- Conversions and Java Print Streams
- Formatters and Java Print Streams
- Java Print Streams
- Wildcards, Arrays, and Generics in Java
- Wildcards and Generic Methods in Java
- Finishing the Project: Java Web Development ...

Watch our Tech Videos 
Dev Articles Forums 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us 
Weekly Newsletter
Developer Updates  
Free Website Content 
Contact Us 
Site Map 
Privacy Policy 

Developer Shed Affiliates


© 2003-2019 by Developer Shed. All rights reserved. DS Cluster - Follow our Sitemap
Popular Web Development Topics
All Web Development Tutorials