Java
  Home arrow Java arrow Page 4 - Creating a User Interface for a Search Ser...
Dev Articles Forums 
ADO.NET  
Apache  
ASP  
ASP.NET  
C#  
C++  
ColdFusion  
COM/COM+  
Delphi-Kylix  
Design Usability  
Development Cycles  
DHTML  
Embedded Tools  
Flash  
Graphic Design  
HTML  
IIS  
Interviews  
Java  
JavaScript  
MySQL  
Oracle  
Photoshop  
PHP  
Reviews  
Ruby-on-Rails  
SQL  
SQL Server  
Style Sheets  
VB.Net  
Visual Basic  
Web Authoring  
Web Services  
Web Standards  
XML  
Mobile Linux 
App Generation ROI 
IBM® developerWorks 
Weekly Newsletter
 
Developer Updates  
Free Website Content 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us Get Paid 
Request Media Kit
Contact Us 
Site Map 
Privacy Policy 
Support 
 USERNAME
 
 PASSWORD
 
 
  >>> SIGN UP!  
  Lost Password? 
JAVA

Creating a User Interface for a Search Service
By: O'Reilly Media
  • Search For More Articles!
  • Disclaimer
  • Author Terms
  • Rating: 4 stars4 stars4 stars4 stars4 stars / 4
    2006-11-22

    Table of Contents:
  • Creating a User Interface for a Search Service
  • Changes to the Original Code to Fit the JSP
  • Setting Up the Indexer
  • Making Use of the Configuration Service

  • Rate this Article: Poor Best 
      ADD THIS ARTICLE TO:
      Del.ici.ous Digg
      Blink Simpy
      Google Spurl
      Y! MyWeb Furl
    Email Me Similar Content When Posted
    Add Developer Shed Article Feed To Your Site
    Email Article To Friend
    Print Version Of Article
    PDF Version Of Article
     
     
    ADVERTISEMENT


    Creating a User Interface for a Search Service - Making Use of the Configuration Service


    (Page 4 of 4 )

    If we jump straight in and start using the search as it’s currently configured, we’ll notice a problem. Our searches are returning lots of results—more than can be possible given the number of products in the database. In fact, a search for “dog” returns over 20 results, even though there are only 6 dogs in the database.

    This is happening because of the brute-force nature of the crawling service. Without extra help, the crawler finds every link on every page and follows it, adding the results to the index. The problem is that in addition to the links that allow users to browse animals in the catalog, there are also links that allow users to add the animals to their shopping carts, links to let them remove those items from their carts, links to a sign-in page (which, by default injPetStore, loads with real credentials stored in the text-boxes), and a live link for “Login,” which the crawler will happily follow—thus generating an entirely new set of links, with a session ID attached to them.

    We need to make sure our crawler doesn’t get suckered into following all the extraneous links and generate more results than are helpful for our users. In the first part of Chapter 9, we talked about the three major problems that turn up in a naïve approach to crawling a site:

    Infinite loops
      
    Once a link has been followed, the crawler must 
       ignore it.

    Off-site jumps
      
    Since we are looking at http://localhost/jpetstore, we 
       don’t want links to external resources to be ndexed: 
       that would lead to indexing the entire Internet (or, 
       at least, blowing up the application due to memory 
       problems after hours of trying).

    Pages that shouldn’t be indexed
       In this case, that’s pages like the sign-in page, any 
       page with a session ID attached to it, and so on.

    Our crawler/indexer service handles the first two issues for us automatically. Let’s go back and look at the code. TheIndexLinksclass has three collections it consults every time it considers a new link:

      Set linksAlreadyFollowed = new HashSet();
      HashSet linkPrefixesToFollow = new HashSet();
      HashSet linkPrefixesToAvoid = new HashSet();

    Every time a link is followed, it gets added tolinksAlreadyFollowed. The crawler never revisits a link stored here. The other two collections are a list of link prefixes that are allowed and a list of the ones that are denied. When we callIndexLinks. setInitialLink, we add the root link to thelinkPrefixesToFollowset:

      linkPrefixesToFollow.add(new URL(initialLink));

    IndexLinksalso exposes a method,initAvoidPrefixesFromSystemProperties, which tells theIndexLinksbean to read the configured system properties in order to initialize the list:

      public void initAvoidPrefixesFromSystemProperties() throws MalformedURLException {
       
    String avoidPrefixes = System.getProperty("com.relevance.ss.AvoidLinks");
       
    if (avoidPrefixes == null || avoidPrefixes.length() == 0) return;
        String[] prefixes = avoidPrefixes.split(" ");
       
    if (prefixes != null && prefixes.length != 0) {
         
    setAvoidPrefixes(prefixes);
        }
      }

    First, the logic for considering a link checks to make sure the new link matches one of the prefixes inlinkPrefixesToFollow. For us, the only value stored there is http://localhost/jpetstore. If it is a subpage of that prefix, we make sure the link doesn’t match one of the prefixes inlinkPrefixesToAvoid.

    A special side note: good code documentation is an important part of maintainability and flexibility. Notice the rather severe lack of comments in the code for the Simple Spider. On the other hand, it has rather lengthy method and type names (likeinitAvoidPrefixesFromSystemProperties), which make comments redundant, since they clearly describe the entity at hand. Good naming, not strict commenting discipline, is often the key to code readability.

    All we need to do is populate thelinkPrefixesToAvoidcollection.ConsoleSearchalready callsinitAvoidPrefixesFromSystemPropertiesfor us, so all we have to do is add the necessary values to the com.relevance.ss.properties file:

      AvoidLinks=http://localhost:8080/ jpetstore/shop/signonForm.do http://localhost:8080/ 
      jpetstore/shop/viewCart.do http://localhost:8080/jpetstore/shop/ searchProducts.do 
      http://localhost:8080/jpetstore/shop/ viewCategory.do;jsessionid= http://localhost:
      8080/jpetstore/shop/addItemToCart.do http://localhost:8080/jpetstore/shop/
      removeItemFromCart.do

    These prefixes represent, in order, the sign-on form of the application, any links that show the current user’s cart, the results of another search, any pages that are the result of a successful logon, pages that add items to a users cart, and pages that remove items from a users cart.

    Principles in Action

    1. Keep it simple: use existing Properties tools, not XML
    2. Choose the right tools: java.util.Properties
    3. Do one thing, and do it well: the service worries about following provided links; the configuration files worry about deciding what links can be followed
    4. Strive for transparency: the service doesn’t know ahead of time what kinds of links will be acceptable; configuration files make that decision transparent to the service
    5. Allow for extension: expandable list of allowable link types

    Please check back next week for the conclusion of this article.


    DISCLAIMER: The content provided in this article is not warranted or guaranteed by Developer Shed, Inc. The content provided is intended for entertainment and/or educational purposes in order to introduce to the reader key ideas, concepts, and/or product reviews. As such it is incumbent upon the reader to employ real-world tactics for security and implementation of best practices. We are not liable for any negative consequences that may result from implementing any information covered in our articles or tutorials. If this is a hardware review, it is not recommended to open and/or modify your hardware.

       · This article is an excerpt from the book "Better, Faster, Lighter Java," published...
     

    Buy this book now. This article is excerpted from chapter 10 of the book Better, Faster, Lighter Java, written by Bruce A. Tate and Justin Gehtland (O'Reilly; ISBN: 0596006764). Check it out today at your favorite bookstore. Buy this book now.

    JAVA ARTICLES

    - Deploying Multiple Java Applets as One
    - Deploying Java Applets
    - Understanding Deployment Frameworks
    - Database Programming in Java Using JDBC
    - Extension Interfaces and SAX
    - Entities, Handlers and SAX
    - Advanced SAX
    - Conversions and Java Print Streams
    - Formatters and Java Print Streams
    - Java Print Streams
    - Wildcards, Arrays, and Generics in Java
    - Wildcards and Generic Methods in Java
    - Finishing the Project: Java Web Development ...
    - Generics and Limitations in Java
    - Getting Started with Java Web Development in...







    © 2003-2010 by Developer Shed. All rights reserved. DS Cluster 2 Hosted by Hostway
    For more Enterprise Application Development news, visit eWeek