The ability to search a website can be invaluable to visitors. Even if you have a website that searches a product database, your visitors might want to search the contents of your entire site. This article helps you set up such a service. It is excerpted from chapter 10 of the book Better, Faster, Lighter Java, written by Bruce A. Tate and Justin Gehtland (O'Reilly; ISBN: 0596006764).
Adding a Search Service to a Java Application (Page 1 of 4 )
The previous chapter introduced the workhorse Simple Spider service with its console-based user interface and web service endpoint. In this chapter, we see how easy it is to add the Spider to an existing application, jPetStore. Some might argue the jPetStore already has a search tool; but that tool only searches the database of animals in the pet store, not all the pages on the site. Our customer needs to search the entire site; jPetStore has at least one page in the current version that isn’t searchable at all (the Help page) and text describing the different animals that doesn’t show up in a query.
We’ll add the Spider to the jPetStore, paying careful attention to what we need to change in the code in order to enable the integration. In addition, we will replace the existing persistence layer with Hibernate. By carefully adhering to our core principles, our code will be reusable, and since thejPetStoreis based on a lightweight framework (Spring), it doesn’t make unreasonable demands on our code in order to incorporate the search capability or the new persistence layer. Coming and going, the inclusion will be simple and almost completely transparent.
A Brief Look at the Existing Search Feature
The search feature that comes with jPetStore takes one or more keywords separated by spaces and returns a list of animals with a name or category that includes the term. A search for “dog” turns up six results, while a search for “snake” nets one. However, a search for “venomless” gets no results, even though animal EST-11 is called the Venomless Rattlesnake. Even worse, none of the other pages (such as the Help page) shows up in the search at all; neither will any other pages you might add, unless they’re an animal entry in the database.
The search feature has the following architecture (shown in Figure 10-1):
Any page of thejPetStoreapplication may contain a search entry box with Search button.
Clicking the button fires a request (for /shop/searchProducts.do) passing the keywords along as part of the request.
petstore-servlet.xml, the configuration file for the MVC portion of the jPetStoreSpring application, has the following definition:
This creates a handler for the “/shop/searchProducts.do” request and maps it to an instance ofSearchProductsController, passing along an instance ofpetStoreImplcalledpetStore.
SearchProductsControllerinstantiates an instance of a class that implements theProductsDaointerface, asking it to search the database for the specified keywords.
ProductsDao queries the database and creates an instance ofProduct for each returned row.
ProductDaopasses aHashMapcontaining all of theProductinstances back toSearchProductsController.
SearchProductsControllercreates a newModelAndViewinstance, passing in the name of the JSP page to display the results (SearchProducts) and theHashMapof values. The JSP page then renders the results using thePagedListHoldercontrol (a list/table with built-in paging functionality).
Figure 10-1. The original jPetStore search architecture
Only theProductsDaoknows how to interact with the underlying data.Productis a straightforward class with information about each product, and the view (Search-Products.jsp) simply iterates through the returned results to create the output page.
Deciding on the Spider
We’ve identified how the current search feature works and its limitations: the search feature only searches products in the database, not the site as a whole, and even then it doesn’t search all available data about the products. The results it returns are extremely limited—though well-formatted.
The Simple Spider is a crawler-based search feature instead of focusing on the database: it searches everywhere on the site, not just the products table, and it treats any textual information visible to users as part of the search domain. The Spider does have a major limitation—since it is based on a web crawler, it can only catalog pages linked to other pages on the site. If a page is only accessible via some server-side logic (for instance, selecting a product from a drop-down list and submitting the form to the server, which returns a client-side or server-side redirect), the crawler never reaches that page and it won’t be part of the search.
With a problem like this, in which a feature of the application is too limited to be of much service to our users, we have to decide between refining the existing service or replacing it entirely. The limitation of thejPetStoresearch is partly due to the fundamental nature of the service (it searches the database, not the site). Refining it to accomplish the full-site search would be horribly inefficient. The Spider is the obvious solution, but we must consider what we are already dealing with (remember, you are what you eat). IfjPetStoreuses a lot of server-side logic to handle navigation, the Spider simply won’t be able to provide a complete catalog. In this case, though, all the navigation on the site is handled client-side, so the Spider is a perfect fit for solving our problem and coexisting with our current application.