Home arrow Java arrow Page 8 - Crawling the Web with Java
JAVA

Crawling the Web with Java


Are you playing with the possibilities of Java? This article explores in detail how to use Java's Web Crawler class and methods. It is excerpted from chapter six of The Art of Java, written by Herbert Schildt and James Holmes (McGraw-Hill, 2004; ISBN: 0072229713).

Author Info:
By: McGraw-Hill/Osborne
Rating: 4 stars4 stars4 stars4 stars4 stars / 87
June 09, 2005
TABLE OF CONTENTS:
  1. · Crawling the Web with Java
  2. · Fundamentals of a Web Crawler
  3. · An Overview of the Search Crawler
  4. · The SearchCrawler Class part 1
  5. · The SearchCrawler Class part 2
  6. · SearchCrawler Variables and Constructor
  7. · The search() Method
  8. · The showError() and updateStats() Methods
  9. · The addMatch() and verifyURL() Methods
  10. · The downloadPage(), removeWwwFromURL(), and
  11. · An Overview of Regular Expression Processing
  12. · A Close Look at retrieveLinks()
  13. · The searchStringMatches() Method
  14. · The crawl() Method
  15. · Compiling and Running the Search Web Crawler

print this article
SEARCH DEVARTICLES

Crawling the Web with Java - The showError() and updateStats() Methods
(Page 8 of 15 )

The showError( ) Method

The showError( ) method, shown here, displays an error dialog box on the screen with the given message. This method is invoked if any required search options are missing or if there are any problems opening, writing to, or closing the log file.

// Show dialog box with error message.
private void showError(String message) {
  JOptionPane.showMessageDialog(this, message, "Error", 
    JOptionPane.ERROR_MESSAGE);
}

The updateStats( ) Method

The updateStats( ) method, shown here, updates the values displayed in the Stats section of the interface:

// Update crawling stats.
private void updateStats(
 
String crawling, int crawled, int toCrawl, int maxUrls)
{
 
crawlingLabel2.setText(crawling);
 
crawledLabel2.setText("" + crawled);
 
toCrawlLabel2.setText("" + toCrawl);
 
// Update progress bar.
  if (maxUrls == -1) {
    progressBar.setMaximum(crawled + toCrawl);
  } else {
   
progressBar.setMaximum(maxUrls);
  }
  progressBar.setValue(crawled);
  matchesLabel2.setText("" + table.getRowCount());
}

First, the crawling results are updated to reflect the current URL being crawled, the number of URLs crawled thus far, and the number of URLs that are left to crawl. Take note that the URLs to Crawl field may be misleading. It displays the number of links that have been aggregated and put in the To Crawl queue, not the difference between the specified maximum URLs and the number of URLs that have been crawled thus far. Notice also that when setText( ) is called with crawled and toCrawl, it is passed an empty string (" ") plus an int value. This is so that Java will convert the int values into String objects, which the setText( ) method requires.

Next, the progress bar is updated to reflect the current progress made toward finishing crawling. If the Max URLs to Crawl text field was left blank, which specifies that crawling should not be capped, the maxUrls variable will have the value Ė1. In this case, the progress barís maximum is set to the number of URLs that have been crawled plus the number of URLs left to crawl. If, on the other hand, a Max URLs to Crawl value was specified, it will be used as the progress barís maximum. After establishing the progress barís maximum value, its current value is set. The JProgressBar class uses the maximum and current values to calculate the percentage shown in text on the progress bar.

Finally, the Search Matches label is updated to reflect the current number of URLs that contain the specified search string.


blog comments powered by Disqus
JAVA ARTICLES

- Java Too Insecure, Says Microsoft Researcher
- Google Beats Oracle in Java Ruling
- Deploying Multiple Java Applets as One
- Deploying Java Applets
- Understanding Deployment Frameworks
- Database Programming in Java Using JDBC
- Extension Interfaces and SAX
- Entities, Handlers and SAX
- Advanced SAX
- Conversions and Java Print Streams
- Formatters and Java Print Streams
- Java Print Streams
- Wildcards, Arrays, and Generics in Java
- Wildcards and Generic Methods in Java
- Finishing the Project: Java Web Development ...

Watch our Tech Videos 
Dev Articles Forums 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us 
Weekly Newsletter
 
Developer Updates  
Free Website Content 
Contact Us 
Site Map 
Privacy Policy 
Support 

Developer Shed Affiliates

 




© 2003-2017 by Developer Shed. All rights reserved. DS Cluster - Follow our Sitemap
Popular Web Development Topics
All Web Development Tutorials