Home arrow Ruby-on-Rails arrow Page 2 - Tracking the News with Google News

Tracking the News with Google News

In this article, you'll create a graphical report from Google News RSS data, using a handy utility called FeedTools and a plug-in called CSS Graphs Helper. This article is excerpted from chapter 11 of the book Practical Reporting with Ruby and Rails, written by David Berube (Apress; ISBN: 1590599330).

Author Info:
By: Apress Publishing
Rating: 5 stars5 stars5 stars5 stars5 stars / 3
April 15, 2010
  1. · Tracking the News with Google News
  2. · Company News Coverage Reporting
  3. · Dissecting the Code
  4. · Creating the News Tracker Report Application
  5. · Dissecting the Code

print this article

Tracking the News with Google News - Company News Coverage Reporting
(Page 2 of 5 )

Suppose your company is investing heavily in public
relations in the hope that the media coverage will lead to sales. However, because so much money is being
spent on a public relations firm, your managers don't
want to spend additional money on a "press clipping" service. A press clipping service would monitor how well
the public relations firm is doing by searching newspapers for stories, typically called clippings, about your firm.

Your boss hopes you can do something similar and create a report that details how many times in a day your company is mentioned in the press. Fortunately, it's easy to do this with Google News. You've decided to implement two programs to produce this report: a Ruby script that loads the Google News news reports into a database and a Rails application that performs the actual reporting.

Loading the Data

As noted, you'll use FeedTools to help load the data into a database. Install FeedTools using the following command:

gem install -y feedtools

Need to update xx gems from http://gems.rubyforge.org
Successfully installed feedtools-x.y.zz Installing ri documentation for feedtools-x.y.zz...
Installing RDoc documentation for feedtools-x.y.zz...


You also need to create a database named company_pr and edit the establish_connection line at the top of the following loader script. (However, note that this code will automatically load a config/database.yml file if it exists, so if you run this application from the Rails application directory you'll create later, you don't need to edit the establish_connection line.)

Now create the loader script, as shown in Listing 11-1 .

Listing 11-1. RSS Loader (rss_loader.rb)

require 'feed_tools'
require 'active_record'
require 'uri'

(puts "usage: #{$0} query"; exit) unless ARGV.length==1

# If there's a config/database.yml file -like you'd find in a Rails app,
# read from that . . .

if File.exists?('./config/database.yml')
  require 'yaml'
                  YAML.load(File.read('config/database.yml'))['development']) else
  # . . . otherwise, connect to the default settings.

    :adapter  => "mysql",
    :host     => "your_mysql_hostname_here",
    :username => "your_mysql_username_here",
:password => "your_mysql_password_here",
    :database => "company_pr")

class Stories < ActiveRecord::Base

unless Stories.table_exists? # If this is the first time running this app,
                             # create the tables we need.

  ActiveRecord::Schema.define do
    create_table :stories do |t|
      t.column :guid, :string
      t.column :title, :string
      t.column :source, :string
      t.column :url, :string
      t.column :published_at, :datetime
      t.column :created_at, :datetime
create_table :cached_feeds do |t|
      t.column :url , :string
      t.column :title, :string
      t.column :href, :string
      t.column :link, :string
      t.column :feed_data, :text

      t.column :feed_data_type, :string, :length=>25
      t.column :http_headers, :text
      t.column :last_retrieved, :datetime

    # Without the following line,
    # you can't retrieve large results
    # like those we use in this script.

    execute "ALTER TABLE cached_feeds
            CHANGE COLUMN feed_data feed_data MEDIUMTEXT;"

output_format = 'rss'
per_page = 100
query = ARGV[0] 
query_encoded = URI.encode(query) # URI.encode will escape values like "&"
                                  # that would mess up our URL.

feed_url = "http://news.google.com/news" <<
           "?hl=en&ned=us&ie=UTF8" <<
           "&num=" << per_page <<
           "&output=" << output_format <<
           "&q=" << query_encoded

# Set up our cache:

FeedTools.configurations[:feed_cache] = "FeedTools::DatabaseFeedCache"

# Create our feed object:


if !feed.live?

  puts "feed is cached..."
  puts "last retrieved: #{ feed.last_retrieved }"
  puts "expires: #{ feed.last_retrieved + feed.time_to_live }"

  feed.items.each do |feed_story|
if not (Stories.find_by_title(feed_story.title) or
            Stories.find_by_url(feed_story.link) or
puts "processing story '#{feed_story.title}' new"
      Stories.new do |new_story| 
       new_story.title=feed_story.title.gsub(/<[^>]*>/, '') # strip HTML 
        new_story.sourcename= feed_story.publisher.name if feed_story.publisher.name
        new_story.published_at = feed_story.published
      # do nothing

Save this script as rss_loader.rb.

You can run this script as follows:

ruby rss_loader.rb Microsoft

processing story 'Microsoft Exchange Troubleshooting Assistant v1.1 (MSI)
ZDNet' - new
processing story 'Being MVP and posting Microsoft copyrighted material without
.. - ZDNet UK' - new
. . .

Note  You may get errors about require_gem being obsolete , but the script should still run fine.

Now, if you run the script again, it will detect that the feed was recently loaded and is in the cache, so it will exit:

ruby rss_loader.rb Microsoft

feed is cached...
last retrieved: Sun Sep 02 18:55:12 UTC 2007 expires: Sun Sep 02 19:55:12 UTC 2007

Not all reporting requires a script. SQL itself can be used for reporting from the MySQL client. This approach is very useful if you want to find a few pieces of information (or just one). Let's use a SQL query to verify that the data has been inserted into the MySQL database, as shown in Listing 11-2.

Listing 11-2. SQL to Verify Data Loading

mysql company_pr -u your_mysql_username -p Password: your_password_here

mysql> SELECT id,
                     CASE WHEN(LENGTH(title)>40)
                          THEN '...'
                          ELSE '' END) AS story_title
        FROM stories;

Running the query in Listing 11-2 produces results similar to the following :










 Judge approves final settlement in Iowa ... 



 Microsoft Webcast: Security Series (Part... 



 Security Showdown - Redmond Channel Part...



 Linux: Hasta la Vista, Microsoft! - LXer... 



Microsoft Vista desktops don&#39;t play...



Major Computer Viruses Over 25 Years - F...



 Ford Syncs Up with Microsoft to Smooth t...



Sony connects with Microsoft&#39;s DRM ... 



 HP&#39;s MediaSmart Server Launch Delaye... 


10  Customize Microsoft Management Console (...


11  Microsoft keeps businesses connected - B... 


12  Yahoo! ups the ante in e-mail - Times On...


13  Microsoft Antitrust Settlement Is a Succ... 


14  Microsoft settles eight year patent case...




 15 Microsoft Delays Windows Server 2008 - C... 

. . .

124  Microsoft Exchange Troubleshooting Assis... 125 Being MVP and posting Microsoft copyrigh...



95 rows in set (0.00 sec)

Of course, you'll get different results depending on the stories that are current when you run the script. Note that apostrophes are represented as &#39;, which is an HTML entity equivalent to the ASCII character '. This means that the HTML entities will be correctly displayed on a web browser, although you'll need to decode them if you intend to display them in, say, a PDF or text file. (You might get more entities; they all begin with &#.)

Additionally, note the call to CONCAT, which has three parts. The first part is the call to CONCAT itself, which adds two strings together. The next two parts are the strings to add. The first string it concatenates is
LEFT(title,40), which pulls out the leftmost 40 characters of the title of the story. The second string is CASE WHEN length(title)>40 THEN '...' ELSE '' END , which adds three periods after the title if the title is longer than 40 characters. In other words, if the title is longer than 40 characters, display the first 40 characters of the title followed by three periods.

Note  Strictly speaking, the notation after the 40- character maximum in this example should be an ellipsis, not three periods. An ellipsis is closer together, so the three periods are the width of a single character. However, text-only applications, like the MySQL console, don't have ellipses.

Now let's take a look at the code in the loading script.

blog comments powered by Disqus

- Ruby-on-Rails Faces Second Security Flaw in ...
- Ruby 2.0 Prepped for February 2013 Release
- Why LinkedIn Switched from Ruby on Rails
- Adding Style with Action Pack
- Handling HTML in Templates with Action Pack
- Filters, Controllers and Helpers in Action P...
- Action Pack and Controller Filters
- Action Pack Categories and Events
- Logging Out, Events and Templates with Actio...
- Action Pack Sessions and Architecture
- More on Action Pack Partial Templates
- Action Pack Partial Templates
- Displaying Error Messages with the Action Pa...
- Action Pack Request Parameters
- Creating an Action Pack Registration Form

Watch our Tech Videos 
Dev Articles Forums 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us 
Weekly Newsletter
Developer Updates  
Free Website Content 
Contact Us 
Site Map 
Privacy Policy 

Developer Shed Affiliates


© 2003-2018 by Developer Shed. All rights reserved. DS Cluster - Follow our Sitemap
Popular Web Development Topics
All Web Development Tutorials