Home arrow Ruby-on-Rails arrow Calculating Statistics with Active Record

Calculating Statistics with Active Record

If you've done any database work, you know that many of them have built-in features that let you reach the data and sometimes even analyze it. This article will show you how to perform two tasks using Ruby's Active Record. It is excerpted from chapter two of the book Practical Reporting with Ruby and Rails, written by David Berube (Apress; ISBN: 1590599330).

Author Info:
By: Apress Publishing
Rating: 5 stars5 stars5 stars5 stars5 stars / 4
March 16, 2010
  1. · Calculating Statistics with Active Record
  2. · Analyzing Data with Grouping and Aggregates
  3. · Calculating Salary Distribution
  4. · Calculating Drink/Win Distribution
  5. · Dissecting the Code

print this article

Calculating Statistics with Active Record
(Page 1 of 5 )

The previous chapter discussed the fundamentals of accessing and manipulating data with Active Record. The statistical analyses--the highest salary, average salary, and so forth--were done manually using Ruby code. While that's a plausible approach, it's easier and often quicker to let the database do the work for you.

Databases typically have numerous built-in features for speeding up data access. Indexes, for example, are subsets of your table data, which are automatically maintained by your database and can make searching much faster. You can think of indexes like the table of contents in a book. It's much faster to find something by using the table of contents than it is to read every page of the book looking for the desired information. Additionally, the database's query planner uses speed-enhancing techniques automatically. This query planner has access to statistical information on the various tables and columns that your query uses, and it will formulate a query plan based on that information. In other words, it estimates how long each method of retrieving the data you requested will take, and it uses the quickest method. Because of the capabilities of the database, it's typically best to use the techniques described in this chapter, as they are considerably faster than doing your statistics in your Ruby code.

In this chapter, you'll learn how to use the database to perform two common tasks: grouping and aggregation. Let's look at how these tasks are useful, and then work through an example that uses them for reporting.

Grouping and Aggregation

Grouping refers to a way to reduce a table into a subset, where each row in the subset represents the set of records having a particular grouped value or values. For example, if you were tracking automobile accidents, and you had a table of persons, with their age and number of accidents, you could group by age and retrieve every distinct age in the database. In other words, you would get a list of the age of every person, with the duplicates removed.

If you were using an Active Record model named
Person with an age column, you could find all of the distinct ages of the people involved, as follows:

ages = Person.find(:all, :group=>'age')

However, to perform useful work on grouped queries, you'll typically use aggregate functions. For example, you'll need to use aggregate functions to retrieve the average accidents per age group or the count of the people in each age group.

You've probably encountered a number of aggregate functions already. Some common ones are MAX and MIN, which give you the maximum and minimum value; AVG, which gives you the average value; SUM, which returns the sum of the values; and COUNT, which returns the total number of values. Each database engine may define different statistical functions, but nearly all provide those just mentioned.

Continuing with the Active Record model named Person with an age column, you could find the highest age from your table as follows:

oldest_age = Person.calculate(:max, :age)

Note that calculate takes the max function's name, as a symbol, as its first argument, but Active Record also has a number of convenience functions named after their respective purposes: count, sum, minimum, maximum, and average. For example, the following two lines are identical:

average_accident_count = Person.calculate(:avg, :accident_count) average_accident_count = Person.average(:accident_count)

Both print out the average number of accidents for all rows .

Note  The calculate form takes the abbreviated version of the function name, such as avg for average. However, the shortcut form takes a longer version. For example, you could use either
Person.calculate(:avg, :age) or
Person.average(:age). This is confusing, but the idea is that the calculate form passes your function directly to your database , so you can use any statistical function defined in your database, whereas the convenience functions are fixed, so they can have easier to understand names.

You can also combine grouping and aggregate functions. For example, if you wanted to print the average accident count for each age, you could do so as follows:

Person.calculate(:avg, :accident_count, :group=>'age').each do |player|
  age, accident_count_average = *player
  puts "Average Accident Count #{'%0.3f' % accident_count_average} for age #{age}"

Note that the object passed to the block is an array. In the array, the grouped field comes first, followed by the calculated field. (If you wanted to group by more than one field, it would be in the order specified by the group parameter.)

Depending on your data, your results would look something like this:

Average Accident Count 3.000 for age 18 Average Accident Count 2.020 for age 19 Average Accident Count 1.010 for age 20
. . .

However, for more complex queries, you may need to craft a SQL statement manually. Specifically, you can use find_by_sql to search for any arbitrary SQL query, and this allows you to ask for virtually any type of information. For example, if a vehicle_model column existed in your Accident model, you could group by vehicle_model and run two aggregate functions: average age and average accident_count. This would give you an average owner age and an average number of accidents per vehicle, so you could get some idea whether, say, a Ford Explorer was a safer ride than a Honda CRX. (Of course, driver control would play a significant part, and hence the average
age is a helpful statistic.) You can't use multiple aggregate functions with calculate, but you can with find_by_sql, like this:

sql = "SELECT vehicle_model,
AVG(age) as average_age,
              AVG(accident_count) AS average_accident_count
FROM persons
           BY vehicle_model
Person.find_by_sql(sql).each do |row|
  puts "#{row.vehicle_model}, " <<
       "avg. age: #{row.average_age}, " <<
       "avg. accidents: #{row.average_accident_count}"

The output would look something like this:

Ford Explorer, avg. age: 43.010, avg. accidents: 0.400
Honda CRX, avg. age: 18.720, avg. accidents: 1.250
. . .

Let's put this knowledge to use with a more complicated example.

blog comments powered by Disqus

- Ruby-on-Rails Faces Second Security Flaw in ...
- Ruby 2.0 Prepped for February 2013 Release
- Why LinkedIn Switched from Ruby on Rails
- Adding Style with Action Pack
- Handling HTML in Templates with Action Pack
- Filters, Controllers and Helpers in Action P...
- Action Pack and Controller Filters
- Action Pack Categories and Events
- Logging Out, Events and Templates with Actio...
- Action Pack Sessions and Architecture
- More on Action Pack Partial Templates
- Action Pack Partial Templates
- Displaying Error Messages with the Action Pa...
- Action Pack Request Parameters
- Creating an Action Pack Registration Form

Watch our Tech Videos 
Dev Articles Forums 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us 
Weekly Newsletter
Developer Updates  
Free Website Content 
Contact Us 
Site Map 
Privacy Policy 

Developer Shed Affiliates


© 2003-2019 by Developer Shed. All rights reserved. DS Cluster - Follow our Sitemap
Popular Web Development Topics
All Web Development Tutorials