If you're working on analyzing your web site traffic, keep reading. This article will show you how to create a custom report to analyze web site traffic and sales, based on a high volume of data in Apache logs. It is excerpted from chapter 10 of the book Practical Reporting with Ruby on Rails, written by David Berube (Apress; ISBN: 1590599330).
Calculating Costs by Analyzing Apache Web Logs (Page 1 of 6 )
Atthe time of this writing, the Web is less than 20 years old. While that may seem like a lifetime in software terms, in business terms, the Web is still an infant, with countless new ventures being built around it.
Because the Web is such a new business concept, we’re still developing benchmarks for measuring the success of new approaches. Therefore, it can be difficult to determine exactly how well a fledgling startup is doing, particularly if it’s in a prerevenue phase. Of course, one way to measure success is through advertising revenues gained by traffic. However, many startups have not defined a plan to monetize their traffic. Rather, they hope to create a massive source of traffic and sell out to a large company, which will then devise a plan for deriving revenue from the traffic. As a result, the essential value of those startups is their web traffic, and analyzing that traffic—not to mention finding ways to increase it—is vitally important.
Logically, traffic plays a significant role beyond advertising results. For businesses selling a product or service, traffic is extremely important, because web-based sales are a function of traffic. Accordingly, it makes sense for these businesses to analyze traffic much in the same way brick-and-mortar stores analyze customer visitation and purchase patterns. While some of these questions can be answered by prepackaged tools like AWStats (http://awstats.sourceforge.net/), analyzing complex questions sometimes calls for custom reporting software.
In this chapter, you’ll see how to create a custom report to analyze web site traffic and sales, based on a high volume of data in Apache logs. For this example, you will use Gruff, which was introduced in Chapter 3, plus two new tools: ActiveRecord::Extensions and PDF::Writer. Let’s begin by looking at what these two tools can do for you.
Speeding Up Insertions with ActiveRecord::Extensions
ActiveRecord::Extensions is, as the name implies, a collection of extensions for Active Record. These are generally performance extensions. The extension you’ll use in this chapter’s example allows you to insert multiple rows of data at a time into a single table. This will speed up performance significantly and can reduce memory use, since you won’t need to create large numbers of objects. Since the example calls for inserting large numbers of records into a database from the Apache log, this performance boost will be significant.
ActiveRecord::Extensions is very easy to use. Let’s take a look at a brief example to see how it works. Suppose you want to insert some data into a table named webhosts, which has two columns: domain and description. You could use the following Active Record code:
Webhost.new(:domain=>'www.somecompany.example', :description=>'Some Company Site').save Webhost.new(:domain=>'www.teststore.example', :description=>'A Test Store').save Webhost.new(:domain=>'www.smallblog.example', :description=>'A Small Blog').save
You could also use the following ActiveRecord::Extensions code:
Webhost.import([:domain, :description], [['www.somecompany.example, 'Some Company Site'], ['www.teststore.example', 'A Test Store'], ['www.smallblog.example', 'A Small Blog']]
This method saves the time required to create the Active Record object, which can be very significant. It also runs just one query rather than three. In other words, the first Active Record example produces queries like this:
INSERT INTO webhosts (domain, description) ('www.somecompany.example', 'Some Company Site'); INSERT INTO webhosts (domain, description) ('www.teststore.example', 'A Test Store'); INSERT INTO webhosts (domain, description) ('www.smallblog.example', 'A Small Blog');
On the other hand, the ActiveRecord::Extensions code produces a query like this:
INSERT INTO webhosts (domain, description) ('www.somecompany.example', 'Some Company Site'), ('www.teststore.example', 'A Test Store'), ('www.smallblog.example', 'A Small Blog');
As you can see, the same data is inserted in three queries in the first example but in just one query in the second example. It won’t take very long to insert three records no matter what you do. However, in cases where you need to scale, and even more so when you need to insert large amounts of data at once, the time savings can be quite valuable.
You might initially think that a database like MySQL or PostgreSQL would take the same amount of time to insert a given number of rows, without regard to how many queries are used, but that’s not true. Significant overhead is associated with each additional query. In fact, as you’ll see when we examine the completed solution later in this chapter, the code that inserts Apache log data using this extension will perform at triple the speed of Active Record alone. If you have a more complex situation—with a number of keys or with a large amount of data being read at a time, for example—it will speed up processing even more.
Tip Actually, the fastest way to do MySQL inserts is by using the LOAD DATA INFILE statement. This requires your data to be on disk in CSV format, however. In some cases, you can convert data from another format, write it as CSV, and then load it with LOADDATA INFILE.This will still be faster than using separateINSERTstatements, but the additional complexity may not be worth it, particularly if you’re dealing with dynamic data from, say, a user filling out a form on a web site. Also,LOAD DATA INFILEhas the disadvantage of being MySQL-specific, and you need to place the CSV file on the server or else deal with some security issues. You can find more information about LOAD DATA INFILE, as well as on speeding upINSERTstatements (including the relatively obscure INSERT DELAYED statement) at http://dev.mysql.com/doc/refman/5.0/en/ insert-speed.html.