Home arrow SQL arrow Page 2 - Focusing SQL Queries
SQL

Focusing SQL Queries


In this conclusion to a three-part series that examines the SQL query, you will learn some alternative ways of phrasing a query, and how to query large quantities of data. This article is excerpted from chapter four of The Art of SQL, written by Stephane Faroult and Peter Robson (O'Reilly; ISBN: 0596008945). Copyright © 2006 O'Reilly Media, Inc. All rights reserved. Used with permission from the publisher. Available from booksellers or direct from O'Reilly Media.

Author Info:
By: O'Reilly Media
Rating: 4 stars4 stars4 stars4 stars4 stars / 26
January 17, 2008
TABLE OF CONTENTS:
  1. · Focusing SQL Queries
  2. · Lessons to be learned from the Batmobile trade
  3. · Querying Large Quantities of Data
  4. · The Proportions of Retrieved Data

print this article
SEARCH DEVARTICLES

Focusing SQL Queries - Lessons to be learned from the Batmobile trade
(Page 2 of 4 )

The various examples of SQL that you saw in the preceding section may look like an idle exercise in programming dexterity, but they are more than that. The key point is that there are many different ways in which we can attack the data, and that we don’t necessarily have to go first through customers, then orders, thenorderdetail, and thenarticles as some of the ways of writing the query might suggest.

If we represent the strength of our search criteria with arrows—the more discriminant the criterion, the larger the arrow—we can assume that we have very few customers in Gotham, but that we sell quite a number of Batmobiles and business has been brisk for the past six months, in which case our battle map may look like Figure 4-6. Although we have a condition on the article name, the medium arrow points toorderdetailbecause that is what truly matters. We may have very few articles for sale, which may represent similar percentages of our revenue, or we may have a huge number of articles, of which one of the best sellers is the Batmobile.


Figure 4-6.  When query discrimination is based on location

Alternatively, we can assume that most of our customers are indeed based in Gotham, but that very few actually buy Batmobiles, in which case our battle plan will look more like Figure 4-7. It is quite obvious then, that we really have to cut to pieces theorderdetailtable, which is the largest one. The faster we slash this table, the faster our query will run.


Figure 4-7.  When query discrimination is based on purchase

Note also—and this is a very important point—that the criterion “during the last six months” is not a very precise one. But what if we change the criterion to specify the last two months and happen to have 10 years of sales history online? In that case, it may be more efficient to get to those recent orders first—which, thanks to some techniques described in Chapter 5, may be clustered together—and then start from there, selecting customers from Gotham, on the one hand, and orders for Batmobiles on the other. To put it another way, the best execution plan does not only depend on the data values, it may also evolve over time.

What then can we conclude from all this? First, that there is more than one way to skin a cat...and that an expression of a query is usually associated with implicit assumptions about the data. With each different expression of a query we will obtain the same result set, but it may be at significantly different speeds. The way we write the query may influence the execution path, especially when we have to apply criteria that cannot be expressed within the truly relational part of the environment. If the optimizer is to be allowed to function at its best, we must try to maximize the amount of true relational processing and ensure the non-relational component has minimum impact on the final result.

We have assumed all along in this chapter that statements will be run as suggested by the way they are written. Be aware though, that an optimizer may rewrite queries—sometimes pretty aggressively. You could argue that rewrites by the optimizer don’t matter, because SQL is supposed to be a declarative language in which you state what you want and let the DBMS provide it. However, you have seen that each time we have rewritten a query in a different way, we have had to change assumptions about the distribution of data and about existing indexes. It is highly important, therefore, to anticipate the work of the optimizer to be certain that it will find what it needs, whether in terms of indexes or in terms of detailed-enough statistical information about the data.

The correct result from an SQL statement is only the first step in building the best SQL.


blog comments powered by Disqus
SQL ARTICLES

- Focusing SQL Queries
- Complex SQL Queries
- A Close Look at the SQL Query
- Generating Reports with SQL Date Handling
- Creating SQL Reports Based on Date Criteria
- SQL Date Handling and Data Trends
- Date Handling
- Introduction to SQL
- Lies, Damn Lies, Statistics, and SQL

Watch our Tech Videos 
Dev Articles Forums 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us 
Weekly Newsletter
 
Developer Updates  
Free Website Content 
Contact Us 
Site Map 
Privacy Policy 
Support 

Developer Shed Affiliates

 




© 2003-2017 by Developer Shed. All rights reserved. DS Cluster - Follow our Sitemap
Popular Web Development Topics
All Web Development Tutorials