Home arrow SQL arrow Page 4 - Complex SQL Queries

Complex SQL Queries

In this second part to a three-part series that covers the SQL query, you'll learn what factors you must take into account for complex queries. This article is excerpted from chapter four of The Art of SQL, written by Stephane Faroult and Peter Robson (O'Reilly; ISBN: 0596008945). Copyright © 2007 O'Reilly Media, Inc. All rights reserved. Used with permission from the publisher. Available from booksellers or direct from O'Reilly Media.

Author Info:
By: O'Reilly Media
Rating: 3 stars3 stars3 stars3 stars3 stars / 93
January 10, 2008
  1. · Complex SQL Queries
  2. · Number of Other Users
  3. · Evaluation of Filtering Conditions
  4. · Buyers of Batmobiles

print this article

Complex SQL Queries - Buyers of Batmobiles
(Page 4 of 4 )

Assume that we have four tables, namely customers,orders,orderdetail, and a table ofarticles, as shown in Figure 4-5. Please note that in the figure the sizes of the boxes representing each table are more or less proportional to the volume of data in each table, not simply to the number of columns. Primary key columns are underlined.

Figure 4-5.  A classical order schema

Let’s now suppose that our SQL problem is to find the names of all the customers living in the city named “Gotham” who have ordered the article called “Batmobile” during the last six months. We have, of course, several ways to formulate this query; the following is probably what an ANSI SQL fan would write:

  select distinct c.custname
  from customers c
       join orders o
         on o.custid = c.custid
       join orderdetail od
         on od.ordid = o.ordid
       join articles a
         on a.artid = od.artid
where c.city = 'GOTHAM'
and a.artname = 'BATMOBILE'
and o.ordered >= somefunc

somefunc  is supposed to be a function that returns the date six months prior to the current date. Notice too, the presence ofdistinct, which may be required if one of our customers is an especially heavy consumer of Batmobiles and has recently ordered several of them.

Let’s forget for a while that the optimizer may rewrite the query, and look at the execution plan such a statement suggests. First, we walk thecustomerstable, keeping only rows for which the city happens to be Gotham. Then we search theorderstable, which means that thecustidcolumn there had better be indexed, because otherwise the only hope the SQL engine has of executing the query reasonably fast is to perform some sorting and merging or to scan theorderstable to build a hash table and then operate against that. We are going to apply another filter at this level, against the order date. A clever optimizer will not mind finding the filtering condition in thewhereclause and will understand that in order to minimize the amount of data to join it must filter on the date before performing the join. A not so clever optimizer might be tempted to join first, and then filter, and may therefore be grateful to you for specifying the filtering condition with the join condition, as follows:

  join orders o
    on o.custid = c.custid
    and a.ordered >= somefunc

Even if the filtering condition really has nothing to do with the join, it is sometimes difficult for the optimizer to understand when that is the case. If the primary key oforderdetail is defined as(ordid, artid)then, becauseordid is the first attribute of the index, we can make use of that index to find the rows associated with an order as in Chapter 3. But if the primary key happens to be(artid, ordid)(and note, either version is exactly the same as far as relational theory is concerned), then tough luck. Some products may be able to make some use of the index* in that case, but it will not provide the efficient access that(ordid, artid)would have allowed. Other products will be totally unable to use the index. The only circumstance that may save us is the existence of a separate index onordid.

Once we have linkedorderdetails toorders, we can proceed toarticles--without any problem this time since we foundartid, the primary key, inorderdetail. Finally, we can check whether the value inarticles is or is not a Batmobile. Is this the end of the story? Not quite. As instructed bydistinct, we must now sort the resulting set of customer names that have passed across all the filtering layers so as to eliminate duplicates.

It turns out that there are several alternative ways of expressing the query that I’ve just described. One example is to use the older join syntax, as follows:

  select distinct c.custname
  from customers c,
orders o,
orderdetail od,
articles a
where c.city = 'GOTHAM'
and c.custid = o.custid
and o.ordid = od.ordid
and od.artid = a.artid
and a.artname = 'BATMOBILE'
and o.ordered >= somefunc

It may just be old habits dying hard, but I prefer this older way, if only for one simple reason: it makes it slightly more obvious that from a logical point of view the order in which we process data is arbitrary, because the same data will be returned irrespective of the order in which we inspect tables. Certainly thecustomers tableis particularly important, since that is the source from which we obtain the data that is ultimately required, while in this very specific context, all the other tables are used purely to support the remaining selection processes. One really has to understand that there is no one recipe that works for all cases. The pattern of table joins will vary for each situation you encounter. The deciding factor is the nature of the data you are dealing with.

A given approach in SQL may solve one problem, but make another situation worse. The way queries are written is a bit like a drug that may heal one patient but kill another.

Please check back next week for the conclusion to this article.

DISCLAIMER: The content provided in this article is not warranted or guaranteed by Developer Shed, Inc. The content provided is intended for entertainment and/or educational purposes in order to introduce to the reader key ideas, concepts, and/or product reviews. As such it is incumbent upon the reader to employ real-world tactics for security and implementation of best practices. We are not liable for any negative consequences that may result from implementing any information covered in our articles or tutorials. If this is a hardware review, it is not recommended to open and/or modify your hardware.

blog comments powered by Disqus

- Focusing SQL Queries
- Complex SQL Queries
- A Close Look at the SQL Query
- Generating Reports with SQL Date Handling
- Creating SQL Reports Based on Date Criteria
- SQL Date Handling and Data Trends
- Date Handling
- Introduction to SQL
- Lies, Damn Lies, Statistics, and SQL

Watch our Tech Videos 
Dev Articles Forums 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us 
Weekly Newsletter
Developer Updates  
Free Website Content 
Contact Us 
Site Map 
Privacy Policy 

Developer Shed Affiliates


© 2003-2018 by Developer Shed. All rights reserved. DS Cluster - Follow our Sitemap
Popular Web Development Topics
All Web Development Tutorials