Focusing SQL Queries - Lessons to be learned from the Batmobile trade (Page 2 of 4 )
The various examples of SQL that you saw in the preceding section may look like an idle exercise in programming dexterity, but they are more than that. The key point is that there are many different ways in which we can attack the data, and that we don’t necessarily have to go first through customers, then orders, thenorderdetail, and thenarticles as some of the ways of writing the query might suggest.
If we represent the strength of our search criteria with arrows—the more discriminant the criterion, the larger the arrow—we can assume that we have very few customers in Gotham, but that we sell quite a number of Batmobiles and business has been brisk for the past six months, in which case our battle map may look like Figure 4-6. Although we have a condition on the article name, the medium arrow points toorderdetailbecause that is what truly matters. We may have very few articles for sale, which may represent similar percentages of our revenue, or we may have a huge number of articles, of which one of the best sellers is the Batmobile.
Figure 4-6. When query discrimination is based on location
Alternatively, we can assume that most of our customers are indeed based in Gotham, but that very few actually buy Batmobiles, in which case our battle plan will look more like Figure 4-7. It is quite obvious then, that we really have to cut to pieces theorderdetailtable, which is the largest one. The faster we slash this table, the faster our query will run.
Figure 4-7.When query discrimination is based on purchase
Note also—and this is a very important point—that the criterion “during the last six months” is not a very precise one. But what if we change the criterion to specify the last two months and happen to have 10 years of sales history online? In that case, it may be more efficient to get to those recent orders first—which, thanks to some techniques described in Chapter 5, may be clustered together—and then start from there, selecting customers from Gotham, on the one hand, and orders for Batmobiles on the other. To put it another way, the best execution plan does not only depend on the data values, it may also evolve over time.
What then can we conclude from all this? First, that there is more than one way to skin a cat...and that an expression of a query is usually associated with implicit assumptions about the data. With each different expression of a query we will obtain the same result set, but it may be at significantly different speeds. The way we write the query may influence the execution path, especially when we have to apply criteria that cannot be expressed within the truly relational part of the environment. If the optimizer is to be allowed to function at its best, we must try to maximize the amount of true relational processing and ensure the non-relational component has minimum impact on the final result.
We have assumed all along in this chapter that statements will be run as suggested by the way they are written. Be aware though, that an optimizer may rewrite queries—sometimes pretty aggressively. You could argue that rewrites by the optimizer don’t matter, because SQL is supposed to be a declarative language in which you state what you want and let the DBMS provide it. However, you have seen that each time we have rewritten a query in a different way, we have had to change assumptions about the distribution of data and about existing indexes. It is highly important, therefore, to anticipate the work of the optimizer to be certain that it will find what it needs, whether in terms of indexes or in terms of detailed-enough statistical information about the data.
The correct result from an SQL statement is only the first step in building the best SQL.