Focusing SQL Queries - The Proportions of Retrieved Data
(Page 4 of 4 )
A typical and frequently quoted saying is the famous “don’t use indexes when your query returns more than 10% of the rows of a table.” This states implicitly that (regular) indexes are efficient when an index key points to 10% or less of the rows in a table. As I have already pointed out in Chapter 3, this rule of thumb dates back to a time when relational databases were still regarded with suspicion in many companies. In those days, their use was mostly confined to that of departmental databases. This was a time when a 100,000–row table was considered a really big one. Compared to 10% of a 500 million–row table, 10% of 100,000 rows is a trifle. Can we seriously hope that the best execution plan in one case will still be the best execution plan in the other case? Such is wishful thinking.
Independently from the evolution of table sizes since the time when the “10% of rows” rule of thumb was first coined, be aware that the number of rows returned means nothing in itself, except in terms of response time expectations by end users. If you compute an average value over 1 billion rows, you return a single row, and yet the DBMS performs a lot of work. Even without any aggregation, what matters is the number of data pages that the DBMS is going to hit when performing the query. Data page hits don’t only depend on the existence of indexes: as you saw in Chapter 3, the relation of indexes to the physical order of rows in the table can make a significant difference in the number of pages to visit. Other implementation issues that I am going to discuss in Chapter 5 play an important part, too: depending on how data is physically stored, the same number of rows returned may mean that you have to visit massively different numbers of data pages. Furthermore, operations that would execute sequentially with one access path may be massively parallelized with another one. Don’t fall into the row percentage trap.
When we want a lot of data, we don’t necessarily want an index.
* A good example would be sqlite, a remarkable storage engine that allows the management of data inside a file using SQL, but that is not a database server.
* The optimizer may also sometimes push criteria down into the view.
* A feature known as skip-scan may allow for searching the index.
| DISCLAIMER: The content provided in this article is not warranted or guaranteed by Developer Shed, Inc. The content provided is intended for entertainment and/or educational purposes in order to introduce to the reader key ideas, concepts, and/or product reviews. As such it is incumbent upon the reader to employ real-world tactics for security and implementation of best practices. We are not liable for any negative consequences that may result from implementing any information covered in our articles or tutorials. If this is a hardware review, it is not recommended to open and/or modify your hardware. |
|
This article is excerpted from chapter four of The Art of SQL, written by Stephane Faroult and Peter Robson (O'Reilly; ISBN: 0596008945). Check it out today at your favorite bookstore. Buy this book now.
|
|