How to Tune a SQL Statement

published: 2010-06-27

One of my standard questions I ask when interviewing a candidate with some database or SQL know how is:

Assume you have a fairly complex select statement operating on some serious tables (tens of millions of rows). The customer complains the SQL statement is to slow. What options would you consider for tuning such a select statement? If it matters assume it is an Oracle database.

Just in case you want to apply for a job at a LINEAS here is, what I would consider a damn good answer:

First I check if the SQL statement is really slow, for that I'll execute the SQL statement with a low level tool (SQL Plus, SQL Developer ...) on the database. I fetch the same amount of rows, as the customer (or his application) does. I do that on the DB machine or one a machine with very good connectivity to the database, in order to take network issues out of the loop. I measure the execution time. If this is as slow as described by the customer, the SQL statement plus DB seem to be the problem. If not probably something else is the problem (network, application logic ...)

I establish a target performance, we need to meet the customer requirements.

I check the execution plan of the statement comparing it with the structure of the schema (tables, indexes, number of rows in tables compared to number of rows actually selected). Are there any obvious problems? For example;

Full Table Scans that result only in a small percentage of the rows of that table
Index accesses on a large percentage of rows
Indexes that don't get used, although one would expect them to
Identical expensive pieces of the execution plan, on different placesÂ of the execution plan
Estimated number of rows in the execution plan that differ significantly from the actual number of rows returned

Depending on what I find I would try some of the following actions, to improve performance

gather statistics
create an additional index supporting the where clauses used in the statement
add columns to an index, in order to remove the need for additional table accesses
use hints to force certain execution plans
exchange (not) in with (not) exists and vice versa
rearrange the select statement (e.g. turn a union ofÂ joins into a join of union)
partitioning of tables and indexes
use materialized views
in very special cases use bitmap indexes or dimensions

With these options the performance of most of the 'slow' statements I encountered so far could get brought to acceptable levels. Often improving performance by multiple orders of magnitude. When I am able to identify the problem, but not the solution Ask Tom is what I would do next.

There is one answer I get very often, which I consider pretty bad, if it comes without further comment:

Replace outer joins with inner joins

Outer joins have a different meaning than inner joins. Normally you can't replace one for the other. If you actually can, I'd consider using an outer join a bug, so replacing it with an inner join is bug fixing, not tuning.

Talks

Wan't to meet me in person to tell me how stupid I am? You can find me at the following events:

« Possibly the Most Important Skill for Managers, Coaches, Teachers and Parents « » Scrum is a Social Thing »