One of my standard questions I ask when interviewing a candidate with some database or SQL know how is:
Assume you have a fairly complex select statement operating on some serious tables (tens of millions of rows). The customer complains the SQL statement is to slow. What options would you consider for tuning such a select statement? If it matters assume it is an Oracle database.
Just in case you want to apply for a job at a LINEAS here is, what I would consider a damn good answer:
First I check if the SQL statement is really slow, for that I’ll execute the SQL statement with a low level tool (SQL Plus, SQL Developer …) on the database. I fetch the same amount of rows, as the customer (or his application) does. I do that on the DB machine or one a machine with very good connectivity to the database, in order to take network issues out of the loop. I measure the execution time. If this is as slow as described by the customer, the SQL statement plus DB seem to be the problem. If not probably something else is the problem (network, application logic …)
I establish a target performance, we need to meet the customer requirements.
I check the execution plan of the statement comparing it with the structure of the schema (tables, indexes, number of rows in tables compared to number of rows actually selected). Are there any obvious problems? For example;
- Full Table Scans that result only in a small percentage of the rows of that table
- Index accesses on a large percentage of rows
- Indexes that don’t get used, although one would expect them to
- Identical expensive pieces of the execution plan, on different places of the execution plan
- Estimated number of rows in the execution plan that differ significantly from the actual number of rows returned
Depending on what I find I would try some of the following actions, to improve performance
- gather statistics
- create an additional index supporting the where clauses used in the statement
- add columns to an index, in order to remove the need for additional table accesses
- use hints to force certain execution plans
- exchange (not) in with (not) exists and vice versa
- rearrange the select statement (e.g. turn a union of joins into a join of union)
- partitioning of tables and indexes
- use materialized views
- in very special cases use bitmap indexes or dimensions
With these options the performance of most of the ‘slow’ statements I encountered so far could get brought to acceptable levels. Often improving performance by multiple orders of magnitude. When I am able to identify the problem, but not the solution Ask Tom is what I would do next.
There is one answer I get very often, which I consider pretty bad, if it comes without further comment:
Replace outer joins with inner joins
Outer joins have a different meaning than inner joins. Normally you can’t replace one for the other. If you actually can, I’d consider using an outer join a bug, so replacing it with an inner join is bug fixing, not tuning.