Query Optimization: How a Database Turns SQL Into a Fast Execution Plan (Visualized)
When you run a SQL query, the database does not execute it literally โ a query optimizer rewrites it into the cheapest execution plan it can find. This guide explains the planner, EXPLAIN, cost-based optimization, scan and join algorithms, predicate pushdown, covering indexes, and common anti-patterns โ with live animations of each idea.
Query optimization is the process by which a database's query planner transforms a declarative SQL statement into the cheapest physical execution plan that produces the correct result. SQL only says what data you want; it never says how to fetch it. The optimizer's entire job is to invent the how.
The same query can be executed in dozens of ways: scan a whole table or jump through an index, join tables in one order or another, use a nested loop or build a hash table. Each choice has a wildly different cost. A good plan can return in milliseconds; a bad plan for the identical query can take minutes. Understanding the optimizer is the difference between a database that scales and one that falls over under load.
The Query Planner: From SQL to a Plan Tree
Before any rows are read, the database runs a pipeline. (1) Parse the SQL text into a syntax tree. (2) Rewrite it โ expand views, flatten subqueries, apply constant folding. (3) Plan/optimize โ enumerate candidate execution strategies and estimate the cost of each. (4) Execute the winning plan. The output of step 3 is a plan tree: a tree of physical operators (scans, joins, sorts, aggregates) where rows flow up from the leaves to the root.
In PostgreSQL the cost-based optimizer is built around a search over join orders and access methods. MySQL's optimizer plays the same game with a different search strategy. Either way, the planner is choosing a tree, and you can see that tree with EXPLAIN.
Reading EXPLAIN and EXPLAIN ANALYZE
EXPLAIN prints the plan the optimizer chose, with estimated row counts and costs โ without running the query. EXPLAIN ANALYZE actually executes it and reports real timings and row counts alongside the estimates. The gap between estimated and actual rows is the single most useful debugging signal you have: if the planner thinks an operation returns 5 rows but it returns 5 million, its statistics are stale and every downstream choice is suspect.
EXPLAIN ANALYZE
SELECT o.id, o.total
FROM orders o
JOIN customers c ON c.id = o.customer_id
WHERE c.country = 'DE'
AND o.created_at >= '2024-01-01';
-- Nested Loop (cost=0.42..douple rows=120 width=16)
-- (actual time=0.03..1.21 rows=118 loops=1)
-- -> Index Scan using customers_country_idx on customers c
-- -> Index Scan using orders_customer_id_idx on orders o
-- Planning Time: 0.31 ms
-- Execution Time: 1.34 msRead the tree from the inside out and bottom up: the indented children run first and feed their parent. The animation below shows rows streaming up a plan tree, with each operator lighting up as it processes its input.
Cost-Based Optimization and Statistics
Modern optimizers are cost-based: they assign an estimated cost to every candidate plan and pick the cheapest. Cost is a unitless number combining estimated disk page reads, CPU work per row, and (for some operators) memory and startup cost. To estimate how many rows an operator emits โ its selectivity โ the planner relies on statistics the database collects about each column: row counts, the number of distinct values, histograms of value distribution, and most-common-value lists.
These statistics are gathered by ANALYZE in PostgreSQL or ANALYZE TABLE in MySQL, usually automatically. When they go stale โ after a bulk load, a big delete, or skewed growth โ the planner estimates badly and picks bad plans. The classic fix for a query that suddenly got slow is to re-run ANALYZE so the optimizer sees reality again.
Index Scan vs Sequential Scan
The most fundamental access decision is how to read a table. A sequential scan reads every row and checks the predicate on each โ O(n) work, but cheap per row because it reads pages in order. An index scan walks a B-tree to jump straight to matching rows โ O(log n) to locate, but each match may need a random fetch into the table heap.
Counter-intuitively, an index is not always faster. If a predicate matches most of the table, a sequential scan wins because random index lookups cost more than reading pages in bulk. The optimizer uses selectivity to decide: high selectivity (few matching rows) favors the index; low selectivity (many matches) favors the seq scan. The animation contrasts the two on the same dataset.
Join Algorithms: Nested Loop, Hash, Merge
When a query joins two tables, the optimizer picks one of three physical join strategies. Nested loop join takes each row of the outer table and probes the inner table for matches โ cheap when the outer side is tiny or the inner side has an index, but quadratic if both sides are large. Hash join builds an in-memory hash table from the smaller table, then streams the larger one through it โ excellent for big, unsorted, equality joins. Merge join sorts both inputs on the join key and walks them in lockstep โ ideal when inputs are already sorted (for example, by an index).
| Join | How it works | Cost (n outer, m inner) | Best when |
|---|---|---|---|
| Nested loop | For each outer row, probe inner | O(n ร m), or O(n log m) with index | Small outer side, indexed inner |
| Hash join | Build hash table on smaller side, probe with larger | O(n + m) | Large unsorted equality joins |
| Merge join | Sort both, walk in lockstep | O(n log n + m log m), or O(n + m) if pre-sorted | Inputs already sorted on key |
The animation below contrasts a nested loop (re-scanning the inner side for every outer row) with a hash join (one build pass, then O(1) probes). Watch how the comparison counter explodes for the nested loop while the hash join stays linear.
Predicate Pushdown
Predicate pushdown is an optimization that moves filters as close to the data source as possible โ ideally into the scan itself, before rows are joined, sorted, or shipped across a network. Filtering 10 million rows down to 200 at the scan means every operator above it does dramatically less work. The same idea powers columnar and distributed engines, where pushing predicates into the storage layer (or a remote shard) avoids reading and transferring data you will only discard.
Covering Indexes
A covering index contains every column a query needs, so the database answers the query from the index alone and never touches the table heap. This is called an index-only scan. Because the heap fetch (the random I/O that often dominates an index scan) disappears, covering indexes can turn a slow query fast. In PostgreSQL you add non-key columns with INCLUDE; in MySQL's InnoDB, the secondary index already stores the primary key, and you extend it by adding columns to the index definition.
-- Without covering: index finds rows, then heap fetch for o.total
CREATE INDEX ix_orders_customer ON orders (customer_id);
-- Covering: total is in the index, so an index-only scan suffices
CREATE INDEX ix_orders_customer_cov
ON orders (customer_id) INCLUDE (total);
SELECT total FROM orders WHERE customer_id = 42;
-- Index Only Scan using ix_orders_customer_cov (Heap Fetches: 0)Common Anti-Patterns
N+1 queries: an application loads a list with one query, then fires one more query per row to fetch related data โ 1 + N round trips where a single join or batched IN would do. ORMs cause this silently; the fix is eager loading or a join. SELECT *: fetching every column defeats covering indexes, ships unused bytes, and breaks when the schema changes โ select only the columns you need.
Non-sargable predicates: a predicate is sargable (Search ARGument ABLE) if the database can use an index for it. Wrapping a column in a function โ WHERE LOWER(email) = 'a@b.com' or WHERE created_at::date = '2024-01-01' โ forces a full scan because the index is built on the raw column, not the transformed value. Rewrite to a range (created_at >= '2024-01-01' AND created_at < '2024-01-02') or add an expression index on the transformation.
How to Tune a Slow Query
Run EXPLAIN ANALYZE and look for three things: a large gap between estimated and actual rows (stale statistics โ run ANALYZE), a sequential scan where you expected an index (missing or unusable index, often a non-sargable predicate), and a join algorithm that mismatches the data sizes (often a symptom of bad row estimates). Fix the estimate first; the plan usually fixes itself.
Frequently Asked Questions
Why does the optimizer sometimes ignore my index?
Usually because it estimates the predicate is not selective enough โ if a query matches a large fraction of the table, a sequential scan is genuinely cheaper than thousands of random index lookups. Other causes are stale statistics (run ANALYZE), a non-sargable predicate that wraps the column in a function, or a type mismatch between the column and the literal. Check EXPLAIN ANALYZE to see the estimated vs actual rows.
What is the difference between EXPLAIN and EXPLAIN ANALYZE?
EXPLAIN shows the chosen plan with the optimizer's estimates and does not run the query. EXPLAIN ANALYZE actually executes the query and reports the real timing and row counts next to the estimates, so you can spot where the planner guessed wrong. Use ANALYZE for debugging, but remember it runs the query โ avoid it on expensive writes.
Does adding more indexes always make queries faster?
No. Each index speeds up matching reads but slows down every insert, update, and delete because the index must be maintained, and it consumes storage and memory. Too many indexes also give the optimizer more plans to consider and more chances to choose wrong. Index the columns your real queries filter and join on, prefer composite and covering indexes over many single-column ones, and drop indexes that EXPLAIN never uses.
SQL tells the database what you want; the optimizer decides how to get it. Learn to read the plan, and a slow query stops being a mystery and becomes a checklist.
โ alokknight Engineering
