How We Debugged a 10-Second Database Query Down to 50ms
A step-by-step walkthrough of how we diagnosed and fixed a crippling PostgreSQL performance issue in production, using EXPLAIN ANALYZE, proper indexing, and query restructuring.
The Problem: Dashboard That Took 10 Seconds to Load
A client's analytics dashboard was taking 10+ seconds to load. Users were abandoning it. The CTO was panicking. The backend team said 'it works fine locally.' Sound familiar?
Here's exactly how we traced the issue, fixed it, and brought the response time down to 50ms. No magic — just systematic debugging.
Step 1: Find the Slow Query
Before guessing, we enabled PostgreSQL's slow query log to catch anything over 500ms:
-- In postgresql.conf
log_min_duration_statement = 500
-- Or for a single session:
SET log_min_duration_statement = 500;The culprit was immediately obvious — a single query joining 4 tables was taking 10.2 seconds:
SELECT o.id, o.total, c.name, c.email,
COUNT(oi.id) as item_count,
SUM(oi.quantity * oi.unit_price) as calculated_total
FROM orders o
JOIN customers c ON c.id = o.customer_id
JOIN order_items oi ON oi.order_id = o.id
JOIN products p ON p.id = oi.product_id
WHERE o.created_at >= '2025-01-01'
AND o.status IN ('completed', 'shipped')
AND c.region = 'asia-pacific'
GROUP BY o.id, o.total, c.name, c.email
ORDER BY o.created_at DESC
LIMIT 50;Step 2: EXPLAIN ANALYZE — See What PostgreSQL Actually Does
Running EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) revealed the real problem:
Seq Scan on orders o (cost=0.00..185432.00 rows=892145 width=48)
Filter: (created_at >= '2025-01-01' AND status = ANY(...))
Rows Removed by Filter: 3241876
Buffers: shared hit=12 read=142891Sequential scan on 4 million rows. PostgreSQL was reading the entire orders table from disk because there was no index on created_at or status. The Buffers: read=142891 confirmed massive disk I/O.
Step 3: The Fix — Composite Index + Partial Index
A single column index on created_at would help, but a composite index targeting this exact query pattern is far better:
-- Composite index matching the WHERE clause
CREATE INDEX CONCURRENTLY idx_orders_status_created
ON orders (status, created_at DESC)
WHERE status IN ('completed', 'shipped');
-- Index on the foreign key join (often missed!)
CREATE INDEX CONCURRENTLY idx_order_items_order_id
ON order_items (order_id)
INCLUDE (quantity, unit_price); -- covering indexThe WHERE status IN (...) on the index makes it a partial index — it only indexes rows we actually query, keeping it small and fast. The INCLUDE on the second index makes it a covering index — PostgreSQL can answer the query from the index alone without touching the table.
Step 4: Query Restructuring with CTE
The original query calculated aggregates across the entire result set before applying LIMIT. We restructured it to filter first, then aggregate:
WITH recent_orders AS (
SELECT o.id, o.total, o.customer_id, o.created_at
FROM orders o
WHERE o.created_at >= '2025-01-01'
AND o.status IN ('completed', 'shipped')
ORDER BY o.created_at DESC
LIMIT 50
)
SELECT ro.id, ro.total, c.name, c.email,
COUNT(oi.id) as item_count,
SUM(oi.quantity * oi.unit_price) as calculated_total
FROM recent_orders ro
JOIN customers c ON c.id = ro.customer_id
JOIN order_items oi ON oi.order_id = ro.id
GROUP BY ro.id, ro.total, c.name, c.email, ro.created_at
ORDER BY ro.created_at DESC;This fetches only 50 orders first, then joins. Instead of aggregating across millions of rows, we're joining against 50.
The Result
| Metric | Before | After |
|---|---|---|
| Query time | 10,200ms | 47ms |
| Rows scanned | 4,134,021 | 50 |
| Disk reads | 142,891 buffers | 12 buffers |
| Index used | None (seq scan) | idx_orders_status_created |
Lessons Learned
1. Always check EXPLAIN ANALYZE in production, not just locally. Local databases have warm caches and small datasets.
2. Foreign key columns need indexes. Django/Rails create them automatically, but raw SQL schemas often miss them.
3. LIMIT doesn't prevent full table scans when combined with GROUP BY. Restructure the query to filter first.
4. Use CONCURRENTLY when creating indexes in production to avoid locking the table.
The most expensive query is the one nobody profiled. Make EXPLAIN ANALYZE part of your code review process for any query touching more than 10K rows.
— alokknight Engineering
