advancedPerformance100% Free

SQL Query Optimization

Definition

SQL query optimization is the process of rewriting queries and adjusting database settings to minimize execution time and resource consumption.

Introduction to SQL Query Optimization

Query optimization is both an art and a science. A poorly written query on a 10M-row table can take 30 seconds; the same data retrieved with an optimized query and the right indexes takes 5ms. This guide covers the most impactful techniques used by senior database engineers daily.

Syntax

SQL
-- Always check execution plan first:
EXPLAIN ANALYZE SELECT ...; -- PostgreSQL
EXPLAIN SELECT ...;          -- MySQL
SET STATISTICS IO ON;       -- SQL Server

Examples

1

Avoid SELECT * — Select Only Needed Columns

-- Slow: fetches all columns, more I/O and network
SELECT * FROM orders WHERE customer_id = 42;

-- Fast: fetches only needed columns
SELECT id, total_amount, status, created_at
FROM orders
WHERE customer_id = 42;

SELECT * transfers every column even if your application uses only 3. This wastes I/O, memory, and network bandwidth.

Try in Playground
2

Use EXISTS Instead of COUNT for Existence Checks

-- Slow: counts all matching rows
SELECT * FROM customers
WHERE (SELECT COUNT(*) FROM orders WHERE customer_id = customers.id) > 0;

-- Fast: stops at first match
SELECT * FROM customers
WHERE EXISTS (
  SELECT 1 FROM orders WHERE customer_id = customers.id
);

EXISTS short-circuits at the first match. COUNT(*) must scan all matching rows even when you only need a yes/no answer.

Try in Playground
3

Move Filtering to WHERE, Not HAVING

-- Slow: aggregates all rows, then filters
SELECT dept, COUNT(*)
FROM employees
GROUP BY dept
HAVING dept = 'Engineering';

-- Fast: filters before aggregation
SELECT dept, COUNT(*)
FROM employees
WHERE dept = 'Engineering'
GROUP BY dept;

WHERE filters rows before GROUP BY processes them, dramatically reducing the work done by aggregation.

Try in Playground

Common Mistakes

Using functions on columns in WHERE (e.g., WHERE YEAR(created_at) = 2024) — prevents index usage

Implicit type conversions in JOIN conditions (int vs varchar) — disables indexes

Not using bind parameters — prevents query plan caching

Using DISTINCT when JOIN logic should be fixed instead

Fetching more rows than needed — always use LIMIT when you only need top N results

Frequently Asked Questions

How do you identify slow SQL queries?
Use the database's slow query log: MySQL slow_query_log, PostgreSQL pg_stat_statements, or SQL Server Query Store. These capture queries exceeding a threshold (e.g., >100ms). Then use EXPLAIN/EXPLAIN ANALYZE on slow queries to understand their execution plan and find optimization opportunities.
What is an N+1 query problem in SQL?
The N+1 problem occurs when an application executes 1 query to fetch a list of N records, then executes N individual queries to fetch related data for each record. Fix it by using a single JOIN query to fetch all needed data at once, or use batch loading.
What does EXPLAIN ANALYZE show in SQL?
EXPLAIN ANALYZE executes the query and shows the actual execution plan including: which indexes were used, actual vs estimated row counts, time spent at each step, and join strategies. Look for 'Seq Scan' on large tables (needs an index), high 'actual rows' vs 'estimated rows' discrepancy (stale statistics), and expensive sort operations.
What is query plan caching in SQL databases?
Most databases cache the compiled execution plan for a query so subsequent executions skip the optimization phase. Use parameterized queries / prepared statements to maximize cache hits. Hardcoded literal values (WHERE id = 42) often bypass the cache, while bind parameters (WHERE id = $1) reuse the cached plan.

Related SQL Topics

Practice This in the SQL Playground

Write real queries, see live results, and master SQL Query Optimization hands-on. 100% free.