intermediatePerformanceFree Guide

Database Indexing Explained

Definition

A database index is a separate data structure that stores a sorted subset of table column values with pointers to full rows, enabling fast data retrieval without full table scans.

Introduction to Database Indexing Explained

Indexing is the most impactful performance optimization in relational databases. Without indexes, every query that searches by a non-primary-key column must read every row in the table — a full table scan. With the right index, the database can jump directly to matching rows in O(log n) time using a B-tree structure.

Key Takeaways

Indexes trade write performance for read performance — every INSERT/UPDATE/DELETE must update all indexes
B-tree indexes are the default in most databases and support range queries (<, >, BETWEEN)
The primary key is always indexed; foreign keys should almost always be indexed
Composite indexes follow the leftmost prefix rule
Covering indexes eliminate the need to access the actual table rows
EXPLAIN/EXPLAIN ANALYZE shows whether your query uses indexes

Real-World Examples & SQL Schema

Create a Simple Index

-- Without index: 1M row table scan in ~500ms
SELECT * FROM users WHERE email = 'alice@example.com';

-- Create index:
CREATE INDEX idx_users_email ON users(email);

-- Same query with index: ~1ms
SELECT * FROM users WHERE email = 'alice@example.com';

A B-tree index on email reduces lookup from O(n) to O(log n).

Run code in Playground

Composite Index — Leftmost Prefix Rule

CREATE INDEX idx_orders ON orders(customer_id, status, created_at);

-- Uses index (leftmost columns):
SELECT * FROM orders WHERE customer_id = 5;
SELECT * FROM orders WHERE customer_id = 5 AND status = 'pending';

-- Does NOT use index (skips customer_id):
SELECT * FROM orders WHERE status = 'pending';

Composite indexes work left-to-right. Design column order based on query patterns.

Run code in Playground

Primary Use Cases

Columns frequently used in WHERE clauses

Foreign key columns used in JOINs

Columns used in ORDER BY for sort-heavy queries

Columns with high cardinality (many unique values)

Columns in GROUP BY for aggregate queries

Frequently Asked Questions

How does a database index work internally?

Most database indexes are B-trees (balanced trees). The B-tree stores column values in sorted order with pointers to the table rows. When you search WHERE col = 'value', the database traverses the B-tree from root to leaf in O(log n) comparisons, then follows the row pointer to retrieve the full record. This is dramatically faster than reading every row.

How many indexes should a table have?

There's no universal rule, but a good guideline: index every foreign key, index columns frequently used in WHERE/JOIN/ORDER BY, and avoid over-indexing write-heavy tables. Most OLTP tables have 3-8 indexes. Run EXPLAIN on your most frequent and slowest queries to identify what indexes are needed.

Do indexes slow down INSERT and UPDATE?

Yes. Every index on a table must be updated when data changes. A table with 10 indexes requires 10 index updates per INSERT. This is why write-heavy tables should have fewer indexes. The tradeoff: optimize for reads (more indexes) or writes (fewer indexes) based on your workload pattern.

Related Concepts

Query Optimization Transactions Normalization Replication