### Slicing Datasets with SQL Ranking Window Functions In complex business analytics, you frequently need to rank items, segment customers, or classify employees based on numeric metrics (like revenue, scorecards, or salary pools). To do this dynamically without altering your database tables, SQL provides a set of powerful analytical operators called **Ranking Window Functions**: `ROW_NUMBER()`, `RANK()`, `DENSE_RANK()`, and `NTILE()`. Let's break down exactly how they work, their subtle mathematical differences, and how they behave inside partition boundaries! ---\n ### 👑 Meet the Four Ranking Sovereigns To understand how each function behaves, imagine three employees who all earn the exact same salary of **$80,000**: 1. **`ROW_NUMBER()`** (Sequential Counter): Assigns a strictly sequential, unique integer to every row starting from `1`. It has **no duplicate ranks**, even if values are identical. The sorting order of duplicates is non-deterministic unless you provide an additional tie-breaker sorting column. * *Result for our employees:* `1`, `2`, `3` 2. **`RANK()`** (Olympic-Style ranking with gaps): Assigns the same rank to identical values. However, if there's a tie, it **skips subsequent ranks**. The skip size equals the number of tied elements. This behaves exactly like sports rankings (if two runners tie for Gold, the next runner gets Bronze, and Silver is skipped). * *Result for our employees:* `1`, `1`, `1`, `4` (the next row is fourth!) 3. **`DENSE_RANK()`** (Olympic-Style ranking without gaps): Assigns the same rank to identical values, but it **never skips subsequent ranks**. The next distinct value receives the immediately following integer. * *Result for our employees:* `1`, `1`, `1`, `2` (the next row is second!) 4. **`NTILE(N)`** (Bracket Segmenter): Divides the sorted result set into `N` roughly equal groups (brackets) and assigns the bracket number (`1` to `N`) to each row. This is exceptionally useful for calculating percentiles, quartiles, or segmenting customers into high, medium, and low cohorts. * *Result with `NTILE(3)`:* Segments staff into Top Third (`1`, Middle Third (`2`), and Bottom Third (`3`). ---\n ### 🧱 Mastering the `PARTITION BY` Boundary Clause By default, window functions calculate rankings across the entire active dataset. If you want to compute rankings **separately within each department**, add the **`PARTITION BY`** clause before the `ORDER BY`: ```sql -- Calculate employee rank inside their own department SELECT name, department_id, salary, DENSE_RANK() OVER ( PARTITION BY department_id ORDER BY salary DESC ) as dept_salary_rank FROM employees; ``` #### How the Engine Computes This: 1. **Splits** the table into virtual sub-tables (one per unique `department_id`). 2. **Sorts** rows inside each sub-table separately by salary in descending order. 3. **Ranks** rows inside each sub-table, starting over at `1` for every department! ---\n ### ⚠️ Common Pitfalls for Beginners > [!CAUTION] > **Using Window Functions in the WHERE Clause:** > Attempting to write `SELECT * FROM employees WHERE RANK() OVER(...) <= 3;` will cause a compile error. Window functions are computed during the SELECT phase, *after* the WHERE clause has already filtered rows. To filter on rankings, wrap your query inside a **Common Table Expression (CTE)** or subquery! > > ```sql > WITH ranked_staff AS ( > SELECT name, salary, DENSE_RANK() OVER (ORDER BY salary DESC) as rank > FROM employees > ) > SELECT * FROM ranked_staff WHERE rank <= 3; > ```