How a Regional Bank Rebuilt Its Credit Risk Audit System Using SQL — A Real-World Case Study

Q: What is the core focus of the study 'How a Regional Bank Rebuilt Its Credit Risk Audit System Using SQL — A Real-World Case Study'?

Discover how a mid-sized regional bank rearchitected its credit risk audit database from a flat reporting structure into a normalized, queryable system built for performance.

Q: Which database schema is recommended to practice this study?

To practice the queries and concepts in this article, we recommend using the 'apexbank' database schema available in the SQLMarrow playground.

# How a Regional Bank Rebuilt Its Credit Risk Audit System Using SQL — A Real-World Case Study *SQLMarrow Case Studies · Advanced · 10 min read* --- There's a moment every database engineer dreads: someone in compliance walks over to your desk, sits down calmly, and says — *"We need to pull every loan that was approved in the last 18 months where the applicant's debt-to-income ratio was above 43%, cross-referenced against accounts that have since missed two or more payments."* And you smile, nod, and quietly realize your current schema was never designed for that question. This is the story of how a mid-sized regional bank — around 2.3 million customers, roughly $18 billion in assets under management — rearchitected its credit risk audit database from a flat reporting structure into a properly normalized, queryable system built to answer exactly that kind of question. Instantly. At 3am before a regulatory deadline. Without breaking production. Every schema decision, every query, every painful index rebuild is documented here. Let's get into it. --- ## The Starting Point: A Schema That Grew Without a Plan When the bank's data team inherited the system, the core loan database looked something like this — one massive table that had accumulated columns for six years: ```sql -- The original monolith (simplified) CREATE TABLE loans_all ( loan_id INT, customer_name VARCHAR(200), customer_email VARCHAR(200), loan_amount DECIMAL(15,2), interest_rate DECIMAL(5,4), loan_type VARCHAR(50), approved_date DATE, approval_officer VARCHAR(100), credit_score INT, annual_income DECIMAL(15,2), total_debt DECIMAL(15,2), property_value DECIMAL(15,2), payment_jan DECIMAL(10,2), payment_feb DECIMAL(10,2), payment_mar DECIMAL(10,2), -- ... payment columns through December, for every year payment_jan_2024 DECIMAL(10,2), payment_feb_2024 DECIMAL(10,2), -- ... you get the idea missed_payments INT, account_status VARCHAR(20) ); ``` This table had 94 columns. Payment history was stored as individual monthly columns rather than rows. Customer data was duplicated across every loan — so if a customer had three loans, their name and income appeared three times, and updating their contact details required touching multiple rows with no guarantee of consistency. Queries were slow. JOIN-based analysis was impossible because there was nothing to join *against*. And any question that involved payment history trends required either a 94-column pivot in application code or a stored procedure so long it had its own documentation page. The compliance team's questions were getting harder. The schema had to change. --- ## Step 1 — Normalization: Breaking the Monolith Apart The first task was decomposing the flat table into a properly normalized relational schema. The team identified four distinct entities hiding inside `loans_all`: ```sql -- Customers: one row per person, not per loan CREATE TABLE customers ( customer_id INT PRIMARY KEY, full_name VARCHAR(200) NOT NULL, email VARCHAR(200) UNIQUE NOT NULL, date_of_birth DATE, created_at DATETIME2(0) DEFAULT GETDATE() ); -- Loan applications: the financial snapshot at approval time CREATE TABLE loan_applications ( loan_id INT PRIMARY KEY, customer_id INT NOT NULL REFERENCES customers(customer_id), loan_type VARCHAR(50) NOT NULL, -- MORTGAGE, AUTO, PERSONAL, HELOC principal_amount DECIMAL(15,2) NOT NULL, interest_rate DECIMAL(5,4) NOT NULL, term_months INT NOT NULL, approved_date DATE NOT NULL, approval_officer VARCHAR(100), -- Credit profile at the moment of approval (never updated) credit_score_at_approval INT NOT NULL, annual_income_at_approval DECIMAL(15,2) NOT NULL, total_debt_at_approval DECIMAL(15,2) NOT NULL, property_value DECIMAL(15,2) ); -- Loan accounts: the live account state (updated regularly) CREATE TABLE loan_accounts ( account_id INT PRIMARY KEY, loan_id INT NOT NULL REFERENCES loan_applications(loan_id), account_status VARCHAR(20) NOT NULL DEFAULT 'CURRENT', -- CURRENT, DELINQUENT, DEFAULT, PAID_OFF, CHARGED_OFF current_balance DECIMAL(15,2) NOT NULL, missed_payments INT NOT NULL DEFAULT 0, last_payment_date DATE, updated_at DATETIME2(0) DEFAULT GETDATE() ); -- Payment ledger: every payment event as its own row CREATE TABLE payment_history ( payment_id BIGINT PRIMARY KEY IDENTITY, account_id INT NOT NULL REFERENCES loan_accounts(account_id), payment_date DATE NOT NULL, amount_due DECIMAL(10,2) NOT NULL, amount_paid DECIMAL(10,2) NOT NULL, payment_status VARCHAR(20) NOT NULL, -- ON_TIME, LATE_30, LATE_60, LATE_90, MISSED days_late INT DEFAULT 0 ); ``` Three things the team got right here that are worth calling out: **Separating approval-time data from live account data.** The `loan_applications` table stores the credit score, income, and debt figures *as they were at approval*. These never change — they're the basis on which the loan was made. The `loan_accounts` table stores the live state that changes every month. Keeping these separate means compliance can ask "what was this customer's DTI when we approved them?" without that answer being contaminated by their current financial situation. **Payment history as rows, not columns.** The old schema had 24+ payment columns. The new `payment_history` table has one row per payment event. This makes every trend query — rolling averages, delinquency streaks, early payoff detection — a standard aggregation instead of a nightmare pivot. **Status as a controlled vocabulary.** Account status and payment status are constrained to known values at the application level (and ideally via `CHECK` constraints). When every status is a known token, reporting queries don't need `LIKE` wildcards and fuzzy string matching. --- ## Step 2 — The Core Credit Risk Query: Debt-to-Income at Approval The compliance team's first question: flag every loan approved in the last 18 months where the applicant's debt-to-income ratio exceeded 43% — the standard regulatory threshold for qualified mortgages in the US — and cross-reference it against accounts currently showing delinquency. This is a query that would have required a 40-line stored procedure on the old schema. On the new one: ```sql -- Flag high-DTI approvals that have since gone delinquent WITH dti_at_approval AS ( SELECT la.loan_id, la.customer_id, la.loan_type, la.principal_amount, la.approved_date, la.credit_score_at_approval, -- DTI: total monthly debt obligations / gross monthly income ROUND( (la.total_debt_at_approval / (la.annual_income_at_approval / 12.0)) * 100, 2 ) AS dti_ratio_pct FROM loan_applications la WHERE la.approved_date >= DATEADD(month, -18, GETDATE()) ), delinquent_accounts AS ( SELECT acc.loan_id, acc.account_status, acc.missed_payments, acc.current_balance FROM loan_accounts acc WHERE acc.account_status IN ('DELINQUENT', 'DEFAULT') OR acc.missed_payments >= 2 ) SELECT c.full_name, c.email, dti.loan_type, dti.principal_amount, dti.approved_date, dti.credit_score_at_approval, dti.dti_ratio_pct, da.account_status, da.missed_payments, da.current_balance FROM dti_at_approval dti JOIN delinquent_accounts da ON dti.loan_id = da.loan_id JOIN customers c ON dti.customer_id = c.customer_id WHERE dti.dti_ratio_pct > 43 ORDER BY dti.dti_ratio_pct DESC; ``` The CTE structure here isn't just stylistic. `dti_at_approval` is reusable — the team runs it against different thresholds for different loan types. And isolating the delinquency filter in its own CTE means that logic can be tightened (say, changing `>= 2` to `>= 1`) without touching the DTI calculation. --- ## Step 3 — Payment Behavior Analysis with Window Functions The second request from compliance was trickier: identify customers who had three or more consecutive missed payments within any 6-month window. This is the kind of early warning signal that predicts charge-off before the account status has been formally updated. This is a classic gaps-and-islands problem, and it's where window functions earn their keep: ```sql -- Detect streaks of consecutive missed payments WITH payment_flags AS ( SELECT ph.account_id, ph.payment_date, ph.payment_status, -- Flag missed payments as 1, on-time as 0 CASE WHEN ph.payment_status = 'MISSED' THEN 1 ELSE 0 END AS is_missed, -- Assign a "group" to consecutive missed-payment streaks ROW_NUMBER() OVER (PARTITION BY ph.account_id ORDER BY ph.payment_date) - ROW_NUMBER() OVER (PARTITION BY ph.account_id, ph.payment_status ORDER BY ph.payment_date) AS streak_group FROM payment_history ph ), missed_streaks AS ( SELECT account_id, MIN(payment_date) AS streak_start, MAX(payment_date) AS streak_end, COUNT(*) AS consecutive_misses FROM payment_flags WHERE is_missed = 1 GROUP BY account_id, streak_group HAVING COUNT(*) >= 3 ) SELECT c.full_name, la.loan_type, la.principal_amount, ms.streak_start, ms.streak_end, ms.consecutive_misses, acc.account_status, acc.current_balance FROM missed_streaks ms JOIN loan_accounts acc ON ms.account_id = acc.account_id JOIN loan_applications la ON acc.loan_id = la.loan_id JOIN customers c ON la.customer_id = c.customer_id ORDER BY ms.consecutive_misses DESC, ms.streak_start; ``` The double `ROW_NUMBER()` trick in `payment_flags` is the heart of the gaps-and-islands approach. By subtracting a partition-aware row number from an overall row number, consecutive rows with the same status get the same `streak_group` value. This turns what would otherwise be a recursive or cursor-based loop into a clean, set-based aggregation. --- ## Step 4 — Portfolio Risk Exposure by Loan Officer The third question came from the Chief Risk Officer: which approval officers had the highest concentration of now-delinquent loans in their approval history, and what was the average DTI they were approving? This is exactly the kind of accountability query that flat schemas make politically sensitive but technically trivial in a normalized design: ```sql SELECT la.approval_officer, COUNT(la.loan_id) AS total_approvals, COUNT(da.loan_id) AS delinquent_count, ROUND(COUNT(da.loan_id) * 100.0 / COUNT(la.loan_id), 1) AS delinquency_rate_pct, ROUND(AVG( (la.total_debt_at_approval / (la.annual_income_at_approval / 12.0)) * 100 ), 2) AS avg_dti_approved, SUM(CASE WHEN da.loan_id IS NOT NULL THEN acc.current_balance ELSE 0 END) AS exposed_balance FROM loan_applications la LEFT JOIN loan_accounts acc ON la.loan_id = acc.loan_id LEFT JOIN ( SELECT loan_id FROM loan_accounts WHERE account_status IN ('DELINQUENT', 'DEFAULT') OR missed_payments >= 2 ) da ON la.loan_id = da.loan_id WHERE la.approved_date >= DATEADD(year, -2, GETDATE()) GROUP BY la.approval_officer HAVING COUNT(la.loan_id) >= 10 -- Only officers with meaningful sample sizes ORDER BY delinquency_rate_pct DESC; ``` One note on `exposed_balance`: this sums the *current* outstanding balance on delinquent accounts only, giving the CRO a dollar figure for the actual capital at risk — not just a count of problem loans. That distinction matters when you're presenting to a board. --- ## Step 5 — Indexing for Audit Query Performance With the schema cleaned up and the queries written, the final task was making sure these ran fast enough to use during live regulatory reviews. The team ran `sys.dm_db_missing_index_details` and came up with four targeted indexes: ```sql -- Support DTI lookups by approval date and loan type CREATE NONCLUSTERED INDEX IX_LoanApps_ApprovalDate_Type ON loan_applications (approved_date, loan_type) INCLUDE (customer_id, principal_amount, credit_score_at_approval, annual_income_at_approval, total_debt_at_approval, approval_officer); -- Support delinquency filters on account status and missed payments CREATE NONCLUSTERED INDEX IX_LoanAccounts_Status_Missed ON loan_accounts (account_status, missed_payments) INCLUDE (loan_id, current_balance, last_payment_date); -- Support payment history streak analysis CREATE NONCLUSTERED INDEX IX_PaymentHistory_Account_Date ON payment_history (account_id, payment_date) INCLUDE (payment_status, amount_due, amount_paid, days_late); -- Support officer-level aggregations CREATE NONCLUSTERED INDEX IX_LoanApps_Officer_Date ON loan_applications (approval_officer, approved_date) INCLUDE (loan_id, loan_type, principal_amount, annual_income_at_approval, total_debt_at_approval); ``` The `INCLUDE` clause on each index is deliberate. These are covering indexes — the queries the compliance team runs can be satisfied entirely from index pages without touching the base table. For the approval officer aggregation query, which previously ran in 28 seconds on the flat schema, execution time dropped to under 400ms after the index was in place. --- ## The Numbers After Rebuilding The migration happened over six weeks, with the new schema running in parallel before the old `loans_all` table was archived. Here's what changed: | Metric | Old schema | New schema | |---|---|---| | DTI compliance query | ~28 sec | 380ms | | Consecutive missed payment detection | Not possible without ETL | 1.1 sec | | Officer delinquency report | ~45 sec | 620ms | | Storage (payment data) | 94 columns × 8.2M rows | 68M rows × 8 columns | | Schema columns on main table | 94 | 12 (loan_applications) | | Data consistency issues | ~14,000 duplicate customer records | 0 (FK enforced) | The storage comparison deserves a word: the new payment history table has more rows, but the row width is a fraction of the original. Overall storage for payment data dropped by 34% after normalization, despite having a more complete historical record. --- ## What This Case Study Is Really About The credit risk queries here are valuable, but they're almost secondary to the design decisions that made them possible. You can't write a clean gaps-and-islands window function against a schema where payment history lives in 24 separate columns. You can't do approval-time DTI analysis if the credit snapshot was overwritten every time a customer's file was updated. The queries are only as powerful as the schema underneath them. That's the thing most SQL tutorials skip: schema design isn't just a database administrator concern. Every analyst who ever had to write a 12-subquery monstrosity to answer a simple business question was, at some level, working around a schema that wasn't designed for questions. When the schema is right, the queries are almost obvious. SQLMarrow's interactive playground lets you run every query in this case study against a live seed dataset — load the banking schema, execute the window functions, modify the DTI threshold, and watch the result set change in real time. That feedback loop, between reading a pattern and actually executing it, is how this kind of SQL thinking becomes second nature. The compliance team filed on time. No cursors were harmed in the making of this audit. --- **Practice the queries from this case study in the [SQLMarrow SQL Playground →](https://sqlmarrow.com/playground)** *Related: [Advanced Window Functions — Chapter 10](https://sqlmarrow.com/handbook) · [ACID Transactions & Concurrency — Chapter 22](https://sqlmarrow.com/handbook) · [SQL Interview Guide](https://sqlmarrow.com/blog/sql-interview-guide)* --- *Tags: banking SQL, credit risk database, SQL window functions, gaps and islands SQL, debt-to-income query, SQL audit, T-SQL case study, SQLMarrow*