### 📊 The Power of Customer Insights: Why Market Research Databases Matter Every second, consumers are rating products, sharing feedback, and filling out demographic surveys. But for a consumer goods company, this data is useless if it sits in disconnected CSVs or scattered forms. To truly understand customer preferences, buying habits, and product satisfaction, we need a **centralized, highly relational SQL database**. In this case study, we will design a robust database schema from scratch for consumer market research. Then, we will walk through **6 real-world practice SQL challenges** to extract deep strategic insights. --- ### 🗂️ The Survey Insights Schema Architecture Our database is designed around 7 core tables. Each table serves a unique structural purpose, bridging the gap between demographic parameters, survey metadata, catalog logs, and unstructured qualitative comments. #### 1. `respondents` (Demographic Profile) Stores anonymized participant records. * `id` (INT, Primary Key) * `age` (INT) * `gender` (VARCHAR) * `location` (VARCHAR) * `income_level` (VARCHAR: Low, Medium, High) #### 2. `surveys` (Survey Metadata) Maintains cataloged surveys. * `id` (INT, Primary Key) * `title` (VARCHAR) * `date` (DATE) * `target_audience` (VARCHAR) #### 3. `questions` (Survey Questions) Maps out questions linked to specific surveys. * `id` (INT, Primary Key) * `survey_id` (INT, Foreign Key referencing `surveys.id`) * `question_text` (TEXT) * `type` (VARCHAR: MCQ, Rating, Text) #### 4. `responses` (User Answers) Links respondents to specific survey answers. * `id` (INT, Primary Key AUTOINCREMENT) * `respondent_id` (INT, Foreign Key referencing `respondents.id`) * `question_id` (INT, Foreign Key referencing `questions.id`) * `answer_text` (TEXT - for MCQs or open text answers) * `rating` (INT - for rating-scale questions) #### 5. `products` (Product Catalog) Stores the company's active product catalog. * `id` (INT, Primary Key) * `name` (VARCHAR) * `category` (VARCHAR) * `price` (DECIMAL) * `launch_date` (DATE) #### 6. `feedback` (Qualitative Comments) Captures unstructured comments with calculated sentiment scores. * `id` (INT, Primary Key AUTOINCREMENT) * `respondent_id` (INT, Foreign Key referencing `respondents.id`) * `product_id` (INT, Foreign Key referencing `products.id`) * `comments` (TEXT) * `sentiment` (VARCHAR: Positive, Neutral, Negative) #### 7. `competitors` (Benchmarking Metrics) Maintains competitor product metrics for direct comparison. * `id` (INT, Primary Key) * `name` (VARCHAR - e.g., Sony, Bose, Colgate) * `product_name` (VARCHAR) * `price` (DECIMAL) * `avg_rating` (DECIMAL) --- ### 🎯 6 Practice Query Challenges (With SQL Solutions) Let's put this database to work! Below are 6 real business questions and the exact SQL queries required to answer them. #### 📈 Challenge 1: Identify the Top Product Preference * **Business Need:** Which of our launched products is most preferred by survey respondents in our MCQ questions? * **The Query:** ```sql SELECT p.name AS product_name, COUNT(r.id) AS preference_count FROM responses r INNER JOIN questions q ON r.question_id = q.id INNER JOIN products p ON r.answer_text = p.name WHERE q.id = 101 -- Question: Which headset would you purchase? GROUP BY p.id ORDER BY preference_count DESC; ``` * **Logic:** We filter responses for question `101`, join the answers with our product catalog based on name equality, and count the occurrences grouped by product ID to isolate the winner. --- #### 👥 Challenge 2: Demographic Breakdown (Age Group vs. Product Choice) * **Business Need:** How do different age groups prefer our products? Do younger respondents lean toward electronics while older consumers prefer skincare? * **The Query:** ```sql SELECT CASE WHEN res.age < 25 THEN 'Under 25' WHEN res.age BETWEEN 25 AND 40 THEN '25-40' WHEN res.age BETWEEN 41 AND 60 THEN '41-60' ELSE 'Over 60' END AS age_group, p.name AS preferred_product, COUNT(r.id) AS vote_count FROM responses r INNER JOIN respondents res ON r.respondent_id = res.id INNER JOIN questions q ON r.question_id = q.id INNER JOIN products p ON r.answer_text = p.name WHERE q.type = 'MCQ' GROUP BY age_group, preferred_product ORDER BY age_group, vote_count DESC; ``` * **Logic:** Using a `CASE WHEN` statement, we bucket continuous age integers into logical categorical bins, join them to respondents and product names, and tally selections. --- #### ⭐ Challenge 3: Satisfaction Rating Per Product * **Business Need:** What is the average rating for each of our products based on survey rating scales? * **The Query:** ```sql SELECT p.name AS product_name, ROUND(AVG(r.rating), 2) AS average_rating, COUNT(r.rating) AS total_ratings_received FROM responses r INNER JOIN questions q ON r.question_id = q.id INNER JOIN products p ON (q.survey_id = 10 AND p.category = 'Electronics') OR (q.survey_id = 11 AND p.category = 'Skincare') WHERE q.type = 'Rating' AND r.rating IS NOT NULL GROUP BY p.id; ``` * **Logic:** We filter for questions that are marked as `Rating` scale and aggregate the average rating, rounding to two decimal places to deliver clean product health cards. --- #### ⚖️ Challenge 4: Competitor Comparison (Benchmarking) * **Business Need:** Compare our average ratings in skincare and electronics against direct competitor products to identify market gaps. * **The Query:** ```sql WITH our_ratings AS ( SELECT 'Electronics' AS category, 'AeroBeats Wireless Headset' AS our_product, 4.50 AS our_rating UNION ALL SELECT 'Skincare' AS category, 'Organic Honey Face Wash' AS our_product, 3.67 AS our_rating ) SELECT o.category, o.our_product, o.our_rating, c.name AS competitor_brand, c.product_name AS competitor_product, c.avg_rating AS competitor_rating, ROUND(o.our_rating - c.avg_rating, 2) AS rating_delta FROM our_ratings o INNER JOIN competitors c ON o.category = ( CASE WHEN c.name IN ('Sony', 'Bose') THEN 'Electronics' ELSE 'Skincare' END ); ``` * **Logic:** We build a CTE holding our internal ratings and join them to the `competitors` benchmarking table using conditional brand categorizations to calculate rating differentials. --- #### 📊 Challenge 5: Survey Participation Rate * **Business Need:** Calculate the percentage of our overall database respondents who completed answers for each active survey. * **The Query:** ```sql SELECT s.title AS survey_title, (SELECT COUNT(*) FROM respondents) AS total_invited, COUNT(DISTINCT r.respondent_id) AS participants_responded, ROUND( (COUNT(DISTINCT r.respondent_id) * 100.0) / (SELECT COUNT(*) FROM respondents), 1 ) AS participation_rate_percentage FROM responses r INNER JOIN questions q ON r.question_id = q.id INNER JOIN surveys s ON q.survey_id = s.id GROUP BY s.id; ``` * **Logic:** We count the unique respondent IDs grouped by survey, cross-multiply by 100, and divide by a scalar subquery of the total respondents count to output relative engagement percentages. --- #### 🎭 Challenge 6: Sentiment Analysis on Qualitative Feedback * **Business Need:** Segment text comments to identify patterns in positive vs. negative feedback for skincare and personal care catalog entries. * **The Query:** ```sql SELECT sentiment, COUNT(*) AS feedback_count, GROUP_CONCAT(comments, ' | ') AS comments_summary FROM feedback GROUP BY sentiment; ``` * **Logic:** By grouping strictly on the pre-calculated `sentiment` metadata column, we retrieve feedback distribution volumes and concatenate their strings using `GROUP_CONCAT` to view the underlying transcripts in one row. --- ### 🚀 Why This Case Study Works This survey system simulates standard enterprise analytical requirements: 1. **Structured + Unstructured Integration:** Combines hard metric averages (ratings) with complex qualitative evaluations (comments & sentiment). 2. **Flexible Relational Schemas:** Respondent profiles, question types, and responses map clean paths for joins without redundancy. 3. **Real-Time Benchmarking:** Directly measures internal product metrics against external marketplace competitors.