This End-to-End project show a cookies business by generating data for customers, orders, and cookies sold. The goal is to create a structured relational database for a bakery that tracks its sales performance and customer behavior. This project aims to showcase my skills and tools used and provide insightful data analysis like sales trends, repeat customer behavior, and forecasts based on historical data.
You can check the full dashboard: PowerBI_Cookies
The following SQL queries were used to build and populate a fictional Cookies Business Database consisting of four main tables: Cookies, Customers, Orders, and Order_Details.
Schema and Data: You can check the creation and importation of the data to the Database Engine (SQL Server Management Studio) Click here
- customer_id (Primary Key)
- name
- address
- order_id (Primary Key)
- customer_id (Foreign Key references Customers)
- order_date
- order_id (Foreign Key references Orders)
- cookie_name
- quantity
- total_price
- cookie_id (Primary Key)
- name
- description
- price
SQL Queries for Data Insertion: A Python script was used to generate 1,000 fictional customers, orders, and corresponding order details. Used Chatgpt to generate the python script.
- Handling Missing Values: Ensuring there are no NULL values in critical fields (e.g., customer_id, order_id).
- Standardizing Data Formats: Ensuring dates are in the correct format (ISO format).
- Removing Duplicates: Using DISTINCT to avoid duplicate records in the analysis.
- Data Transformation: Using functions like SUM, COUNT, ROUND, and aggregating data to calculate totals and averages.
- Conditional Logic: Utilizing CASE statements to categorize data (e.g., determining month names).
- Joining Tables: Using JOIN operations to combine data from different tables (e.g., linking Orders and Order_Details).
Subqueries used for aggregating data and filtering results based on specific criteria and Common Table Expressions (CTEs) to simplify complex queries and organize logic making SQL code more readable and maintainable.
--Sale SELECT c.name, SUM(od.total_price) AS total_sale FROM Order_Details od LEFT JOIN Cookies c ON od.cookie_id = c.cookie_id GROUP BY c.name ORDER BY total_sale DESC | name | total_sale |
|---|---|
| Double Chocolate | 4060.00 |
| Peanut Butter | 3827.25 |
| Chocolate Chip | 3121.50 |
| Oatmeal Raisin | 2498.75 |
| Sugar Cookie | 2172.00 |
--Order SELECT c.name, SUM(od.quantity) AS highest_order_cookie FROM Order_Details od LEFT JOIN Cookies c ON od.cookie_id = c.cookie_id GROUP BY c.name ORDER BY highest_order_cookie DESC | name | highest_order_cookie |
|---|---|
| Peanut Butter | 2187 |
| Sugar Cookie | 2172 |
| Chocolate Chip | 2081 |
| Double Chocolate | 2030 |
| Oatmeal Raisin | 1999 |
Analyze cookie product sales to determine top cookie sale and highest order
SELECT SUM(od.quantity) AS total_order_for_the_day, o.order_date, SUM(od.total_price) as total_sale FROM Order_Details od LEFT JOIN Orders o ON od.order_id = o.order_id GROUP BY o.order_date ORDER BY o.order_date ASC | total_order_for_the_day | order_date | total_sale |
|---|---|---|
| 7 | 2023-10-01 | 10.50 |
| 12 | 2023-10-03 | 15.25 |
| 36 | 2023-10-04 | 57.00 |
| 20 | 2023-10-06 | 20.00 |
| 17 | 2023-10-07 | 21.00 |
| 57 | 2023-10-08 | 86.25 |
| 38 | 2023-10-11 | 70.50 |
| 9 | 2023-10-12 | 31.50 |
| 21 | 2023-10-15 | 31.50 |
| 107 | 2023-10-16 | 140.50 |
| 6 | 2023-10-17 | 7.50 |
| 163 | 2023-10-19 | 244.50 |
| 46 | 2023-10-20 | 60.50 |
| 60 | 2023-10-21 | 105.25 |
| 28 | 2023-10-22 | 33.50 |
| 23 | 2023-10-23 | 41.50 |
| 107 | 2023-10-25 | 158.00 |
| 26 | 2023-10-26 | 62.00 |
| 39 | 2023-10-28 | 46.25 |
Visualizes sales performance trends across the year helping to track growth or decline.
WITH repeat_order AS ( SELECT customer_id, COUNT(*) AS repeat_orders FROM Orders GROUP BY customer_id HAVING COUNT(*) > 1 ) SELECT COUNT(repeat_orders) AS total_repeat_order, (SELECT COUNT(customer_id) FROM Orders) AS total_orders, CAST(COUNT(repeat_orders) AS DECIMAL(10, 2)) / CAST((SELECT COUNT(customer_id) FROM Orders) AS DECIMAL(10, 2)) * 100 AS percent_of_repeat_orders FROM repeat_order; | total_repeat_order | total_orders | percent_of_repeat_orders |
|---|---|---|
| 257 | 1000 | 25.70 |
Calculates the percentage of repeat customers, providing insight into customer loyalty and retention.
SELECT YEAR(o.order_date) AS year_cookies, CASE WHEN MONTH(o.order_date) = 1 THEN 'January' WHEN MONTH(o.order_date) = '2' THEN 'February' WHEN MONTH(o.order_date) = '3' THEN 'March' WHEN MONTH(o.order_date) = '4' THEN 'April' WHEN MONTH(o.order_date) = '5' THEN 'May' WHEN MONTH(o.order_date) = '6' THEN 'June' WHEN MONTH(o.order_date) = '7' THEN 'July' WHEN MONTH(o.order_date) = '8' THEN 'August' WHEN MONTH(o.order_date) = '9' THEN 'September' WHEN MONTH(o.order_date) = '10' THEN 'October' WHEN MONTH(o.order_date) = '11' THEN 'November' WHEN MONTH(o.order_date) = '12' THEN 'December' END AS month_cookies, SUM(od.total_price) as total_sale FROM Order_Details od LEFT JOIN Orders o ON od.order_id = o.order_id GROUP BY YEAR(o.order_date), MONTH(o.order_date) ORDER BY year_cookies DESC | year_cookies | month_cookies | total_sale |
|---|---|---|
| 2024 | January | 1310.25 |
| 2024 | February | 1286.00 |
| 2024 | March | 1158.00 |
| 2024 | April | 1361.50 |
| 2024 | May | 1488.50 |
| 2024 | June | 1333.50 |
| 2024 | July | 1613.25 |
| 2024 | August | 1305.50 |
| 2024 | September | 1015.00 |
| 2023 | October | 1366.50 |
| 2023 | November | 1025.25 |
| 2023 | December | 1416.25 |
Breaks down sales performance by month to help identify seasonality and sales trends.
5.1 Can we forecast which products will perform well in the future based on past performance? (9 months past performance in 2024 based on orders and sales)
--Order WITH nine_month_performance AS ( SELECT c.name AS cookies, SUM(od.quantity) AS highest_order_cookie, YEAR(o.order_date) AS year_cookies, CASE WHEN MONTH(o.order_date) = '1' THEN 'January' WHEN MONTH(o.order_date) = '2' THEN 'February' WHEN MONTH(o.order_date) = '3' THEN 'March' WHEN MONTH(o.order_date) = '4' THEN 'April' WHEN MONTH(o.order_date) = '5' THEN 'May' WHEN MONTH(o.order_date) = '6' THEN 'June' WHEN MONTH(o.order_date) = '7' THEN 'July' WHEN MONTH(o.order_date) = '8' THEN 'August' WHEN MONTH(o.order_date) = '9' THEN 'September' WHEN MONTH(o.order_date) = '10' THEN 'October' WHEN MONTH(o.order_date) = '11' THEN 'November' WHEN MONTH(o.order_date) = '12' THEN 'December' END AS month_cookies, SUM(od.total_price) as total_sale FROM Order_Details od LEFT JOIN Cookies c ON od.cookie_id = c.cookie_id LEFT JOIN Orders o ON od.order_id = o.order_id GROUP BY YEAR(o.order_date), MONTH(o.order_date), c.name ) SELECT cookies, SUM(highest_order_cookie) AS total_order FROM nine_month_performance WHERE month_cookies IN ('January','February','March','April','May','June','July','August','September') AND year_cookies = 2024 GROUP BY cookies ORDER BY total_order DESC; | cookies | total_order |
|---|---|
| Peanut Butter | 1712 |
| Sugar Cookie | 1657 |
| Chocolate Chip | 1612 |
| Double Chocolate | 1504 |
| Oatmeal Raisin | 1434 |
--Sale WITH nine_month_performance AS ( SELECT c.name AS cookies, SUM(od.quantity) AS highest_order_cookie, YEAR(o.order_date) AS year_cookies, CASE WHEN MONTH(o.order_date) = '1' THEN 'January' WHEN MONTH(o.order_date) = '2' THEN 'February' WHEN MONTH(o.order_date) = '3' THEN 'March' WHEN MONTH(o.order_date) = '4' THEN 'April' WHEN MONTH(o.order_date) = '5' THEN 'May' WHEN MONTH(o.order_date) = '6' THEN 'June' WHEN MONTH(o.order_date) = '7' THEN 'July' WHEN MONTH(o.order_date) = '8' THEN 'August' WHEN MONTH(o.order_date) = '9' THEN 'September' WHEN MONTH(o.order_date) = '10' THEN 'October' WHEN MONTH(o.order_date) = '11' THEN 'November' WHEN MONTH(o.order_date) = '12' THEN 'December' END AS month_cookies, SUM(od.total_price) as total_sale FROM Order_Details od LEFT JOIN Cookies c ON od.cookie_id = c.cookie_id LEFT JOIN Orders o ON od.order_id = o.order_id GROUP BY YEAR(o.order_date), MONTH(o.order_date), c.name ) SELECT cookies, SUM(total_sale) AS total_sale FROM nine_month_performance WHERE month_cookies IN ('January','February','March','April','May','June','July','August','September') AND year_cookies = 2024 GROUP BY cookies ORDER BY total_sale DESC; | cookies | total_sale |
|---|---|
| Double Chocolate | 3008.00 |
| Peanut Butter | 2996.00 |
| Chocolate Chip | 2418.00 |
| Oatmeal Raisin | 1792.50 |
| Sugar Cookie | 1657.00 |
5.2 Using the moving average method for a 4-quarter 2024. (Estimated sale in Oct, Nov, Dec).What are the expected sales for the next quarter based on historical trends?
WITH sale_per_month AS ( SELECT YEAR(o.order_date) AS year_cookies, CASE WHEN MONTH(o.order_date) = '1' THEN 'January' WHEN MONTH(o.order_date) = '2' THEN 'February' WHEN MONTH(o.order_date) = '3' THEN 'March' WHEN MONTH(o.order_date) = '4' THEN 'April' WHEN MONTH(o.order_date) = '5' THEN 'May' WHEN MONTH(o.order_date) = '6' THEN 'June' WHEN MONTH(o.order_date) = '7' THEN 'July' WHEN MONTH(o.order_date) = '8' THEN 'August' WHEN MONTH(o.order_date) = '9' THEN 'September' WHEN MONTH(o.order_date) = '10' THEN 'October' WHEN MONTH(o.order_date) = '11' THEN 'November' WHEN MONTH(o.order_date) = '12' THEN 'December' END AS month_cookies, SUM(od.total_price) as total_sale FROM Order_Details od LEFT JOIN Orders o ON od.order_id = o.order_id GROUP BY YEAR(o.order_date), MONTH(o.order_date) ) SELECT SUM(total_sale) AS first_qtr_sale, (SELECT SUM(total_sale) FROM sale_per_month WHERE month_cookies IN ('April','May','June') AND year_cookies = 2024) AS second_qtr_sale, (SELECT SUM(total_sale) FROM sale_per_month WHERE month_cookies IN ('July','August','September') AND year_cookies = 2024) AS third_qtr_sale, (SELECT ROUND(SUM(total_price)/ 3, 2) FROM Order_Details od LEFT JOIN Orders o ON od.order_id = o.order_id WHERE YEAR(o.order_date) = '2024') AS avg_fourth_qtr_sale FROM sale_per_month WHERE month_cookies IN ('January','February','March') AND year_cookies = 2024 | first_qtr_sale | second_qtr_sale | third_qtr_sale | avg_fourth_qtr_sale |
|---|---|---|---|
| 3754.25 | 4183.50 | 3933.75 | 3957.17 |
The moving average for a period of four quarters is typically calculated by taking the average of the sales from the last three quarters. In this case, since we only have three quarters of data for 2024. Formula (Total Sale(Q1,Q2,Q3/3 Quarters)) = Moving Average. Thus 3957.17, we can use this as an estimated sales figure for each month (October, November, December) in the fourth quarter of 2024.
- SQL Server Management Studio (SSMS): For creating the relational database, and inserting records.
- SQL: Run queries and data manipulation.
- Python (Faker Library): To automate the generation of 1000 fictional customer, order, and order details records.
- CSV Files: The data generated was saved in CSV format and then inserted into the SSMS database.
- Power BI: Used for building an interactive dashboard to visualize sales trends and customer behavior.
- Resolving foreign key constraint issues during data insertion.
- Handling and normalizing non-standard date formats.
- Managing large data inserts and query performance.
This project demonstrates practical skills in data generation, querying, and analysis to simulate a real-world cookies business database. The 1000 fictional data entries offer an excellent base for advanced analysis providing valuable insights for business growth and optimization strategies.

