SQL + Python + Power BI
Help retail managers identify top products, seasonal trends, and revenue drivers to improve sales strategy.
This project analyzes retail sales data across multiple cities in India to understand business performance, product demand, customer purchasing behavior, and revenue trends.
The objective of this project is to transform raw retail sales data into meaningful business insights using SQL for data cleaning and analysis, Python for data processing and exploratory analysis, and Power BI for interactive visualization and reporting.
This project demonstrates a complete end-to-end data analytics workflow, including data preparation, transformation, KPI calculation, trend analysis, and dashboard development.
The business wants to analyze retail sales data to answer key business questions:
- What is the total revenue generated?
- How many orders were placed?
- Which product categories generate the most revenue?
- Which cities contribute the highest sales?
- What are the monthly and seasonal sales trends?
- Which products are top-selling and underperforming?
- Which payment methods are most used by customers?
The goal is to convert raw sales data into actionable insights that help businesses improve decision-making and profitability.
The following Key Performance Indicators (KPIs) were calculated during the analysis:
Total Revenue → Total sales generated after discount
Total Orders → Number of orders placed
Average Order Value (AOV) → Total Revenue ÷ Total Orders
Total Quantity Sold → Total products sold
Average Discount → Average discount percentage applied on orders
These KPIs help evaluate the overall business performance and customer purchasing patterns.
- Monthly sales trend (Line Chart)
- Daily sales trend (Bar Chart)
These charts help identify seasonal sales patterns and business growth trends.
- Sales by product category (Bar Chart)
- Sales by payment method (Pie Chart)
- Sales by city (Map Visualization)
These visualizations help understand customer purchasing preferences and regional sales distribution.
- Top 10 products by revenue (Bar Chart)
- Bottom 10 products by revenue (Bar Chart)
- Total sales by category (Column Chart)
This analysis helps businesses identify best-performing and underperforming products.
The dataset contains retail transaction records including:
- Order ID
- Order Date
- Region
- Product Category
- Product Name
- Quantity
- Unit Price
- Sales amount
- Profit
- Discount Percentage
- Payment Method
- Customer Age
The dataset initially contained duplicates, missing values, and inconsistent records, with total recors of 600+ which were cleaned and transformed before analysis.
- Imported raw retail dataset
- Inspected dataset structure
- Identified missing values, duplicates, and inconsistencies
- Prepared the dataset for SQL processing
The dataset was imported into SQL Server (SSMS) for data cleaning and transformation.
The following operations were performed:
- Removed duplicate records
- Handled missing values
- Standardized column names
- Created calculated columns for total sales
Using SQL queries, the following analyses were performed:
- Total sales by category
- Monthly sales trend
- Top selling products
- Region-wise sales
- Product ranking using window functions
- Average order value calculation
- Filtering using WHERE and HAVING clauses
- Aggregation using SUM, COUNT, AVG
After SQL cleaning, the dataset was exported for further analysis using Python.
- Pandas → Data manipulation
- NumPy → Numerical operations
- Matplotlib → Data visualization
- Data validation
- Exploratory Data Analysis (EDA)
- Monthly sales trend analysis
- Category performance analysis
- City-wise sales comparison
- Visualization of revenue patterns
The cleaned dataset was imported into Power BI to create an interactive business dashboard.
- Imported dataset into Power BI
- Created calculated measures using DAX
- Built interactive visualizations
- Designed a business-friendly dashboard layout
- KPI Cards (Revenue, Orders, AOV)
- Sales by Category
- Monthly Sales Trend
- City-wise Sales Map
- Payment Mode Distribution
- Top Performing Products
This analysis helps retail businesses:
- Electronics category generated the highest revenue.
- Sales peak during November and December.
- West region contributes the largest share of total sales.
- Top 10 products generate nearly 60% of revenue.
- Increase inventory before peak seasonal demand.
- Focus marketing on high-performing categories.
- Improve pricing strategy for low-profit products.
These insights can help businesses improve marketing strategies and optimize operations.
Raw Data (Excel)
⬇
Data Cleaning (SQL)
⬇
Exploratory Data Analysis (Python)
⬇
Visualization (Power BI)
⬇
Business Insights
Power BI dashboard screenshots are included in this repository.
The dashboard provides an interactive view of:
- Sales performance
- Product category insights
- Regional sales distribution
- Customer payment behavior
- Monthly revenue trends
This project demonstrates how SQL, Python, and Power BI can be combined to build a complete data analytics solution.
It showcases practical skills including:
- Data cleaning and transformation
- Business KPI calculation
- Exploratory data analysis
- Dashboard development
- Business insight generation
The project highlights how raw data can be converted into meaningful insights that support strategic business decisions.
Through this project, I gained experience in:
- Writing real-world SQL queries
- Performing data cleaning and transformation
- Conducting exploratory data analysis using Python
- Designing professional Power BI dashboards
- Communicating data insights effectively
