A statistical and visual analysis of restaurant tipping behavior using Python, Pandas, NumPy, SciPy, and Matplotlib.
This project explores patterns in tipping habits, outliers, gender differences, and the relationship between bill amount and tip amount.
This analysis covers:
- Dataset cleaning
- Descriptive statistics
- Exploratory visualizations
- Outlier detection using IQR
- Confidence interval estimation
- Gender-based tip comparison
- Correlation & regression analysis
- Scatter plot with regression line
restaurant-tips-analysis/
│
├── analysis.ipynb (or analysis.py) # Your analysis code
├── Set 17 - restaurant tips.csv # Dataset
└── README.md
- Cleaning Actions: Removed duplicates and dropped missing values in
total_billandtip. - Result:
- Original rows: 244
- After cleaning: 221
Computed for: total_bill, tip, size
| Metric | total_bill | tip | size |
|---|---|---|---|
| Mean | 20.97 | 3.31 | 2.57 |
| Median | 17.92 | 3.00 | 2.00 |
| Std Dev | 11.06 | 2.32 | 0.94 |
| Min | 3.07 | 1.00 | 1 |
| Max | 73.40 | 18.00 | 6 |
| Range | 70.33 | 17.00 | 5 |
Plots created using Matplotlib:
- Histogram of
total_bill - Histogram of
tip - Boxplot of tips by day
Formula: Outliers = points lying outside 1.5 × IQR
- Total bill outliers: 12
- Tip outliers: 12
Using t-distribution to estimate true average tipping behavior.
- Mean tip: $3.31
- 95% CI: ($3.00, $3.62)
Performed Levene test (variance equality) and Independent t-test.
- Sample Size: Male (131), Female (81)
- Avg Tip: Male ($3.42), Female ($3.03)
- Variances equal? Yes (p = 0.2672)
- Difference significant? No (t = 1.2339, p = 0.2186)
Conclusion: No statistically significant difference in tipping based on gender.
- Pearson correlation:
0.774- Interpretation: Strong positive relationship (Larger bills → larger tips).
- Regression Model:
Tip = -0.105 + 0.163 × Total Bill - R²:
0.600- Interpretation: 60% of variance in tip is explained by the bill amount.
Scatter plot showing the relationship between bill and tip, with fitted regression line:
- Positive linear trend
- Clear upward slope
- Some variance but strong overall pattern
- pandas
- numpy
- matplotlib
- scipy
Install dependencies using pip:
pip install pandas numpy matplotlib scipy
Run the script via terminal:
python analysis.py
Or open the Jupyter Notebook if using .ipynb.