A breast cancer prediction system built with Random Forest and SHAP explainability, deployed as a full-stack web application.
Live Demo · Kaggle Notebook · Dataset
Takes 5 tumor measurements as input and predicts whether a tumor is benign or malignant — then explains why using SHAP feature attribution, so the prediction is never just a black box.
| Metric | Score |
|---|---|
| Accuracy | 97.2% |
| Precision | 96% |
| Recall | 97% |
| F1 | 97% |
| False negatives | 3 / 114 test cases |
Evaluated using 5-fold stratified cross-validation, not a single train/test split.
| Layer | Technology |
|---|---|
| Model | scikit-learn Random Forest |
| Explainability | SHAP (TreeExplainer) |
| Backend | FastAPI + Uvicorn |
| Frontend | Vanilla HTML/CSS/JS |
| Hosting | Render (backend) |
cancer-app/
├── main.py # FastAPI backend — /predict and /features endpoints
├── index.html # Frontend — sliders, results, SHAP bar chart
├── model.pkl # Trained Random Forest model
├── X_train.pkl # Training data for SHAP background
├── feature_names.json # Top 5 + all 15 feature names
├── requirements.txt # Python dependencies
└── notebook/
└── breast-cancer-prediction.ipynb
Data cleaning
- Removed the
idcolumn (row identifier, not a medical feature) - Label encoded diagnosis: B → 0, M → 1
Class imbalance
- Used
class_weight='balanced'in Random Forest to prevent the model from being biased toward the majority benign class
Feature selection
- Ranked all 30 features by Random Forest importance
- Kept top 15 to remove redundant correlated columns (mean/se/worst versions of the same measure)
Evaluation
- 5-fold stratified cross-validation for reliable, stable metrics
- Confusion matrix analysis with focus on minimizing false negatives (missed malignant cases)
Explainability
- SHAP TreeExplainer generates per-prediction feature attribution
- Every prediction shows which measurements drove the result and by how much
GET /features
Returns min, max and mean values for the top 5 features — used to populate slider ranges in the frontend.
POST /predict
{
"inputs": {
"concave points_worst": 0.14,
"perimeter_worst": 120.0,
"concave points_mean": 0.07,
"radius_worst": 16.0,
"area_worst": 900.0
}
}Returns:
{
"prediction": 1,
"label": "Malignant",
"confidence": 94.2,
"shap_values": [...],
"shap_base": 0.37
}git clone https://github.com/aliiakbarkhan/cancer-app.git
cd cancer-app
pip install -r requirements.txt
uvicorn main:app --reloadThen open http://localhost:8000/app
| Feature | Benign case | Malignant case |
|---|---|---|
| Concave points worst | 0.053 | 0.261 |
| Perimeter worst | 78.3 | 197.4 |
| Concave points mean | 0.020 | 0.147 |
| Radius worst | 12.1 | 29.8 |
| Area worst | 422.0 | 2501.0 |
This tool is for research and educational purposes only. It is not a substitute for professional medical diagnosis.