This project implements a complete MLOps pipeline for a U.S. Visa Approval Classification System, covering all essential components โ from data ingestion to deployment and monitoring. The goal is to predict whether a visa application will be approved or denied, using machine learning and production-grade MLOps tools.
The application is deployed on AWS EC2 using Docker containers, with the image stored and pulled directly from AWS Elastic Container Registry (ECR) through an automated GitHub Actions CI/CD pipeline.
This project demonstrates:
- Data ingestion & transformation
- Model training & hyperparameter optimization
- Model registry and versioning with AWS S3
- FastAPI deployment
- Continuous evaluation with Evidently AI
It is designed following end-to-end MLOps best practices, ensuring scalability, reproducibility, and maintainability.
| Category | Tools / Libraries |
|---|---|
| Data Processing | pandas, numpy, matplotlib, seaborn, plotly |
| ML Modeling | scikit-learn, xgboost, catboost, imblearn, scipy |
| MLOps & Monitoring | dill, PyYAML, neuro_mf, boto3, botocore, mypy-boto3-s3, evidently==0.2.8 |
| Database | pymongo |
| Backend/API | fastapi, uvicorn, jinja2, python-multipart |
| Utilities | from_root, certifi, dnspython |
MLOpsE2EClassificationTermProject/
โ
โโโ data/ # Raw & processed data
โโโ notebooks/ # Exploratory analysis notebooks
โโโ src/
โ โโโ components/ # Data ingestion, transformation, training modules
โ โโโ pipeline/ # Training & prediction pipelines
โ โโโ utils/ # Helper functions
โ โโโ logger.py # Custom logging
โ โโโ exception.py # Error handling
โ
โโโ app.py # FastAPI main application
โโโ template.py # Folder structure generator
โโโ requirements.txt
โโโ setup.py
โโโ README.md
conda create -n visa python=3.8 -y
conda activate visapip install -r requirements.txtpip uninstall -y pymongo motor mongoengine djongo
pip install -U "pymongo>=4.7" dnspython certifi- Handles missing values and outliers
- Encodes categorical variables
- Normalizes numeric features
- Trains multiple models (XGBoost, CatBoost, RandomForest, etc.)
- Uses GridSearchCV for parameter optimization
- Saves model artifacts with
dill
- Stores trained models and metadata in AWS S3
- Uses
boto3andneuro_mffor version tracking
- REST API endpoint for prediction:
/predict - Web UI using Jinja2 templates
- Deployed using Uvicorn
- Integrated with Evidently AI (v0.2.8) for drift detection
- Tracks model performance and feature drift over time
python src/pipeline/training_pipeline.pyuvicorn app:app --reloadpython src/components/data_monitoring.py{
"case_id": "A12345",
"country_of_origin": "India",
"education_level": "Masters",
"job_experience": 5,
"employer_size": 200,
"prev_visa_denials": 0
}Response:
{
"prediction": "Approved",
"probability": 0.89
}Create a .env file in the root directory:
AWS_ACCESS_KEY_ID=<your_aws_key>
AWS_SECRET_ACCESS_KEY=<your_secret_key>
MONGODB_CLUSTER_URI=<your_mongo_connection_string>
BUCKET_NAME=<your_s3_bucket_name>
AWS_DEFAULT_REGION=<your_aws_region>
ECR_REPO=<your_ecr_url>
If you face MongoDB issues:
pip uninstall -y pymongo motor mongoengine djongo
pip install -U "pymongo>=4.7" dnspython certifiIf S3 upload fails:
- Check your AWS credentials
- Verify IAM role permissions
- Ensure correct bucket region
A[Data Ingestion] --> B[Data Transformation]
B --> C[Model Training]
C --> D[Model Evaluation]
D --> E[Model Storage (AWS S3)]
E --> F[FastAPI Deployment]
F --> G[Prediction API]
G --> H[Monitoring (Evidently AI)]
H --> A
pandas
numpy
matplotlib
plotly
seaborn
scipy
scikit-learn
imblearn
xgboost
catboost
pymongo
from_root
evidently==0.2.8
dill
PyYAML
neuro_mf
boto3
mypy-boto3-s3
botocore
fastapi
uvicorn
jinja2
python-multipart
-e .
Pankaj Kumar Pramanik Data, AI & MLOps Engineer ๐ pankajpramanik.com