Skip to content

End-to-end MLOps classification system with AWS CI/CD and automated deployment.

License

Notifications You must be signed in to change notification settings

pankaj2k9/MLOpsE2EClassificationTermProject

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

36 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

MLOpsE2EClassificationTermProject

This project implements a complete MLOps pipeline for a U.S. Visa Approval Classification System, covering all essential components โ€” from data ingestion to deployment and monitoring. The goal is to predict whether a visa application will be approved or denied, using machine learning and production-grade MLOps tools.

The application is deployed on AWS EC2 using Docker containers, with the image stored and pulled directly from AWS Elastic Container Registry (ECR) through an automated GitHub Actions CI/CD pipeline.


๐Ÿ“˜ Overview

This project demonstrates:

  • Data ingestion & transformation
  • Model training & hyperparameter optimization
  • Model registry and versioning with AWS S3
  • FastAPI deployment
  • Continuous evaluation with Evidently AI

It is designed following end-to-end MLOps best practices, ensuring scalability, reproducibility, and maintainability.


โš™๏ธ Tech Stack

Category Tools / Libraries
Data Processing pandas, numpy, matplotlib, seaborn, plotly
ML Modeling scikit-learn, xgboost, catboost, imblearn, scipy
MLOps & Monitoring dill, PyYAML, neuro_mf, boto3, botocore, mypy-boto3-s3, evidently==0.2.8
Database pymongo
Backend/API fastapi, uvicorn, jinja2, python-multipart
Utilities from_root, certifi, dnspython

๐Ÿ“‚ Project Structure

MLOpsE2EClassificationTermProject/
โ”‚
โ”œโ”€โ”€ data/                        # Raw & processed data
โ”œโ”€โ”€ notebooks/                   # Exploratory analysis notebooks
โ”œโ”€โ”€ src/
โ”‚   โ”œโ”€โ”€ components/              # Data ingestion, transformation, training modules
โ”‚   โ”œโ”€โ”€ pipeline/                # Training & prediction pipelines
โ”‚   โ”œโ”€โ”€ utils/                   # Helper functions
โ”‚   โ”œโ”€โ”€ logger.py                # Custom logging
โ”‚   โ”œโ”€โ”€ exception.py             # Error handling
โ”‚
โ”œโ”€โ”€ app.py                       # FastAPI main application
โ”œโ”€โ”€ template.py                  # Folder structure generator
โ”œโ”€โ”€ requirements.txt
โ”œโ”€โ”€ setup.py
โ””โ”€โ”€ README.md

๐Ÿงฉ Installation

1๏ธโƒฃ Create and activate conda environment

conda create -n visa python=3.8 -y
conda activate visa

2๏ธโƒฃ Install dependencies

pip install -r requirements.txt

3๏ธโƒฃ (If MongoDB error occurs)

pip uninstall -y pymongo motor mongoengine djongo
pip install -U "pymongo>=4.7" dnspython certifi

๐Ÿง  Features

๐Ÿงฎ Data Preprocessing

  • Handles missing values and outliers
  • Encodes categorical variables
  • Normalizes numeric features

๐Ÿง  Model Training

  • Trains multiple models (XGBoost, CatBoost, RandomForest, etc.)
  • Uses GridSearchCV for parameter optimization
  • Saves model artifacts with dill

โ˜๏ธ Model Versioning & Storage

  • Stores trained models and metadata in AWS S3
  • Uses boto3 and neuro_mf for version tracking

โšก Deployment via FastAPI

  • REST API endpoint for prediction: /predict
  • Web UI using Jinja2 templates
  • Deployed using Uvicorn

๐Ÿ“Š Continuous Monitoring

  • Integrated with Evidently AI (v0.2.8) for drift detection
  • Tracks model performance and feature drift over time

๐Ÿš€ Usage

๐Ÿงช Run training pipeline

python src/pipeline/training_pipeline.py

โš™๏ธ Start API server

uvicorn app:app --reload

๐Ÿ“ˆ Generate Evidently report

python src/components/data_monitoring.py

๐Ÿงพ Example API Request

POST /predict

{
  "case_id": "A12345",
  "country_of_origin": "India",
  "education_level": "Masters",
  "job_experience": 5,
  "employer_size": 200,
  "prev_visa_denials": 0
}

Response:

{
  "prediction": "Approved",
  "probability": 0.89
}

โ˜๏ธ AWS Integration

Environment Variables

Create a .env file in the root directory:

AWS_ACCESS_KEY_ID=<your_aws_key>
AWS_SECRET_ACCESS_KEY=<your_secret_key>
MONGODB_CLUSTER_URI=<your_mongo_connection_string>
BUCKET_NAME=<your_s3_bucket_name>
AWS_DEFAULT_REGION=<your_aws_region>
ECR_REPO=<your_ecr_url>

๐Ÿงน Troubleshooting

If you face MongoDB issues:

pip uninstall -y pymongo motor mongoengine djongo
pip install -U "pymongo>=4.7" dnspython certifi

If S3 upload fails:

  • Check your AWS credentials
  • Verify IAM role permissions
  • Ensure correct bucket region

๐Ÿ“Š MLOps Pipeline (Flow)

    A[Data Ingestion] --> B[Data Transformation]
    B --> C[Model Training]
    C --> D[Model Evaluation]
    D --> E[Model Storage (AWS S3)]
    E --> F[FastAPI Deployment]
    F --> G[Prediction API]
    G --> H[Monitoring (Evidently AI)]
    H --> A

๐Ÿ“ฆ Requirements Summary

pandas
numpy
matplotlib
plotly
seaborn
scipy
scikit-learn
imblearn
xgboost
catboost
pymongo
from_root
evidently==0.2.8
dill
PyYAML
neuro_mf
boto3
mypy-boto3-s3
botocore
fastapi
uvicorn
jinja2
python-multipart
-e .

๐Ÿ‘จโ€๐Ÿ’ป Author

Pankaj Kumar Pramanik Data, AI & MLOps Engineer ๐ŸŒ pankajpramanik.com

About

End-to-end MLOps classification system with AWS CI/CD and automated deployment.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors