A machine learning system that classifies receipts as either real or AI-generated using a hybrid image-text approach.
This project implements a receipt classification system that combines computer vision and natural language processing techniques to determine if a receipt image is authentic or AI-generated. The system uses a hybrid model architecture that processes both the visual features of the receipt image and the text extracted from it.
- Hybrid Classification: Combines image features from EfficientNet-B0 and text features from BERT
- OCR Integration: Uses TrOCR (Transformer OCR) for high-quality text extraction from receipt images
- REST API: Provides a FastAPI endpoint for easy integration with other systems
- Multi-format Support: Handles both image files (JPG, PNG) and PDF documents
The system uses a three-part architecture:
- Image Feature Extraction: EfficientNet-B0 processes the visual aspects of the receipt
- Text Extraction & Processing: TrOCR extracts text from the image, which is then processed by BERT
- Hybrid Classifier: Combines image and text features to make the final classification
- Python 3.8+
- Poppler (for PDF processing)
-
Clone the repository:
git clone https://github.com/yuguda999/receipt_classifier.git cd receipt_classifier -
Create a virtual environment and activate it:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate -
Install dependencies:
pip install -r requirements.txt -
Install Poppler (required for PDF processing):
- On Ubuntu/Debian:
sudo apt-get install poppler-utils - On macOS:
brew install poppler - On Windows: Download from poppler releases
- On Ubuntu/Debian:
Start the FastAPI server:
uvicorn app:app --host 0.0.0.0 --port 8000
The API will be available at http://localhost:8000.
Accepts an image or PDF file and returns the classification result.
Request:
- Form data with a file field named
file
Response:
{
"prediction": "real",
"probabilities": {
"ai_generated": 0.05,
"real": 0.95
},
"extracted_text": "Sample receipt text extracted from the image..."
}import requests
url = "http://localhost:8000/predict"
files = {"file": open("path/to/receipt.jpg", "rb")}
response = requests.post(url, files=files)
result = response.json()
print(f"Prediction: {result['prediction']}")
print(f"Confidence: {result['probabilities'][result['prediction']]:.2f}")The model was trained on a dataset of real and AI-generated receipts. The training process is documented in the receipt_classifier.ipynb notebook.
receipt_dataset/
├── ai_generated/ # AI-generated receipt images
└── real/ # Real receipt images
The hybrid model achieves high accuracy in distinguishing between real and AI-generated receipts, with validation accuracy reaching 100% after training.
- EfficientNet for image feature extraction
- Hugging Face Transformers for BERT and TrOCR models
- FastAPI for the API implementation
- PyTesseract for OCR capabilities