This project implements an AI application that bridges Computer Vision and Natural Language Processing (NLP) to automatically generate descriptive text captions for images. Instead of training a model from scratch, this solution leverages a state-of-the-art pre-trained transformer model to achieve high accuracy and performance.
- Image Processing: Loads and processes standard image formats (JPEG, PNG).
- Transformer Architecture: Utilizes the Bootstrapping Language-Image Pre-training (BLIP) model for robust feature extraction and text generation.
- Interactive CLI: Provides a simple command-line interface for users to input image paths and receive instant captions.
- Language: Python 3.x
- Deep Learning Framework:
torch(PyTorch) - Pre-trained Models:
Salesforce/blip-image-captioning-basevia the Hugging Facetransformerslibrary - Image Handling:
Pillow(PIL)
Before running the script, ensure you have the required external libraries installed:
pip install transformers torch Pillow
Run the Python file from your terminal:
python image_captioning.py
When prompted, provide the absolute or relative path to the image file you want to analyze. The AI will download the model (on the first run only), process the image, and output a generated caption.
Nithin Kumar Badduluri
- GitHub: @NKumar-B