Skip to content

A Flask-based computer-vision application combining camera/image input, model inference (e.g., object detection/classification) and a web UI to display results.

Notifications You must be signed in to change notification settings

SSSM0602/flask-computer-vision-app

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

Flask Computer Vision Web Application

A Flask-based web application that provides dual computer vision capabilities through two state-of-the-art models: Vision Transformer (ViT) for image classification and YOLO for object detection. Users can upload images and choose between text-based predictions or visual object detection with bounding boxes.

Features

  • Dual Model Support: Choose between Vision Transformer and YOLO models
  • Image Upload Interface: Secure file upload with filename sanitization
  • Vision Transformer (ViT): Returns text-based object predictions
  • YOLO Object Detection: Returns images with bounding boxes drawn around detected objects
  • Responsive Web Interface: Clean, user-friendly interface for model selection and results
  • Static File Serving: Integrated handling of uploaded images and prediction outputs

Technology Stack

  • Backend: Flask (Python web framework)
  • Computer Vision Models:
    • Vision Transformer (ViT) for image classification
    • YOLO (You Only Look Once) for object detection
  • File Handling: Werkzeug secure filename utilities
  • Frontend: HTML templates with static file serving

Project Structure

flask-app/
├── app.py                    # Main Flask application
├── vit_model.py             # Vision Transformer model implementation
├── yolo_model.py            # YOLO model implementation
├── static/
│   └── uploads/             # Directory for uploaded images
├── templates/
│   ├── index.html           # Home page with model selection
│   ├── vit_prediction.html  # ViT prediction results
│   └── yolo_prediction.html # YOLO prediction results
└── README.md

Installation

  1. Clone the repository:
git clone <repository-url>
cd flask-computer-vision-app
  1. Install required dependencies:
pip install flask werkzeug torch torchvision transformers opencv-python ultralytics
  1. Create the uploads directory:
mkdir -p static/uploads

Model Implementations

Vision Transformer (ViT) Model

  • File: vit_model.py
  • Function: predict_vit(file_path)
  • Output: Text-based object classification
  • Use Case: Image classification with descriptive labels

YOLO Model

  • File: yolo_model.py
  • Function: predict_yolo(file_path)
  • Output: Image file path with bounding boxes
  • Use Case: Object detection with visual localization

Usage

Starting the Application

python app.py

The application will run on http://localhost:5000 in debug mode.

Using the Web Interface

  1. Navigate to Home Page (/)

    • View descriptions of both available models
    • Choose between ViT and YOLO models
  2. Upload and Predict

    • Select your preferred model (ViT or YOLO)
    • Upload an image file
    • Click predict to process
  3. View Results

    • ViT Results: Text-based classification displayed on results page
    • YOLO Results: Original image with bounding boxes overlaid

API Endpoints

GET /

  • Description: Home page with model selection interface
  • Returns: HTML page with model descriptions and upload form

POST /predict

  • Description: Process uploaded image with selected model
  • Parameters:
    • model: Selected model type ('vit' or 'yolo')
    • file: Uploaded image file
  • Returns:
    • ViT: Prediction results page with text classification
    • YOLO: Prediction results page with annotated image

Model Descriptions

Vision Transformer (ViT)

"Vision Transformer Model: Predicts objects in images and returns the predicted object as text."

  • Provides semantic understanding of image content
  • Returns descriptive text labels
  • Ideal for general image classification tasks
  • Based on transformer architecture adapted for computer vision

YOLO (You Only Look Once)

"YOLO Model: Predicts objects in images and returns an image with bounding boxes drawn."

  • Real-time object detection and localization
  • Returns visual results with bounding boxes
  • Identifies multiple objects in a single image
  • Provides spatial information about detected objects

File Handling

  • Upload Security: Uses secure_filename() to sanitize uploaded filenames
  • File Storage: Images saved to static/uploads/ directory
  • Static Serving: Flask configured to serve uploaded files and prediction results
  • Supported Formats: Common image formats (JPG, PNG, etc.)

Configuration

app.config['UPLOAD_FOLDER'] = 'static/uploads'
app = Flask(__name__, static_folder='static/uploads')

Development Notes

Error Handling

  • Returns "Prediction failed." message for unsuccessful predictions
  • Implements secure filename handling to prevent directory traversal
  • Debug mode enabled for development

Model Integration

  • Modular design with separate files for each model
  • Consistent function signatures for easy model swapping
  • Flexible return types (text for ViT, image path for YOLO)

Example Workflow

  1. User visits home page and sees model options
  2. User selects Vision Transformer model
  3. User uploads a photo of a cat
  4. System processes image through ViT model
  5. Results page displays: "Predicted object: Cat"

Alternatively:

  1. User selects YOLO model
  2. User uploads a street scene photo
  3. System processes image through YOLO model
  4. Results page displays original image with bounding boxes around cars, pedestrians, etc.

Requirements

  • Python 3.7+
  • Flask
  • Computer vision libraries (specific to your model implementations)
  • Sufficient storage space for uploaded images
  • GPU recommended for faster model inference (optional)

Future Enhancements

  • Add support for batch processing multiple images
  • Implement confidence scores for predictions
  • Add model performance benchmarking
  • Include data augmentation options
  • Add user session management
  • Implement prediction history
  • Add support for video input
  • Include model comparison features
  • Add API endpoints for programmatic access

Troubleshooting

  • Model Loading Issues: Ensure model files are properly downloaded and accessible
  • File Upload Problems: Check upload directory permissions and disk space
  • Memory Issues: Consider using GPU acceleration or reducing image sizes
  • Template Errors: Verify all HTML templates are in the templates/ directory

About

A Flask-based computer-vision application combining camera/image input, model inference (e.g., object detection/classification) and a web UI to display results.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published