Flask Computer Vision Web Application

A Flask-based web application that provides dual computer vision capabilities through two state-of-the-art models: Vision Transformer (ViT) for image classification and YOLO for object detection. Users can upload images and choose between text-based predictions or visual object detection with bounding boxes.

Features

Dual Model Support: Choose between Vision Transformer and YOLO models
Image Upload Interface: Secure file upload with filename sanitization
Vision Transformer (ViT): Returns text-based object predictions
YOLO Object Detection: Returns images with bounding boxes drawn around detected objects
Responsive Web Interface: Clean, user-friendly interface for model selection and results
Static File Serving: Integrated handling of uploaded images and prediction outputs

Technology Stack

Backend: Flask (Python web framework)
Computer Vision Models:
- Vision Transformer (ViT) for image classification
- YOLO (You Only Look Once) for object detection
File Handling: Werkzeug secure filename utilities
Frontend: HTML templates with static file serving

Project Structure

flask-app/
├── app.py                    # Main Flask application
├── vit_model.py             # Vision Transformer model implementation
├── yolo_model.py            # YOLO model implementation
├── static/
│   └── uploads/             # Directory for uploaded images
├── templates/
│   ├── index.html           # Home page with model selection
│   ├── vit_prediction.html  # ViT prediction results
│   └── yolo_prediction.html # YOLO prediction results
└── README.md

Installation

Clone the repository:

git clone <repository-url>
cd flask-computer-vision-app

Install required dependencies:

pip install flask werkzeug torch torchvision transformers opencv-python ultralytics

Create the uploads directory:

mkdir -p static/uploads

Model Implementations

Vision Transformer (ViT) Model

File: vit_model.py
Function: predict_vit(file_path)
Output: Text-based object classification
Use Case: Image classification with descriptive labels

YOLO Model

File: yolo_model.py
Function: predict_yolo(file_path)
Output: Image file path with bounding boxes
Use Case: Object detection with visual localization

Usage

Starting the Application

python app.py

The application will run on http://localhost:5000 in debug mode.

Using the Web Interface

Navigate to Home Page (/)
- View descriptions of both available models
- Choose between ViT and YOLO models
Upload and Predict
- Select your preferred model (ViT or YOLO)
- Upload an image file
- Click predict to process
View Results
- ViT Results: Text-based classification displayed on results page
- YOLO Results: Original image with bounding boxes overlaid

API Endpoints

`GET /`

Description: Home page with model selection interface
Returns: HTML page with model descriptions and upload form

`POST /predict`

Description: Process uploaded image with selected model
Parameters:
- model: Selected model type ('vit' or 'yolo')
- file: Uploaded image file
Returns:
- ViT: Prediction results page with text classification
- YOLO: Prediction results page with annotated image

Model Descriptions

Vision Transformer (ViT)

"Vision Transformer Model: Predicts objects in images and returns the predicted object as text."

Provides semantic understanding of image content
Returns descriptive text labels
Ideal for general image classification tasks
Based on transformer architecture adapted for computer vision

YOLO (You Only Look Once)

"YOLO Model: Predicts objects in images and returns an image with bounding boxes drawn."

Real-time object detection and localization
Returns visual results with bounding boxes
Identifies multiple objects in a single image
Provides spatial information about detected objects

File Handling

Upload Security: Uses secure_filename() to sanitize uploaded filenames
File Storage: Images saved to static/uploads/ directory
Static Serving: Flask configured to serve uploaded files and prediction results
Supported Formats: Common image formats (JPG, PNG, etc.)

Configuration

app.config['UPLOAD_FOLDER'] = 'static/uploads'
app = Flask(__name__, static_folder='static/uploads')

Development Notes

Error Handling

Returns "Prediction failed." message for unsuccessful predictions
Implements secure filename handling to prevent directory traversal
Debug mode enabled for development

Model Integration

Modular design with separate files for each model
Consistent function signatures for easy model swapping
Flexible return types (text for ViT, image path for YOLO)

Example Workflow

User visits home page and sees model options
User selects Vision Transformer model
User uploads a photo of a cat
System processes image through ViT model
Results page displays: "Predicted object: Cat"

Alternatively:

User selects YOLO model
User uploads a street scene photo
System processes image through YOLO model
Results page displays original image with bounding boxes around cars, pedestrians, etc.

Requirements

Python 3.7+
Flask
Computer vision libraries (specific to your model implementations)
Sufficient storage space for uploaded images
GPU recommended for faster model inference (optional)

Future Enhancements

Add support for batch processing multiple images
Implement confidence scores for predictions
Add model performance benchmarking
Include data augmentation options
Add user session management
Implement prediction history
Add support for video input
Include model comparison features
Add API endpoints for programmatic access

Troubleshooting

Model Loading Issues: Ensure model files are properly downloaded and accessible
File Upload Problems: Check upload directory permissions and disk space
Memory Issues: Consider using GPU acceleration or reducing image sizes
Template Errors: Verify all HTML templates are in the templates/ directory

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
flask_app		flask_app
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Flask Computer Vision Web Application

Features

Technology Stack

Project Structure

Installation

Model Implementations

Vision Transformer (ViT) Model

YOLO Model

Usage

Starting the Application

Using the Web Interface

API Endpoints

`GET /`

`POST /predict`

Model Descriptions

Vision Transformer (ViT)

YOLO (You Only Look Once)

File Handling

Configuration

Development Notes

Error Handling

Model Integration

Example Workflow

Requirements

Future Enhancements

Troubleshooting

About

Uh oh!

Releases

Packages

Languages

SSSM0602/flask-computer-vision-app

Folders and files

Latest commit

History

Repository files navigation

Flask Computer Vision Web Application

Features

Technology Stack

Project Structure

Installation

Model Implementations

Vision Transformer (ViT) Model

YOLO Model

Usage

Starting the Application

Using the Web Interface

API Endpoints

GET /

POST /predict

Model Descriptions

Vision Transformer (ViT)

YOLO (You Only Look Once)

File Handling

Configuration

Development Notes

Error Handling

Model Integration

Example Workflow

Requirements

Future Enhancements

Troubleshooting

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`GET /`

`POST /predict`

Packages