Skip to content

AradhyaChhabdi/DC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

6 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿš— Autonomous Drone Car Navigation System

A real-time autonomous vehicle navigation system built with YOLOv8, AirSim, and Flask. The car operates inside an Unreal Engine simulation, streams live camera footage to a web browser, and autonomously chases any object the user clicks on โ€” using depth perception, multi-object tracking, and PID control.


๐Ÿ“– Project Overview

This project bridges computer vision and control systems to create a fully autonomous vehicle inside a simulation. From the browser, a user can:

  • Drive the car manually with keyboard input
  • Click on any detected object in the live video feed to hand control over to the autonomous system
  • Watch the car track, approach, and follow the selected object in real-time
  • Receive collision alerts and seamlessly return to manual control

The system fuses two AirSim camera feeds (RGB + depth), runs YOLOv8 inference on every frame, and uses a 10 Hz PID control loop running in a background thread to steer the vehicle.


๐Ÿงฑ Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                      AirSim (Unreal Engine)              โ”‚
โ”‚   โ€ข RGB camera feed (Scene)                              โ”‚
โ”‚   โ€ข Depth camera feed (DepthPerspective)                 โ”‚
โ”‚   โ€ข Car dynamics & physics                               โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
               โ”‚  AirSim Python API
               โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                      Flask Backend (app.py)              โ”‚
โ”‚                                                          โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
โ”‚  โ”‚   generate_frames() โ”‚   โ”‚ autonomous_navigation    โ”‚ โ”‚
โ”‚  โ”‚   (main thread)     โ”‚   โ”‚ _thread() (daemon, 10Hz) โ”‚ โ”‚
โ”‚  โ”‚                     โ”‚   โ”‚                          โ”‚ โ”‚
โ”‚  โ”‚  โ€ข YOLOv8 inference โ”‚   โ”‚  โ€ข PID steering          โ”‚ โ”‚
โ”‚  โ”‚  โ€ข ByteTrack IDs    โ”‚   โ”‚  โ€ข PID throttle          โ”‚ โ”‚
โ”‚  โ”‚  โ€ข Depth extraction โ”‚   โ”‚  โ€ข Collision detection   โ”‚ โ”‚
โ”‚  โ”‚  โ€ข MJPEG streaming  โ”‚   โ”‚  โ€ข Stuck recovery        โ”‚ โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
โ”‚                                                          โ”‚
โ”‚  REST API routes: /select_object  /select_track          โ”‚
โ”‚                   /reset_selection  /manual_control      โ”‚
โ”‚                   /get_mode  /get_frame_size             โ”‚
โ”‚                   /get_current_detections                โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
               โ”‚  HTTP / MJPEG stream
               โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                  Browser UI (index.html)                 โ”‚
โ”‚   โ€ข Live MJPEG video with YOLO bounding boxes            โ”‚
โ”‚   โ€ข Canvas overlay (click zones, crosshair, IDs)         โ”‚
โ”‚   โ€ข Keyboard manual control (WASD / arrow keys)          โ”‚
โ”‚   โ€ข Click-to-track object selection                      โ”‚
โ”‚   โ€ข Mode status: MANUAL / AUTO / ARMING / COLLISION      โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

โœจ Key Features

Feature Description
Multi-Object Tracking YOLOv8 + ByteTrack assigns persistent IDs across frames
Depth Perception AirSim DepthPerspective camera provides per-pixel distance in metres
3D Localisation Converts 2D pixel + depth โ†’ 3D world coordinates using camera intrinsics
Dual PID Control Separate PID controllers for steering (image-space error) and throttle
Visual Servoing Primary control loop keeps the target centred in the image frame
Collision Safety Armed collision detection โ†’ automatic MANUAL fallback + emergency brake
Stuck Recovery Detects zero-speed state and applies a throttle burst to break static friction
MJPEG Streaming Annotated frames streamed directly to the browser at full speed
Canvas Overlay Transparent click-zone highlights and live crosshair drawn over the video

๐Ÿ“ Project Structure

DC/
โ”œโ”€โ”€ requirements.txt                   # Python dependencies
โ”œโ”€โ”€ yolov8n.pt                         # Pre-trained YOLOv8 nano weights
โ””โ”€โ”€ yolo_flask_app/
    โ”œโ”€โ”€ app.py                         # Flask server + all backend logic
    โ”œโ”€โ”€ yolov8n.pt                     # Model weights (mirror in app dir)
    โ”œโ”€โ”€ PROJECT_FEATURES.md            # Detailed feature reference
    โ”œโ”€โ”€ templates/
    โ”‚   โ”œโ”€โ”€ index.html                 # Main UI (video feed, controls, overlay)
    โ”‚   โ””โ”€โ”€ processing.html            # Alternate single-page tracking view
    โ””โ”€โ”€ static/
        โ”œโ”€โ”€ js/
        โ”‚   โ””โ”€โ”€ main.js                # Shared JS utilities
        โ””โ”€โ”€ uploads/                   # Runtime upload directory

โš™๏ธ How It Works

1. Video Pipeline

Every iteration of generate_frames():

  1. Captures an RGB frame and a DepthPerspective frame simultaneously from AirSim
  2. Runs model.track(frame, persist=True) โ€” YOLOv8 + ByteTrack in one call
  3. If no target is selected โ†’ streams the YOLO-annotated frame with all boxes
  4. If a target is selected โ†’ overlays a red box + depth readout on the annotated frame and updates TARGET_CENTER_PX and TARGET_DEPTH_M

2. Autonomous Control Loop (10 Hz)

The background thread autonomous_navigation_thread():

  1. Reads TARGET_CENTER_PX (image-space target position)
  2. Calculates horizontal error: err_x = (cx - frame_center) / frame_half_width
  3. Feeds error into Steering PID โ†’ clamped to [-1.0, 1.0]
  4. Derives throttle from base value minus turn penalty, clamped to [0.35, 0.65]
  5. Falls back to world-space navigation using TARGET_POSITION_3D if image data is stale
  6. Sends CarControls(throttle, steering) to AirSim via thread-safe lock

3. Object Selection

Clicking on the video triggers a two-stage resolution:

  • Client-side: detections fetched from /get_current_detections, boxes scaled to display coordinates, nearest inside box selected โ†’ /select_track called with the track ID (avoids coordinate mismatch)
  • Server-side fallback: if no local hit, scaled coordinates sent to /select_object which does a strict then margin-expanded bounding-box hit test

4. PID Controllers

steering_pid  โ†’  Kp=0.5, Ki=0.0, Kd=0.1   (image-space horizontal error)
throttle_pid  โ†’  Kp=0.3, Ki=0.0, Kd=0.05  (speed error, world-space fallback only)

5. 3D Coordinate Estimation

focal_length = (image_width / 2) / tan(FOV_deg / 2)
x_cam = (pixel_x - cx) / focal_length ร— depth
y_cam = (pixel_y - cy) / focal_length ร— depth
โ†’ transform using car pose quaternion โ†’ world (x, y, z)

๐Ÿ› ๏ธ Prerequisites

Requirement Version
Python โ‰ฅ 3.9
Unreal Engine + AirSim Latest Blocks environment
CUDA (optional) For GPU inference acceleration

๐Ÿš€ Setup & Running

1. Clone the repository

git clone https://github.com/AradhyaChhabdi/DC.git
cd DC

2. Create and activate a virtual environment

python -m venv myenv
.\myenv\Scripts\Activate.ps1

3. Install dependencies

pip install -r requirements.txt

4. Set the Flask secret key (recommended)

$env:FLASK_SECRET_KEY = "your-random-secret-key-here"

5. Launch AirSim

Open the Blocks environment in Unreal Engine and press Play. Ensure a settings.json with at least one car and a depth camera is configured.

6. Start the Flask server

cd yolo_flask_app
python app.py

7. Open the browser

Navigate to http://127.0.0.1:5000


๐ŸŽฎ Controls

Input Action
โ†‘ / W Drive forward
โ†“ / S Drive backward
โ† / A Turn left
โ†’ / D Turn right
Space Brake
Click on object Lock target โ†’ engage AUTO mode
๐ŸŽฎ button Toggle between browser control and AirSim keyboard

AUTO mode is exited automatically on collision or by clicking Reset in the status bar.


๐Ÿ”Œ REST API Reference

Endpoint Method Description
/ GET Serve the main UI
/video_feed GET MJPEG stream of annotated frames
/select_object POST Select target by pixel coordinates {x, y}
/select_track POST Select target directly by track ID {track_id}
/reset_selection POST Clear target, switch to MANUAL mode
/manual_control POST Send a drive action {action}
/toggle_control_source POST Switch API / simulation keyboard control
/get_mode GET Returns current mode, collision flag, arming status
/get_frame_size GET Returns actual frame {width, height}
/get_current_detections GET Returns list of {id, box} detections

๐Ÿ› Bugs Fixed

# File Issue Fix
1 templates/processing.html Entire file wrapped in <!-- --> โ€” was completely dead HTML Removed HTML comment wrapper; file is now a valid, functional template
2 templates/processing.html JS-style // comments used inline in HTML body (outside <script>) โ€” rendered as visible text Removed invalid inline comments
3 app.py annotated_frame = frame.copy() when target selected โ€” YOLO annotations for all other objects disappeared Changed to results[0].plot() so non-tracked detections remain visible
4 app.py app.secret_key hardcoded as plain text Now reads FLASK_SECRET_KEY env var with a warning fallback
5 templates/index.html best.prefer key access in click handler, but prefer was never stored in best dict โ€” silent comparison bug Added prefer to the best dict: best = { id, dist, prefer }

๐Ÿ”ฎ Potential Future Enhancements

  • Kalman filter for smoother target state estimation under occlusion
  • Path planning (A* / RRT) to route around obstacles while pursuing the target
  • Multiple target queuing with priority ordering
  • End-to-end deep learning control policy (replaces PID)
  • Real hardware deployment via ROS bridge

๐Ÿ“š Tech Stack

  • Flask โ€” Lightweight Python web framework
  • Ultralytics YOLOv8 โ€” Object detection + ByteTrack multi-object tracking
  • OpenCV โ€” Frame encoding and annotation
  • Microsoft AirSim โ€” Unreal Engine simulation API
  • NumPy โ€” Numerical operations

๐Ÿ“„ License

This project is for academic and research purposes.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors