A real-time autonomous vehicle navigation system built with YOLOv8, AirSim, and Flask. The car operates inside an Unreal Engine simulation, streams live camera footage to a web browser, and autonomously chases any object the user clicks on โ using depth perception, multi-object tracking, and PID control.
This project bridges computer vision and control systems to create a fully autonomous vehicle inside a simulation. From the browser, a user can:
- Drive the car manually with keyboard input
- Click on any detected object in the live video feed to hand control over to the autonomous system
- Watch the car track, approach, and follow the selected object in real-time
- Receive collision alerts and seamlessly return to manual control
The system fuses two AirSim camera feeds (RGB + depth), runs YOLOv8 inference on every frame, and uses a 10 Hz PID control loop running in a background thread to steer the vehicle.
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ AirSim (Unreal Engine) โ
โ โข RGB camera feed (Scene) โ
โ โข Depth camera feed (DepthPerspective) โ
โ โข Car dynamics & physics โ
โโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ AirSim Python API
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Flask Backend (app.py) โ
โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ generate_frames() โ โ autonomous_navigation โ โ
โ โ (main thread) โ โ _thread() (daemon, 10Hz) โ โ
โ โ โ โ โ โ
โ โ โข YOLOv8 inference โ โ โข PID steering โ โ
โ โ โข ByteTrack IDs โ โ โข PID throttle โ โ
โ โ โข Depth extraction โ โ โข Collision detection โ โ
โ โ โข MJPEG streaming โ โ โข Stuck recovery โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โ REST API routes: /select_object /select_track โ
โ /reset_selection /manual_control โ
โ /get_mode /get_frame_size โ
โ /get_current_detections โ
โโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ HTTP / MJPEG stream
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Browser UI (index.html) โ
โ โข Live MJPEG video with YOLO bounding boxes โ
โ โข Canvas overlay (click zones, crosshair, IDs) โ
โ โข Keyboard manual control (WASD / arrow keys) โ
โ โข Click-to-track object selection โ
โ โข Mode status: MANUAL / AUTO / ARMING / COLLISION โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
| Feature | Description |
|---|---|
| Multi-Object Tracking | YOLOv8 + ByteTrack assigns persistent IDs across frames |
| Depth Perception | AirSim DepthPerspective camera provides per-pixel distance in metres |
| 3D Localisation | Converts 2D pixel + depth โ 3D world coordinates using camera intrinsics |
| Dual PID Control | Separate PID controllers for steering (image-space error) and throttle |
| Visual Servoing | Primary control loop keeps the target centred in the image frame |
| Collision Safety | Armed collision detection โ automatic MANUAL fallback + emergency brake |
| Stuck Recovery | Detects zero-speed state and applies a throttle burst to break static friction |
| MJPEG Streaming | Annotated frames streamed directly to the browser at full speed |
| Canvas Overlay | Transparent click-zone highlights and live crosshair drawn over the video |
DC/
โโโ requirements.txt # Python dependencies
โโโ yolov8n.pt # Pre-trained YOLOv8 nano weights
โโโ yolo_flask_app/
โโโ app.py # Flask server + all backend logic
โโโ yolov8n.pt # Model weights (mirror in app dir)
โโโ PROJECT_FEATURES.md # Detailed feature reference
โโโ templates/
โ โโโ index.html # Main UI (video feed, controls, overlay)
โ โโโ processing.html # Alternate single-page tracking view
โโโ static/
โโโ js/
โ โโโ main.js # Shared JS utilities
โโโ uploads/ # Runtime upload directory
Every iteration of generate_frames():
- Captures an RGB frame and a DepthPerspective frame simultaneously from AirSim
- Runs
model.track(frame, persist=True)โ YOLOv8 + ByteTrack in one call - If no target is selected โ streams the YOLO-annotated frame with all boxes
- If a target is selected โ overlays a red box + depth readout on the annotated frame and updates
TARGET_CENTER_PXandTARGET_DEPTH_M
The background thread autonomous_navigation_thread():
- Reads
TARGET_CENTER_PX(image-space target position) - Calculates horizontal error:
err_x = (cx - frame_center) / frame_half_width - Feeds error into Steering PID โ clamped to
[-1.0, 1.0] - Derives throttle from base value minus turn penalty, clamped to
[0.35, 0.65] - Falls back to world-space navigation using
TARGET_POSITION_3Dif image data is stale - Sends
CarControls(throttle, steering)to AirSim via thread-safe lock
Clicking on the video triggers a two-stage resolution:
- Client-side: detections fetched from
/get_current_detections, boxes scaled to display coordinates, nearest inside box selected โ/select_trackcalled with the track ID (avoids coordinate mismatch) - Server-side fallback: if no local hit, scaled coordinates sent to
/select_objectwhich does a strict then margin-expanded bounding-box hit test
steering_pid โ Kp=0.5, Ki=0.0, Kd=0.1 (image-space horizontal error)
throttle_pid โ Kp=0.3, Ki=0.0, Kd=0.05 (speed error, world-space fallback only)
focal_length = (image_width / 2) / tan(FOV_deg / 2)
x_cam = (pixel_x - cx) / focal_length ร depth
y_cam = (pixel_y - cy) / focal_length ร depth
โ transform using car pose quaternion โ world (x, y, z)
| Requirement | Version |
|---|---|
| Python | โฅ 3.9 |
| Unreal Engine + AirSim | Latest Blocks environment |
| CUDA (optional) | For GPU inference acceleration |
git clone https://github.com/AradhyaChhabdi/DC.git
cd DCpython -m venv myenv
.\myenv\Scripts\Activate.ps1pip install -r requirements.txt$env:FLASK_SECRET_KEY = "your-random-secret-key-here"Open the Blocks environment in Unreal Engine and press Play. Ensure a settings.json with at least one car and a depth camera is configured.
cd yolo_flask_app
python app.pyNavigate to http://127.0.0.1:5000
| Input | Action |
|---|---|
โ / W |
Drive forward |
โ / S |
Drive backward |
โ / A |
Turn left |
โ / D |
Turn right |
Space |
Brake |
| Click on object | Lock target โ engage AUTO mode |
| ๐ฎ button | Toggle between browser control and AirSim keyboard |
AUTO mode is exited automatically on collision or by clicking Reset in the status bar.
| Endpoint | Method | Description |
|---|---|---|
/ |
GET | Serve the main UI |
/video_feed |
GET | MJPEG stream of annotated frames |
/select_object |
POST | Select target by pixel coordinates {x, y} |
/select_track |
POST | Select target directly by track ID {track_id} |
/reset_selection |
POST | Clear target, switch to MANUAL mode |
/manual_control |
POST | Send a drive action {action} |
/toggle_control_source |
POST | Switch API / simulation keyboard control |
/get_mode |
GET | Returns current mode, collision flag, arming status |
/get_frame_size |
GET | Returns actual frame {width, height} |
/get_current_detections |
GET | Returns list of {id, box} detections |
| # | File | Issue | Fix |
|---|---|---|---|
| 1 | templates/processing.html |
Entire file wrapped in <!-- --> โ was completely dead HTML |
Removed HTML comment wrapper; file is now a valid, functional template |
| 2 | templates/processing.html |
JS-style // comments used inline in HTML body (outside <script>) โ rendered as visible text |
Removed invalid inline comments |
| 3 | app.py |
annotated_frame = frame.copy() when target selected โ YOLO annotations for all other objects disappeared |
Changed to results[0].plot() so non-tracked detections remain visible |
| 4 | app.py |
app.secret_key hardcoded as plain text |
Now reads FLASK_SECRET_KEY env var with a warning fallback |
| 5 | templates/index.html |
best.prefer key access in click handler, but prefer was never stored in best dict โ silent comparison bug |
Added prefer to the best dict: best = { id, dist, prefer } |
- Kalman filter for smoother target state estimation under occlusion
- Path planning (A* / RRT) to route around obstacles while pursuing the target
- Multiple target queuing with priority ordering
- End-to-end deep learning control policy (replaces PID)
- Real hardware deployment via ROS bridge
- Flask โ Lightweight Python web framework
- Ultralytics YOLOv8 โ Object detection + ByteTrack multi-object tracking
- OpenCV โ Frame encoding and annotation
- Microsoft AirSim โ Unreal Engine simulation API
- NumPy โ Numerical operations
This project is for academic and research purposes.