Link Demo :

Link1 : https://www.youtube.com/watch?v=TNiw3K0_gpU

Link2 : https://www.youtube.com/watch?v=ytsUsYFrEZg

Presentation : https://drive.google.com/file/d/1mTUDDzFVPkpeEalwFBjfJOjnJOlAitqd/view?usp=sharing

Architecture & Workflow : https://drive.google.com/file/d/10eHQDelLdlLpawuHmSrbcZaHe2EhJBgh/view?usp=sharing

VoiceBridge - Hackathon Winner - Top 1 DAIN AI Challege and USC Hackathon

AI-Powered Computer Control for Enhanced Accessibility

Overview

VoiceBridge is an innovative accessibility tool that bridges the gap between technology and users with impairments. Using advanced AI vision and natural language processing, it enables users to control their computers through voice commands, making technology truly inclusive.

Mission

"Making technology accessible to everyone, regardless of physical limitations"

Key Features

Screen Analysis - AI-powered visual understanding of your screen
Voice Control - Natural language commands for computer interaction
Smart Clicking - Intelligent UI element detection and interaction
Text Input - Voice-to-text typing capabilities
Browser Integration - Chrome extension for web navigation
Page Navigation - Forward/backward navigation with voice commands
Scrolling Control - Smooth scrolling with voice commands

Tech Stack

Core Services

DAIN Protocol - AI service framework
Node.js/TypeScript - Main service layer
Python Flask - Computer automation API
Chrome Extension - Browser integration

AI & Vision

OpenAI GPT-4 Vision - Screen analysis and understanding
EasyOCR - Optical Character Recognition
Computer Vision (OpenCV) - Image processing
Levenshtein Distance - Text similarity matching

Automation

PyAutoGUI - Screen automation (clicking, typing, scrolling)
Screen Capture - Real-time screen analysis
Keyboard/Mouse Control - System-level automation

Development

TypeScript 5.5+ - Type-safe development
Hono - Lightweight web framework
Zod - Schema validation
Axios - HTTP client

Quick Start

Prerequisites

Node.js 18+
Python 3.8+
OpenAI API Key
DAIN API Key

Installation

Clone the repository

git clone https://github.com/quachphu/Hackathon.git
cd SoCalHackathon2024-main

Install Node.js dependencies

npm install

Install Python dependencies

cd api
pip install flask opencv-python pyautogui easyocr levenshtein matplotlib numpy

Environment Setup

# Create .env.development file
cp .env.development.example .env.development

# Add your API keys
DAIN_API_KEY=your_dain_api_key_here
OPENAI_API_KEY=your_openai_api_key_here

Chrome Extension Setup

Open Chrome and go to chrome://extensions/
Enable "Developer mode"
Click "Load unpacked" and select the extension/ folder

Running the Application

Start the Python Flask API

cd api
python main.py
# Server runs on http://localhost:3000

Start the DAIN Service

npm run dev
# Service runs on port 2022

Test the connection

curl http://localhost:3000/analyze_screen

API Reference

Flask API Endpoints

Endpoint	Method	Description
`/analyze_screen`	GET	Capture and analyze current screen
`/write_text`	POST	Type text on screen
`/press_key`	POST	Emulate key presses
`/click_request`	POST	Click on specified UI element
`/scroll`	POST	Scroll page up/down
`/navigate_page`	POST	Navigate forward/backward
`/open`	POST	Open new tab/window

DAIN Service Tools

analyze-screen - Analyzes user's screen content
click-request - Clicks on specified text/buttons
scroll-page - Controls page scrolling
navigate-page - Page navigation controls
write-string - Text input functionality
press-key - Keyboard emulation
open-tab - Tab management

Usage Examples

Voice Commands

"What's on my screen?"          → Analyzes current screen content
"Click on the login button"     → Finds and clicks login button
"Scroll down"                   → Scrolls page downward
"Type 'Hello World'"           → Types the specified text
"Press Enter"                   → Presses Enter key
"Go back"                       → Navigates to previous page
"Open new tab"                  → Opens a new browser tab

Programmatic Usage

// Analyze screen
const analysis = await dainService.tools.analyzeScreen({
  question: "What's displayed on the screen?"
});

// Click on element
await dainService.tools.clickRequest({
  question: "Click on the submit button"
});

// Type text
await dainService.tools.writeString({
  question: "Enter my email address"
});

Configuration

DAIN Configuration (`dain.json`)

{
  "project-id": "",
  "api-key": "MUST PUT IN .env.development as DAIN_API_KEY=YOUR_API_KEY",
  "main-file": "src/index.ts",
  "static-dir": "static",
  "out-dir": "dist",
  "environment": "development",
  "version": "1.0.0",
  "runtime": "node"
}

Python API Configuration

Port: 3000 (configurable in main.py)
Image Storage: ../temp/image.png
OCR Language: English (["en"])

Use Cases

Accessibility

Motor Impairments - Voice control for users with limited hand mobility
Visual Impairments - Screen reading and navigation assistance
Cognitive Assistance - Simplified computer interaction

Productivity

Hands-free Computing - Multitasking while controlling computer
Voice Automation - Streamlined workflow automation
Remote Control - Computer control from distance

Hackathon Achievements

SoCalHackathon2024 - Building inclusive technology that bridges accessibility gaps

What We Built

Complete AI-powered accessibility suite
Real-time screen analysis and control
Natural language computer interaction
Cross-platform browser integration

Technical Challenges Overcome

Real-time OCR and computer vision integration
Seamless AI service communication
Precise UI element detection and interaction
Cross-service architecture coordination

Contributing

We welcome contributions! Please see our Contributing Guidelines.

Development Setup

Fork the repository
Create a feature branch: git checkout -b feature/amazing-feature
Commit changes: git commit -m 'Add amazing feature'
Push to branch: git push origin feature/amazing-feature
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Team

DFPP Team

Dichill - Full Stack Application and System Architecture
Fahat - Frontend & User Experience
Paola - Frontend & User Experience
Phu - Full Stack Application

Acknowledgments

DAIN Protocol - For the amazing AI service framework
OpenAI - For powerful vision and language models
SoCalHackathon2024 - For the opportunity to build inclusive technology
Accessibility Community - For inspiration and feedback

Support

Made with ❤️ for accessibility and inclusion

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
api		api
extension		extension
src		src
temp		temp
.DS_Store		.DS_Store
README.md		README.md
dain.json		dain.json
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Folders and files

Latest commit

History

Repository files navigation

Link Demo :

VoiceBridge - Hackathon Winner - Top 1 DAIN AI Challege and USC Hackathon

Overview

Mission

Key Features

Tech Stack

Core Services

AI & Vision

Automation

Development

Quick Start

Prerequisites

Installation

Running the Application

API Reference

Flask API Endpoints

DAIN Service Tools

Usage Examples

Voice Commands

Programmatic Usage

Configuration

DAIN Configuration (dain.json)

Python API Configuration

Use Cases

Accessibility

Productivity

Hackathon Achievements

What We Built

Technical Challenges Overcome

Contributing

Development Setup

License

Team

Acknowledgments

Support

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

DAIN Configuration (`dain.json`)

Packages