Link1 : https://www.youtube.com/watch?v=TNiw3K0_gpU
Link2 : https://www.youtube.com/watch?v=ytsUsYFrEZg
Presentation : https://drive.google.com/file/d/1mTUDDzFVPkpeEalwFBjfJOjnJOlAitqd/view?usp=sharing
Architecture & Workflow : https://drive.google.com/file/d/10eHQDelLdlLpawuHmSrbcZaHe2EhJBgh/view?usp=sharing
AI-Powered Computer Control for Enhanced Accessibility
VoiceBridge is an innovative accessibility tool that bridges the gap between technology and users with impairments. Using advanced AI vision and natural language processing, it enables users to control their computers through voice commands, making technology truly inclusive.
"Making technology accessible to everyone, regardless of physical limitations"
- Screen Analysis - AI-powered visual understanding of your screen
- Voice Control - Natural language commands for computer interaction
- Smart Clicking - Intelligent UI element detection and interaction
- Text Input - Voice-to-text typing capabilities
- Browser Integration - Chrome extension for web navigation
- Page Navigation - Forward/backward navigation with voice commands
- Scrolling Control - Smooth scrolling with voice commands
- DAIN Protocol - AI service framework
- Node.js/TypeScript - Main service layer
- Python Flask - Computer automation API
- Chrome Extension - Browser integration
- OpenAI GPT-4 Vision - Screen analysis and understanding
- EasyOCR - Optical Character Recognition
- Computer Vision (OpenCV) - Image processing
- Levenshtein Distance - Text similarity matching
- PyAutoGUI - Screen automation (clicking, typing, scrolling)
- Screen Capture - Real-time screen analysis
- Keyboard/Mouse Control - System-level automation
- TypeScript 5.5+ - Type-safe development
- Hono - Lightweight web framework
- Zod - Schema validation
- Axios - HTTP client
- Node.js 18+
- Python 3.8+
- OpenAI API Key
- DAIN API Key
- Clone the repository
git clone https://github.com/quachphu/Hackathon.git
cd SoCalHackathon2024-main- Install Node.js dependencies
npm install- Install Python dependencies
cd api
pip install flask opencv-python pyautogui easyocr levenshtein matplotlib numpy- Environment Setup
# Create .env.development file
cp .env.development.example .env.development
# Add your API keys
DAIN_API_KEY=your_dain_api_key_here
OPENAI_API_KEY=your_openai_api_key_here- Chrome Extension Setup
- Open Chrome and go to
chrome://extensions/ - Enable "Developer mode"
- Click "Load unpacked" and select the
extension/folder
- Start the Python Flask API
cd api
python main.py
# Server runs on http://localhost:3000- Start the DAIN Service
npm run dev
# Service runs on port 2022- Test the connection
curl http://localhost:3000/analyze_screen| Endpoint | Method | Description |
|---|---|---|
/analyze_screen |
GET | Capture and analyze current screen |
/write_text |
POST | Type text on screen |
/press_key |
POST | Emulate key presses |
/click_request |
POST | Click on specified UI element |
/scroll |
POST | Scroll page up/down |
/navigate_page |
POST | Navigate forward/backward |
/open |
POST | Open new tab/window |
- analyze-screen - Analyzes user's screen content
- click-request - Clicks on specified text/buttons
- scroll-page - Controls page scrolling
- navigate-page - Page navigation controls
- write-string - Text input functionality
- press-key - Keyboard emulation
- open-tab - Tab management
"What's on my screen?" → Analyzes current screen content
"Click on the login button" → Finds and clicks login button
"Scroll down" → Scrolls page downward
"Type 'Hello World'" → Types the specified text
"Press Enter" → Presses Enter key
"Go back" → Navigates to previous page
"Open new tab" → Opens a new browser tab
// Analyze screen
const analysis = await dainService.tools.analyzeScreen({
question: "What's displayed on the screen?"
});
// Click on element
await dainService.tools.clickRequest({
question: "Click on the submit button"
});
// Type text
await dainService.tools.writeString({
question: "Enter my email address"
});{
"project-id": "",
"api-key": "MUST PUT IN .env.development as DAIN_API_KEY=YOUR_API_KEY",
"main-file": "src/index.ts",
"static-dir": "static",
"out-dir": "dist",
"environment": "development",
"version": "1.0.0",
"runtime": "node"
}- Port: 3000 (configurable in
main.py) - Image Storage:
../temp/image.png - OCR Language: English (
["en"])
- Motor Impairments - Voice control for users with limited hand mobility
- Visual Impairments - Screen reading and navigation assistance
- Cognitive Assistance - Simplified computer interaction
- Hands-free Computing - Multitasking while controlling computer
- Voice Automation - Streamlined workflow automation
- Remote Control - Computer control from distance
SoCalHackathon2024 - Building inclusive technology that bridges accessibility gaps
- Complete AI-powered accessibility suite
- Real-time screen analysis and control
- Natural language computer interaction
- Cross-platform browser integration
- Real-time OCR and computer vision integration
- Seamless AI service communication
- Precise UI element detection and interaction
- Cross-service architecture coordination
We welcome contributions! Please see our Contributing Guidelines.
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Commit changes:
git commit -m 'Add amazing feature' - Push to branch:
git push origin feature/amazing-feature - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
DFPP Team
- Dichill - Full Stack Application and System Architecture
- Fahat - Frontend & User Experience
- Paola - Frontend & User Experience
- Phu - Full Stack Application
- DAIN Protocol - For the amazing AI service framework
- OpenAI - For powerful vision and language models
- SoCalHackathon2024 - For the opportunity to build inclusive technology
- Accessibility Community - For inspiration and feedback
- Email: [quachphuwork@gmail.com]
- Issues: GitHub Issues
- Discussions: GitHub Discussions
Made with ❤️ for accessibility and inclusion