Skip to content

Naveen910/Voice-Agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

75 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Voice-Agent

Architecture:

🎀 User (voice)
  ↓ (Speech-to-Text, e.g., Whisper)

πŸ“ Text Query
  ↓
πŸ€– LangChain Agent (LLM + Tools)
  - Google Calendar Tool
  - Gmail Tool
  - SQL/NoSQL Database Tool
  - File Search Tool
  - Custom APIs
  ↓
πŸ“ Text Response
  ↓ (Text-to-Speech, e.g., OpenAI TTS / ElevenLabs)

πŸ”Š Spoken Output

Example Flow:

User (voice): "Schedule a meeting with Naveen tomorrow at 10 AM and send him an email confirmation."

  • Whisper β†’ converts to text.
  • LangChain Agent β†’ interprets the intent.
  • Calls Google Calendar Tool to create the event.
  • Calls Gmail Tool to send confirmation.
  • LLM β†’ generates a spoken confirmation: "I’ve scheduled the meeting and sent Rahul an email."
  • TTS β†’ speaks back.

Stack Flow:

Frontend
🎀 User voice β†’ (STT: Whisper.js / Web Speech API / Vosk WASM / AssemblyAI SDK)
   ↓
πŸ“ Text query β†’ Sent to Backend

Backend
πŸ€– LangChain Agent (LLM + Tools: Calendar, Gmail, DB, APIs, File Search)
   ↓
πŸ“ Text response β†’ Sent back to Frontend

Frontend
↓
(Text-to-Speech: OpenAI TTS / ElevenLabs / Browser SpeechSynthesis API)
πŸ”Š Spoken Output

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors