A real-time voice-to-voice AI agent built with LangChain, AssemblyAI for speech-to-text, and Inworld for text-to-speech. As we're illustrating the "voice agent sandwich" of STT -> LLM -> TTS, our example scenario is also one where you can order a sandwich.
- Node.js v20 or higher
- npm v9 or higher
| Service | Environment Variable | Purpose | Get Key |
|---|---|---|---|
| Anthropic | ANTHROPIC_API_KEY |
Claude LLM | console.anthropic.com |
| AssemblyAI | ASSEMBLYAI_API_KEY |
Speech-to-Text | assemblyai.com |
| Inworld | INWORLD_API_KEY |
Text-to-Speech | platform.inworld.ai |
git clone https://github.com/inworld-ai/langchain-voice-agent-node
cd langchain-voice-agent-node
npm installcp .env.example .env
# Edit .env and add your API keysnpm run build
npm startClick the link in the terminal (http://localhost:8000) to open the app. Click "Start Conversation" to begin speaking with the agent.
For development with hot reload:
npm run devlangchain-voice-agent-node/
├── src/
│ ├── backend/ # Node.js + Hono server
│ │ ├── index.ts # Server & WebSocket pipeline
│ │ ├── types.ts # Event type definitions
│ │ ├── utils.ts # Async iterator utilities
│ │ ├── assemblyai/ # AssemblyAI STT client
│ │ │ ├── index.ts
│ │ │ ├── api-types.ts
│ │ │ └── stt.ts
│ │ └── inworld/ # Inworld TTS client
│ │ ├── index.ts
│ │ ├── api-types.ts
│ │ ├── prompts.ts
│ │ └── tts.ts
│ └── frontend/ # Svelte web app
│ ├── package.json
│ ├── vite.config.ts
│ └── src/
├── package.json
├── tsconfig.json
└── .env.example
| Command | Purpose |
|---|---|
npm install |
Install all dependencies (root + frontend workspace) |
npm run build |
Build frontend, then compile backend |
npm start |
Run server on :8000 |
npm run dev |
Build frontend + run backend with hot-reload |
npm run lint |
Run ESLint on backend code |
npm run type-check |
TypeScript type checking |
We welcome contributions! Please see CONTRIBUTING.md for guidelines.
This project is licensed under the MIT License - see the LICENSE file for details.