This repository is a forked version of the original Cartesia Voice Agent example. The main enhancements include the integration of Retrieval-Augmented Generation (RAG) using OpenAI embeddings and Qdrant vector database for semantic search capability. Specifically:
- Document Ingestion: Added
injest.pyscript that processes text files, generates embeddings with OpenAI'stext-embedding-3-smallmodel, and populates the Qdrant vector database (knowledge_base). - Custom Retrieval Function: Implemented
retrieve_infoinmain.pyto enable the agent to query the vector database for exact and semantic matches, returning concise, context-aware responses. - Enhanced Logging: Improved logging throughout ingestion and retrieval processes to assist debugging and ensure visibility into internal operations.
- Async Handling: Ensured all blocking I/O operations (e.g., Qdrant queries, OpenAI embedding requests) run in separate threads (
asyncio.to_thread) to maintain asynchronous efficiency.
- The
thinking_messagesinagent\main.pyaren't being used at all. They should be used while the agent is processing the user's query.
Feel Free to contribute to this repository.
This is a demo of a LiveKit Voice Pipeline Agent using Cartesia and GPT-4o-mini.
The example includes a custom Next.js frontend and Python agent.
- Node.js
- Python 3.9-3.12
- LiveKit Cloud account (or OSS LiveKit server)
- Cartesia API key (for speech synthesis)
- OpenAI API key (for LLM)
- Deepgram API key (for speech-to-text)
- Qdrant API key, Cluster_ID, and URL (for semantic search)
Copy .env.example to .env.local and set the environment variables. Then run:
cd frontend
npm install
npm run devCopy .env.example to .env and set the environment variables. Then run:
cd agent
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# This will ingest the document into the Qdrant Vector Database (Only to Run Once)
python injest.py
python main.py dev