An outbound voice agent that places calls and walks them through a six-state sales funnel — code controls the flow, the LLM only generates language.
The conversation is a state machine. The LLM never decides where the call
goes — it only writes the next line, and a classifier (interested /
not_interested / unclear) feeds the engine.
┌──────┐ ┌───────┐ ┌──────────┐ ┌───────┐ ┌───────┐ ┌─────┐
│ INIT │ -> │ INTRO │ -> │ QUALIFY │ -> │ PITCH │ -> │ CLOSE │ -> │ END │
└──────┘ └───────┘ └──────────┘ └───────┘ └───────┘ └─────┘
│ ▲
└──────── not_interested ──────────────┘
Every transition is code-controlled, persisted to SQLite, and emitted as a structured log line.
Voice agents fail in production from accumulated complexity, not from missing features. This repo is the smallest thing that places a real call, drives a real funnel, and persists every turn — built against a five-principle constitution:
| # | Principle | Status |
|---|---|---|
| I | Simplicity Over Cleverness | standard |
| II | Deterministic Control, Generative Surface | standard |
| III | Latency Discipline | NON-NEGOTIABLE |
| IV | Conversation Is a Funnel | standard |
| V | Transparency to the Caller | NON-NEGOTIABLE |
Every turn must respond in under 3 seconds end-to-end. The agent identifies itself as AI on request and honors do-not-call signals on the same call.
caller Twilio FastAPI OpenAI
───┐ ───┐ ───┐ ───┐
│ speech │ /twilio/voice │ generate_reply │
│ ─────────────────▶│ ────────────────────▶ │ ──────────────────▶ │
│ │ │ │
│ │ │ ◀── classify ──────│
│ │ TwiML <Say> │ │
│ audio ◀────── │ ◀───────────────────── │ ─── stream TTS ──▶ ElevenLabs
▼ ▼ ▼
SQLite (calls, messages)
Three layers, no message queue, no Redis, no Alembic:
app/
├── routes/ # POST /call/start, POST /twilio/voice — thin
├── services/ # conversation_engine, ai_service, tts_service, intent, retention
└── db/ # SQLAlchemy async + aiosqlite; two tables: calls, messages
Full walkthrough:
specs/001-calling-agent-mvp/quickstart.md· Target: clone → first test call in 10 minutes.
Prerequisites: Docker + Docker Compose, a Twilio account with a
verified test number, and a public URL for webhooks (ngrok http 8000).
git clone <repo-url> && cd VoiceSalesAgent
cp .env.example .env && $EDITOR .env # paste keys + PUBLIC_BASE_URL
docker compose up --buildPlace a call:
curl -X POST http://localhost:8000/call/start \
-H 'Content-Type: application/json' \
-d '{"phone": "+1YOUR_TEST_NUMBER"}'Your phone rings. The agent introduces itself, qualifies, pitches, and closes — or ends politely if you signal disinterest. Every turn lands in SQLite with the funnel state at the time of the turn.
| Variable | Required | Notes |
|---|---|---|
OPENAI_API_KEY |
yes | LLM + classifier |
TWILIO_ACCOUNT_SID |
yes | telephony |
TWILIO_AUTH_TOKEN |
yes | webhook signature verification |
TWILIO_PHONE_NUMBER |
yes | originating number |
ELEVENLABS_API_KEY |
yes | streaming TTS; falls back to Twilio <Say> on error |
PUBLIC_BASE_URL |
yes | https URL Twilio can reach (ngrok in dev) |
HANDOFF_NUMBER |
optional | live-transfer target; FR-021 falls back cleanly if absent |
WEBHOOK_BUDGET_SECONDS |
optional | engine short-circuits past this; default 1.8 |
What the agent does when reality gets weird:
| Scenario | What you'll hear | What gets persisted |
|---|---|---|
| Caller asks "are you AI?" | Confirms it's an AI, keeps going | normal turn (FR-002, SC-005) |
| Caller asks for a person | "Transferring you now"; Twilio dials out | status=transferred |
| Caller says "do not call" | Polite end on the same call | end_reason=caller_dnc |
| Silence ≥ 8 s | Polite end | end_reason=silence_timeout |
| LLM error | Fallback line, then end | end_reason=llm_error, status=failed |
No HANDOFF_NUMBER |
"Can't transfer right now"; ends | end_reason=handoff_unconfigured |
Simulated harness (tests/integration/test_latency_smoke.py,
RUN_LATENCY_SMOKE=1 pytest) measures wall-clock from POST /twilio/voice
ingress to TwiML response, with realistic per-stage delays.
| Stage | Budget | Observed (p95) |
|---|---|---|
| LLM | 1.5 s | ~600 ms |
| TTS first byte | 1.5 s | ~700 ms |
| Webhook ack | 2.0 s | ~1.32 s |
| End-to-end | 3.0 s | within budget |
Twilio adds ~700 ms in production (STT silence-detect + TwiML round-trip).
WEBHOOK_BUDGET_SECONDS is the load-bearing constraint; if real LLM tail
latency pushes engine work past it, the call ends as telephony_error
and the breach is logged as event=webhook_budget_exceeded.
.
├── app/ # FastAPI service (routes / services / db)
├── tests/
│ ├── contract/ # OpenAPI conformance for /call/start
│ ├── integration/ # full funnel, qualify routing, failure modes,
│ │ # handoff, idempotency, transparency, retention
│ └── unit/ # state machine, intent service
├── specs/001-calling-agent-mvp/
│ ├── spec.md # functional + success criteria
│ ├── plan.md # implementation plan + constitution check
│ ├── research.md # latency budget, library choices
│ ├── data-model.md # tables, columns, indexes
│ ├── quickstart.md # clone → first call (10-minute target)
│ └── contracts/ # OpenAPI + Twilio webhook shape
├── .specify/memory/constitution.md # the five principles
├── Dockerfile
├── docker-compose.yml
└── pyproject.toml
- Specification — what the system does and why
- Implementation plan — how it's built, with the constitution check
- Quickstart — clone to first test call
- Data model —
callsandmessagesschemas - Constitution — the five principles every change is checked against