Skip to content

StackThrower/VoiceSalesAgent

Repository files navigation

AI Calling Agent MVP

An outbound voice agent that places calls and walks them through a six-state sales funnel — code controls the flow, the LLM only generates language.

python fastapi sqlite twilio openai elevenlabs docker


The funnel

The conversation is a state machine. The LLM never decides where the call goes — it only writes the next line, and a classifier (interested / not_interested / unclear) feeds the engine.

   ┌──────┐    ┌───────┐    ┌──────────┐    ┌───────┐    ┌───────┐    ┌─────┐
   │ INIT │ -> │ INTRO │ -> │ QUALIFY  │ -> │ PITCH │ -> │ CLOSE │ -> │ END │
   └──────┘    └───────┘    └──────────┘    └───────┘    └───────┘    └─────┘
                                  │                                      ▲
                                  └──────── not_interested ──────────────┘

Every transition is code-controlled, persisted to SQLite, and emitted as a structured log line.


Why this exists

Voice agents fail in production from accumulated complexity, not from missing features. This repo is the smallest thing that places a real call, drives a real funnel, and persists every turn — built against a five-principle constitution:

# Principle Status
I Simplicity Over Cleverness standard
II Deterministic Control, Generative Surface standard
III Latency Discipline NON-NEGOTIABLE
IV Conversation Is a Funnel standard
V Transparency to the Caller NON-NEGOTIABLE

Every turn must respond in under 3 seconds end-to-end. The agent identifies itself as AI on request and honors do-not-call signals on the same call.


Architecture at a glance

       caller                Twilio                 FastAPI                OpenAI
        ───┐                  ───┐                    ───┐                  ───┐
           │  speech           │  /twilio/voice         │  generate_reply     │
           │ ─────────────────▶│ ────────────────────▶  │ ──────────────────▶ │
           │                   │                        │                     │
           │                   │                        │  ◀── classify ──────│
           │                   │  TwiML <Say>           │                     │
           │  audio   ◀──────  │ ◀───────────────────── │ ─── stream TTS ──▶  ElevenLabs
           ▼                   ▼                        ▼
                                                    SQLite (calls, messages)

Three layers, no message queue, no Redis, no Alembic:

app/
├── routes/      # POST /call/start, POST /twilio/voice — thin
├── services/    # conversation_engine, ai_service, tts_service, intent, retention
└── db/          # SQLAlchemy async + aiosqlite; two tables: calls, messages

Quickstart

Full walkthrough: specs/001-calling-agent-mvp/quickstart.md · Target: clone → first test call in 10 minutes.

Prerequisites: Docker + Docker Compose, a Twilio account with a verified test number, and a public URL for webhooks (ngrok http 8000).

git clone <repo-url> && cd VoiceSalesAgent
cp .env.example .env && $EDITOR .env   # paste keys + PUBLIC_BASE_URL
docker compose up --build

Place a call:

curl -X POST http://localhost:8000/call/start \
  -H 'Content-Type: application/json' \
  -d '{"phone": "+1YOUR_TEST_NUMBER"}'

Your phone rings. The agent introduces itself, qualifies, pitches, and closes — or ends politely if you signal disinterest. Every turn lands in SQLite with the funnel state at the time of the turn.


Configuration

Variable Required Notes
OPENAI_API_KEY yes LLM + classifier
TWILIO_ACCOUNT_SID yes telephony
TWILIO_AUTH_TOKEN yes webhook signature verification
TWILIO_PHONE_NUMBER yes originating number
ELEVENLABS_API_KEY yes streaming TTS; falls back to Twilio <Say> on error
PUBLIC_BASE_URL yes https URL Twilio can reach (ngrok in dev)
HANDOFF_NUMBER optional live-transfer target; FR-021 falls back cleanly if absent
WEBHOOK_BUDGET_SECONDS optional engine short-circuits past this; default 1.8

Behavior matrix

What the agent does when reality gets weird:

Scenario What you'll hear What gets persisted
Caller asks "are you AI?" Confirms it's an AI, keeps going normal turn (FR-002, SC-005)
Caller asks for a person "Transferring you now"; Twilio dials out status=transferred
Caller says "do not call" Polite end on the same call end_reason=caller_dnc
Silence ≥ 8 s Polite end end_reason=silence_timeout
LLM error Fallback line, then end end_reason=llm_error, status=failed
No HANDOFF_NUMBER "Can't transfer right now"; ends end_reason=handoff_unconfigured

Performance baseline

Simulated harness (tests/integration/test_latency_smoke.py, RUN_LATENCY_SMOKE=1 pytest) measures wall-clock from POST /twilio/voice ingress to TwiML response, with realistic per-stage delays.

Stage Budget Observed (p95)
LLM 1.5 s ~600 ms
TTS first byte 1.5 s ~700 ms
Webhook ack 2.0 s ~1.32 s
End-to-end 3.0 s within budget

Twilio adds ~700 ms in production (STT silence-detect + TwiML round-trip). WEBHOOK_BUDGET_SECONDS is the load-bearing constraint; if real LLM tail latency pushes engine work past it, the call ends as telephony_error and the breach is logged as event=webhook_budget_exceeded.


Project layout

.
├── app/                       # FastAPI service (routes / services / db)
├── tests/
│   ├── contract/              # OpenAPI conformance for /call/start
│   ├── integration/           # full funnel, qualify routing, failure modes,
│   │                          # handoff, idempotency, transparency, retention
│   └── unit/                  # state machine, intent service
├── specs/001-calling-agent-mvp/
│   ├── spec.md                # functional + success criteria
│   ├── plan.md                # implementation plan + constitution check
│   ├── research.md            # latency budget, library choices
│   ├── data-model.md          # tables, columns, indexes
│   ├── quickstart.md          # clone → first call (10-minute target)
│   └── contracts/             # OpenAPI + Twilio webhook shape
├── .specify/memory/constitution.md   # the five principles
├── Dockerfile
├── docker-compose.yml
└── pyproject.toml

Documentation

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors