Giving AI a Human Voice — A service that dynamically modulates synthesized speech based on the detected emotion of the source text.
The Empathy Engine bridges the gap between text-based sentiment and expressive, human-like audio output. Instead of monotonic, robotic speech, it detects the emotion behind your text and adjusts the voice — its speed, pitch, and volume — to match how a human would naturally say it.
Key Features:
- Granular Emotion Detection — Goes beyond positive/negative/neutral to detect 7 emotions: Happy, Excited, Sad, Angry, Fearful, Surprised, and Neutral
- Intensity Scaling — The degree of emotion affects the voice. "This is good" sounds different from "THIS IS THE BEST NEWS EVER!!!"
- SSML-like Pause Injection — Natural pauses at punctuation for more human-like delivery
- Web Interface — A polished Flask UI with instant audio playback
- CLI Mode — A terminal-based interface for quick testing
- API Endpoint — JSON API at
POST /synthesizefor programmatic access
Rather than relying on a single method, the engine uses a two-stage pipeline:
-
VADER Sentiment Analysis — Provides a continuous compound score (–1.0 to +1.0) that captures overall valence and intensity. VADER is specifically tuned for social media text and handles slang, emojis, capitalization, and punctuation boosters (like
!!!) out of the box. -
Keyword Matching — A curated keyword lexicon distinguishes between emotions that share the same VADER polarity. For example, "angry" and "sad" are both negative, but keyword matching differentiates them.
The two stages are cross-validated: keyword matches are only trusted if the VADER polarity aligns, preventing false classifications.
The intensity (0.0–1.0) is derived from VADER's absolute compound score, then boosted by:
- Exclamation marks (
!→ +0.08 each, capped at +0.3) - ALL CAPS words → +0.06 each, capped at +0.2
This intensity value controls an interpolation between the neutral voice profile and the target emotion's profile. A low-intensity "happy" barely changes the voice; a high-intensity "excited" dramatically increases rate and pitch.
Each emotion maps to a voice profile that modulates three parameters:
| Emotion | Rate (wpm) | Volume | Pitch Δ (Hz) |
|---|---|---|---|
| Excited | ↑ Fast | Full | +40 |
| Happy | ↑ Slightly | Full | +20 |
| Neutral | Normal | 95% | 0 |
| Sad | ↓ Slow | Soft | –30 |
| Angry | ↑ Fast | Full | +15 |
| Fearful | ↑ Fast | Soft | +25 |
| Surprised | ↑↑ Fast | Full | +35 |
These profiles are interpolated by intensity, so the actual values fall on a spectrum.
pyttsx3 was chosen for fully offline, zero-API-key operation. It uses the system's native TTS engine (espeak on Linux, SAPI5 on Windows, NSSpeechSynthesizer on macOS). Pitch control is best supported on Linux (espeak backend); on other platforms, rate and volume modulation still produce clearly differentiated emotional speech.
- Python 3.8+
- espeak (Linux) or native TTS (macOS/Windows)
# On Ubuntu/Debian — install espeak for pyttsx3
sudo apt-get install espeak
# On macOS — no extra install needed (uses NSSpeechSynthesizer)
# On Windows — no extra install needed (uses SAPI5)# Clone the repository
git clone https://github.com/YOUR_USERNAME/empathy-engine.git
cd empathy-engine
# Create a virtual environment (recommended)
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txtpython app.pyOpen http://localhost:5000 in your browser. Type or paste text, click Synthesize Speech, and listen.
python app.py --cliType sentences interactively in the terminal. Emotion analysis and voice parameters are printed, and audio files are saved to static/audio/.
curl -X POST http://localhost:5000/synthesize \
-H "Content-Type: application/json" \
-d '{"text": "I am absolutely thrilled about this!"}'Response:
{
"emotion": "excited",
"intensity": 0.82,
"vader_scores": { "neg": 0.0, "neu": 0.359, "pos": 0.641, "compound": 0.6996 },
"voice_profile": { "rate": 218, "volume": 0.94, "pitch_delta": 32 },
"audio_url": "/audio/empathy_a1b2c3d4e5.wav"
}empathy-engine/
├── app.py # Main application (Flask + emotion + TTS)
├── requirements.txt # Python dependencies
├── README.md # This file
├── templates/
│ └── index.html # Web interface
└── static/
└── audio/ # Generated .wav files (auto-created)
| Input Text | Expected Emotion | Intensity |
|---|---|---|
| "The meeting is at 3 PM." | Neutral | Low |
| "I'm so happy for you!" | Happy | Medium |
| "THIS IS THE BEST NEWS EVER!!!" | Excited | High |
| "This is unacceptable. I demand a refund." | Angry | High |
| "I'm worried about the deadline." | Fearful | Medium |
| "I miss those days so much." | Sad | Medium |
MIT License — free to use, modify, and distribute.