Skip to content

Codex128187/Empathy-Engine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🎙️ The Empathy Engine

Giving AI a Human Voice — A service that dynamically modulates synthesized speech based on the detected emotion of the source text.

Python Flask License


🎯 Overview

The Empathy Engine bridges the gap between text-based sentiment and expressive, human-like audio output. Instead of monotonic, robotic speech, it detects the emotion behind your text and adjusts the voice — its speed, pitch, and volume — to match how a human would naturally say it.

Key Features:

  • Granular Emotion Detection — Goes beyond positive/negative/neutral to detect 7 emotions: Happy, Excited, Sad, Angry, Fearful, Surprised, and Neutral
  • Intensity Scaling — The degree of emotion affects the voice. "This is good" sounds different from "THIS IS THE BEST NEWS EVER!!!"
  • SSML-like Pause Injection — Natural pauses at punctuation for more human-like delivery
  • Web Interface — A polished Flask UI with instant audio playback
  • CLI Mode — A terminal-based interface for quick testing
  • API Endpoint — JSON API at POST /synthesize for programmatic access

🏗️ Architecture & Design Choices

Emotion Detection — Two-Stage Hybrid Approach

Rather than relying on a single method, the engine uses a two-stage pipeline:

  1. VADER Sentiment Analysis — Provides a continuous compound score (–1.0 to +1.0) that captures overall valence and intensity. VADER is specifically tuned for social media text and handles slang, emojis, capitalization, and punctuation boosters (like !!!) out of the box.

  2. Keyword Matching — A curated keyword lexicon distinguishes between emotions that share the same VADER polarity. For example, "angry" and "sad" are both negative, but keyword matching differentiates them.

The two stages are cross-validated: keyword matches are only trusted if the VADER polarity aligns, preventing false classifications.

Intensity Scaling

The intensity (0.0–1.0) is derived from VADER's absolute compound score, then boosted by:

  • Exclamation marks (! → +0.08 each, capped at +0.3)
  • ALL CAPS words → +0.06 each, capped at +0.2

This intensity value controls an interpolation between the neutral voice profile and the target emotion's profile. A low-intensity "happy" barely changes the voice; a high-intensity "excited" dramatically increases rate and pitch.

Voice Parameter Modulation

Each emotion maps to a voice profile that modulates three parameters:

Emotion Rate (wpm) Volume Pitch Δ (Hz)
Excited ↑ Fast Full +40
Happy ↑ Slightly Full +20
Neutral Normal 95% 0
Sad ↓ Slow Soft –30
Angry ↑ Fast Full +15
Fearful ↑ Fast Soft +25
Surprised ↑↑ Fast Full +35

These profiles are interpolated by intensity, so the actual values fall on a spectrum.

TTS Engine

pyttsx3 was chosen for fully offline, zero-API-key operation. It uses the system's native TTS engine (espeak on Linux, SAPI5 on Windows, NSSpeechSynthesizer on macOS). Pitch control is best supported on Linux (espeak backend); on other platforms, rate and volume modulation still produce clearly differentiated emotional speech.


🚀 Setup & Installation

Prerequisites

  • Python 3.8+
  • espeak (Linux) or native TTS (macOS/Windows)
# On Ubuntu/Debian — install espeak for pyttsx3
sudo apt-get install espeak

# On macOS — no extra install needed (uses NSSpeechSynthesizer)
# On Windows — no extra install needed (uses SAPI5)

Install

# Clone the repository
git clone https://github.com/YOUR_USERNAME/empathy-engine.git
cd empathy-engine

# Create a virtual environment (recommended)
python -m venv venv
source venv/bin/activate   # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

▶️ Running the Application

Web Interface (default)

python app.py

Open http://localhost:5000 in your browser. Type or paste text, click Synthesize Speech, and listen.

CLI Mode

python app.py --cli

Type sentences interactively in the terminal. Emotion analysis and voice parameters are printed, and audio files are saved to static/audio/.

API Usage

curl -X POST http://localhost:5000/synthesize \
  -H "Content-Type: application/json" \
  -d '{"text": "I am absolutely thrilled about this!"}'

Response:

{
  "emotion": "excited",
  "intensity": 0.82,
  "vader_scores": { "neg": 0.0, "neu": 0.359, "pos": 0.641, "compound": 0.6996 },
  "voice_profile": { "rate": 218, "volume": 0.94, "pitch_delta": 32 },
  "audio_url": "/audio/empathy_a1b2c3d4e5.wav"
}

📂 Project Structure

empathy-engine/
├── app.py                 # Main application (Flask + emotion + TTS)
├── requirements.txt       # Python dependencies
├── README.md              # This file
├── templates/
│   └── index.html         # Web interface
└── static/
    └── audio/             # Generated .wav files (auto-created)

🧪 Example Test Cases

Input Text Expected Emotion Intensity
"The meeting is at 3 PM." Neutral Low
"I'm so happy for you!" Happy Medium
"THIS IS THE BEST NEWS EVER!!!" Excited High
"This is unacceptable. I demand a refund." Angry High
"I'm worried about the deadline." Fearful Medium
"I miss those days so much." Sad Medium

📜 License

MIT License — free to use, modify, and distribute.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors