🎤 Heartbeat

Native C++17 Inference Engine for Kokoro-82M Text-to-Speech

Heartbeat is a high-performance, standalone TTS engine that runs Kokoro-82M entirely in C++ with no Python runtime dependencies. Built on GGML for tensor operations and custom ISTFT for audio synthesis.

✨ Features

⚡ Fast: <200ms latency for 5-second sentences on AVX2 CPUs
🎯 Portable: Single GGUF model file, no external dependencies at runtime
🔊 High Quality: 24kHz audio output using ISTFTNet vocoder
🌍 Multi-Voice: American English, Indian English, and more

🚀 Quick Start

1. Set Up Dependencies

# Windows (PowerShell as Administrator)
.\scripts\setup_dependencies.ps1

This installs:

espeak-ng - Text-to-phoneme conversion
GGML - Tensor operations library
KissFFT - Fast Fourier Transform
Python packages - For model export

2. Download & Export Model

# Download Kokoro-82M from Hugging Face
python scripts/download_model.py

# Convert to GGUF format
python scripts/export_kokoro.py

3. Build

mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
cmake --build . --config Release

4. Run

./heartbeat --text "Hello, world!" --voice af --output hello.wav

🎭 Available Voices

Voice Code	Description
`af`	American Female
`am`	American Male
`bf`	British Female
`bm`	British Male
`in_f`	Indian Female
`in_m`	Indian Male

📖 Usage

# Basic synthesis
heartbeat --text "Welcome to Heartbeat!" --output welcome.wav

# Specify voice
heartbeat --text "नमस्ते" --voice in_f --output namaste.wav

# Benchmark mode
heartbeat --benchmark --text "Performance test sentence."

🏗️ Architecture

Text → Phonemizer (espeak-ng) → PL-BERT Encoder → Duration Predictor
                                      ↓
    WAV ← ISTFT ← ISTFTNet Decoder ← Length Regulator ← Style Vector

📁 Project Structure

Heartbeat/
├── extern/           # Third-party libraries
│   ├── ggml/         # Tensor operations
│   └── kissfft/      # FFT library
├── models/           # Model files (.pth, .gguf)
├── scripts/          # Python utilities
├── include/          # C++ headers
├── src/              # C++ implementation
└── tests/            # Unit tests

🤝 Credits

Kokoro-82M - The original model
GGML - Tensor library by Georgi Gerganov
espeak-ng - Text-to-phoneme engine
StyleTTS2 - Original architecture

📄 License

MIT License - See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 84 Commits
extern/kissfft		extern/kissfft
include		include
models		models
scripts		scripts
src		src
tests		tests
www		www
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎤 Heartbeat

✨ Features

🚀 Quick Start

1. Set Up Dependencies

2. Download & Export Model

3. Build

4. Run

🎭 Available Voices

📖 Usage

🏗️ Architecture

📁 Project Structure

🤝 Credits

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🎤 Heartbeat

✨ Features

🚀 Quick Start

1. Set Up Dependencies

2. Download & Export Model

3. Build

4. Run

🎭 Available Voices

📖 Usage

🏗️ Architecture

📁 Project Structure

🤝 Credits

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages