XTTS v2 Voice Cloning for Palmir Device

Local voice cloning using Coqui TTS XTTS v2 on your Raspberry Pi CM5.

Features

🎙️ Zero-shot voice cloning - Clone any voice from 6-10 seconds of audio
🌍 Multilingual - Supports 17+ languages
🔊 Integrated with Palmir hardware - Uses device microphone and speakers
💻 Fully local - No internet required after setup
🚀 Easy to use - Web GUI + CLI interface
🌐 Web Interface - Beautiful drag & drop web UI for easy voice cloning

Installation

The environment is already set up! Just activate it:

cd "/home/distiller/projects/voice cloning"
source venv/bin/activate

Quick Start

Option 1: Web GUI (Recommended) 🌐

Launch the beautiful web interface:

./start_web.sh

Then open in your browser:

Local: http://localhost:5001
Network: http://<palmir-ip>:5001

Features:

Drag & drop file upload
Record directly from microphone
Real-time voice cloning
Audio playback & download
File management

See WEB_GUI_README.md for details.

Option 2: Command Line Interface

For CLI usage, see below or check QUICKSTART.md.

CLI Usage

Quick Start (Full Pipeline)

Record your voice and generate cloned speech in one command:

python voice_clone.py --mode full --text "Hello, this is my cloned voice!"

This will:

Record 10 seconds of your voice as reference
Clone your voice and generate the specified text
Play back the result

Step-by-Step Usage

1. Record Reference Audio

python voice_clone.py --mode record --duration 10 --reference my_voice.wav

Speak naturally for the duration. The more expressive, the better!

2. Clone Voice and Generate Speech

python voice_clone.py --mode clone \
  --reference my_voice.wav \
  --text "This is the text I want to speak in the cloned voice" \
  --output output.wav

Advanced Options

python voice_clone.py \
  --mode clone \
  --reference voice_reference.wav \
  --text "Your text here" \
  --output generated.wav \
  --language en \
  --no-play  # Don't auto-play the result

Supported Languages

English: en
Spanish: es
French: fr
German: de
Italian: it
Portuguese: pt
Polish: pl
Turkish: tr
Russian: ru
Dutch: nl
Czech: cs
Arabic: ar
Chinese: zh-cn
Japanese: ja
Hungarian: hu
Korean: ko
Hindi: hi

Performance

On Raspberry Pi CM5 (ARM64, 4-core):

First run: ~2-3 minutes (downloads models)
Voice cloning: ~10-30 seconds per sentence
RAM usage: ~3-4GB during inference
Quality: Near-human naturalness

Tips for Best Results

Reference audio quality:
- Use 6-10 seconds of clear speech
- Avoid background noise
- Include emotional variety if possible
Text generation:
- Shorter sentences generate faster
- Natural punctuation improves prosody
- XTTS handles complex text well
Performance:
- First generation takes longer (model loading)
- Subsequent generations are faster
- CPU mode is slower but works well

Troubleshooting

Model download fails

First run downloads ~2GB of models. Ensure good internet connection:

python -c "from TTS.api import TTS; TTS('tts_models/multilingual/multi-dataset/xtts_v2')"

Audio hardware issues

Test Palmir audio separately:

source /opt/distiller-cm5-sdk/activate.sh
python -c "from distiller_cm5_sdk.hardware.audio import Audio; a = Audio(); print('Audio OK')"

Out of memory

Close other applications or reduce text length.

Examples

Clone your voice

# Record yourself
python voice_clone.py --mode record --duration 10

# Generate speech
python voice_clone.py --mode clone \
  --text "I can now speak any text in my own voice!"

Use existing audio file

# Clone from any WAV file
python voice_clone.py --mode clone \
  --reference /path/to/audio.wav \
  --text "The quick brown fox jumps over the lazy dog"

Multiple languages

# Generate in Spanish
python voice_clone.py --mode clone \
  --reference spanish_speaker.wav \
  --text "Hola, ¿cómo estás?" \
  --language es

Architecture

TTS Engine: Coqui TTS XTTS v2
Backend: PyTorch (CPU mode)
Audio I/O: Palmir SDK (ALSA)
Sample Rate: 22050 Hz
Format: WAV (16-bit PCM)

License

This tool uses Coqui TTS, which is open source. Check their license for commercial use.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
templates		templates
.gitignore		.gitignore
COMPLETE_FEATURES.md		COMPLETE_FEATURES.md
DISTILLER_INTEGRATION.md		DISTILLER_INTEGRATION.md
QUICKSTART.md		QUICKSTART.md
README.md		README.md
SETUP_NOTES.md		SETUP_NOTES.md
VOICE_INTERACTION_GUIDE.md		VOICE_INTERACTION_GUIDE.md
VOICE_PROFILES_API.md		VOICE_PROFILES_API.md
WEB_GUI_README.md		WEB_GUI_README.md
distiller_xtts_tts.py		distiller_xtts_tts.py
run_production.py		run_production.py
run_server.sh		run_server.sh
simple_voice_assistant.py		simple_voice_assistant.py
start_web.sh		start_web.sh
test_distiller_tts.py		test_distiller_tts.py
test_profiles.py		test_profiles.py
voice-cloning-web.service		voice-cloning-web.service
voice-cloning.service		voice-cloning.service
voice_clone.py		voice_clone.py
voice_profiles.py		voice_profiles.py
web_app.py		web_app.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

XTTS v2 Voice Cloning for Palmir Device

Features

Installation

Quick Start

Option 1: Web GUI (Recommended) 🌐

Option 2: Command Line Interface

CLI Usage

Quick Start (Full Pipeline)

Step-by-Step Usage

1. Record Reference Audio

2. Clone Voice and Generate Speech

Advanced Options

Supported Languages

Performance

Tips for Best Results

Troubleshooting

Model download fails

Audio hardware issues

Out of memory

Examples

Clone your voice

Use existing audio file

Multiple languages

Architecture

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

XTTS v2 Voice Cloning for Palmir Device

Features

Installation

Quick Start

Option 1: Web GUI (Recommended) 🌐

Option 2: Command Line Interface

CLI Usage

Quick Start (Full Pipeline)

Step-by-Step Usage

1. Record Reference Audio

2. Clone Voice and Generate Speech

Advanced Options

Supported Languages

Performance

Tips for Best Results

Troubleshooting

Model download fails

Audio hardware issues

Out of memory

Examples

Clone your voice

Use existing audio file

Multiple languages

Architecture

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages