Skip to content

feat: add TTS voice message responses via edge-tts#65

Open
asturwebs wants to merge 5 commits intosix-ddc:mainfrom
asturwebs:feat/tts-edge-tts
Open

feat: add TTS voice message responses via edge-tts#65
asturwebs wants to merge 5 commits intosix-ddc:mainfrom
asturwebs:feat/tts-edge-tts

Conversation

@asturwebs
Copy link
Contributor

@asturwebs asturwebs commented Mar 24, 2026

Summary

  • Text-to-speech: Final assistant responses sent as Telegram voice notes using Microsoft Edge neural voices (edge-tts)
  • Per-user control: /voice toggles TTS on/off and sets voice, /voices lists available voices
  • Non-intrusive: Text always sent first; voice appended after. If TTS fails, text is preserved
  • Multi-locale: 400+ voices across 30+ languages — explore and pick from Telegram
  • Smart UX: Detects /voices <VoiceName> confusion and suggests /voice; graceful 503 handling
  • Text cleanup: Emojis, markdown, code fences, and symbols stripped from audio while text stays enriched
  • Voice validation: Rejects command-like strings to prevent edge-tts crashes
  • Audio quality: Leading pause prevents first-word truncation in OGG/Opus encoding

Changes

src/ccbot/tts.py (new)

  • synthesize(text, user_id) → OGG/Opus audio bytes via edge-tts
  • send_voice_message() → Send voice note to Telegram with silent fallback
  • is_tts_enabled(), toggle_tts(), get_voice(), set_voice() — per-user state
  • clean_text_for_tts() → strip emojis, markdown, symbols, code fences before synthesis
  • set_voice() → validates voice name, rejects command-like strings (/, list, all)
  • Leading "..." pause to prevent first-word truncation in OGG/Opus encoding

src/ccbot/config.py

  • CCBOT_TTS_ENABLED (default: true) — global toggle
  • CCBOT_TTS_AUTO (default: false) — auto-enable for all users
  • CCBOT_TTS_VOICE (default: es-ES-ElviraNeural) — default voice

src/ccbot/bot.py

  • /voice command — toggle or set voice (auto-enables TTS, validates voice name)
  • /voices command — compact locale index or filtered voice list by locale prefix
  • Confusion detection: /voices <VoiceName> → suggests /voice
  • 503 error handling with retry suggestion for Microsoft TTS service

src/ccbot/handlers/message_queue.py

  • MessageTask gains role and is_complete fields for TTS gating
  • TTS voice sent after final assistant text (is_complete=True, role=assistant)
  • Merge role preservation: promotes to "assistant" when any merged task has that role
  • Merge metadata computed from actually merged tasks only (not remaining queue items)

pyproject.toml

  • edge-tts>=7.2.8 added as dependency

tests/ccbot/test_tts.py (new)

  • TestTTSToggle: toggle on/off, per-user isolation, default off when tts_auto=false (4 tests)
  • TestTTSGlobalDisabled: global disable overrides per-user (1 test)
  • TestCleanTextForTTS: strips emojis, markdown, arrows/symbols; keeps punctuation; empty after clean (5 tests)

Usage

# Toggle TTS on/off:
/voice

# Set a specific voice (auto-enables TTS):
/voice es-AR-ElenaNeural       # Argentine Spanish
/voice en-US-JennyNeural       # US English
/voice zh-CN-XiaoxiaoNeural    # Mandarin Chinese
/voice de-KatjaNeural          # German

# Explore available voices:
/voices                         # Compact index: 30+ locales with voice counts
/voices en                      # All English variants (en-US, en-GB, en-AU, en-IN, ...)
/voices es                      # All Spanish variants (es-ES, es-AR, es-MX, ...)
/voices zh                      # All Chinese variants (zh-CN, zh-TW, zh-HK, ...)
/voices fr                      # French, /voices ja for Japanese, /voices ko for Korean, ...

Voice names are validated — command-like strings are rejected. Use /voices to discover available voices for any language.

Config

CCBOT_TTS_ENABLED=true        # Global toggle
CCBOT_TTS_AUTO=false          # Auto-enable for all users
CCBOT_TTS_VOICE=es-ES-ElviraNeural  # Default voice

Test plan

  • ruff check passes
  • ruff format --check passes
  • pyright — 0 errors
  • 10 new tests pass
  • 233 existing tests pass (8 pre-existing failures in test_transcribe.py)
  • Production tested: TTS + STT full-duplex working
  • /voice, /voice <name>, /voices, /voices <locale> tested
  • Voice validation: rejects /voices, list, all as voice names
  • 503 error handling tested (Microsoft service unavailable)
  • /voices <VoiceName> confusion detection tested
  • Text cleanup: emojis/markdown/symbols stripped from audio, preserved in text
  • Code fence content stripped from audio (cleanup order fix)
  • First-word truncation fixed with leading pause
  • Merge role preservation: assistant role survives user+assistant merge
  • Merge metadata scoped to actually merged tasks only

🤖 Generated with BytIA

Add text-to-speech support using Microsoft Edge neural voices.
Final assistant responses are sent as Telegram voice notes alongside text.

Features:
- /voice — Toggle TTS on/off (per-user)
- /voice <name> — Set voice and auto-enable TTS
- /voices — Compact locale index with voice counts
- /voices <locale> — List all voices for a locale (es, en, zh...)
- Per-user voice selection with global defaults
- Graceful 503 handling for Microsoft service outages
- Smart /voices vs /voice confusion detection

Config (env vars):
- CCBOT_TTS_ENABLED (default: true)
- CCBOT_TTS_AUTO (default: false)
- CCBOT_TTS_VOICE (default: es-ES-ElviraNeural)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e560196e0b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

- Validate voice names in set_voice() to prevent invalid names like
  '/voices' from crashing edge-tts (ValueError)
- Add clean_text_for_tts() to strip emojis, symbols, and markdown
  artifacts before TTS synthesis for cleaner audio
- Preserve 'assistant' role when merging mixed-role tasks so TTS
  isn't skipped when user+assistant messages are batch-merged
- Add 5 new tests for text cleanup

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b9307f76a2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

asturwebs and others added 2 commits March 24, 2026 19:21
merged_role and merged_complete were derived from all drained queue items
including non-merged ones put back. Now uses [first] + items[:merge_count]
to avoid incorrectly labeling merged tasks when later non-mergeable
tasks have assistant role or is_complete=True.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The markdown cleanup pattern stripped backticks before the code fence
pattern could match, leaving code block content as orphan text in TTS.
Now code fences are removed first (with their content), then remaining
inline backticks are cleaned up.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1d9a3312c8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

OGG/Opus encoding trims the first few milliseconds, cutting the
initial phoneme. Prefixing text with "... " forces edge-tts to
start with a brief silence, preserving the first word intact.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant