Skip to content

Add 60db TTS provider (Hindi/Indian voices) — live-API-verified#27

Open
ConalMullan wants to merge 2 commits into
mainfrom
feat/sixtydb-tts-integration
Open

Add 60db TTS provider (Hindi/Indian voices) — live-API-verified#27
ConalMullan wants to merge 2 commits into
mainfrom
feat/sixtydb-tts-integration

Conversation

@ConalMullan

Copy link
Copy Markdown
Collaborator

Adds 60db (https://60db.ai) as a third TTS provider alongside ElevenLabs and Qwen3. Builds on @manishEMS47's work in #26 (their original commit is preserved here) with fixes to make it work against the live 60db API, plus end-to-end verification.

Why 60db

Fills a real gap in the toolkit: native Hindi + Indian-accented English voices, cheaper than ElevenLabs ($0.00002/char) and faster (RTF ~0.22). Qwen3 is English/Chinese-leaning and ElevenLabs is premium-priced for Indic languages. Verified live that the default voice produces good-quality English and Hindi (Devanagari) speech.

The fix (on top of #26)

The original integration coded to 60db's documented /tts-synthesize contract (single JSON {audio_base64}, mp3). In production the endpoint actually streams newline-delimited JSON of raw 48 kHz PCM (Content-Type: application/x-ndjson) with a trailing {metadata} line — so the default path failed with "Invalid JSON response."

tools/sixtydb_tts.py:

  • Rewrote _synthesize_rest to consume the NDJSON PCM stream, while still accepting the documented single-JSON shape if 60db ships it (defensive both ways).
  • Added _finalize_audio — sniffs bytes for an audio container (mp3/wav/ogg/flac) and writes/transcodes as-is, else wraps raw PCM as WAV and transcodes to --output-format via ffmpeg.
  • Added _derive_pcm_sample_rate — infers rate from byte-count ÷ metadata.audio_sec instead of hardcoding.
  • Surfaces 60db metadata.warnings; routed _synthesize_stream through the same PCM finalizer and flagged that /tts-stream currently returns HTTP 500 upstream.

Verified live ✅

  • sixtydb_tts.py → valid MP3 (ID3v2.4, 48 kHz mono), EN + Hindi
  • voiceover.py --provider 60db --scene-dir → correct per-scene MP3s + the JSON shape sync_timing.py consumes
  • Compiles clean (Python 3.9 compatible), dry-run works

Not verified

  • redub.py --tts-provider 60db — needs an ElevenLabs key (Scribe STT) + a video; delegation logic is straightforward but untested end-to-end.
  • websocket transport — matches docs, not needed for batch voiceover (minor wss:// vs documented ws:// discrepancy to confirm).

Closes #26.

🤖 Generated with Claude Code

manishEMS47 and others added 2 commits June 8, 2026 16:16
The original integration coded to 60db's documented /tts-synthesize contract
(single JSON object with `audio_base64` in the requested container format).
In production the endpoint instead streams newline-delimited JSON of raw
16-bit mono PCM (Content-Type: application/x-ndjson) with a trailing
`{metadata}` line, so the default voiceover path failed with "Invalid JSON
response".

Changes (tools/sixtydb_tts.py):
- Rewrite _synthesize_rest to consume the NDJSON PCM stream, while still
  accepting the documented single-JSON shape if 60db ships it.
- Add _finalize_audio: sniff bytes for an audio container (mp3/wav/ogg/flac)
  and write/transcode as-is, else wrap raw PCM as WAV and transcode to the
  requested --output-format via ffmpeg.
- Add _derive_pcm_sample_rate: infer the rate from byte-count and the
  metadata audio_sec (snap to nearest standard rate) instead of hardcoding.
- Surface 60db metadata warnings; route _synthesize_stream through the same
  PCM-aware finalizer and flag that /tts-stream currently 500s upstream.

Verified live: sixtydb_tts.py and `voiceover.py --provider 60db --scene-dir`
both produce valid 48kHz MP3s for English and Hindi.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread tools/sixtydb_tts.py
finally:
try:
ws.close()
except Exception:
@ConalMullan

Copy link
Copy Markdown
Collaborator Author

@manishEMS47 — heads up: I built directly on your #26 work here (your original commit is preserved with your authorship) and added one fix so it works against the live 60db API. The only issue was that the code followed 60db's documented /tts-synthesize response shape (single JSON audio_base64), but in production the endpoint streams NDJSON of raw 48 kHz PCM — so I made the parsing handle both and transcode to the requested format. Tested end-to-end (English + Hindi → valid MP3, plus voiceover.py --provider 60db).

Would love a quick look if you have a moment — happy to adjust anything. If I don't hear back in a day or two I'll go ahead and merge so it doesn't stall. Thanks again for adding this — the Hindi/Indian-voice support fills a real gap for us. 🙌

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant