Add 60db TTS provider (Hindi/Indian voices) — live-API-verified#27
Add 60db TTS provider (Hindi/Indian voices) — live-API-verified#27ConalMullan wants to merge 2 commits into
Conversation
The original integration coded to 60db's documented /tts-synthesize contract
(single JSON object with `audio_base64` in the requested container format).
In production the endpoint instead streams newline-delimited JSON of raw
16-bit mono PCM (Content-Type: application/x-ndjson) with a trailing
`{metadata}` line, so the default voiceover path failed with "Invalid JSON
response".
Changes (tools/sixtydb_tts.py):
- Rewrite _synthesize_rest to consume the NDJSON PCM stream, while still
accepting the documented single-JSON shape if 60db ships it.
- Add _finalize_audio: sniff bytes for an audio container (mp3/wav/ogg/flac)
and write/transcode as-is, else wrap raw PCM as WAV and transcode to the
requested --output-format via ffmpeg.
- Add _derive_pcm_sample_rate: infer the rate from byte-count and the
metadata audio_sec (snap to nearest standard rate) instead of hardcoding.
- Surface 60db metadata warnings; route _synthesize_stream through the same
PCM-aware finalizer and flag that /tts-stream currently 500s upstream.
Verified live: sixtydb_tts.py and `voiceover.py --provider 60db --scene-dir`
both produce valid 48kHz MP3s for English and Hindi.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
| finally: | ||
| try: | ||
| ws.close() | ||
| except Exception: |
|
@manishEMS47 — heads up: I built directly on your #26 work here (your original commit is preserved with your authorship) and added one fix so it works against the live 60db API. The only issue was that the code followed 60db's documented Would love a quick look if you have a moment — happy to adjust anything. If I don't hear back in a day or two I'll go ahead and merge so it doesn't stall. Thanks again for adding this — the Hindi/Indian-voice support fills a real gap for us. 🙌 |
Adds 60db (https://60db.ai) as a third TTS provider alongside ElevenLabs and Qwen3. Builds on @manishEMS47's work in #26 (their original commit is preserved here) with fixes to make it work against the live 60db API, plus end-to-end verification.
Why 60db
Fills a real gap in the toolkit: native Hindi + Indian-accented English voices, cheaper than ElevenLabs ($0.00002/char) and faster (RTF ~0.22). Qwen3 is English/Chinese-leaning and ElevenLabs is premium-priced for Indic languages. Verified live that the default voice produces good-quality English and Hindi (Devanagari) speech.
The fix (on top of #26)
The original integration coded to 60db's documented
/tts-synthesizecontract (single JSON{audio_base64}, mp3). In production the endpoint actually streams newline-delimited JSON of raw 48 kHz PCM (Content-Type: application/x-ndjson) with a trailing{metadata}line — so the default path failed with "Invalid JSON response."tools/sixtydb_tts.py:_synthesize_restto consume the NDJSON PCM stream, while still accepting the documented single-JSON shape if 60db ships it (defensive both ways)._finalize_audio— sniffs bytes for an audio container (mp3/wav/ogg/flac) and writes/transcodes as-is, else wraps raw PCM as WAV and transcodes to--output-formatvia ffmpeg._derive_pcm_sample_rate— infers rate from byte-count ÷metadata.audio_secinstead of hardcoding.metadata.warnings; routed_synthesize_streamthrough the same PCM finalizer and flagged that/tts-streamcurrently returns HTTP 500 upstream.Verified live ✅
sixtydb_tts.py→ valid MP3 (ID3v2.4, 48 kHz mono), EN + Hindivoiceover.py --provider 60db --scene-dir→ correct per-scene MP3s + the JSON shapesync_timing.pyconsumesNot verified
redub.py --tts-provider 60db— needs an ElevenLabs key (Scribe STT) + a video; delegation logic is straightforward but untested end-to-end.websockettransport — matches docs, not needed for batch voiceover (minorwss://vs documentedws://discrepancy to confirm).Closes #26.
🤖 Generated with Claude Code