Skip to content

Y3454R/OpenDub

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

OpenDub πŸ₯₯

Open-source pipeline for dubbing English video into Bangla β€” translation, speech synthesis, voice conversion, and prosody transfer, assembled into a single CLI.


Pipeline

Input video (.mkv / .mp4)
        β”‚
        β”œβ”€[ffmpeg]──────► english.wav + english.srt
        β”‚
        β”œβ”€[Demucs]──────► vocals.wav + background.wav
        β”‚
        β”œβ”€[pysrt]───────► segments[] with per-segment audio slices
        β”‚
        β”œβ”€[NLLB-200]────► segments[].bn  (Bangla text)
        β”‚
        β”œβ”€[MMS-TTS-ben]─► segments[].bn_audio  (Bangla speech)
        β”‚
        β”œβ”€[kNN-VC]──────► voice-converted Bangla audio  (optional)
        β”‚
        β”œβ”€[pyworld]─────► F0 + energy transferred from EN speaker  (optional)
        β”‚
        β”œβ”€[rubberband]──► duration-aligned, pitch-preserved audio
        β”‚
        └─[ffmpeg]──────► dubbed.mkv  (voice + background mixed)

Installation

git clone https://github.com/Y3454R/OpenDub.git
cd OpenDub
python -m venv venv && source venv/bin/activate
pip install -r requirements.txt

macOS β€” rubberband backend (required for pitch-preserving time stretch):

brew install rubberband
pip install pyrubberband

Usage

Fast test β€” skips voice cloning and prosody transfer:

python cli.py \
  --input  test_files/clip.mkv \
  --srt    test_files/english.srt \
  --output outputs/dubbed/dubbed.mkv \
  --no-voice-clone \
  --no-prosody

Full pipeline (requires kNN-VC checkpoint):

python cli.py \
  --input         movie.mkv \
  --output        dubbed.mkv \
  --vc-checkpoint /path/to/knnvc_checkpoint

Clip a time window:

python cli.py \
  --input  movie.mkv \
  --start  00:23:00 \
  --end    00:27:00 \
  --output outputs/dubbed/clip.mkv \
  --no-voice-clone --no-prosody

External SRT (skip subtitle extraction):

python cli.py \
  --input  movie.mkv \
  --srt    subtitles.srt \
  --output dubbed.mkv

All flags

Flag Default Description
--input (required) Input video file
--output dubbed.mkv Output video file
--srt None External SRT file; skips subtitle extraction from video
--start None Clip start time, e.g. 00:23:00
--end None Clip end time, e.g. 00:27:00
--no-voice-clone off Skip kNN-VC voice conversion
--no-prosody off Skip pyworld prosody transfer
--vc-checkpoint None Path to kNN-VC model checkpoint
--output-dir outputs Working directory for intermediate files

Modules

Module Model Purpose
audio/extractor.py ffmpeg Extract mono 16kHz WAV + SRT from video
audio/separator.py Demucs htdemucs Split vocals from background music/effects
subtitles/parser.py pysrt Parse SRT, clean tags, slice per-segment audio
translation/translator.py NLLB-200-distilled-600M English β†’ Bangla text translation
tts/synthesizer.py MMS-TTS-ben Bangla text β†’ speech
voice/cloning.py kNN-VC Convert TTS voice to match source speaker
voice/prosody.py pyworld (WORLD vocoder) Transfer F0 contour and energy from EN speaker
audio/stretcher.py rubberband / librosa TSM Pitch-preserving duration alignment
audio/mixer.py ffmpeg Assemble segments, mix background, mux to video

Research context

This pipeline is a baseline for studying isochrony-aware Bangla dubbing β€” matching dubbed speech duration to the original without degrading naturalness.

Key open problems:

  • Voice cloning for Bangla β€” kNN-VC was trained on English; cross-lingual transfer to Bangla TTS output is untested
  • Isochrony β€” Bangla text is typically longer than its English source; the isochrony report printed after each run quantifies the gap per segment
  • Prosody transfer β€” F0 contour from the English speaker is transferred via WORLD vocoder; interaction with Bangla tone patterns is an open question

The isochrony report at the end of each run shows mean/std/max duration mismatch across segments β€” this is the metric the research aims to minimize.


Output structure

outputs/
β”œβ”€β”€ segments/
β”‚   β”œβ”€β”€ english.wav
β”‚   β”œβ”€β”€ vocals.wav
β”‚   β”œβ”€β”€ background.wav
β”‚   β”œβ”€β”€ seg_0001_en.wav
β”‚   β”œβ”€β”€ seg_0001_bn.wav
β”‚   └── ...
└── dubbed/
    └── dubbed.mkv

Requirements

  • Python 3.10+
  • ffmpeg (system install: brew install ffmpeg / apt install ffmpeg)
  • rubberband (macOS: brew install rubberband)
  • CUDA GPU recommended for full pipeline; M-series Apple Silicon works for development

About

OpenDub πŸ₯₯ is a pipeline for dubbing English video into Bangla

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages