简体中文 · English
Local-first Phase 1 MVP for video translation and dubbing, with one shared in-process engine exposed through a Typer CLI and a FastAPI app.
This repo is useful today for local experiments and personal workflows, especially when you already have a subtitle sidecar. It is better than the earlier pure-placeholder path, but it is still not production-grade translation or dubbing.
- Pipeline stages:
ingest -> caption -> normalize -> translate -> tts -> render -> qa - Jobs persist under a local artifact workspace with
job.jsonplus append-only JSONL histories - CLI supports:
run,stage run,segment rerun,config init/show/validate,doctor,completion show/install - API supports: health, create/list/get job, artifacts/logs/qa reads, stage rerun, segment rerun
- Caption strategy defaults to
auto: subtitle sidecar is preferred when provided, otherwise the pipeline falls back to local ASR/media translation behavior - Default ASR model is
small; for Chinese voice-only extraction you can opt intomediumfor higher accuracy at higher CPU cost - For subtitle-sourced Chinese jobs, the pipeline prefers offline text translation over audio-derived translation; this is the better path for curated manual subtitles
- For non-subtitle jobs, translation falls back to media translation from the source video
- TTS uses local fallback synthesis; on macOS it can use
say+afconvertto generate spoken audio instead of the older sine-wave placeholder - Render prefers
ffmpeg; on macOS, ifffmpegis unavailable or fails and fallback is allowed, the app can use AVFoundation viaswiftto produce a dubbed MP4 without burned subtitles
What improved:
- Subtitle-driven Chinese -> English jobs are materially better than before
- macOS native fallbacks can produce spoken English audio and a dubbed MP4 even without
ffmpeg - Real local runs now complete end-to-end with persisted artifacts
What this is not:
- not a polished translation system
- not studio-quality dubbing
- not robust enough yet for unattended production media pipelines
- Python 3.11+
ffprobeonPATHis preferred for ingest/doctor- On macOS, media probing can fall back to native tools via
mdls/swift ffmpegis recommended, but optional when fallback render paths are allowed- On macOS,
say+afconvertimproves fallback TTS quality - On macOS,
swiftenables the AVFoundation dubbed-video fallback path
Recommended install:
python -m pip install -e '.[dev]'Verification test suite:
python -m pytestDocumented fixture files live under examples/mvp/:
examples/mvp/source.srtexamples/mvp/source.mp4(placeholder bytes for deterministic tests, not a real video)examples/mvp/vtl.config.json
Notes:
tests/test_mvp_documented_flow.pykeeps the documented fixture flow honest- For real local runs, replace
examples/mvp/source.mp4with an actual media file - Generated job outputs are intentionally not committed under
examples/mvp/
- If
input_subtitleis provided, the subtitle sidecar is the source of truth - Otherwise
caption_strategy=autofalls back to local audio-derived captioning - For Chinese ASR,
smallis the default balance;mediumis available as a slower high-accuracy mode
- Default:
small - High-accuracy Chinese mode:
medium - When
asr_model=mediumandsource_lang=zh, the app disables Whispercondition_on_previous_textto reduce long-span repetition/hallucination loops seen in local CPU runs mediumis materially slower thansmall; in local tests it improved weighted CER but took about2.24xlonger
- Subtitle-sourced Chinese jobs prefer offline text translation instead of re-translating from audio
- Non-subtitle jobs use media translation from the source video as the fallback path
- Translation quality is still best-effort; current local logic includes some domain overrides/cleanup and optional offline MT behavior
- Default local fallback remains available for deterministic runs
- On macOS,
say+afconvertcan synthesize spoken WAV clips and usually sounds much better than the placeholder tone path
- Preferred path:
ffmpegcreatesmix.wavandfinal_en.mp4 - If that fails and fallback is allowed on macOS, AVFoundation muxing can still create a dubbed
final_en.mp4 - Fallback-rendered MP4s do not burn subtitles into the video
- If AVFoundation is also unavailable, the last-resort local copy fallback keeps artifacts moving, but the copied MP4 is just the source video
Show help:
python -m apps.cli.main --helpRun doctor:
python -m apps.cli.main doctor --artifact-root ./.artifacts/mvp-jobs
# make ffmpeg a hard requirement
python -m apps.cli.main doctor \
--artifact-root ./.artifacts/mvp-jobs \
--no-allow-render-copy-fallbackRun with the documented fixture:
python -m apps.cli.main run \
--config ./examples/mvp/vtl.config.json \
--job-id 00000000-0000-0000-0000-000000000260 \
--no-prefer-ffmpegReal local run with an actual video and subtitle sidecar:
python -m apps.cli.main run \
--input-video ./我是不白痴.mp4 \
--input-subtitle ./我是不白痴.srt \
--source-lang zh \
--target-lang en \
--artifact-root ./.artifacts/wobubaichi-run-v5 \
--no-prefer-ffmpegReal local run without sidecar subtitles, using higher-accuracy Chinese ASR:
python -m apps.cli.main run \
--input-video ./我在迪拜等你.mp4 \
--source-lang zh \
--target-lang en \
--asr-model medium \
--artifact-root ./.artifacts/dubai-medium-run \
--no-prefer-ffmpegStage rerun:
python -m apps.cli.main stage run translate \
--job-id 00000000-0000-0000-0000-000000000260 \
--artifact-root ./.artifacts/mvp-jobs \
--no-prefer-ffmpegSegment rerun:
python -m apps.cli.main segment rerun seg_0001 \
--job-id 00000000-0000-0000-0000-000000000260 \
--artifact-root ./.artifacts/mvp-jobs \
--reason "manual fix" \
--execute-stages \
--no-prefer-ffmpegConfig helpers:
python -m apps.cli.main config init --output ./vtl.config.json
python -m apps.cli.main config validate --config ./vtl.config.json
python -m apps.cli.main config show --config ./vtl.config.jsonNotes:
run --mode remoteis not implemented in this phaseconfig initcan write.jsonor.yaml; TOML write is not supported--asr-model mediumis currently the recommended high-accuracy option for Chinese ASR-only runs when extra latency is acceptable
Start the API:
python -m uvicorn apps.api.main:app --reloadHealth checks:
curl -s http://127.0.0.1:8000/api/v1/health
curl -s http://127.0.0.1:8000/api/v1/health/readyCreate a local job:
curl -s -X POST http://127.0.0.1:8000/api/v1/jobs \
-H 'content-type: application/json' \
-d '{
"job_id": "00000000-0000-0000-0000-000000000350",
"input_video": "./examples/mvp/source.mp4",
"input_subtitle": "./examples/mvp/source.srt",
"artifact_root": "./.artifacts/mvp-jobs",
"asr_model": "medium",
"prefer_ffmpeg": false,
"allow_render_copy_fallback": true
}'Notes:
asr_modelis optional on the API; omit it to keep the defaultsmall- For Chinese ASR-only jobs,
"asr_model": "medium"enables the higher-accuracy tuned decode path
Inspect job state and outputs:
curl -s "http://127.0.0.1:8000/api/v1/jobs?artifact_root=./.artifacts/mvp-jobs"
curl -s "http://127.0.0.1:8000/api/v1/jobs/00000000-0000-0000-0000-000000000350?artifact_root=./.artifacts/mvp-jobs"
curl -s "http://127.0.0.1:8000/api/v1/jobs/00000000-0000-0000-0000-000000000350/artifacts?artifact_root=./.artifacts/mvp-jobs"
curl -s "http://127.0.0.1:8000/api/v1/jobs/00000000-0000-0000-0000-000000000350/logs?artifact_root=./.artifacts/mvp-jobs"
curl -s "http://127.0.0.1:8000/api/v1/jobs/00000000-0000-0000-0000-000000000350/qa?artifact_root=./.artifacts/mvp-jobs"Rerun through the API:
curl -s -X POST "http://127.0.0.1:8000/api/v1/jobs/00000000-0000-0000-0000-000000000350/stages/translate/rerun" \
-H 'content-type: application/json' \
-d '{"artifact_root":"./.artifacts/mvp-jobs","prefer_ffmpeg":false,"allow_render_copy_fallback":true}'
curl -s -X POST "http://127.0.0.1:8000/api/v1/jobs/00000000-0000-0000-0000-000000000350/segments/seg_0001/rerun" \
-H 'content-type: application/json' \
-d '{"artifact_root":"./.artifacts/mvp-jobs","reason":"api rerun","execute_stages":true,"prefer_ffmpeg":false,"allow_render_copy_fallback":true}'The repo also ships a local web console under apps/web/. It is path-based rather than upload-based: run the frontend on the same machine as the FastAPI service, then paste local video/subtitle paths into the form.
Start the API:
python -m uvicorn apps.api.main:app --reloadStart the frontend:
cd apps/web
npm install
npm run dev -- --host 127.0.0.1 --port 5173Then open http://127.0.0.1:5173.
Notes:
- The header language switch supports both English and Chinese UI copy
ASR Model = mediumis surfaced directly in the form for higher-accuracy Chinese ASR runs- If
Subtitle Pathis provided, the job will ingest the sidecar instead of exercising ASR extraction
English home screen:
Completed run view:
For a job at <artifact_root>/<job_id>/:
- Top-level files
job.jsonstage_runs.jsonlartifacts.jsonlsegments.jsonlsegment_reruns.jsonl(present after segment reruns)
- Stage directories
ingest/media_info.jsoncaption/source_zh.raw.jsonnormalize/source_zh.cleaned.jsonnormalize/source_zh.srttranslate/en_subtitle.jsontranslate/en_subtitle.srttranslate/en_dub_text.jsontranslate/en_dub_text.txttts/seg_*.wavtts/dub_en.wavrender/output_en.srtrender/mix.wavrender/final_en.mp4qa/qa_report.jsonqa/qa_report.md
input/andlogs/directories may exist and may be empty in current runs
A recent successful real run artifact root is:
.artifacts/wobubaichi-run-v5/00000000-0000-0000-0000-000000009105/
That run used 我是不白痴.mp4 plus a manual SRT, completed all seven stages, and produced:
- subtitle-driven Chinese -> English outputs under
translate/ - spoken WAV clips plus
tts/dub_en.wav render/final_en.mp4andrender/output_en.srtqa/qa_report.jsonandqa/qa_report.md
- This is still a local MVP, not a production media pipeline
- Best results currently come from jobs that already have a curated subtitle sidecar
- Non-subtitle jobs rely on local ASR/media translation fallback and are less predictable
- Subtitle quality and dub quality can diverge because subtitle text and dubbing text are stored separately
- Fallback-rendered MP4s do not burn subtitles into the video
- On macOS, AVFoundation fallback can produce a dubbed MP4; if that path is unavailable, the final copy fallback preserves the source MP4 without dubbed audio muxed in
doctorhard-blockspython,ffprobe/macOS probe fallback, and artifact-root writability by default;ffmpegonly becomes hard-blocking when you disable render fallback while preferring ffmpeg- QA can pause a job and cause the CLI to exit with the QA-blocked code
tests/test_mvp_documented_flow.pytests/test_cli_smoke.pytests/test_api_smoke.py

