A command-line tool for transcribing YouTube videos and local media files to text using OpenAI's Whisper.
- Transcribe YouTube videos by URL
- Transcribe local audio/video files
- Multiple output formats: TXT, JSON, SRT, VTT
- Automatic Whisper model downloading
- Signal handling for graceful cleanup
cargo install voxtusgit clone https://github.com/johanthoren/voxtus
cd voxtus
cargo install --path .# Transcribe a YouTube video
voxtus https://www.youtube.com/watch?v=VIDEO_ID
# Transcribe a local file
voxtus recording.mp3
# Specify output format(s)
voxtus -f json,srt video.mp4
# Use a different Whisper model
voxtus --model large-v3 audio.mp3
# Output to stdout (for piping)
voxtus --stdout -f json video.mp4 | jq '.transcript'
# List available models
voxtus --list-modelsArguments:
<INPUT> YouTube URL or local media file path
Options:
-f, --format <FORMAT> Output format(s), comma-separated: txt,json,srt,vtt [default: txt]
-n, --name <NAME> Base name for output files (no extension)
-o, --output <DIR> Output directory [default: current directory]
-v, --verbose Increase verbosity (-v, -vv for debug)
-k, --keep Keep the downloaded/converted audio file
--model <MODEL> Whisper model to use [default: small]
--list-models List available models and exit
--overwrite Overwrite existing files without confirmation
--stdout Output to stdout only (single format, no files created)
-h, --help Show help
-V, --version Show version
Plain text with timestamps:
[0.00 - 5.20]: Welcome to our podcast.
[5.20 - 10.50]: Today we're discussing Rust.
Structured data with metadata:
{
"transcript": [
{"id": 1, "start": 0.0, "end": 5.2, "text": "Welcome to our podcast."}
],
"metadata": {
"title": "Episode 42",
"source": "https://youtube.com/watch?v=...",
"duration": 1523.5,
"model": "small",
"language": "en"
}
}SubRip subtitle format:
1
00:00:00,000 --> 00:00:05,200
Welcome to our podcast.
WebVTT format with metadata:
WEBVTT
NOTE Title
Episode 42
00:00:00.000 --> 00:00:05.200
Welcome to our podcast.
| Model | Parameters | VRAM | Speed | Accuracy |
|---|---|---|---|---|
| tiny | 39M | ~1GB | Fastest | Lower |
| base | 74M | ~1GB | Fast | Basic |
| small | 244M | ~2GB | Moderate | Good |
| medium | 769M | ~5GB | Slow | Better |
| large-v3 | 1550M | ~10GB | Slowest | Best |
English-only variants (.en suffix) are faster for English content.
Models are automatically downloaded on first use to ~/.local/share/voxtus/models/.
This project is licensed under the GNU Affero General Public License v3.0 or later (AGPL-3.0-or-later).
- whisper.cpp - C/C++ port of OpenAI's Whisper
- whisper-rs - Rust bindings for whisper.cpp
- yt-dlp - YouTube downloader