Voxtus

A command-line tool for transcribing YouTube videos and local media files to text using OpenAI's Whisper.

Features

Transcribe YouTube videos by URL
Transcribe local audio/video files
Multiple output formats: TXT, JSON, SRT, VTT
Automatic Whisper model downloading
Signal handling for graceful cleanup

Installation

Prerequisites

Rust (1.85+)
FFmpeg in your PATH
CMake (for building whisper-rs)

From crates.io

cargo install voxtus

From source

git clone https://github.com/johanthoren/voxtus
cd voxtus
cargo install --path .

Usage

# Transcribe a YouTube video
voxtus https://www.youtube.com/watch?v=VIDEO_ID

# Transcribe a local file
voxtus recording.mp3

# Specify output format(s)
voxtus -f json,srt video.mp4

# Use a different Whisper model
voxtus --model large-v3 audio.mp3

# Output to stdout (for piping)
voxtus --stdout -f json video.mp4 | jq '.transcript'

# List available models
voxtus --list-models

Options

Arguments:
  <INPUT>  YouTube URL or local media file path

Options:
  -f, --format <FORMAT>    Output format(s), comma-separated: txt,json,srt,vtt [default: txt]
  -n, --name <NAME>        Base name for output files (no extension)
  -o, --output <DIR>       Output directory [default: current directory]
  -v, --verbose            Increase verbosity (-v, -vv for debug)
  -k, --keep               Keep the downloaded/converted audio file
      --model <MODEL>      Whisper model to use [default: small]
      --list-models        List available models and exit
      --overwrite          Overwrite existing files without confirmation
      --stdout             Output to stdout only (single format, no files created)
  -h, --help               Show help
  -V, --version            Show version

Output Formats

TXT

Plain text with timestamps:

[0.00 - 5.20]: Welcome to our podcast.
[5.20 - 10.50]: Today we're discussing Rust.

JSON

Structured data with metadata:

{
  "transcript": [
    {"id": 1, "start": 0.0, "end": 5.2, "text": "Welcome to our podcast."}
  ],
  "metadata": {
    "title": "Episode 42",
    "source": "https://youtube.com/watch?v=...",
    "duration": 1523.5,
    "model": "small",
    "language": "en"
  }
}

SRT

SubRip subtitle format:

1
00:00:00,000 --> 00:00:05,200
Welcome to our podcast.

VTT

WebVTT format with metadata:

WEBVTT

NOTE Title
Episode 42

00:00:00.000 --> 00:00:05.200
Welcome to our podcast.

Whisper Models

Model	Parameters	VRAM	Speed	Accuracy
tiny	39M	~1GB	Fastest	Lower
base	74M	~1GB	Fast	Basic
small	244M	~2GB	Moderate	Good
medium	769M	~5GB	Slow	Better
large-v3	1550M	~10GB	Slowest	Best

English-only variants (.en suffix) are faster for English content.

Models are automatically downloaded on first use to ~/.local/share/voxtus/models/.

License

This project is licensed under the GNU Affero General Public License v3.0 or later (AGPL-3.0-or-later).

Acknowledgments

whisper.cpp - C/C++ port of OpenAI's Whisper
whisper-rs - Rust bindings for whisper.cpp
yt-dlp - YouTube downloader

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github/workflows		.github/workflows
src		src
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
deny.toml		deny.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Voxtus

Features

Installation

Prerequisites

From crates.io

From source

Usage

Options

Output Formats

TXT

JSON

SRT

VTT

Whisper Models

License

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

License

johanthoren/voxtus

Folders and files

Latest commit

History

Repository files navigation

Voxtus

Features

Installation

Prerequisites

From crates.io

From source

Usage

Options

Output Formats

TXT

JSON

SRT

VTT

Whisper Models

License

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages