Skip to content

Hugo0/voiceio

Repository files navigation

voiceio

 ██╗   ██╗ ██████╗ ██╗ ██████╗███████╗██╗ ██████╗
 ██║   ██║██╔═══██╗██║██╔════╝██╔════╝██║██╔═══██╗
 ██║   ██║██║   ██║██║██║     █████╗  ██║██║   ██║
 ╚██╗ ██╔╝██║   ██║██║██║     ██╔══╝  ██║██║   ██║
  ╚████╔╝ ╚██████╔╝██║╚██████╗███████╗██║╚██████╔╝
   ╚═══╝   ╚═════╝ ╚═╝ ╚═════╝╚══════╝╚═╝ ╚═════╝

CI PyPI Python License: MIT Downloads

Speak → text, locally, instantly.

voiceio-demo.mp4

Quick start

# 1. Install system dependencies (Ubuntu/Debian)
sudo apt install pipx ibus gir1.2-ibus-1.0 python3-gi python3-dev portaudio19-dev

# 2. Install voiceio
pipx install python-voiceio

# 3. Run the setup wizard
voiceio setup

That's it. Press Ctrl+Alt+V (or your chosen hotkey) to start dictating.

Fedora
sudo dnf install pipx ibus python3-gobject python3-devel portaudio-devel
pipx install python-voiceio
voiceio setup
Arch Linux
sudo pacman -S python-pipx ibus python-gobject portaudio
# Note: Arch includes Python headers by default with the python package
pipx install python-voiceio
voiceio setup
Windows
# Option A: Install with pip (requires Python 3.11+)
pip install python-voiceio
voiceio setup

# Option B: Download the installer from GitHub Releases (no Python needed)
# https://github.com/Hugo0/voiceio/releases
# Also available as a portable .zip if you prefer no installation.

Windows uses pynput for hotkeys and text injection. No extra system dependencies required.

macOS
pipx install python-voiceio
voiceio setup
Build from source

If you want the source code locally to hack on or customize for personal use. PRs are welcome!

git clone https://github.com/Hugo0/voiceio
cd voiceio
uv pip install -e ".[linux,dev]"

# Bootstrap CLI commands onto PATH (creates ~/.local/bin/voiceio)
uv run voiceio setup

Note: Source installs live inside a virtualenv, so voiceio isn't on PATH until setup creates symlinks in ~/.local/bin/. If voiceio isn't found after setup, restart your terminal or run export PATH="$HOME/.local/bin:$PATH".

You can also install with uv tool install python-voiceio or pip install python-voiceio.

How it works

hotkey → mic capture → whisper (local) → text at cursor
          pre-buffered   streaming        IBus / clipboard

Press your hotkey to start recording (1s pre-buffer catches the first syllable). Text streams into the focused app as an underlined preview. Press again to commit. Transcription runs locally via faster-whisper, text is injected through IBus (any GTK/Qt app) with clipboard fallback for terminals.

Features

  • Streaming: text appears as you speak, not after you stop
  • Works everywhere: IBus input method for GUI apps, clipboard for terminals
  • Wayland + X11: evdev hotkeys work on both, no root required
  • Pre-buffer: never miss the first syllable
  • Voice commands: "new line", "comma", "scratch that", punctuation by name
  • Autocorrect: LLM-powered review of recurring Whisper mistakes (voiceio correct)
  • Text-to-speech: hear selected text spoken back (Piper, eSpeak, Edge TTS)
  • Smart post-processing: numbers ("twenty five" → "25"), punctuation, capitalization
  • Auto-healing: falls back to the next working backend if one fails
  • Autostart: optional systemd service, restarts on crash
  • Self-diagnosing: voiceio doctor checks everything, --fix repairs it

Models

Model Size Speed Accuracy Good for
tiny 75 MB ~10x realtime Basic Quick notes, low-end hardware
base 150 MB ~7x realtime Good Daily use (default)
small 500 MB ~4x realtime Better Longer dictation
medium 1.5 GB ~2x realtime Great Accuracy-sensitive work
large-v3 3 GB ~1x realtime Best Maximum quality, GPU recommended

Models download automatically on first use. Switch anytime: voiceio --model small.

Commands

voiceio                  Start the daemon
voiceio setup            Interactive setup wizard
voiceio doctor           Health check (--fix to auto-repair)
voiceio test             Test microphone + live transcription
voiceio demo             Interactive guided tour of all features
voiceio toggle           Toggle recording on a running daemon
voiceio correct          Review and fix recurring transcription errors
voiceio history          View transcription history
voiceio update           Update to latest version
voiceio service install  Autostart on login (systemd / Windows Startup)
voiceio logs             View recent logs
voiceio uninstall        Remove all system integrations

Configuration

voiceio setup handles everything interactively. To tweak later, edit the config file or override at runtime:

  • Linux/macOS: ~/.config/voiceio/config.toml
  • Windows: %LOCALAPPDATA%\voiceio\config\config.toml
voiceio --model large-v3 --language auto -v

See config.example.toml for all options.

Troubleshooting

voiceio doctor           # see what's working
voiceio doctor --fix     # auto-fix issues
voiceio logs             # check debug output
Problem Fix
No text appears voiceio doctor --fix - usually a missing IBus component or GNOME input source
Hotkey doesn't work on Wayland sudo usermod -aG input $USER then log out and back in
Transcription too slow Use a smaller model: voiceio --model tiny
Want to start fresh voiceio uninstall then voiceio setup
Windows: antivirus blocks hotkeys pynput uses global keyboard hooks — add an exception for voiceio
Windows: no sound feedback Check voiceio logs for audio device info
macOS issues Experimental — consider aquavoice.com or contribute a PR

Platform support

Platform Status Text injection Hotkeys Streaming preview
Ubuntu / Debian (GNOME, Wayland) Tested daily IBus evdev / GNOME shortcut Yes
Ubuntu / Debian (GNOME, X11) Supported IBus evdev / pynput Yes
Fedora (GNOME) Supported IBus evdev / GNOME shortcut Yes
Arch Linux Supported IBus evdev Yes
KDE / Sway / Hyprland Should work IBus / ydotool / wtype evdev Yes
Windows 10/11 Experimental pynput / clipboard pynput Type-and-correct (no preedit)
macOS Experimental pynput / clipboard pynput Type-and-correct (no preedit)

voiceio auto-detects your platform and picks the best available backends. Run voiceio doctor to see what's working on your system.

Uninstall

voiceio uninstall        # removes service, IBus, shortcuts, symlinks
pipx uninstall python-voiceio   # removes the package

Roadmap

Contributions welcome! See CONTRIBUTING.md and open issues.

Now

  • macOS polish (IMKit for native preedit, Accessibility API for text injection)

Soon

  • Per-app context awareness (detect focused app, adapt formatting/behavior)
  • File/audio transcription mode (voiceio transcribe recording.mp3)

Backlog

  • Multiple engine backends (whisper.cpp for Vulkan/AMD, VOSK for low-end hardware)
  • Echo cancellation (filter system audio for meeting use)
  • Wake word activation ("Hey voiceio") Done
  • Text-to-speech output (Piper/eSpeak/Edge TTS — completes the "io")
  • LLM auto-audit dictionary (voiceio correct --auto — scan history with LLM, interactive correction)
  • LLM post-processing via Ollama (grammar cleanup, spelling fixes on final pass)
  • Corrections dictionary — auto-replace misheard words, "correct that" voice command
  • Transcription history — searchable log of everything you've dictated
  • Number-to-digit conversion ("three hundred forty two" → "342")
  • VAD-based silence filtering (Silero VAD, prevents Whisper hallucinations)
  • Voice commands — "new line", "new paragraph", "scratch that", punctuation by name
  • Custom vocabulary / personal dictionary (bias Whisper via initial_prompt)
  • Smart punctuation & capitalization post-processing
  • Windows support
  • System tray icon with animated states
  • Auto-stop on silence

License

MIT

About

Push-to-talk voice-to-text for Linux. Hold a hotkey, speak, release — text appears at your cursor. Local & offline via faster-whisper.

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors