Skip to content

Stop typing prompts like a caveman. Barkdown is a voice-to-prompt toolkit that listens to your rambling, transcribes it in realtime and refines it into something Claude won't judge you for.

Notifications You must be signed in to change notification settings

Lokaltog/barkdown

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Barkdown

Voice-to-prompt pipeline: wake word detection → streaming STT → Claude refinement.

flowchart LR
    subgraph Input
        M[🎙️ Microphone]
    end
    subgraph Pipeline
        W[Wake Word\nDetection]
        T[Streaming\nTranscription]
        C[Claude\nRefinement]
    end
    subgraph Output
        CB[📋 Clipboard]
    end

    M --> W --> T --> C --> CB
Loading

Features

  • Wake word activation — Say "Hey Jarvis" to start (hands-free)
  • Real-time streaming — See transcription as you speak
  • Best-in-class accuracy — Parakeet TDT 0.6B v3 (#1 on HuggingFace Open ASR Leaderboard)
  • Auto endpoint detection — Recording stops automatically when you stop speaking
  • Claude refinement — Transform raw dictation into polished LLM prompts
  • GPU-accelerated — <100ms inference on RTX 4090

Installation

NixOS / Nix Flakes (Recommended)

The project uses a Nix flake with devenv for reproducible development environments with CUDA support.

Prerequisites:

  • Nix with flakes enabled
  • direnv with nix-direnv
  • NVIDIA GPU with drivers installed
# Clone and enter the directory
git clone https://github.com/lokaltog/barkdown.git
cd barkdown

# Allow direnv (loads the flake automatically)
direnv allow

# The environment provides:
# - Python 3.13 with uv
# - CUDA 12.x + cuDNN
# - PortAudio (connects to PulseAudio/PipeWire)
# - Audio tools (ffmpeg, sox)

On first load, direnv activates the flake and displays:

🎤 Parakeet ASR Development Environment
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Python: Python 3.13.x
CUDA: 12.x.x
cuDNN: 9.x.x
GPU: NVIDIA GeForce RTX 4090

Manual Installation

# Requires Python 3.13+, CUDA 12.x, and system audio libraries
pip install -e .
# or with uv
uv sync

Quick Start

barkdown               # Default loop mode
barkdown --auto-refine # Skip confirmation, auto-send to Claude
barkdown --studio      # Studio preset (aggressive VAD)
barkdown --long-form   # Long recording preset
barkdown --no-loop     # Single transcription then exit

Architecture

flowchart TB
    subgraph TUI["Terminal UI (Textual)"]
        APP[BarkdownApp]
        PC[PipelineController]
        SP[Spectrogram]
        HP[History Panel]
        PP[Preview Panel]
    end

    subgraph Pipeline["Pipeline (Async)"]
        ORCH[PipelineOrchestrator]
        WWD[WakeWordDetector\nOpenWakeWord/CPU]
        STR[StreamingTranscriber\nParakeet TDT/GPU]
        REF[ClaudeRefiner\nclaude-agent-sdk]
        VAD[Silero VAD]
    end

    subgraph Persistence
        HIST[(history.json)]
    end

    APP --> PC
    PC <-->|events| ORCH
    ORCH --> WWD
    ORCH --> STR
    ORCH --> REF
    STR --> VAD

    APP --> SP
    APP --> HP
    APP --> PP
    HP <--> HIST
Loading

Key Components

Component Technology Purpose
Wake Word OpenWakeWord (CPU) Detects "Hey Jarvis" activation phrase
ASR NVIDIA Parakeet TDT 0.6B (GPU) Streaming speech-to-text
VAD Silero VAD Voice activity detection for auto-stop
Refinement Claude via claude-agent-sdk Transforms dictation into LLM prompts
TUI Textual Async terminal interface
Persistence JSON History storage at ~/.local/share/barkdown/history.json

State Machine

The pipeline operates as an event-driven state machine:

stateDiagram-v2
    [*] --> IDLE
    IDLE --> LOADING: StartListening
    LOADING --> LISTENING_WAKE: ModelsLoaded

    LISTENING_WAKE --> RECORDING: WakeWordDetected
    LISTENING_WAKE --> IDLE: AbortOperation

    RECORDING --> TRANSCRIBING: SilenceDetected
    RECORDING --> IDLE: AbortOperation

    TRANSCRIBING --> AWAITING_CONFIRM: !auto_refine
    TRANSCRIBING --> REFINING: auto_refine
    TRANSCRIBING --> IDLE: AbortOperation

    AWAITING_CONFIRM --> REFINING: ConfirmRefinement(yes)
    AWAITING_CONFIRM --> COMPLETE: ConfirmRefinement(no)
    AWAITING_CONFIRM --> IDLE: AbortOperation

    REFINING --> COMPLETE: RefinementDone
    REFINING --> IDLE: AbortOperation

    COMPLETE --> LISTENING_WAKE: ContinueLoop
    COMPLETE --> IDLE: !loop_mode
    COMPLETE --> REFINING: RetryRefinement

    ERROR --> IDLE: ResetFromError

    note right of LISTENING_WAKE: CPU-only\n(OpenWakeWord)
    note right of RECORDING: VAD monitors\nfor silence
    note right of TRANSCRIBING: GPU inference\n(Parakeet)
    note right of REFINING: Claude API call
Loading

States

State Description
IDLE Waiting for user to initiate
LOADING Loading ML models (ASR, wake word)
LISTENING_WAKE Listening for wake word on CPU
RECORDING Recording audio, VAD monitors for silence
TRANSCRIBING Processing audio through Parakeet ASR
AWAITING_CONFIRM Showing transcription, waiting for user to confirm refinement
REFINING Claude is transforming raw text into polished prompt
COMPLETE Output ready, copied to clipboard
ERROR Error state, can reset to IDLE

Events

User Actions:

  • StartListening — Begin the pipeline
  • AbortOperation — Cancel current operation (works from any interruptible state)
  • ConfirmRefinement — Confirm or skip Claude refinement
  • RetryRefinement — Re-run refinement from COMPLETE state
  • ResetFromError — Clear error and return to IDLE

Internal Events:

  • ModelsLoaded — Models ready, proceed to listening
  • WakeWordDetected — Wake phrase detected, start recording
  • SilenceDetected — Silence threshold reached, stop recording
  • TranscriptionComplete — ASR finished
  • RefinementDone — Claude refinement complete
  • ContinueLoop — Continue to next iteration (loop mode)

Data Flow

sequenceDiagram
    autonumber
    participant User
    participant TUI as BarkdownApp
    participant Orch as PipelineOrchestrator
    participant Wake as WakeWordDetector
    participant ASR as StreamingTranscriber
    participant Claude as ClaudeRefiner
    participant Clip as Clipboard

    User->>TUI: Press Enter
    TUI->>Orch: StartListening
    Orch->>Orch: Load models (if needed)
    Orch->>Wake: Start detection

    User->>Wake: "Hey Jarvis"
    Wake-->>Orch: WakeWordDetected

    Orch->>ASR: Start streaming

    loop While speaking
        User->>ASR: Audio chunks
        ASR-->>TUI: Partial transcription
        ASR->>ASR: VAD monitoring
    end

    ASR-->>Orch: SilenceDetected
    Orch->>ASR: Finalize
    ASR-->>Orch: TranscriptionComplete

    alt auto_refine = false
        Orch-->>TUI: AWAITING_CONFIRM
        TUI->>User: Show transcription
        User->>TUI: Confirm (Y)
        TUI->>Orch: ConfirmRefinement(yes)
    end

    Orch->>Claude: Refine raw text
    Claude-->>Orch: RefinementDone
    Orch-->>TUI: COMPLETE
    TUI->>Clip: Copy refined text
    TUI->>User: Show result
Loading

Configuration

Configuration via ~/.config/barkdown/config.toml or environment variables (BARKDOWN_*):

# Audio
sample_rate = 16000
channels = 1

# Wake word
wake_word_model = "hey_jarvis"
wake_word_threshold = 0.5

# ASR
parakeet_model = "nvidia/parakeet-tdt-0.6b-v3"

# VAD / Recording
vad_threshold = 0.5
silence_threshold_sec = 1.5
max_recording_sec = 60.0

# Claude
claude_model = "claude-opus-4-5-20251101"
auto_refine = false

# Audio processing
normalize_audio = false
strip_silence = true

Presets

Studio (--studio) — For high-quality studio equipment:

  • Higher VAD threshold (0.7)
  • Faster endpoint detection (1.0s silence)
  • LUFS normalization enabled

Long-form (--long-form) — For meetings, lectures:

  • Lower VAD threshold (0.3)
  • Longer silence tolerance (3.0s)
  • 1 hour max recording
  • Auto-chunking for memory efficiency

Keyboard Shortcuts

Key Action
Enter Start listening / Confirm
Y Confirm refinement
N Skip refinement
Escape Abort current operation
R Retry refinement
C Copy refined text
Shift+C Copy raw transcription
? Show help
Q Quit

Requirements

Hardware:

  • NVIDIA GPU with CUDA 12.x support
  • ~4GB VRAM (Parakeet TDT 0.6B)
  • Microphone

Software (handled by Nix flake):

  • Python 3.13+
  • CUDA 12.x + cuDNN
  • PulseAudio/PipeWire + PortAudio

API Keys:

  • ANTHROPIC_API_KEY for Claude refinement

Development

With Nix (recommended)

# direnv auto-loads the environment on cd
cd barkdown

# Install Python dependencies (auto-runs via devenv)
uv sync --extra dev

# Run the app
barkdown

# Run tests (use cuda-uv for GPU tests)
cuda-uv run pytest

# Type checking
pyright

# Linting
ruff check .

Without Nix

Ensure CUDA 12.x, cuDNN, and PortAudio are installed system-wide, then:

uv sync --extra dev
uv run barkdown

Design Principles

  • Async-first — All pipeline components use async/await with thread executors for blocking I/O
  • Process isolation — Pipeline runs in subprocess; PyTorch/CUDA blocks asyncio even with executors
  • Event-driven — State machine with explicit transitions and events
  • Lazy loading — Heavy dependencies (NeMo, OpenWakeWord) imported on first use for fast startup
  • Non-blocking TUI — Textual message passing keeps UI responsive

License

MIT

About

Stop typing prompts like a caveman. Barkdown is a voice-to-prompt toolkit that listens to your rambling, transcribes it in realtime and refines it into something Claude won't judge you for.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published