Barkdown

Voice-to-prompt pipeline: wake word detection → streaming STT → Claude refinement.

flowchart LR
    subgraph Input
        M[🎙️ Microphone]
    end
    subgraph Pipeline
        W[Wake Word\nDetection]
        T[Streaming\nTranscription]
        C[Claude\nRefinement]
    end
    subgraph Output
        CB[📋 Clipboard]
    end

    M --> W --> T --> C --> CB

Features

Wake word activation — Say "Hey Jarvis" to start (hands-free)
Real-time streaming — See transcription as you speak
Best-in-class accuracy — Parakeet TDT 0.6B v3 (#1 on HuggingFace Open ASR Leaderboard)
Auto endpoint detection — Recording stops automatically when you stop speaking
Claude refinement — Transform raw dictation into polished LLM prompts
GPU-accelerated — <100ms inference on RTX 4090

Installation

NixOS / Nix Flakes (Recommended)

The project uses a Nix flake with devenv for reproducible development environments with CUDA support.

Prerequisites:

Nix with flakes enabled
direnv with nix-direnv
NVIDIA GPU with drivers installed

# Clone and enter the directory
git clone https://github.com/lokaltog/barkdown.git
cd barkdown

# Allow direnv (loads the flake automatically)
direnv allow

# The environment provides:
# - Python 3.13 with uv
# - CUDA 12.x + cuDNN
# - PortAudio (connects to PulseAudio/PipeWire)
# - Audio tools (ffmpeg, sox)

On first load, direnv activates the flake and displays:

🎤 Parakeet ASR Development Environment
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Python: Python 3.13.x
CUDA: 12.x.x
cuDNN: 9.x.x
GPU: NVIDIA GeForce RTX 4090

Manual Installation

# Requires Python 3.13+, CUDA 12.x, and system audio libraries
pip install -e .
# or with uv
uv sync

Quick Start

barkdown               # Default loop mode
barkdown --auto-refine # Skip confirmation, auto-send to Claude
barkdown --studio      # Studio preset (aggressive VAD)
barkdown --long-form   # Long recording preset
barkdown --no-loop     # Single transcription then exit

Architecture

flowchart TB
    subgraph TUI["Terminal UI (Textual)"]
        APP[BarkdownApp]
        PC[PipelineController]
        SP[Spectrogram]
        HP[History Panel]
        PP[Preview Panel]
    end

    subgraph Pipeline["Pipeline (Async)"]
        ORCH[PipelineOrchestrator]
        WWD[WakeWordDetector\nOpenWakeWord/CPU]
        STR[StreamingTranscriber\nParakeet TDT/GPU]
        REF[ClaudeRefiner\nclaude-agent-sdk]
        VAD[Silero VAD]
    end

    subgraph Persistence
        HIST[(history.json)]
    end

    APP --> PC
    PC <-->|events| ORCH
    ORCH --> WWD
    ORCH --> STR
    ORCH --> REF
    STR --> VAD

    APP --> SP
    APP --> HP
    APP --> PP
    HP <--> HIST

Key Components

Component	Technology	Purpose
Wake Word	OpenWakeWord (CPU)	Detects "Hey Jarvis" activation phrase
ASR	NVIDIA Parakeet TDT 0.6B (GPU)	Streaming speech-to-text
VAD	Silero VAD	Voice activity detection for auto-stop
Refinement	Claude via claude-agent-sdk	Transforms dictation into LLM prompts
TUI	Textual	Async terminal interface
Persistence	JSON	History storage at `~/.local/share/barkdown/history.json`

State Machine

The pipeline operates as an event-driven state machine:

stateDiagram-v2
    [*] --> IDLE
    IDLE --> LOADING: StartListening
    LOADING --> LISTENING_WAKE: ModelsLoaded

    LISTENING_WAKE --> RECORDING: WakeWordDetected
    LISTENING_WAKE --> IDLE: AbortOperation

    RECORDING --> TRANSCRIBING: SilenceDetected
    RECORDING --> IDLE: AbortOperation

    TRANSCRIBING --> AWAITING_CONFIRM: !auto_refine
    TRANSCRIBING --> REFINING: auto_refine
    TRANSCRIBING --> IDLE: AbortOperation

    AWAITING_CONFIRM --> REFINING: ConfirmRefinement(yes)
    AWAITING_CONFIRM --> COMPLETE: ConfirmRefinement(no)
    AWAITING_CONFIRM --> IDLE: AbortOperation

    REFINING --> COMPLETE: RefinementDone
    REFINING --> IDLE: AbortOperation

    COMPLETE --> LISTENING_WAKE: ContinueLoop
    COMPLETE --> IDLE: !loop_mode
    COMPLETE --> REFINING: RetryRefinement

    ERROR --> IDLE: ResetFromError

    note right of LISTENING_WAKE: CPU-only\n(OpenWakeWord)
    note right of RECORDING: VAD monitors\nfor silence
    note right of TRANSCRIBING: GPU inference\n(Parakeet)
    note right of REFINING: Claude API call

States

State	Description
`IDLE`	Waiting for user to initiate
`LOADING`	Loading ML models (ASR, wake word)
`LISTENING_WAKE`	Listening for wake word on CPU
`RECORDING`	Recording audio, VAD monitors for silence
`TRANSCRIBING`	Processing audio through Parakeet ASR
`AWAITING_CONFIRM`	Showing transcription, waiting for user to confirm refinement
`REFINING`	Claude is transforming raw text into polished prompt
`COMPLETE`	Output ready, copied to clipboard
`ERROR`	Error state, can reset to IDLE

Events

User Actions:

StartListening — Begin the pipeline
AbortOperation — Cancel current operation (works from any interruptible state)
ConfirmRefinement — Confirm or skip Claude refinement
RetryRefinement — Re-run refinement from COMPLETE state
ResetFromError — Clear error and return to IDLE

Internal Events:

ModelsLoaded — Models ready, proceed to listening
WakeWordDetected — Wake phrase detected, start recording
SilenceDetected — Silence threshold reached, stop recording
TranscriptionComplete — ASR finished
RefinementDone — Claude refinement complete
ContinueLoop — Continue to next iteration (loop mode)

Data Flow

sequenceDiagram
    autonumber
    participant User
    participant TUI as BarkdownApp
    participant Orch as PipelineOrchestrator
    participant Wake as WakeWordDetector
    participant ASR as StreamingTranscriber
    participant Claude as ClaudeRefiner
    participant Clip as Clipboard

    User->>TUI: Press Enter
    TUI->>Orch: StartListening
    Orch->>Orch: Load models (if needed)
    Orch->>Wake: Start detection

    User->>Wake: "Hey Jarvis"
    Wake-->>Orch: WakeWordDetected

    Orch->>ASR: Start streaming

    loop While speaking
        User->>ASR: Audio chunks
        ASR-->>TUI: Partial transcription
        ASR->>ASR: VAD monitoring
    end

    ASR-->>Orch: SilenceDetected
    Orch->>ASR: Finalize
    ASR-->>Orch: TranscriptionComplete

    alt auto_refine = false
        Orch-->>TUI: AWAITING_CONFIRM
        TUI->>User: Show transcription
        User->>TUI: Confirm (Y)
        TUI->>Orch: ConfirmRefinement(yes)
    end

    Orch->>Claude: Refine raw text
    Claude-->>Orch: RefinementDone
    Orch-->>TUI: COMPLETE
    TUI->>Clip: Copy refined text
    TUI->>User: Show result

Configuration

Configuration via ~/.config/barkdown/config.toml or environment variables (BARKDOWN_*):

# Audio
sample_rate = 16000
channels = 1

# Wake word
wake_word_model = "hey_jarvis"
wake_word_threshold = 0.5

# ASR
parakeet_model = "nvidia/parakeet-tdt-0.6b-v3"

# VAD / Recording
vad_threshold = 0.5
silence_threshold_sec = 1.5
max_recording_sec = 60.0

# Claude
claude_model = "claude-opus-4-5-20251101"
auto_refine = false

# Audio processing
normalize_audio = false
strip_silence = true

Presets

Studio (--studio) — For high-quality studio equipment:

Higher VAD threshold (0.7)
Faster endpoint detection (1.0s silence)
LUFS normalization enabled

Long-form (--long-form) — For meetings, lectures:

Lower VAD threshold (0.3)
Longer silence tolerance (3.0s)
1 hour max recording
Auto-chunking for memory efficiency

Keyboard Shortcuts

Key	Action
`Enter`	Start listening / Confirm
`Y`	Confirm refinement
`N`	Skip refinement
`Escape`	Abort current operation
`R`	Retry refinement
`C`	Copy refined text
`Shift+C`	Copy raw transcription
`?`	Show help
`Q`	Quit

Requirements

Hardware:

NVIDIA GPU with CUDA 12.x support
~4GB VRAM (Parakeet TDT 0.6B)
Microphone

Software (handled by Nix flake):

Python 3.13+
CUDA 12.x + cuDNN
PulseAudio/PipeWire + PortAudio

API Keys:

ANTHROPIC_API_KEY for Claude refinement

Development

With Nix (recommended)

# direnv auto-loads the environment on cd
cd barkdown

# Install Python dependencies (auto-runs via devenv)
uv sync --extra dev

# Run the app
barkdown

# Run tests (use cuda-uv for GPU tests)
cuda-uv run pytest

# Type checking
pyright

# Linting
ruff check .

Without Nix

Ensure CUDA 12.x, cuDNN, and PortAudio are installed system-wide, then:

uv sync --extra dev
uv run barkdown

Design Principles

Async-first — All pipeline components use async/await with thread executors for blocking I/O
Process isolation — Pipeline runs in subprocess; PyTorch/CUDA blocks asyncio even with executors
Event-driven — State machine with explicit transitions and events
Lazy loading — Heavy dependencies (NeMo, OpenWakeWord) imported on first use for fast startup
Non-blocking TUI — Textual message passing keeps UI responsive

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
barkdown		barkdown
tests		tests
.envrc		.envrc
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
flake.lock		flake.lock
flake.nix		flake.nix
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Barkdown

Features

Installation

NixOS / Nix Flakes (Recommended)

Manual Installation

Quick Start

Architecture

Key Components

State Machine

States

Events

Data Flow

Configuration

Presets

Keyboard Shortcuts

Requirements

Development

With Nix (recommended)

Without Nix

Design Principles

License

About

Uh oh!

Releases

Packages

Languages

Lokaltog/barkdown

Folders and files

Latest commit

History

Repository files navigation

Barkdown

Features

Installation

NixOS / Nix Flakes (Recommended)

Manual Installation

Quick Start

Architecture

Key Components

State Machine

States

Events

Data Flow

Configuration

Presets

Keyboard Shortcuts

Requirements

Development

With Nix (recommended)

Without Nix

Design Principles

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages