Voice-to-prompt pipeline: wake word detection → streaming STT → Claude refinement.
flowchart LR
subgraph Input
M[🎙️ Microphone]
end
subgraph Pipeline
W[Wake Word\nDetection]
T[Streaming\nTranscription]
C[Claude\nRefinement]
end
subgraph Output
CB[📋 Clipboard]
end
M --> W --> T --> C --> CB
- Wake word activation — Say "Hey Jarvis" to start (hands-free)
- Real-time streaming — See transcription as you speak
- Best-in-class accuracy — Parakeet TDT 0.6B v3 (#1 on HuggingFace Open ASR Leaderboard)
- Auto endpoint detection — Recording stops automatically when you stop speaking
- Claude refinement — Transform raw dictation into polished LLM prompts
- GPU-accelerated — <100ms inference on RTX 4090
The project uses a Nix flake with devenv for reproducible development environments with CUDA support.
Prerequisites:
- Nix with flakes enabled
- direnv with nix-direnv
- NVIDIA GPU with drivers installed
# Clone and enter the directory
git clone https://github.com/lokaltog/barkdown.git
cd barkdown
# Allow direnv (loads the flake automatically)
direnv allow
# The environment provides:
# - Python 3.13 with uv
# - CUDA 12.x + cuDNN
# - PortAudio (connects to PulseAudio/PipeWire)
# - Audio tools (ffmpeg, sox)On first load, direnv activates the flake and displays:
🎤 Parakeet ASR Development Environment
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Python: Python 3.13.x
CUDA: 12.x.x
cuDNN: 9.x.x
GPU: NVIDIA GeForce RTX 4090
# Requires Python 3.13+, CUDA 12.x, and system audio libraries
pip install -e .
# or with uv
uv syncbarkdown # Default loop mode
barkdown --auto-refine # Skip confirmation, auto-send to Claude
barkdown --studio # Studio preset (aggressive VAD)
barkdown --long-form # Long recording preset
barkdown --no-loop # Single transcription then exitflowchart TB
subgraph TUI["Terminal UI (Textual)"]
APP[BarkdownApp]
PC[PipelineController]
SP[Spectrogram]
HP[History Panel]
PP[Preview Panel]
end
subgraph Pipeline["Pipeline (Async)"]
ORCH[PipelineOrchestrator]
WWD[WakeWordDetector\nOpenWakeWord/CPU]
STR[StreamingTranscriber\nParakeet TDT/GPU]
REF[ClaudeRefiner\nclaude-agent-sdk]
VAD[Silero VAD]
end
subgraph Persistence
HIST[(history.json)]
end
APP --> PC
PC <-->|events| ORCH
ORCH --> WWD
ORCH --> STR
ORCH --> REF
STR --> VAD
APP --> SP
APP --> HP
APP --> PP
HP <--> HIST
| Component | Technology | Purpose |
|---|---|---|
| Wake Word | OpenWakeWord (CPU) | Detects "Hey Jarvis" activation phrase |
| ASR | NVIDIA Parakeet TDT 0.6B (GPU) | Streaming speech-to-text |
| VAD | Silero VAD | Voice activity detection for auto-stop |
| Refinement | Claude via claude-agent-sdk | Transforms dictation into LLM prompts |
| TUI | Textual | Async terminal interface |
| Persistence | JSON | History storage at ~/.local/share/barkdown/history.json |
The pipeline operates as an event-driven state machine:
stateDiagram-v2
[*] --> IDLE
IDLE --> LOADING: StartListening
LOADING --> LISTENING_WAKE: ModelsLoaded
LISTENING_WAKE --> RECORDING: WakeWordDetected
LISTENING_WAKE --> IDLE: AbortOperation
RECORDING --> TRANSCRIBING: SilenceDetected
RECORDING --> IDLE: AbortOperation
TRANSCRIBING --> AWAITING_CONFIRM: !auto_refine
TRANSCRIBING --> REFINING: auto_refine
TRANSCRIBING --> IDLE: AbortOperation
AWAITING_CONFIRM --> REFINING: ConfirmRefinement(yes)
AWAITING_CONFIRM --> COMPLETE: ConfirmRefinement(no)
AWAITING_CONFIRM --> IDLE: AbortOperation
REFINING --> COMPLETE: RefinementDone
REFINING --> IDLE: AbortOperation
COMPLETE --> LISTENING_WAKE: ContinueLoop
COMPLETE --> IDLE: !loop_mode
COMPLETE --> REFINING: RetryRefinement
ERROR --> IDLE: ResetFromError
note right of LISTENING_WAKE: CPU-only\n(OpenWakeWord)
note right of RECORDING: VAD monitors\nfor silence
note right of TRANSCRIBING: GPU inference\n(Parakeet)
note right of REFINING: Claude API call
| State | Description |
|---|---|
IDLE |
Waiting for user to initiate |
LOADING |
Loading ML models (ASR, wake word) |
LISTENING_WAKE |
Listening for wake word on CPU |
RECORDING |
Recording audio, VAD monitors for silence |
TRANSCRIBING |
Processing audio through Parakeet ASR |
AWAITING_CONFIRM |
Showing transcription, waiting for user to confirm refinement |
REFINING |
Claude is transforming raw text into polished prompt |
COMPLETE |
Output ready, copied to clipboard |
ERROR |
Error state, can reset to IDLE |
User Actions:
StartListening— Begin the pipelineAbortOperation— Cancel current operation (works from any interruptible state)ConfirmRefinement— Confirm or skip Claude refinementRetryRefinement— Re-run refinement from COMPLETE stateResetFromError— Clear error and return to IDLE
Internal Events:
ModelsLoaded— Models ready, proceed to listeningWakeWordDetected— Wake phrase detected, start recordingSilenceDetected— Silence threshold reached, stop recordingTranscriptionComplete— ASR finishedRefinementDone— Claude refinement completeContinueLoop— Continue to next iteration (loop mode)
sequenceDiagram
autonumber
participant User
participant TUI as BarkdownApp
participant Orch as PipelineOrchestrator
participant Wake as WakeWordDetector
participant ASR as StreamingTranscriber
participant Claude as ClaudeRefiner
participant Clip as Clipboard
User->>TUI: Press Enter
TUI->>Orch: StartListening
Orch->>Orch: Load models (if needed)
Orch->>Wake: Start detection
User->>Wake: "Hey Jarvis"
Wake-->>Orch: WakeWordDetected
Orch->>ASR: Start streaming
loop While speaking
User->>ASR: Audio chunks
ASR-->>TUI: Partial transcription
ASR->>ASR: VAD monitoring
end
ASR-->>Orch: SilenceDetected
Orch->>ASR: Finalize
ASR-->>Orch: TranscriptionComplete
alt auto_refine = false
Orch-->>TUI: AWAITING_CONFIRM
TUI->>User: Show transcription
User->>TUI: Confirm (Y)
TUI->>Orch: ConfirmRefinement(yes)
end
Orch->>Claude: Refine raw text
Claude-->>Orch: RefinementDone
Orch-->>TUI: COMPLETE
TUI->>Clip: Copy refined text
TUI->>User: Show result
Configuration via ~/.config/barkdown/config.toml or environment variables (BARKDOWN_*):
# Audio
sample_rate = 16000
channels = 1
# Wake word
wake_word_model = "hey_jarvis"
wake_word_threshold = 0.5
# ASR
parakeet_model = "nvidia/parakeet-tdt-0.6b-v3"
# VAD / Recording
vad_threshold = 0.5
silence_threshold_sec = 1.5
max_recording_sec = 60.0
# Claude
claude_model = "claude-opus-4-5-20251101"
auto_refine = false
# Audio processing
normalize_audio = false
strip_silence = trueStudio (--studio) — For high-quality studio equipment:
- Higher VAD threshold (0.7)
- Faster endpoint detection (1.0s silence)
- LUFS normalization enabled
Long-form (--long-form) — For meetings, lectures:
- Lower VAD threshold (0.3)
- Longer silence tolerance (3.0s)
- 1 hour max recording
- Auto-chunking for memory efficiency
| Key | Action |
|---|---|
Enter |
Start listening / Confirm |
Y |
Confirm refinement |
N |
Skip refinement |
Escape |
Abort current operation |
R |
Retry refinement |
C |
Copy refined text |
Shift+C |
Copy raw transcription |
? |
Show help |
Q |
Quit |
Hardware:
- NVIDIA GPU with CUDA 12.x support
- ~4GB VRAM (Parakeet TDT 0.6B)
- Microphone
Software (handled by Nix flake):
- Python 3.13+
- CUDA 12.x + cuDNN
- PulseAudio/PipeWire + PortAudio
API Keys:
ANTHROPIC_API_KEYfor Claude refinement
# direnv auto-loads the environment on cd
cd barkdown
# Install Python dependencies (auto-runs via devenv)
uv sync --extra dev
# Run the app
barkdown
# Run tests (use cuda-uv for GPU tests)
cuda-uv run pytest
# Type checking
pyright
# Linting
ruff check .Ensure CUDA 12.x, cuDNN, and PortAudio are installed system-wide, then:
uv sync --extra dev
uv run barkdown- Async-first — All pipeline components use async/await with thread executors for blocking I/O
- Process isolation — Pipeline runs in subprocess; PyTorch/CUDA blocks asyncio even with executors
- Event-driven — State machine with explicit transitions and events
- Lazy loading — Heavy dependencies (NeMo, OpenWakeWord) imported on first use for fast startup
- Non-blocking TUI — Textual message passing keeps UI responsive
MIT