Skip to content

Whisper performance + fast-paste migration + developer UX#196

Open
ericmason wants to merge 15 commits intoSuperCmdLabs:mainfrom
ericmason:feat/whisper-performance
Open

Whisper performance + fast-paste migration + developer UX#196
ericmason wants to merge 15 commits intoSuperCmdLabs:mainfrom
ericmason:feat/whisper-performance

Conversation

@ericmason
Copy link
Copy Markdown
Contributor

@ericmason ericmason commented Apr 10, 2026

Summary

This branch lands a set of whisper + paste-performance improvements plus the dev-UX changes needed to test them reliably.

Core feature work:

  • Persistent whisper.cpp server + native fast-paste for dictation (initial commits: Metal GPU backend, native mic capture with getUserMedia fallback, FFT waveform visualization, initial prompt setting, double-tap resend, model size selector).
  • Finish the fast-paste migration — every paste/type path (clipboard history, snippet paste, snippet keyword auto-expand, prompt-apply-generated-text, replace-live-text, paste-file, paste-text) now uses the native CGEvent addon instead of osascript. osascript preserved as a safety-net fallback.

Bundled bug fixes and dev UX:

  • Fix — app activation in native addon: activateWithOptions:0 is a no-op on modern macOS when the caller isn't already frontmost, which we aren't after mainWindow.hide(). Now uses NSApplicationActivateIgnoringOtherApps, so ⌘V actually fires into the previously-frontmost app instead of whatever macOS focused next.
  • Fix — DevTools shortcut scope: ⌘⌥I was registered via globalShortcut.register and intercepted the combo system-wide in dev mode. Scoped to SuperCmd's own `webContents` via before-input-event — Chrome/VS Code ⌘⌥I works normally again.
  • Dev UX — open -n launch scripts: ./node_modules/.bin/electron . attributes TCC to whatever terminal/SSH daemon spawned it. npm run dev:gui uses open -n --env ... -a Electron.app so launchd roots TCC responsibility in Electron itself and the Accessibility prompt correctly says "Electron."
  • Feat — onboarding re-check: if the user has seen onboarding but is missing a core permission (Accessibility or Input Monitoring), the launcher reopens at the Permissions step instead of replaying all 7 intro steps. Handles the common case of reinstalling Electron in dev and losing the per-binary TCC grant.
  • Whisper stability: extra logging around hotkey registration, hold watcher, and type-live; whisper-speak-toggle debounce; first-launch auto-open is now gated on !hasSeenOnboarding.

Whisper indicator: never steal focus (latest update)

Added after initial review to fix a long-standing side effect where showing the whisper overlay blurred the previously-frontmost app:

  • Detached whisper / memory-status overlays are now fully passive on macOS: created with type: 'panel' + show: false, then surfaced via childWindow.showInactive() on ready-to-show. The default show: true path routes through makeKeyAndOrderFront: and triggers [NSApp activate] even with focusable: false, which was blurring launcher-style apps the user had open and causing them to hide-on-blur.
  • Removed the 50/180/360 ms activateLastFrontmostApp() focus-restore loop that ran whenever a detached overlay opened with preserveFocusWhenHidden. With the inactive panel the overlay never takes focus, so the "restore" loop was itself re-ordering windows in other apps for no reason.
  • Shortened the whisper double-tap resend window from 3000 ms → 300 ms so a real double-tap reads as resend and a casual second trigger doesn't replay the last transcript.

Why the paste migration matters

The osascript paste path has three problems:

  1. Slow — ~200–300ms per osascript spawn.
  2. Permission cliff — requires Automation → System Events, a separate TCC grant that many users reject. The prompt text is unhelpful.
  3. Silent failure — error 1002 ("osascript is not allowed to send keystrokes") is swallowed and produces no user-visible feedback.

The native CGEvent path only needs Accessibility (which users expect for a launcher) and is roughly an order of magnitude faster. Every leaf function now tries the fast path first and falls back to osascript only if the addon is unavailable or fails — so behavior is strictly a superset of what was there before.

Test plan

  • First-launch onboarding still appears for fresh installs
  • Revoke Accessibility in System Settings, relaunch → onboarding opens at Permissions step
  • Clipboard history → Enter pastes into previously-frontmost app
  • Snippet paste via Enter works
  • Snippet keyword auto-expand (e.g. `;email`) works
  • Whisper dictation types into target app (both streaming and final paste)
  • Triggering the whisper overlay while another launcher-style app is frontmost does not cause that app to hide
  • Double-tapping the whisper hotkey within 300 ms resends the last transcript; a slower second tap starts a fresh recording
  • Global AI prompt "apply generated text" replaces selection
  • ⌘⌥I in SuperCmd launcher opens SuperCmd DevTools
  • ⌘⌥I in Chrome/VS Code opens their DevTools (not SuperCmd)

🤖 Generated with Claude Code

@ericmason ericmason changed the title perf: persistent whisper.cpp server + native fast-paste for dictation Whisper performance + fast-paste migration + developer UX Apr 14, 2026
ericmason and others added 15 commits April 14, 2026 12:49
Two performance improvements that eliminate the main latency sources
in the whisper dictation flow:

1. Persistent whisper.cpp server process — keeps the model loaded in
   memory between transcriptions instead of spawning a new process
   (and reloading ~150MB from disk) for every transcription. Mirrors
   the existing Parakeet/Qwen3 server pattern. Pre-warms on app
   startup so the first session is instant.

2. Native N-API fast-paste addon for text insertion — replaces the
   osascript-based activate + paste in type-text-live and
   whisper-type-text-live with an in-process CGEvent call. Eliminates
   ~200-300ms of process spawn overhead per text entry.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…-close

Major whisper dictation improvements:

- Enabled Metal GPU acceleration (use_gpu=true, flash_attn=true) for
  whisper.cpp — transcription ~4x faster (4s → ~500ms on M1 Pro)
- Native mic capture via AVAudioEngine in the whisper-transcriber server,
  bypassing Chromium's getUserMedia (~400ms → ~1ms startup)
- Model size selection (tiny/base/small/medium/large) with download
  validation and cached settings to avoid repeated disk reads
- Auto-close whisper bar after dictation (on by default, configurable)
- Fixed warmup banner flashing when server is already warm
- Show proper error when model not downloaded instead of "warming up"
- Singleton whisper overlay window to prevent duplicates
- Coachmark hint only shows once per session instead of every dictation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The whisper-transcriber server now computes 13-band FFT from the live
mic input and emits frequency levels every 50ms. The renderer drives
the waveform bars from real audio data instead of random animation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Squelch low-level background noise (below 15% gets dampened to 20%)
and halved minimum bar heights so quiet = nearly flat bars, speaking
= prominent response. More dynamic range overall.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Background noise below -25dB now reads as zero at the source, so the
waveform bars stay flat when quiet. Speech still registers clearly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
If the native AVAudioEngine capture fails (e.g. on older Macs or
permission issues), falls back to the Chromium getUserMedia path
instead of showing an error. Ensures compatibility across all macOS
versions and hardware.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- New "Initial Prompt" textarea in whisper settings to guide transcription
  with vocabulary, names, or style hints (sets whisper.cpp initial_prompt)
- Double-tap activation key within 3s of finishing dictation re-types the
  last transcript without re-recording — prevents losing long dictations
  that landed in the wrong input

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Removed .claude/settings.local.json from tracking, added to .gitignore
- Extracted duplicated fast-paste addon code to shared tryFastPaste() helper
- Typed whisperCppServerProcess as ChildProcess | null instead of any
- Replaced untyped (window as any).__scWhisperNativeCapture with module-level variable
- Moved DOUBLE_TAP_WINDOW_MS to module level (was recreated every render)
- Fixed null stream dereference on warmup failure path
- Documented usleep trade-off in fast_paste.mm

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Vite's dev-server port auto-increments when 5173 is taken, so a static
`wait-on http://localhost:5173` is unreliable. The renderer and main
process now agree on the URL via `dist/.vite-dev-url`, written by a
Vite plugin on `listening` and read by main via `getDevServerBaseUrl()`.
`start:electron` waits for the file to exist before launching Electron.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Harden `sendWhisperCppRequest` against a null `stdin` so we reject
  instead of throwing on a dead server process.
- Log hotkey registration outcome (success / failure / exception) so
  OS-level hotkey conflicts are visible in the console instead of
  silently swallowed.
- Log hold-watcher spawn + exit (code + signal) for whisper hold mode.
- Add speak-toggle toggle-off path: a second hotkey press now closes the
  whisper overlay (with a 350ms debounce to swallow the initial open).
- Gate the first-launch auto-open-launcher behavior on
  `!hasSeenOnboarding`; returning users stay in the background until the
  global hotkey is pressed.
- Log `[Whisper][whisper-type-live]` path taken (fast-paste success vs
  osascript fallback) to aid dictation debugging.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Previously only the whisper type-live path used the native fast-paste
addon. Every other paste/type path still shelled out to osascript
(`keystroke "v" using command down` / `key code 51` loops), which has
three problems:

1. Spawning osascript is 200-300ms per call.
2. It requires the separate Automation -> System Events TCC grant,
   which many users deny (the prompt text is unhelpful).
3. macOS error 1002 ("osascript is not allowed to send keystrokes")
   is silently swallowed, producing no user-visible feedback.

This commit routes every paste/type path through the CGEvent addon,
falling back to osascript only when the addon is unavailable or
returns false:

- Shared module-level helpers: `tryFastPaste`, `tryFastType`,
  `tryFastReplaceAndPaste`, `tryFastReplaceAndType`, plus addon
  caching via `getFastPasteAddon`.
- Leaf functions `pasteTextToActiveApp`, `typeTextDirectly`,
  `replaceTextDirectly`, `replaceTextViaBackspaceAndPaste` now try
  the fast path first.
- `hideAndPaste` (clipboard history / snippet paste / paste-file /
  paste-text) uses the addon's `activateAndPaste`.
- `expandSnippetKeywordInPlace` (snippet keyword auto-expansion)
  now does backspaces + paste via CGEvent.
- Removed the duplicate inner `tryFastPaste` from the IPC handler
  block (dedup).

Native addon changes (`fast_paste.mm`):
- Add `NSApplicationActivateIgnoringOtherApps` to the app activation
  call. Without this, `activateWithOptions:0` is a no-op on modern
  macOS when the caller isn't already frontmost -- after our window
  hides, we aren't, so the target app never came to the front and
  the Cmd+V posted into whatever macOS focused next.
- `postBackspaces(count)`: N backspace key events via CGEvent.
- `postText(text)`: Unicode string input via
  `CGEventKeyboardSetUnicodeString`, handles surrogate pairs.
- `isAccessibilityTrusted()` / `requestAccessibilityTrust()`: expose
  the AX trust check + prompt to JS (used by onboarding).
- stderr diagnostic logging for activation failures.

Startup logs the Accessibility trust state once so dev-mode silent
failures (when the Electron binary lacks the grant) are observable.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The Cmd+Option+I DevTools shortcut was registered via
`globalShortcut.register` in dev mode, which intercepts the combo
system-wide -- pressing it in Chrome or VS Code would steal the combo
and pop open SuperCmd's DevTools instead of the focused app's.

Scoped the shortcut to per-webContents `before-input-event` handlers
so it only fires when a SuperCmd window has keyboard focus. Wired
`installDevToolsInputHandler` via `app.on('web-contents-created', ...)`
so it automatically covers all current and future windows (main,
settings, detached popups, extension views).

Also exposes `webContents` from the top-level electron destructure.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
`./node_modules/.bin/electron .` runs Electron as a CLI child of
whatever shell spawned it. macOS attributes TCC (Accessibility, Input
Monitoring, etc.) to the "responsible process" at the top of the
spawn chain -- for terminal launches that's Terminal/iTerm, and for
SSH sessions it's `sshd-keygen-wrapper`. Granting Accessibility to
either of those would be a severe security hole (every future
terminal/SSH child would inherit full keystroke control), so the
Accessibility prompt for dev Electron was effectively unusable.

`open -n` hands off to `launchd`, which creates a fresh TCC
responsibility chain rooted at the launched app itself. The prompt
correctly identifies "Electron" and the grant is scoped to that
exact binary signature.

Scripts added:
- `dev:gui` -- build, watch, run Vite, and launch via `open -n
  --env ... -a Electron.app --args "$PWD"`.
- `start:electron:gui` -- the actual `open -n` invocation.
- `stop:dev` -- `pkill -9 -f "$PWD/node_modules/electron"`,
  scoped to this project's Electron so it won't touch Slack,
  VS Code, Discord, etc.
- `restart:dev` -- `stop:dev && dev:gui`.
- `build:native:fast-paste` -- rebuild + copy just the fast-paste
  addon (~2s) for iteration, vs the full `build:native` which also
  rebuilds 8 Swift helpers + whispercpp + parakeet.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…sing

Users who completed onboarding once but later lose a core permission
(revoked it in System Settings, reinstalled Electron during dev,
reset TCC, etc.) had no signal -- fast-paste silently fell back to
the slow osascript path (if Automation was granted) or silently
failed outright. They'd have to dig through Settings to find a
"grant permissions" flow.

On launcher mount, now check `check-onboarding-permissions` via IPC.
If `hasSeenOnboarding` is true but Accessibility or Input Monitoring
is missing, re-open the existing onboarding jumped to the Permissions
step (index 3) rather than replaying all 7 intro steps. Feature-
specific permissions (microphone / speech-recognition / home-folder)
don't force re-onboarding -- users who don't use those features
shouldn't be nagged.

- Adds `initialStepIndex?: number` prop to `OnboardingExtension`;
  initializes the step state from it with bounds clamping.
- Exports `ONBOARDING_STEP_PERMISSIONS` so callers don't hard-code
  the step index.
- `onClose` / `onComplete` now also clear the initial-step state so
  subsequent natural openings start at step 0.
- Shortcut-fix gate stays tied to true first-launch only; re-entry
  for permission re-grant doesn't block on shortcut validation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… apps

- Make the detached whisper / memory-status overlay truly inert on macOS:
  use `type: 'panel'`, create with `show: false`, then `showInactive()` on
  `ready-to-show`. The default `show: true` path routed through
  `makeKeyAndOrderFront:` and triggered `[NSApp activate]` even with
  `focusable: false`, which blurred whichever app was frontmost and
  caused launcher-style apps (e.g. Keepers) to hide-on-blur.
- Remove the 50/180/360 ms `activateLastFrontmostApp()` focus-restore
  loop that fired whenever a detached overlay opened with
  `preserveFocusWhenHidden`. With the inactive panel, nothing steals
  focus — so the restore loop was itself re-ordering windows in other
  apps for no reason.
- Shorten the whisper double-tap resend window from 3000 ms to 300 ms
  so a real double-tap reads as resend and a casual second trigger
  doesn't replay the last transcript.
@ericmason ericmason force-pushed the feat/whisper-performance branch from 3cd0219 to d0179a7 Compare April 14, 2026 20:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant