Whisper performance + fast-paste migration + developer UX#196
Open
ericmason wants to merge 15 commits intoSuperCmdLabs:mainfrom
Open
Whisper performance + fast-paste migration + developer UX#196ericmason wants to merge 15 commits intoSuperCmdLabs:mainfrom
ericmason wants to merge 15 commits intoSuperCmdLabs:mainfrom
Conversation
Two performance improvements that eliminate the main latency sources in the whisper dictation flow: 1. Persistent whisper.cpp server process — keeps the model loaded in memory between transcriptions instead of spawning a new process (and reloading ~150MB from disk) for every transcription. Mirrors the existing Parakeet/Qwen3 server pattern. Pre-warms on app startup so the first session is instant. 2. Native N-API fast-paste addon for text insertion — replaces the osascript-based activate + paste in type-text-live and whisper-type-text-live with an in-process CGEvent call. Eliminates ~200-300ms of process spawn overhead per text entry. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…-close Major whisper dictation improvements: - Enabled Metal GPU acceleration (use_gpu=true, flash_attn=true) for whisper.cpp — transcription ~4x faster (4s → ~500ms on M1 Pro) - Native mic capture via AVAudioEngine in the whisper-transcriber server, bypassing Chromium's getUserMedia (~400ms → ~1ms startup) - Model size selection (tiny/base/small/medium/large) with download validation and cached settings to avoid repeated disk reads - Auto-close whisper bar after dictation (on by default, configurable) - Fixed warmup banner flashing when server is already warm - Show proper error when model not downloaded instead of "warming up" - Singleton whisper overlay window to prevent duplicates - Coachmark hint only shows once per session instead of every dictation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The whisper-transcriber server now computes 13-band FFT from the live mic input and emits frequency levels every 50ms. The renderer drives the waveform bars from real audio data instead of random animation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Squelch low-level background noise (below 15% gets dampened to 20%) and halved minimum bar heights so quiet = nearly flat bars, speaking = prominent response. More dynamic range overall. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Background noise below -25dB now reads as zero at the source, so the waveform bars stay flat when quiet. Speech still registers clearly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
If the native AVAudioEngine capture fails (e.g. on older Macs or permission issues), falls back to the Chromium getUserMedia path instead of showing an error. Ensures compatibility across all macOS versions and hardware. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- New "Initial Prompt" textarea in whisper settings to guide transcription with vocabulary, names, or style hints (sets whisper.cpp initial_prompt) - Double-tap activation key within 3s of finishing dictation re-types the last transcript without re-recording — prevents losing long dictations that landed in the wrong input Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Removed .claude/settings.local.json from tracking, added to .gitignore - Extracted duplicated fast-paste addon code to shared tryFastPaste() helper - Typed whisperCppServerProcess as ChildProcess | null instead of any - Replaced untyped (window as any).__scWhisperNativeCapture with module-level variable - Moved DOUBLE_TAP_WINDOW_MS to module level (was recreated every render) - Fixed null stream dereference on warmup failure path - Documented usleep trade-off in fast_paste.mm Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Vite's dev-server port auto-increments when 5173 is taken, so a static `wait-on http://localhost:5173` is unreliable. The renderer and main process now agree on the URL via `dist/.vite-dev-url`, written by a Vite plugin on `listening` and read by main via `getDevServerBaseUrl()`. `start:electron` waits for the file to exist before launching Electron. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Harden `sendWhisperCppRequest` against a null `stdin` so we reject instead of throwing on a dead server process. - Log hotkey registration outcome (success / failure / exception) so OS-level hotkey conflicts are visible in the console instead of silently swallowed. - Log hold-watcher spawn + exit (code + signal) for whisper hold mode. - Add speak-toggle toggle-off path: a second hotkey press now closes the whisper overlay (with a 350ms debounce to swallow the initial open). - Gate the first-launch auto-open-launcher behavior on `!hasSeenOnboarding`; returning users stay in the background until the global hotkey is pressed. - Log `[Whisper][whisper-type-live]` path taken (fast-paste success vs osascript fallback) to aid dictation debugging. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Previously only the whisper type-live path used the native fast-paste
addon. Every other paste/type path still shelled out to osascript
(`keystroke "v" using command down` / `key code 51` loops), which has
three problems:
1. Spawning osascript is 200-300ms per call.
2. It requires the separate Automation -> System Events TCC grant,
which many users deny (the prompt text is unhelpful).
3. macOS error 1002 ("osascript is not allowed to send keystrokes")
is silently swallowed, producing no user-visible feedback.
This commit routes every paste/type path through the CGEvent addon,
falling back to osascript only when the addon is unavailable or
returns false:
- Shared module-level helpers: `tryFastPaste`, `tryFastType`,
`tryFastReplaceAndPaste`, `tryFastReplaceAndType`, plus addon
caching via `getFastPasteAddon`.
- Leaf functions `pasteTextToActiveApp`, `typeTextDirectly`,
`replaceTextDirectly`, `replaceTextViaBackspaceAndPaste` now try
the fast path first.
- `hideAndPaste` (clipboard history / snippet paste / paste-file /
paste-text) uses the addon's `activateAndPaste`.
- `expandSnippetKeywordInPlace` (snippet keyword auto-expansion)
now does backspaces + paste via CGEvent.
- Removed the duplicate inner `tryFastPaste` from the IPC handler
block (dedup).
Native addon changes (`fast_paste.mm`):
- Add `NSApplicationActivateIgnoringOtherApps` to the app activation
call. Without this, `activateWithOptions:0` is a no-op on modern
macOS when the caller isn't already frontmost -- after our window
hides, we aren't, so the target app never came to the front and
the Cmd+V posted into whatever macOS focused next.
- `postBackspaces(count)`: N backspace key events via CGEvent.
- `postText(text)`: Unicode string input via
`CGEventKeyboardSetUnicodeString`, handles surrogate pairs.
- `isAccessibilityTrusted()` / `requestAccessibilityTrust()`: expose
the AX trust check + prompt to JS (used by onboarding).
- stderr diagnostic logging for activation failures.
Startup logs the Accessibility trust state once so dev-mode silent
failures (when the Electron binary lacks the grant) are observable.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The Cmd+Option+I DevTools shortcut was registered via
`globalShortcut.register` in dev mode, which intercepts the combo
system-wide -- pressing it in Chrome or VS Code would steal the combo
and pop open SuperCmd's DevTools instead of the focused app's.
Scoped the shortcut to per-webContents `before-input-event` handlers
so it only fires when a SuperCmd window has keyboard focus. Wired
`installDevToolsInputHandler` via `app.on('web-contents-created', ...)`
so it automatically covers all current and future windows (main,
settings, detached popups, extension views).
Also exposes `webContents` from the top-level electron destructure.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
`./node_modules/.bin/electron .` runs Electron as a CLI child of whatever shell spawned it. macOS attributes TCC (Accessibility, Input Monitoring, etc.) to the "responsible process" at the top of the spawn chain -- for terminal launches that's Terminal/iTerm, and for SSH sessions it's `sshd-keygen-wrapper`. Granting Accessibility to either of those would be a severe security hole (every future terminal/SSH child would inherit full keystroke control), so the Accessibility prompt for dev Electron was effectively unusable. `open -n` hands off to `launchd`, which creates a fresh TCC responsibility chain rooted at the launched app itself. The prompt correctly identifies "Electron" and the grant is scoped to that exact binary signature. Scripts added: - `dev:gui` -- build, watch, run Vite, and launch via `open -n --env ... -a Electron.app --args "$PWD"`. - `start:electron:gui` -- the actual `open -n` invocation. - `stop:dev` -- `pkill -9 -f "$PWD/node_modules/electron"`, scoped to this project's Electron so it won't touch Slack, VS Code, Discord, etc. - `restart:dev` -- `stop:dev && dev:gui`. - `build:native:fast-paste` -- rebuild + copy just the fast-paste addon (~2s) for iteration, vs the full `build:native` which also rebuilds 8 Swift helpers + whispercpp + parakeet. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…sing Users who completed onboarding once but later lose a core permission (revoked it in System Settings, reinstalled Electron during dev, reset TCC, etc.) had no signal -- fast-paste silently fell back to the slow osascript path (if Automation was granted) or silently failed outright. They'd have to dig through Settings to find a "grant permissions" flow. On launcher mount, now check `check-onboarding-permissions` via IPC. If `hasSeenOnboarding` is true but Accessibility or Input Monitoring is missing, re-open the existing onboarding jumped to the Permissions step (index 3) rather than replaying all 7 intro steps. Feature- specific permissions (microphone / speech-recognition / home-folder) don't force re-onboarding -- users who don't use those features shouldn't be nagged. - Adds `initialStepIndex?: number` prop to `OnboardingExtension`; initializes the step state from it with bounds clamping. - Exports `ONBOARDING_STEP_PERMISSIONS` so callers don't hard-code the step index. - `onClose` / `onComplete` now also clear the initial-step state so subsequent natural openings start at step 0. - Shortcut-fix gate stays tied to true first-launch only; re-entry for permission re-grant doesn't block on shortcut validation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… apps - Make the detached whisper / memory-status overlay truly inert on macOS: use `type: 'panel'`, create with `show: false`, then `showInactive()` on `ready-to-show`. The default `show: true` path routed through `makeKeyAndOrderFront:` and triggered `[NSApp activate]` even with `focusable: false`, which blurred whichever app was frontmost and caused launcher-style apps (e.g. Keepers) to hide-on-blur. - Remove the 50/180/360 ms `activateLastFrontmostApp()` focus-restore loop that fired whenever a detached overlay opened with `preserveFocusWhenHidden`. With the inactive panel, nothing steals focus — so the restore loop was itself re-ordering windows in other apps for no reason. - Shorten the whisper double-tap resend window from 3000 ms to 300 ms so a real double-tap reads as resend and a casual second trigger doesn't replay the last transcript.
3cd0219 to
d0179a7
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This branch lands a set of whisper + paste-performance improvements plus the dev-UX changes needed to test them reliably.
Core feature work:
getUserMediafallback, FFT waveform visualization, initial prompt setting, double-tap resend, model size selector).prompt-apply-generated-text,replace-live-text,paste-file,paste-text) now uses the native CGEvent addon instead ofosascript. osascript preserved as a safety-net fallback.Bundled bug fixes and dev UX:
activateWithOptions:0is a no-op on modern macOS when the caller isn't already frontmost, which we aren't aftermainWindow.hide(). Now usesNSApplicationActivateIgnoringOtherApps, so ⌘V actually fires into the previously-frontmost app instead of whatever macOS focused next.⌘⌥Iwas registered viaglobalShortcut.registerand intercepted the combo system-wide in dev mode. Scoped to SuperCmd's own `webContents` viabefore-input-event— Chrome/VS Code⌘⌥Iworks normally again.open -nlaunch scripts:./node_modules/.bin/electron .attributes TCC to whatever terminal/SSH daemon spawned it.npm run dev:guiusesopen -n --env ... -a Electron.appso launchd roots TCC responsibility in Electron itself and the Accessibility prompt correctly says "Electron."whisper-speak-toggledebounce; first-launch auto-open is now gated on!hasSeenOnboarding.Whisper indicator: never steal focus (latest update)
Added after initial review to fix a long-standing side effect where showing the whisper overlay blurred the previously-frontmost app:
type: 'panel'+show: false, then surfaced viachildWindow.showInactive()onready-to-show. The defaultshow: truepath routes throughmakeKeyAndOrderFront:and triggers[NSApp activate]even withfocusable: false, which was blurring launcher-style apps the user had open and causing them to hide-on-blur.activateLastFrontmostApp()focus-restore loop that ran whenever a detached overlay opened withpreserveFocusWhenHidden. With the inactive panel the overlay never takes focus, so the "restore" loop was itself re-ordering windows in other apps for no reason.Why the paste migration matters
The
osascriptpaste path has three problems:osascriptspawn.Automation → System Events, a separate TCC grant that many users reject. The prompt text is unhelpful.The native CGEvent path only needs Accessibility (which users expect for a launcher) and is roughly an order of magnitude faster. Every leaf function now tries the fast path first and falls back to osascript only if the addon is unavailable or fails — so behavior is strictly a superset of what was there before.
Test plan
⌘⌥Iin SuperCmd launcher opens SuperCmd DevTools⌘⌥Iin Chrome/VS Code opens their DevTools (not SuperCmd)🤖 Generated with Claude Code