[WIP] Add Windows support for meeting audio transcription#1110
Conversation
Here's a visual recap of what changed: |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
PR #1110 — Windows Meeting Transcription Support (Clips Desktop)
This PR extends the Clips desktop app's local Whisper meeting transcription from macOS-only to also support Windows, using cpal for WASAPI microphone and loopback system audio capture. The approach introduces a clean capture/ dispatch module that lets whisper_speech.rs stay platform-agnostic, with macOS re-exporting existing AVAudioEngine/ScreenCaptureKit paths and the new windows.rs implementing WASAPI mic + loopback. CI is updated with libclang setup for whisper-rs bindgen on Windows, and Cargo.toml properly gates cpal and whisper-rs on target platforms.
Risk: Standard — New Rust backend with platform-specific FFI safety implications.
🔴 Blocking Issues
1. System audio hardcodes 48 kHz — missing set_src_rate for Windows loopback
The mic path correctly calls mic_stream.set_src_rate(mic_cap.sample_rate()) after capture starts (line 533 of whisper_speech.rs). The system audio path uses 48000.0 with a comment that says "SCK delivers 48 kHz" — a macOS-only assumption. On Windows, many output devices run at 44.1 kHz. RawSystemCapture.sample_rate is even tagged #[allow(dead_code)], confirming it is never consumed. Result: Whisper resamples Windows system audio with the wrong source rate, degrading or breaking system-audio transcription on 44.1 kHz machines.
Fix: expose sample_rate() on RawSystemCapture and call sys_stream.set_src_rate(sys_cap.sample_rate()) after starting the loopback capture, mirroring the mic path.
2. unsafe impl Send for cpal::Stream on Windows — COM teardown risk
cpal 0.15 explicitly marks its Windows Stream as !Send due to COM thread affinity. The safety comment argues that only moving and dropping across threads is done, but COM-backed objects must be released on their creation thread — dropping from a different thread is exactly what !Send prevents. These handles are moved into a global Mutex<Option<Session>> and dropped from whichever Tauri command thread calls stop(), creating potential UB or crash during teardown.
Fix: keep the stream on a dedicated capture thread and signal teardown via a oneshot channel, or confirm that WASAPI COM objects are free-threaded (apartment model documentation required).
✅ Strengths
- Clean platform dispatch via
capture/mod.rs— zero changes needed inwhisper_speech.rsfor new platforms - Complete sample format coverage across all 10 cpal
SampleFormatvariants - Good unit tests for sample conversion logic
- Well-documented Windows limitations in README (no AEC, no sleep auto-stop)
- Proper CI gating and rollback logic on partial session start failure
🧪 Browser testing: Skipped — PR only modifies Rust/Tauri desktop backend and CI config, no frontend UI impact.
| @@ -7,10 +7,10 @@ | |||
| //! by `source`. whisper.cpp has no such limit: we run one whisper context with | |||
There was a problem hiding this comment.
The system stream is hardcoded to 48000.0 but on Windows the default output device often runs at 44.1 kHz. RawSystemCapture.sample_rate is tagged #[allow(dead_code)] confirming this value is never propagated. Add sys_stream.set_src_rate(sys_cap.sample_rate()) after the start_raw_system_capture call succeeds, mirroring the mic path on line 533.

Summary
Extends the Clips desktop app's local Whisper meeting transcription from macOS-only to also support Windows, using
cpalfor WASAPI microphone and loopback system audio capture.Problem
The meeting transcription feature (mic + system audio capture → local Whisper inference) was gated entirely behind
#[cfg(target_os = "macos")], leaving Windows users with no meeting notes support despite the rest of the pipeline (detection, notifications, transcript rendering) already being cross-platform.Solution
A new
capturemodule provides a platform-dispatched abstraction over audio capture. macOS re-exports the existing proven backends (AVAudioEngine + ScreenCaptureKit). Windows usescpalfor both the microphone (default WASAPI input) and system audio (WASAPI loopback on the default output device). The Whisper engine (whisper_speech.rs) is updated to use this shared interface, making it platform-agnostic.Key Changes
src/capture/mod.rs— new platform-dispatch module exposingstart_raw_mic_capture,start_raw_system_capture,RawMicCapture, andRawSystemCapturefor both macOS and Windows.src/capture/macos.rs— thin re-exports of the existingnative_speechandsystem_audiomacOS backends under the shared capture contract.src/capture/windows.rs— new Windows backend usingcpalfor WASAPI mic input and WASAPI loopback system audio; includes channel-downmix helpers and unit tests for mono/stereo/multi-channel conversion.whisper_speech.rs— renamed innermacosmodule toengine, switched all#[cfg(target_os = "macos")]guards to#[cfg(any(target_os = "macos", target_os = "windows"))], and updated capture imports to usecrate::capture.system_audio.rs— added Windows branch tosystem_audio_version_statusandsystem_audio_request_permission(WASAPI loopback needs no permission prompt).Cargo.toml— addedcpal = "0.15"as a Windows-only dependency; movedwhisper-rsfrom the macOS-only block to acfg(any(macos, windows))target block.Set up libclang for whisper-rs bindgen (Windows)step to both the build-check and release workflows, pointingLIBCLANG_PATHat the LLVM install (with a Chocolatey fallback) sowhisper-rsbindgen compiles on Windows runners.README.md— documented the macOS + Windows meeting transcription architecture and known Windows v1 limitations (no AEC, no sleep/call-ended auto-stop).To clone this PR locally use the Github CLI with command
gh pr checkout 1110You can tag me at @BuilderIO for anything you want me to fix or change