Skip to content

[WIP] Add Windows support for meeting audio transcription#1110

Open
shomix wants to merge 1 commit into
mainfrom
ai_main_4b1a726b10bb48d7a32b
Open

[WIP] Add Windows support for meeting audio transcription#1110
shomix wants to merge 1 commit into
mainfrom
ai_main_4b1a726b10bb48d7a32b

Conversation

@shomix

@shomix shomix commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Summary

Extends the Clips desktop app's local Whisper meeting transcription from macOS-only to also support Windows, using cpal for WASAPI microphone and loopback system audio capture.

Problem

The meeting transcription feature (mic + system audio capture → local Whisper inference) was gated entirely behind #[cfg(target_os = "macos")], leaving Windows users with no meeting notes support despite the rest of the pipeline (detection, notifications, transcript rendering) already being cross-platform.

Solution

A new capture module provides a platform-dispatched abstraction over audio capture. macOS re-exports the existing proven backends (AVAudioEngine + ScreenCaptureKit). Windows uses cpal for both the microphone (default WASAPI input) and system audio (WASAPI loopback on the default output device). The Whisper engine (whisper_speech.rs) is updated to use this shared interface, making it platform-agnostic.

Key Changes

  • src/capture/mod.rs — new platform-dispatch module exposing start_raw_mic_capture, start_raw_system_capture, RawMicCapture, and RawSystemCapture for both macOS and Windows.
  • src/capture/macos.rs — thin re-exports of the existing native_speech and system_audio macOS backends under the shared capture contract.
  • src/capture/windows.rs — new Windows backend using cpal for WASAPI mic input and WASAPI loopback system audio; includes channel-downmix helpers and unit tests for mono/stereo/multi-channel conversion.
  • whisper_speech.rs — renamed inner macos module to engine, switched all #[cfg(target_os = "macos")] guards to #[cfg(any(target_os = "macos", target_os = "windows"))], and updated capture imports to use crate::capture.
  • system_audio.rs — added Windows branch to system_audio_version_status and system_audio_request_permission (WASAPI loopback needs no permission prompt).
  • Cargo.toml — added cpal = "0.15" as a Windows-only dependency; moved whisper-rs from the macOS-only block to a cfg(any(macos, windows)) target block.
  • CI workflows — added a Set up libclang for whisper-rs bindgen (Windows) step to both the build-check and release workflows, pointing LIBCLANG_PATH at the LLVM install (with a Chocolatey fallback) so whisper-rs bindgen compiles on Windows runners.
  • README.md — documented the macOS + Windows meeting transcription architecture and known Windows v1 limitations (no AEC, no sleep/call-ended auto-stop).

Edit in Builder  Preview


To clone this PR locally use the Github CLI with command gh pr checkout 1110

You can tag me at @BuilderIO for anything you want me to fix or change

@shomix shomix changed the title Update from the Builder.io agent [WIP] Update from the Builder.io agent Jun 9, 2026
@builder-io-integration builder-io-integration Bot changed the title [WIP] Update from the Builder.io agent Add Windows support for meeting audio transcription Jun 9, 2026
@shomix shomix changed the title Add Windows support for meeting audio transcription [WIP] Add Windows support for meeting audio transcription Jun 9, 2026
@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Here's a visual recap of what changed:

Visual recap

Open the full interactive recap

@netlify

This comment has been minimized.

@netlify

This comment has been minimized.

@netlify

This comment has been minimized.

@netlify

This comment has been minimized.

@netlify

This comment has been minimized.

@netlify

This comment has been minimized.

@netlify

This comment has been minimized.

@netlify

This comment has been minimized.

@builder-io-integration builder-io-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR #1110 — Windows Meeting Transcription Support (Clips Desktop)

This PR extends the Clips desktop app's local Whisper meeting transcription from macOS-only to also support Windows, using cpal for WASAPI microphone and loopback system audio capture. The approach introduces a clean capture/ dispatch module that lets whisper_speech.rs stay platform-agnostic, with macOS re-exporting existing AVAudioEngine/ScreenCaptureKit paths and the new windows.rs implementing WASAPI mic + loopback. CI is updated with libclang setup for whisper-rs bindgen on Windows, and Cargo.toml properly gates cpal and whisper-rs on target platforms.

Risk: Standard — New Rust backend with platform-specific FFI safety implications.

🔴 Blocking Issues

1. System audio hardcodes 48 kHz — missing set_src_rate for Windows loopback
The mic path correctly calls mic_stream.set_src_rate(mic_cap.sample_rate()) after capture starts (line 533 of whisper_speech.rs). The system audio path uses 48000.0 with a comment that says "SCK delivers 48 kHz" — a macOS-only assumption. On Windows, many output devices run at 44.1 kHz. RawSystemCapture.sample_rate is even tagged #[allow(dead_code)], confirming it is never consumed. Result: Whisper resamples Windows system audio with the wrong source rate, degrading or breaking system-audio transcription on 44.1 kHz machines.
Fix: expose sample_rate() on RawSystemCapture and call sys_stream.set_src_rate(sys_cap.sample_rate()) after starting the loopback capture, mirroring the mic path.

2. unsafe impl Send for cpal::Stream on Windows — COM teardown risk
cpal 0.15 explicitly marks its Windows Stream as !Send due to COM thread affinity. The safety comment argues that only moving and dropping across threads is done, but COM-backed objects must be released on their creation thread — dropping from a different thread is exactly what !Send prevents. These handles are moved into a global Mutex<Option<Session>> and dropped from whichever Tauri command thread calls stop(), creating potential UB or crash during teardown.
Fix: keep the stream on a dedicated capture thread and signal teardown via a oneshot channel, or confirm that WASAPI COM objects are free-threaded (apartment model documentation required).

✅ Strengths

  • Clean platform dispatch via capture/mod.rs — zero changes needed in whisper_speech.rs for new platforms
  • Complete sample format coverage across all 10 cpal SampleFormat variants
  • Good unit tests for sample conversion logic
  • Well-documented Windows limitations in README (no AEC, no sleep auto-stop)
  • Proper CI gating and rollback logic on partial session start failure

🧪 Browser testing: Skipped — PR only modifies Rust/Tauri desktop backend and CI config, no frontend UI impact.

@@ -7,10 +7,10 @@
//! by `source`. whisper.cpp has no such limit: we run one whisper context with

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The system stream is hardcoded to 48000.0 but on Windows the default output device often runs at 44.1 kHz. RawSystemCapture.sample_rate is tagged #[allow(dead_code)] confirming this value is never propagated. Add sys_stream.set_src_rate(sys_cap.sample_rate()) after the start_raw_system_capture call succeeds, mirroring the mic path on line 533.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants