[WIP] Add Windows support for meeting audio transcription by shomix · Pull Request #1110 · BuilderIO/agent-native

shomix · 2026-06-09T14:49:39Z

Summary

Extends the Clips desktop app's local Whisper meeting transcription from macOS-only to also support Windows, using cpal for WASAPI microphone and loopback system audio capture.

Problem

The meeting transcription feature (mic + system audio capture → local Whisper inference) was gated entirely behind #[cfg(target_os = "macos")], leaving Windows users with no meeting notes support despite the rest of the pipeline (detection, notifications, transcript rendering) already being cross-platform.

Solution

A new capture module provides a platform-dispatched abstraction over audio capture. macOS re-exports the existing proven backends (AVAudioEngine + ScreenCaptureKit). Windows uses cpal for both the microphone (default WASAPI input) and system audio (WASAPI loopback on the default output device). The Whisper engine (whisper_speech.rs) is updated to use this shared interface, making it platform-agnostic.

Key Changes

src/capture/mod.rs — new platform-dispatch module exposing start_raw_mic_capture, start_raw_system_capture, RawMicCapture, and RawSystemCapture for both macOS and Windows.
src/capture/macos.rs — thin re-exports of the existing native_speech and system_audio macOS backends under the shared capture contract.
src/capture/windows.rs — new Windows backend using cpal for WASAPI mic input and WASAPI loopback system audio; includes channel-downmix helpers and unit tests for mono/stereo/multi-channel conversion.
whisper_speech.rs — renamed inner macos module to engine, switched all #[cfg(target_os = "macos")] guards to #[cfg(any(target_os = "macos", target_os = "windows"))], and updated capture imports to use crate::capture.
system_audio.rs — added Windows branch to system_audio_version_status and system_audio_request_permission (WASAPI loopback needs no permission prompt).
Cargo.toml — added cpal = "0.15" as a Windows-only dependency; moved whisper-rs from the macOS-only block to a cfg(any(macos, windows)) target block.
CI workflows — added a Set up libclang for whisper-rs bindgen (Windows) step to both the build-check and release workflows, pointing LIBCLANG_PATH at the LLVM install (with a Chocolatey fallback) so whisper-rs bindgen compiles on Windows runners.
README.md — documented the macOS + Windows meeting transcription architecture and known Windows v1 limitations (no AEC, no sleep/call-ended auto-stop).

To clone this PR locally use the Github CLI with command gh pr checkout 1110

You can tag me at @BuilderIO for anything you want me to fix or change

github-actions · 2026-06-09T14:58:44Z

Here's a visual recap of what changed:

Open the full interactive recap

builder-io-integration

PR #1110 — Windows Meeting Transcription Support (Clips Desktop)

This PR extends the Clips desktop app's local Whisper meeting transcription from macOS-only to also support Windows, using cpal for WASAPI microphone and loopback system audio capture. The approach introduces a clean capture/ dispatch module that lets whisper_speech.rs stay platform-agnostic, with macOS re-exporting existing AVAudioEngine/ScreenCaptureKit paths and the new windows.rs implementing WASAPI mic + loopback. CI is updated with libclang setup for whisper-rs bindgen on Windows, and Cargo.toml properly gates cpal and whisper-rs on target platforms.

Risk: Standard — New Rust backend with platform-specific FFI safety implications.

🔴 Blocking Issues

1. System audio hardcodes 48 kHz — missing set_src_rate for Windows loopback
The mic path correctly calls mic_stream.set_src_rate(mic_cap.sample_rate()) after capture starts (line 533 of whisper_speech.rs). The system audio path uses 48000.0 with a comment that says "SCK delivers 48 kHz" — a macOS-only assumption. On Windows, many output devices run at 44.1 kHz. RawSystemCapture.sample_rate is even tagged #[allow(dead_code)], confirming it is never consumed. Result: Whisper resamples Windows system audio with the wrong source rate, degrading or breaking system-audio transcription on 44.1 kHz machines.
Fix: expose sample_rate() on RawSystemCapture and call sys_stream.set_src_rate(sys_cap.sample_rate()) after starting the loopback capture, mirroring the mic path.

2. unsafe impl Send for cpal::Stream on Windows — COM teardown risk
cpal 0.15 explicitly marks its Windows Stream as !Send due to COM thread affinity. The safety comment argues that only moving and dropping across threads is done, but COM-backed objects must be released on their creation thread — dropping from a different thread is exactly what !Send prevents. These handles are moved into a global Mutex<Option<Session>> and dropped from whichever Tauri command thread calls stop(), creating potential UB or crash during teardown.
Fix: keep the stream on a dedicated capture thread and signal teardown via a oneshot channel, or confirm that WASAPI COM objects are free-threaded (apartment model documentation required).

✅ Strengths

Clean platform dispatch via capture/mod.rs — zero changes needed in whisper_speech.rs for new platforms
Complete sample format coverage across all 10 cpal SampleFormat variants
Good unit tests for sample conversion logic
Well-documented Windows limitations in README (no AEC, no sleep auto-stop)
Proper CI gating and rollback logic on partial session start failure

🧪 Browser testing: Skipped — PR only modifies Rust/Tauri desktop backend and CI config, no frontend UI impact.

builder-io-integration · 2026-06-10T12:33:28Z

@@ -7,10 +7,10 @@
 //! by `source`. whisper.cpp has no such limit: we run one whisper context with


The system stream is hardcoded to 48000.0 but on Windows the default output device often runs at 44.1 kHz. RawSystemCapture.sample_rate is tagged #[allow(dead_code)] confirming this value is never propagated. Add sys_stream.set_src_rate(sys_cap.sample_rate()) after the start_raw_system_capture call succeeds, mirroring the mic path on line 533.

Add meeting transcription support for Windows desktop

5fbbe3a

shomix changed the title ~~Update from the Builder.io agent~~ [WIP] Update from the Builder.io agent Jun 9, 2026

builder-io-integration Bot changed the title ~~[WIP] Update from the Builder.io agent~~ Add Windows support for meeting audio transcription Jun 9, 2026

builder-io-integration Bot added the builder.io label Jun 9, 2026

shomix changed the title ~~Add Windows support for meeting audio transcription~~ [WIP] Add Windows support for meeting audio transcription Jun 9, 2026

This comment has been minimized.

Sign in to view

steve8708 approved these changes Jun 9, 2026

View reviewed changes

builder-io-integration Bot requested changes Jun 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Add Windows support for meeting audio transcription#1110

[WIP] Add Windows support for meeting audio transcription#1110
shomix wants to merge 1 commit into
mainfrom
ai_main_4b1a726b10bb48d7a32b

shomix commented Jun 9, 2026 •

edited by builder-io-integration Bot

Loading

Uh oh!

github-actions Bot commented Jun 9, 2026 •

edited by steve8708

Loading

Uh oh!

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

builder-io-integration Bot left a comment

Uh oh!

builder-io-integration Bot Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		@@ -7,10 +7,10 @@
		//! by `source`. whisper.cpp has no such limit: we run one whisper context with

Conversation

shomix commented Jun 9, 2026 • edited by builder-io-integration Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Key Changes

Uh oh!

github-actions Bot commented Jun 9, 2026 • edited by steve8708 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Here's a visual recap of what changed:

Uh oh!

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

builder-io-integration Bot left a comment

Choose a reason for hiding this comment

PR #1110 — Windows Meeting Transcription Support (Clips Desktop)

🔴 Blocking Issues

✅ Strengths

Uh oh!

builder-io-integration Bot Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

shomix commented Jun 9, 2026 •

edited by builder-io-integration Bot

Loading

github-actions Bot commented Jun 9, 2026 •

edited by steve8708

Loading