Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions .github/workflows/clips-desktop-build-check.yml
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,20 @@ jobs:
- name: Install dependencies
run: pnpm install --frozen-lockfile

# whisper-rs compiles whisper.cpp (CMake + C/C++) and runs bindgen, which
# needs libclang. windows-latest ships LLVM and CMake preinstalled; point
# bindgen at libclang and fall back to a Chocolatey install if missing.
- name: Set up libclang for whisper-rs bindgen (Windows)
if: runner.os == 'Windows'
shell: bash
run: |
if [ -f "/c/Program Files/LLVM/bin/libclang.dll" ]; then
echo "LIBCLANG_PATH=C:\\Program Files\\LLVM\\bin" >> "$GITHUB_ENV"
else
choco install llvm --no-progress -y
echo "LIBCLANG_PATH=C:\\Program Files\\LLVM\\bin" >> "$GITHUB_ENV"
fi

- name: Build frontend
working-directory: ${{ env.TAURI_APP_PATH }}
run: pnpm run build
Expand Down
15 changes: 15 additions & 0 deletions .github/workflows/clips-desktop-release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,21 @@ jobs:
V=$(node -p "require('./${{ env.TAURI_APP_PATH }}/package.json').version")
echo "version=$V" >> "$GITHUB_OUTPUT"

# whisper-rs compiles whisper.cpp (CMake + C/C++) and runs bindgen, which
# needs libclang. windows-latest ships LLVM and CMake preinstalled; point
# bindgen at libclang and fall back to a Chocolatey install if the image
# ever drops it. macOS already has clang via Xcode.
- name: Set up libclang for whisper-rs bindgen (Windows)
if: runner.os == 'Windows'
shell: bash
run: |
if [ -f "/c/Program Files/LLVM/bin/libclang.dll" ]; then
echo "LIBCLANG_PATH=C:\\Program Files\\LLVM\\bin" >> "$GITHUB_ENV"
else
choco install llvm --no-progress -y
echo "LIBCLANG_PATH=C:\\Program Files\\LLVM\\bin" >> "$GITHUB_ENV"
fi

- name: Build and release (tauri-action)
uses: tauri-apps/tauri-action@84b9d35b5fc46c1e45415bdb6144030364f7ebc5 # v0
env:
Expand Down
31 changes: 31 additions & 0 deletions templates/clips/desktop/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,37 @@ A small Tauri 2.x menu-bar app that lives in the macOS menu bar / Windows system
- **Recent** — your three most recent recordings
- Quick links to **Open library** and **Settings**

## Meeting transcription (macOS + Windows)

When the calendar watcher detects a meeting, the popover surfaces a "Start
notes" notification. Accepting it captures **both** the microphone and the
system (speaker) audio, transcribes them locally with Whisper (whisper.cpp, no
cloud round-trip), and streams a live mic/system-labeled transcript.

Audio capture is platform-dispatched in `src-tauri/src/capture/`:

- **macOS** — AVAudioEngine with VoiceProcessingIO acoustic echo cancellation
(mic) and ScreenCaptureKit (system audio).
- **Windows** — [`cpal`](https://crates.io/crates/cpal): the default WASAPI
input device (mic) and WASAPI **loopback** of the default output device
(system audio). Loopback needs no OS permission prompt.

The Whisper engine (`src-tauri/src/whisper_speech.rs`) and the meeting
detection, notification, and transcript-rendering flows are otherwise identical
across platforms.

### Known Windows limitations (v1)

- **No mic echo cancellation.** macOS applies hardware AEC so the mic stream
doesn't echo the system audio. `cpal` delivers the raw mic, so expect some
speaker bleed into the mic transcript when not on headphones. Mic and system
are transcribed and labeled separately, so this is cosmetic.
- **No sleep / call-ended auto-stop.** The macOS sleep and call-ended watchers
are no-ops on Windows; the silence-based auto-stop still works. Stop notes
manually or via the silence timeout.
- macOS-only features stay macOS-only: screen/window video recording, EventKit
local calendar, and Accessibility-based personal-vocabulary auto-learn.

## Develop

First install the desktop workspace's own deps (this folder is outside the monorepo's `templates/*` glob because it ships its own Tauri/Vite toolchain):
Expand Down
16 changes: 15 additions & 1 deletion templates/clips/desktop/src-tauri/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -112,13 +112,27 @@ screencapturekit = { version = "2.0.0", features = ["macos_15_0"] }
# AudioBufferList type so we can hand SCK's CMSampleBuffer audio bytes to
# AVAudioPCMBuffer for the speech recognizer.
objc2-core-audio-types = "0.3"
whisper-rs = { version = "0.16.0" }
# NOTE: We hand-roll the few EventKit selectors we need via raw `objc2`
# msg_send! in `eventkit.rs` rather than depending on the `objc2-event-kit`
# crate. Its API surface drifts between minor 0.3.x releases (and not
# every version exposes `EKEventStore::eventsMatchingPredicate`); the
# slice we need is small enough that hand-rolling is more durable.

# Windows-only: cross-platform audio capture. Covers both the microphone
# (default WASAPI input) and system audio (WASAPI loopback — an input stream on
# the default output device). One crate for both meeting capture streams.
# NOTE: pin verified against crates.io before release; network was unavailable
# in the dev environment where this was added.
[target."cfg(target_os = \"windows\")".dependencies]
cpal = "0.15"

# Local meeting transcription via whisper.cpp. Cross-platform (bundles its own
# native C/C++ build via CMake), needed only on the two platforms that run the
# Whisper meeting engine. Scoped here rather than the macOS-only block (it is
# not macOS-specific) but kept off Linux, which has no meeting capture.
[target."cfg(any(target_os = \"macos\", target_os = \"windows\"))".dependencies]
whisper-rs = { version = "0.16.0" }

[profile.release]
panic = "abort"
codegen-units = 1
Expand Down
10 changes: 10 additions & 0 deletions templates/clips/desktop/src-tauri/src/capture/macos.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
//! macOS capture backend — thin re-exports of the existing native impls.
//!
//! The high-quality macOS path is left entirely untouched: the microphone runs
//! through `native_speech` (AVAudioEngine + VoiceProcessingIO AEC) and the
//! system audio through `system_audio` (ScreenCaptureKit). This file only
//! surfaces them under the platform-agnostic names the `capture` contract and
//! `whisper_speech.rs` expect.

pub(crate) use crate::native_speech::macos::{start_raw_mic_capture, RawMicCapture};
pub(crate) use crate::system_audio::macos::{start_raw_system_capture, RawSystemCapture};
39 changes: 39 additions & 0 deletions templates/clips/desktop/src-tauri/src/capture/mod.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
//! Platform-dispatched audio capture for meeting transcription.
//!
//! The local Whisper engine (`whisper_speech.rs`) needs exactly two capture
//! primitives — a microphone stream and a system-audio (loopback) stream —
//! each forwarding mono `f32` samples to a callback and exposing the hardware
//! sample rate plus a `stop()`. Everything else in the meeting pipeline
//! (detection, notifications, transcript rendering) is already cross-platform.
//!
//! This module owns that public contract and dispatches to the right backend
//! at compile time:
//!
//! - macOS → thin re-exports of the proven `native_speech` (AVAudioEngine +
//! VPIO AEC) mic path and `system_audio` (ScreenCaptureKit) loopback.
//! - Windows → `cpal` mic + WASAPI loopback (see `windows.rs`).
//!
//! Both backends expose the same names so `whisper_speech.rs` stays
//! platform-agnostic:
//!
//! ```ignore
//! start_raw_mic_capture(app, mic_device_id, mic_device_label, on_samples) -> RawMicCapture
//! start_raw_system_capture(app, on_samples) -> RawSystemCapture
//! ```
//!
//! Each handle type exposes `sample_rate() -> f64` and `stop()` so the session
//! teardown in `whisper_speech.rs` works unchanged across platforms.

#[cfg(target_os = "macos")]
mod macos;
#[cfg(target_os = "macos")]
pub(crate) use macos::{
start_raw_mic_capture, start_raw_system_capture, RawMicCapture, RawSystemCapture,
};

#[cfg(target_os = "windows")]
mod windows;
#[cfg(target_os = "windows")]
pub(crate) use windows::{
start_raw_mic_capture, start_raw_system_capture, RawMicCapture, RawSystemCapture,
};
Loading
Loading