Prefer merge-base baselines for Git-aware daemon startup#76
Conversation
This adds a Git-aware adapter layer to the daemon. When a watched root is inside a Git worktree and native watching is available, the daemon upgrades to and uses Git changed sets to decide what to reconcile, while still falling back to filesystem scanning when Git lookup fails. Constraint: Keep daemon_tick as the single reconciliation path and keep Git awareness advisory instead of mandatory Rejected: A separate public SCM tool | unnecessary surface area because daemon_status already reports backend state and errors Rejected: Replacing native watch wakeups with Git polling alone | loses the benefit of low-latency filesystem wakeups Confidence: high Scope-risk: moderate Reversibility: clean Directive: Keep SCM semantics in the adapter layer so future Watchman/Sapling integration can plug in without rewriting daemon core logic Tested: cargo fmt --check; cargo test -p m1nd-mcp daemon_start_detects_git_root_and_head -- --nocapture; cargo test -p m1nd-mcp daemon_tick_uses_git_changed_set_when_available -- --nocapture; MCP smoke for git_native_fs success Not-tested: Non-Git SCMs and long-running Git root changes during daemon lifetime
The daemon now detects in-progress Git operations such as merge, rebase, cherry-pick, and index-lock churn and defers Git-aware reconciliation instead of reacting mid-operation. This keeps the daemon from treating transient working-tree churn as meaningful drift while preserving the existing tick/status contract. Constraint: Preserve Git-aware reconciliation while avoiding false work during unstable repository transitions Rejected: Disabling Git-aware mode entirely during SCM operations | loses useful backend state and observability just when the daemon should explain itself Rejected: Forcing raw filesystem fallback during active merge/rebase churn | still risks noisy, misleading reconciliation during unstable states Confidence: high Scope-risk: moderate Reversibility: clean Directive: Keep Git operation detection advisory and status-visible so future Watchman/Sapling adapters can reuse the same defer contract Tested: cargo fmt --check; cargo test -p m1nd-mcp daemon_start_detects_git_root_and_head -- --nocapture; cargo test -p m1nd-mcp daemon_tick_uses_git_changed_set_when_available -- --nocapture; cargo test -p m1nd-mcp daemon_tick_defers_when_git_operation_is_in_progress -- --nocapture; MCP smoke for deferred merge state Not-tested: Long-running rebases/checkouts under concurrent watcher traffic
The Git-aware daemon adapter now prefers a merge-base baseline against the configured upstream when available, falling back to HEAD otherwise. This makes the initial changed-set cursor upstream-aware without changing the daemon's public MCP surface. Constraint: Improve SCM-aware changed-set quality without introducing a full Watchman protocol or altering daemon_tick authority Rejected: Always baseline from HEAD | loses useful repo semantics for feature branches tracking an upstream branch Rejected: Introducing a separate baseline-selection tool | unnecessary public surface for adapter-internal policy Confidence: high Scope-risk: narrow Reversibility: clean Directive: Keep baseline selection adapter-local until richer SCM backends share the same contract Tested: cargo fmt --check; cargo test -p m1nd-mcp daemon_start_detects_git_root_and_head -- --nocapture; cargo test -p m1nd-mcp daemon_start_prefers_merge_base_when_upstream_exists -- --nocapture; MCP smoke for merge_base startup state Not-tested: Non-Git SCMs and dynamic upstream changes during long-running daemon sessions
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4f88a548e3
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| ) -> Result<Vec<PathBuf>, String> { | ||
| let mut changed = Vec::new(); | ||
| let diff_args: Vec<&str> = if let Some(reference) = since_ref { | ||
| vec!["diff", "--name-only", reference, "--"] |
There was a problem hiding this comment.
Include untracked files in Git backend change detection
When since_ref is set, git_changed_absolute_paths uses git diff --name-only <ref> --, which does not report untracked files. I verified CLI semantics locally: git diff -h documents the one-commit form (git diff [<options>] [<commit>] [--] [<path>...]), and in a repro repo git diff --name-only HEAD -- returned nothing while git status --porcelain showed ?? new.txt. In this code path, new files in a normal (already-committed) repo never enter changed_entries, so they are not ingested until manually added to Git.
Useful? React with 👍 / 👎.
| .find(|entry| entry.file_path == path_str) | ||
| .cloned() | ||
| { | ||
| changed_entries.push(entry); |
There was a problem hiding this comment.
Avoid reingesting unchanged dirty files on every tick
The Git backend pushes every path returned by git diff directly into changed_entries without comparing against tracked_files. Because git_since_ref is then advanced to HEAD, any file left modified/staged (but not committed) keeps appearing in git diff HEAD on subsequent ticks, so the daemon repeatedly reingests the same file and re-emits proactive alerts. This can create continuous churn and alert noise while a developer is editing but has not committed yet.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Pull request overview
Updates daemon startup and tick behavior to use a Git-aware baseline (prefer merge-base vs upstream when available) and to surface Git-related runtime state, improving changed-set detection for feature branches.
Changes:
- Persist and expose additional Git-related daemon runtime fields (git root, baseline ref/kind, last scan stats, operation-in-progress state).
- On daemon start, detect a Git root from watch paths and select an initial baseline (merge-base when upstream exists; otherwise HEAD).
- On daemon ticks, optionally use
git diff/statusto drive the changed file set and defer ticks during active Git operations.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| m1nd-mcp/src/session.rs | Extends persisted DaemonRuntimeState with Git baseline/scan fields. |
| m1nd-mcp/src/server.rs | Adjusts reported watcher backend string when Git mode is active. |
| m1nd-mcp/src/daemon_handlers.rs | Adds Git root/baseline detection, Git-based changed-set logic, deferral during Git operations, and tests. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| pub git_root: Option<String>, | ||
| pub git_baseline_ref: Option<String>, | ||
| pub git_baseline_kind: Option<String>, | ||
| pub git_since_ref: Option<String>, | ||
| pub last_git_scan_ms: Option<u64>, | ||
| pub last_git_changed_files: usize, | ||
| pub git_backend_error: Option<String>, | ||
| pub git_operation_in_progress: bool, | ||
| pub git_operation_kind: Option<String>, | ||
| pub deferred_ticks: u64, |
There was a problem hiding this comment.
DaemonRuntimeState is persisted to daemon_state.json; adding new non-optional fields without #[serde(default)] (on the struct or per-field) will cause deserializing older state files to fail, and load_daemon_state will then silently fall back to Default, effectively wiping prior daemon state on upgrade. Consider adding #[serde(default)] to DaemonRuntimeState (or each newly added field) to keep backwards compatibility while still allowing new fields to be introduced safely.
| let output = Command::new("git") | ||
| .args(["rev-parse", "--show-toplevel"]) | ||
| .current_dir(&root_hint) | ||
| .output() | ||
| .ok()?; |
There was a problem hiding this comment.
git_root_for_watch_paths uses .output().ok()? inside the loop; if spawning git fails for the first watch path (e.g., git not on PATH / permission issue), the ? returns early and the function never checks the remaining watch paths. Handle the error per-path (e.g., if let Ok(output) = ...) and continue so other candidate roots can still be detected.
| let output = Command::new("git") | |
| .args(["rev-parse", "--show-toplevel"]) | |
| .current_dir(&root_hint) | |
| .output() | |
| .ok()?; | |
| let output = if let Ok(output) = Command::new("git") | |
| .args(["rev-parse", "--show-toplevel"]) | |
| .current_dir(&root_hint) | |
| .output() | |
| { | |
| output | |
| } else { | |
| continue; | |
| }; |
| state.daemon_state.git_root = git_root_for_watch_paths(&state.daemon_state.watch_paths) | ||
| .map(|root| root.to_string_lossy().to_string()); | ||
| let (git_baseline_ref, git_baseline_kind, _git_upstream_ref) = state | ||
| .daemon_state | ||
| .git_root | ||
| .as_deref() | ||
| .map(|root| git_initial_baseline(Path::new(root))) | ||
| .unwrap_or((None, None, None)); | ||
| state.daemon_state.git_baseline_ref = git_baseline_ref.clone(); | ||
| state.daemon_state.git_baseline_kind = git_baseline_kind; | ||
| state.daemon_state.git_since_ref = git_baseline_ref; | ||
| state.daemon_state.last_git_scan_ms = None; | ||
| state.daemon_state.last_git_changed_files = 0; | ||
| state.daemon_state.git_backend_error = None; | ||
| state.daemon_state.git_operation_in_progress = false; | ||
| state.daemon_state.git_operation_kind = None; | ||
| state.daemon_state.deferred_ticks = 0; | ||
| if state.daemon_state.git_root.is_some() { | ||
| state.daemon_state.watch_backend = "git_native_fs".into(); | ||
| } |
There was a problem hiding this comment.
Switching to watch_backend == "git_native_fs" makes daemon_tick rely on a single detected git_root for change detection. If watch_paths contains multiple roots (or paths outside that git repo), changes in non-git roots will be ignored because only git diff/status results under git_root are considered. Consider only enabling the git changed-set path when all watch_paths are under the same git_root (or compute per-root git baselines), otherwise fall back to the existing inventory comparison for correctness.
| for path in paths { | ||
| let path_str = path.to_string_lossy().to_string(); | ||
| if let Some(entry) = live_inventory | ||
| .values() | ||
| .find(|entry| entry.file_path == path_str) | ||
| .cloned() | ||
| { | ||
| changed_entries.push(entry); |
There was a problem hiding this comment.
In the git-native path, each changed file path is matched by scanning live_inventory.values().find(...), which is O(changed_files * inventory_size) and can get expensive for large repos. Consider building a temporary HashMap<file_path, FileInventoryEntry> (or HashMap<&str, &FileInventoryEntry>) once per tick so path lookups are O(1).
| for path in paths { | |
| let path_str = path.to_string_lossy().to_string(); | |
| if let Some(entry) = live_inventory | |
| .values() | |
| .find(|entry| entry.file_path == path_str) | |
| .cloned() | |
| { | |
| changed_entries.push(entry); | |
| let live_inventory_by_path: HashMap<&str, &FileInventoryEntry> = live_inventory | |
| .values() | |
| .map(|entry| (entry.file_path.as_str(), entry)) | |
| .collect(); | |
| for path in paths { | |
| let path_str = path.to_string_lossy().to_string(); | |
| if let Some(entry) = live_inventory_by_path.get(path_str.as_str()) { | |
| changed_entries.push((*entry).clone()); |
Summary
HEADwhen no upstream or merge-base is availableValidation
git_baseline_kind = merge_baseWhy this matters
This improves the daemon's Git-aware changed-set quality for feature branches. The daemon now starts from a more meaningful baseline than
HEADwhen upstream context exists, which is the next step toward richer SCM-aware semantics.