Defer Git-aware daemon ticks during live repo operations#75
Conversation
This adds a Git-aware adapter layer to the daemon. When a watched root is inside a Git worktree and native watching is available, the daemon upgrades to and uses Git changed sets to decide what to reconcile, while still falling back to filesystem scanning when Git lookup fails. Constraint: Keep daemon_tick as the single reconciliation path and keep Git awareness advisory instead of mandatory Rejected: A separate public SCM tool | unnecessary surface area because daemon_status already reports backend state and errors Rejected: Replacing native watch wakeups with Git polling alone | loses the benefit of low-latency filesystem wakeups Confidence: high Scope-risk: moderate Reversibility: clean Directive: Keep SCM semantics in the adapter layer so future Watchman/Sapling integration can plug in without rewriting daemon core logic Tested: cargo fmt --check; cargo test -p m1nd-mcp daemon_start_detects_git_root_and_head -- --nocapture; cargo test -p m1nd-mcp daemon_tick_uses_git_changed_set_when_available -- --nocapture; MCP smoke for git_native_fs success Not-tested: Non-Git SCMs and long-running Git root changes during daemon lifetime
The daemon now detects in-progress Git operations such as merge, rebase, cherry-pick, and index-lock churn and defers Git-aware reconciliation instead of reacting mid-operation. This keeps the daemon from treating transient working-tree churn as meaningful drift while preserving the existing tick/status contract. Constraint: Preserve Git-aware reconciliation while avoiding false work during unstable repository transitions Rejected: Disabling Git-aware mode entirely during SCM operations | loses useful backend state and observability just when the daemon should explain itself Rejected: Forcing raw filesystem fallback during active merge/rebase churn | still risks noisy, misleading reconciliation during unstable states Confidence: high Scope-risk: moderate Reversibility: clean Directive: Keep Git operation detection advisory and status-visible so future Watchman/Sapling adapters can reuse the same defer contract Tested: cargo fmt --check; cargo test -p m1nd-mcp daemon_start_detects_git_root_and_head -- --nocapture; cargo test -p m1nd-mcp daemon_tick_uses_git_changed_set_when_available -- --nocapture; cargo test -p m1nd-mcp daemon_tick_defers_when_git_operation_is_in_progress -- --nocapture; MCP smoke for deferred merge state Not-tested: Long-running rebases/checkouts under concurrent watcher traffic
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fbe13e4b2d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| let diff_args: Vec<&str> = if let Some(reference) = since_ref { | ||
| vec!["diff", "--name-only", reference, "--"] | ||
| } else { |
There was a problem hiding this comment.
Detect untracked files in git-native change scans
When git_since_ref is set, this code switches to git diff --name-only <ref> --, but that command does not report untracked files. In a normal workflow (new file created but not yet added/committed), the daemon will see zero changed paths, so the new file is never reingested even though it is under a watched root. The previous inventory-vs-tracked comparison did catch this case, so this is a behavior regression for common edits.
Useful? React with 👍 / 👎.
| let value = String::from_utf8_lossy(&output.stdout).trim().to_string(); | ||
| if !value.is_empty() { | ||
| return Some(PathBuf::from(value)); | ||
| } |
There was a problem hiding this comment.
Preserve multi-root watch coverage with git backend
This helper returns the first Git root it finds and stores only that single root, but daemon ticks then derive changed_entries exclusively from that repo’s Git output. If watch_paths contains multiple roots (for example one Git repo plus another watched folder), modifications in paths outside the selected root are silently missed and never reingested. That is a functional regression from the prior full-inventory diff across all watch paths.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Pull request overview
This PR adds Git-aware reconciliation behavior to the daemon so it can detect in-progress Git operations (merge/rebase/cherry-pick, etc.) and defer “git-native” ticks while the repo is unstable, while also surfacing Git/defer state in daemon_status.
Changes:
- Extend
DaemonRuntimeStatewith Git/defer telemetry fields (root, ref, scan stats, operation kind, deferred tick count). - Add Git detection + “defer tick” logic to
handle_daemon_start/handle_daemon_tick, with new tests exercising Git-root detection, changed-file detection, and deferral during merge state. - Update watcher backend labeling to distinguish Git-enabled native watching (
git_native_fs).
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 8 comments.
| File | Description |
|---|---|
m1nd-mcp/src/session.rs |
Adds persisted daemon runtime fields for Git/defer telemetry. |
m1nd-mcp/src/server.rs |
Labels watch backend as git_native_fs when a Git root is known. |
m1nd-mcp/src/daemon_handlers.rs |
Implements Git-root detection, Git-changed-file listing, in-progress operation detection, tick deferral, and adds tests. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| pub git_root: Option<String>, | ||
| pub git_since_ref: Option<String>, | ||
| pub last_git_scan_ms: Option<u64>, | ||
| pub last_git_changed_files: usize, | ||
| pub git_backend_error: Option<String>, | ||
| pub git_operation_in_progress: bool, | ||
| pub git_operation_kind: Option<String>, | ||
| pub deferred_ticks: u64, |
There was a problem hiding this comment.
DaemonRuntimeState is loaded via serde_json::from_str(...).ok().unwrap_or_default(); adding new non-optional fields here without #[serde(default)] (field-level or struct-level) will make older daemon_state.json fail to deserialize and silently reset the entire daemon state to defaults on upgrade. Add serde defaults so missing fields don’t invalidate existing persisted state.
| self.state.daemon_state.watch_backend = | ||
| if self.state.daemon_state.git_root.is_some() { | ||
| "git_native_fs".into() | ||
| } else { | ||
| "native_fs".into() | ||
| }; |
There was a problem hiding this comment.
Introducing the new watch_backend value git_native_fs changes behavior in this file because other logic checks watch_backend == "native_fs" (e.g., daemon_wait_duration_ms uses a coarser interval and the timeout trigger selection treats non-native_fs as polling). As-is, Git-enabled repos will behave like polling for scheduling/trigger semantics even though a native watcher is running; consider keeping watch_backend as native_fs and exposing Git mode separately, or update all native_fs checks to include git_native_fs.
| self.state.daemon_state.watch_backend = | |
| if self.state.daemon_state.git_root.is_some() { | |
| "git_native_fs".into() | |
| } else { | |
| "native_fs".into() | |
| }; | |
| self.state.daemon_state.watch_backend = "native_fs".into(); |
| let output = Command::new("git") | ||
| .args(["rev-parse", "--show-toplevel"]) | ||
| .current_dir(&root_hint) | ||
| .output() | ||
| .ok()?; | ||
| if output.status.success() { | ||
| let value = String::from_utf8_lossy(&output.stdout).trim().to_string(); | ||
| if !value.is_empty() { | ||
| return Some(PathBuf::from(value)); |
There was a problem hiding this comment.
git_root_for_watch_paths aborts the entire search on the first git execution failure because of .output().ok()?; that means a single non-repo path (or a transient spawn failure) prevents checking remaining watch paths. Prefer continuing the loop on errors (e.g., if let Ok(output) = ... { ... }) so one bad path doesn’t suppress Git detection for other roots.
| let output = Command::new("git") | |
| .args(["rev-parse", "--show-toplevel"]) | |
| .current_dir(&root_hint) | |
| .output() | |
| .ok()?; | |
| if output.status.success() { | |
| let value = String::from_utf8_lossy(&output.stdout).trim().to_string(); | |
| if !value.is_empty() { | |
| return Some(PathBuf::from(value)); | |
| if let Ok(output) = Command::new("git") | |
| .args(["rev-parse", "--show-toplevel"]) | |
| .current_dir(&root_hint) | |
| .output() | |
| { | |
| if output.status.success() { | |
| let value = String::from_utf8_lossy(&output.stdout).trim().to_string(); | |
| if !value.is_empty() { | |
| return Some(PathBuf::from(value)); | |
| } |
| let diff_args: Vec<&str> = if let Some(reference) = since_ref { | ||
| vec!["diff", "--name-only", reference, "--"] | ||
| } else { | ||
| vec!["status", "--porcelain"] | ||
| }; | ||
| let output = Command::new("git") | ||
| .args(&diff_args) | ||
| .current_dir(root) | ||
| .output() | ||
| .map_err(|error| error.to_string())?; | ||
| if !output.status.success() { | ||
| return Err(String::from_utf8_lossy(&output.stderr).trim().to_string()); | ||
| } | ||
|
|
||
| let stdout = String::from_utf8_lossy(&output.stdout); | ||
| for raw_line in stdout.lines() { | ||
| let line = raw_line.trim(); | ||
| if line.is_empty() { | ||
| continue; | ||
| } | ||
| let rel = if since_ref.is_some() { | ||
| line.to_string() | ||
| } else { | ||
| line.get(3..).unwrap_or(line).trim().to_string() | ||
| }; | ||
| if rel.is_empty() { | ||
| continue; | ||
| } | ||
| changed.push(root.join(rel)); | ||
| } |
There was a problem hiding this comment.
git_changed_absolute_paths parses git status --porcelain by slicing line[3..], which is not robust (renames include old -> new, paths may be quoted, and spaces are significant). If you need porcelain parsing, prefer --porcelain=v1 -z and parse NUL-separated records, or avoid porcelain entirely and use git diff --name-only -z plus git ls-files --others -z --exclude-standard for untracked files.
| let diff_args: Vec<&str> = if let Some(reference) = since_ref { | ||
| vec!["diff", "--name-only", reference, "--"] | ||
| } else { | ||
| vec!["status", "--porcelain"] | ||
| }; |
There was a problem hiding this comment.
When since_ref is set (the common case after daemon_start), this uses git diff --name-only <ref> which does not include untracked new files. That means newly created files won’t be considered “changed” (and therefore won’t be ingested) until they’re added to Git, unlike the previous non-Git inventory diff behavior. Consider augmenting the changed set with untracked files (e.g., git ls-files --others --exclude-standard) or using git status --porcelain even when since_ref is present.
| for path in paths { | ||
| let path_str = path.to_string_lossy().to_string(); | ||
| if let Some(entry) = live_inventory | ||
| .values() | ||
| .find(|entry| entry.file_path == path_str) | ||
| .cloned() | ||
| { | ||
| changed_entries.push(entry); |
There was a problem hiding this comment.
This is O(changed_paths × live_inventory) because it scans live_inventory.values() for every path returned by Git. On large repos (or when Git returns many changed paths) this can become a noticeable tick-time cost. Consider building a HashMap from file_path to FileInventoryEntry once per tick (or return external_ids directly) to make lookups O(1).
| for path in paths { | |
| let path_str = path.to_string_lossy().to_string(); | |
| if let Some(entry) = live_inventory | |
| .values() | |
| .find(|entry| entry.file_path == path_str) | |
| .cloned() | |
| { | |
| changed_entries.push(entry); | |
| let live_inventory_by_path: HashMap<&str, &FileInventoryEntry> = | |
| live_inventory | |
| .values() | |
| .map(|entry| (entry.file_path.as_str(), entry)) | |
| .collect(); | |
| for path in paths { | |
| let path_str = path.to_string_lossy().to_string(); | |
| if let Some(entry) = live_inventory_by_path.get(path_str.as_str()) { | |
| changed_entries.push((*entry).clone()); |
| fn git_operation_in_progress(root: &Path) -> Option<String> { | ||
| let git_dir = root.join(".git"); | ||
| let checks = [ | ||
| ("rebase-merge", "rebase"), | ||
| ("rebase-apply", "rebase"), | ||
| ("MERGE_HEAD", "merge"), | ||
| ("CHERRY_PICK_HEAD", "cherry-pick"), | ||
| ("BISECT_LOG", "bisect"), | ||
| ("index.lock", "index-lock"), | ||
| ]; | ||
| for (relative, kind) in checks { | ||
| if git_dir.join(relative).exists() { | ||
| return Some(kind.to_string()); | ||
| } | ||
| } | ||
| None |
There was a problem hiding this comment.
git_operation_in_progress assumes .git is a directory under the repo root. That’s not true for linked worktrees/submodules where .git is a file pointing to the actual gitdir, so operation detection (merge/rebase/cherry-pick) will silently fail there. Consider resolving the git directory via git rev-parse --git-dir and checking for these sentinel files relative to that path.
| let (temp, mut state) = build_state(); | ||
| let repo = temp.path().join("repo"); | ||
| std::fs::create_dir_all(repo.join("src")).expect("repo src"); | ||
| std::fs::write(repo.join("src/core.py"), "def core():\n return 1\n").expect("write"); | ||
|
|
||
| Command::new("git") | ||
| .args(["init"]) | ||
| .current_dir(&repo) | ||
| .output() | ||
| .expect("git init"); | ||
| Command::new("git") | ||
| .args(["config", "user.email", "test@example.com"]) | ||
| .current_dir(&repo) | ||
| .output() | ||
| .expect("git email"); | ||
| Command::new("git") | ||
| .args(["config", "user.name", "Test"]) | ||
| .current_dir(&repo) | ||
| .output() | ||
| .expect("git name"); | ||
| Command::new("git") | ||
| .args(["add", "."]) | ||
| .current_dir(&repo) | ||
| .output() | ||
| .expect("git add"); | ||
| Command::new("git") | ||
| .args(["commit", "-m", "init"]) | ||
| .current_dir(&repo) | ||
| .output() | ||
| .expect("git commit"); | ||
|
|
There was a problem hiding this comment.
These tests invoke git but don’t assert that each command succeeded (output.status.success()), so failures can cascade into confusing later assertions. Consider asserting success (and printing stderr on failure), and optionally skipping the tests when git is not available in PATH to avoid CI/environment-specific failures.
| let (temp, mut state) = build_state(); | |
| let repo = temp.path().join("repo"); | |
| std::fs::create_dir_all(repo.join("src")).expect("repo src"); | |
| std::fs::write(repo.join("src/core.py"), "def core():\n return 1\n").expect("write"); | |
| Command::new("git") | |
| .args(["init"]) | |
| .current_dir(&repo) | |
| .output() | |
| .expect("git init"); | |
| Command::new("git") | |
| .args(["config", "user.email", "test@example.com"]) | |
| .current_dir(&repo) | |
| .output() | |
| .expect("git email"); | |
| Command::new("git") | |
| .args(["config", "user.name", "Test"]) | |
| .current_dir(&repo) | |
| .output() | |
| .expect("git name"); | |
| Command::new("git") | |
| .args(["add", "."]) | |
| .current_dir(&repo) | |
| .output() | |
| .expect("git add"); | |
| Command::new("git") | |
| .args(["commit", "-m", "init"]) | |
| .current_dir(&repo) | |
| .output() | |
| .expect("git commit"); | |
| let git_available = Command::new("git") | |
| .args(["--version"]) | |
| .output() | |
| .map(|output| output.status.success()) | |
| .unwrap_or(false); | |
| if !git_available { | |
| return; | |
| } | |
| let (temp, mut state) = build_state(); | |
| let repo = temp.path().join("repo"); | |
| std::fs::create_dir_all(repo.join("src")).expect("repo src"); | |
| std::fs::write(repo.join("src/core.py"), "def core():\n return 1\n").expect("write"); | |
| let run_git = |args: &[&str], label: &str| { | |
| let output = Command::new("git") | |
| .args(args) | |
| .current_dir(&repo) | |
| .output() | |
| .unwrap_or_else(|err| panic!("{label} failed to start: {err}")); | |
| assert!( | |
| output.status.success(), | |
| "{label} failed: {}", | |
| String::from_utf8_lossy(&output.stderr) | |
| ); | |
| }; | |
| run_git(&["init"], "git init"); | |
| run_git(&["config", "user.email", "test@example.com"], "git email"); | |
| run_git(&["config", "user.name", "Test"], "git name"); | |
| run_git(&["add", "."], "git add"); | |
| run_git(&["commit", "-m", "init"], "git commit"); |
Summary
daemon_statusValidation
Why this matters
This gives the daemon the first real "settle/defer" semantics from the Watchman playbook. It can now distinguish between meaningful repo change and an in-progress Git operation that should not trigger structural reconciliation yet.