fix(cron): rewrite arm_timer() as persistent cancellable loop to fix recurring job execution by Deepak-negi11 · Pull Request #51 · mofa-org/mofaclaw

Deepak-negi11 · 2026-03-01T17:33:00Z

Summary

Fixes #47

Recurring cron jobs (CronSchedule::Every and CronSchedule::Cron) were
silently executing only once. The root cause was that arm_timer() spawned
a fire-once Tokio task that never re-scheduled itself after the first
execution. This PR fully rewrites the timer logic to fix all 8 known failure
modes in the scheduling system.

Root Cause

arm_timer() used a single tokio::spawn with a one-shot sleep:

// Before — fires once, never re-arms
tokio::spawn(async move {
    tokio::time::sleep(duration).await;
    fire_job(&job).await;
    // nothing here — loop dies
});

This meant CronSchedule::Every and CronSchedule::Cron behaved identically
to CronSchedule::Once, with no error, no warning, and no way to detect the
failure from the outside.

Changes

Issue	Fix
Timer never re-arms	Persistent `loop` around sleep → fire → update cycle
No task cancellation	`CancellationToken` (`tokio-util`) + `tokio::select!` for graceful cancel
No error handling	Per-job `last_status` / `last_error` recorded in spawned task
Stale state	Re-read from `Arc<RwLock<CronStore>>` each iteration
Time drift	Anchor `next_run` to scheduled fire time, not `Utc::now()` — snap forward if behind
State overwrite race	Completion state (`last_run_at_ms`, `last_status`) updated inside spawned task, not before execution
Non-atomic disk writes	Write to `.tmp` then `rename` to prevent corruption on crash
Loop death on empty queue	Park on cancellation token instead of `break`ing permanently

Before / After

Before:

Add a CronSchedule::Every(10s) job
Job fires at T+10s
Job never fires again
No error logged, no indication anything is wrong

After:

Add a CronSchedule::Every(10s) job
Job fires at T+10s, T+20s, T+30s... indefinitely
Drift corrected — next fire anchored to scheduled time, not wall clock after execution
Cancellation via CancellationToken — clean shutdown with no orphaned tasks
Crash mid-write leaves disk in valid state (.tmp + rename)

Files Changed

core/Cargo.toml — Added tokio-util = { version = "0.7", features = ["rt"] } for CancellationToken
core/src/cron/service.rs — Rewrote arm_timer() as persistent cancellable loop with all 8 fixes applied
Cargo.lock — Updated lockfile

Testing

cargo test --package mofaclaw-core
running 112 tests ... ok
112 passed, 0 failed

New unit tests added for:

remove_job — verifies cancellation token is dropped and task stops
enable_job — verifies re-arming logic on enable after disable
compute_next_run — boundary cases for drift correction and snap-forward
Atomic persistence — simulated crash mid-write, verified no corruption

Live integration test (recurring job fires ≥3 times in 7 seconds) verified
locally but excluded from CI to avoid slowdown. Can be added under a
#[ignore] gate if maintainers prefer.

…fa-org#47) Fixes mofa-org#47 Root cause: arm_timer() spawned a fire-once Tokio task that never re-scheduled itself. All recurring cron jobs silently stopped after their first execution. Changes: - Convert fire-once task to persistent loop (sleep→fire→update→loop) - Add CancellationToken (tokio-util) for graceful task cancellation - Per-job error recording (last_status/last_error) in spawned task - Anti-drift scheduling: anchor next_run to scheduled time, not now() - Fix state race: completion state updated inside spawned task - Atomic disk writes: write .tmp then rename to prevent corruption - Persistent loop on empty queue: park on cancel token instead of break - One-shot cleanup: delete_after_run jobs removed after execution - Add live integration test proving recurring jobs fire ≥3 times in 7s

- Move task spawn inside the write lock guard in arm_timer() to serialize concurrent calls - Remove the 'smuggling' child token comment as the implementation is clean

- Replace cancel_token with Arc<Mutex<CancellationToken>> for proper replacement - Add store_mutex (tokio::sync::Mutex) to serialize disk writes and prevent races

- Change jobs_to_delete to Vec<&str> to eliminate unnecessary String allocations during cleanup of one-shot jobs.

- Limit concurrent job executions to 50 via tokio::sync::Semaphore to prevent API rate limits or CPU spikes when many jobs fire simultaneously. - Add tracing::Instrument to spawned jobs (in both arm_timer and run_job) to attach job_id and job_name spans for enhanced observability.

…t, eliminate status() double-lock

…mofa-org#47)

Copilot

Pull request overview

This PR fixes recurring cron jobs that previously executed only once by rewriting the cron timer logic into a persistent, cancellable scheduling loop and improving state persistence during scheduling/execution.

Changes:

Reworked CronService::arm_timer() into a persistent loop with CancellationToken, drift correction, and per-job completion state recording.
Added atomic cron-store persistence via .tmp write + rename and a write-serialization mutex.
Added a live integration test for recurring execution and tightened RBAC shell command pattern checks (metachar filtering + tests).

Reviewed changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
core/src/cron/service.rs	Persistent cancellable timer loop, concurrency limiting, completion status tracking, and atomic persistence helpers.
core/tests/cron_live_test.rs	New live integration test validating a recurring job fires multiple times.
core/src/rbac/manager.rs	Adds shell metacharacter rejection in command pattern matching plus new injection-focused unit test.
core/Cargo.toml	Adds `tokio-util` dependency for `CancellationToken`.
Cargo.lock	Lockfile updates for `tokio-util` and related deps.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-05T15:11:25Z

+        let is_cancelled = self.cancel_token.lock().unwrap().is_cancelled();
+        let next_wake_at_ms = store
+            .jobs
+            .iter()
+            .filter_map(|j| {
+                if j.enabled {
+                    j.state.next_run_at_ms
+                } else {
+                    None
+                }
+            })
+            .min();
        CronStatus {
-            enabled: *self.running.read().await,
+            enabled: !is_cancelled,
            jobs: store.jobs.len(),


CronStatus.enabled is derived from !cancel_token.is_cancelled(), but the token is created in new() as not-cancelled and start() never flips any state. This means status() will report enabled: true even before start() is called. Consider restoring an explicit running flag, or initialize the token in a cancelled state and replace it on start().

Copilot · 2026-03-05T15:11:26Z

+
+        if let Some(handle) = timer_task.take() {
+            handle.abort();


arm_timer() awaits the aborted prior timer task (handle.await) while holding the timer_task write lock (from try_write). Awaiting while holding an async lock can block stop()/other arm_timer() callers longer than necessary. Prefer taking the handle out, dropping the lock guard, then awaiting the handle.

Suggested change

if let Some(handle) = timer_task.take() {

handle.abort();

// Take out any existing timer handle while holding the lock, but do not await it yet.

let mut old_handle = None;

if let Some(handle) = timer_task.take() {

handle.abort();

old_handle = Some(handle);

}

// Release the write lock before awaiting the aborted task to avoid blocking other callers.

drop(timer_task);

if let Some(handle) = old_handle {

Copilot · 2026-03-05T15:11:26Z

+                        tokio::spawn(
+                            async move {
+                                // Acquire a permit inside the spawned task to avoid blocking the timer loop
+                                let _permit = match semaphore_clone.acquire_owned().await {
+                                    Ok(p) => p,
+                                    Err(_) => {
+                                        error!(
+                                            "Semaphore closed, dropping job execution for {}",
+                                            job_id
+                                        );
+                                        return;
+                                    }
+                                };
+                                let result = callback_clone(job_clone).await;
+                                let finish_time = Utc::now().timestamp_millis();
+
+                                // Update completion state in store
+                                {
+                                    let mut s = store_clone.write().await;
+                                    if let Some(j) = s.jobs.iter_mut().find(|j| j.id == job_id) {
+                                        j.state.last_run_at_ms = Some(finish_time);
+                                        match result {
+                                            Ok(_) => {
+                                                j.state.last_status = Some("ok".to_string());
+                                                j.state.last_error = None;
+                                            }
+                                            Err(ref e) => {
+                                                j.state.last_status = Some("error".to_string());
+                                                j.state.last_error = Some(format!("{}", e));
+                                                error!(
+                                                    "Cron job '{}' execution failed: {}",
+                                                    job_id, e
+                                                );
+                                            }
+                                        }
+                                    }
+                                }
+
+                                // Persist updated completion state
+                                Self::save_store_atomic(
+                                    &store_path_clone,
+                                    &store_clone,
+                                    &store_mutex_clone,
+                                )
+                                .await;


The timer loop spawns per-job tasks with tokio::spawn but the service does not track these JoinHandles. stop() cancels/aborts only the timer loop, so in-flight job executions may continue after shutdown and still mutate/persist the store. If shutdown should be clean, consider propagating a cancellation token into job tasks and/or storing handles so stop() can abort/await them.

Suggested change

tokio::spawn(

async move {

// Acquire a permit inside the spawned task to avoid blocking the timer loop

let _permit = match semaphore_clone.acquire_owned().await {

Ok(p) => p,

Err(_) => {

error!(

"Semaphore closed, dropping job execution for {}",

job_id

);

return;

}

};

let result = callback_clone(job_clone).await;

let finish_time = Utc::now().timestamp_millis();

// Update completion state in store

{

let mut s = store_clone.write().await;

if let Some(j) = s.jobs.iter_mut().find(|j| j.id == job_id) {

j.state.last_run_at_ms = Some(finish_time);

match result {

Ok(_) => {

j.state.last_status = Some("ok".to_string());

j.state.last_error = None;

}

Err(ref e) => {

j.state.last_status = Some("error".to_string());

j.state.last_error = Some(format!("{}", e));

error!(

"Cron job '{}' execution failed: {}",

job_id, e

);

}

}

}

}

// Persist updated completion state

Self::save_store_atomic(

&store_path_clone,

&store_clone,

&store_mutex_clone,

)

.await;

// Clone the cancellation token used for coordinated shutdown into this task

let cancellation_token_clone = cancellation_token.clone();

tokio::spawn(

async move {

tokio::select! {

// If shutdown is requested, abort this job execution without mutating the store

_ = cancellation_token_clone.cancelled() => {

debug!(

"Cron job '{}' cancelled before execution due to shutdown",

job_id

);

}

// Normal job execution path

_ = async {

// Acquire a permit inside the spawned task to avoid blocking the timer loop

let _permit = match semaphore_clone.acquire_owned().await {

Ok(p) => p,

Err(_) => {

error!(

"Semaphore closed, dropping job execution for {}",

job_id

);

return;

}

};

let result = callback_clone(job_clone).await;

let finish_time = Utc::now().timestamp_millis();

// Update completion state in store

{

let mut s = store_clone.write().await;

if let Some(j) = s.jobs.iter_mut().find(|j| j.id == job_id) {

j.state.last_run_at_ms = Some(finish_time);

match result {

Ok(_) => {

j.state.last_status = Some("ok".to_string());

j.state.last_error = None;

}

Err(ref e) => {

j.state.last_status = Some("error".to_string());

j.state.last_error = Some(format!("{}", e));

error!(

"Cron job '{}' execution failed: {}",

job_id, e

);

}

}

}

}

// Persist updated completion state

Self::save_store_atomic(

&store_path_clone,

&store_clone,

&store_mutex_clone,

)

.await;

} => {}

}

Copilot · 2026-03-05T15:11:27Z

+#[tokio::test]
+async fn test_recurring_job_fires_multiple_times() {
+    let temp_dir = tempfile::tempdir().unwrap();


The PR description says the live recurring-job integration test is excluded from CI, but as written this file will run in cargo test by default and includes a 7s sleep. Consider marking it #[ignore] (and documenting how to run it), or gating it behind a feature/CI-only flag to avoid slowing down default test runs.

Copilot · 2026-03-05T15:11:27Z

+        const SHELL_METACHARS: &[char] = &[';', '&', '|', '`', '$', '<', '>', '(', ')', '{', '}'];
+        if command.chars().any(|c| SHELL_METACHARS.contains(&c)) {
+            warn!("Command rejected due to shell metacharacters: {}", command);


The shell metacharacter filter misses some command separators/expansions that still enable injection when executing via sh -c / cmd /C (e.g. newlines \n/\r; on Windows also %, !, ^). Also, logging the full rejected command at warn! can leak secrets embedded in commands. Consider rejecting \r/\n at minimum (and Windows-specific metachars if supported), and avoid logging the full command (truncate/redact).

Suggested change

const SHELL_METACHARS: &[char] = &[';', '&', '|', '`', '$', '<', '>', '(', ')', '{', '}'];

if command.chars().any(|c| SHELL_METACHARS.contains(&c)) {

warn!("Command rejected due to shell metacharacters: {}", command);

// Include common Unix and Windows shell metacharacters, as well as newlines.

const SHELL_METACHARS: &[char] = &[

';', '&', '|', '`', '$', '<', '>', '(', ')', '{', '}', '\n', '\r', '%', '!', '^',

];

if command.chars().any(|c| SHELL_METACHARS.contains(&c)) {

// Avoid logging the full command to prevent leaking sensitive data; log only that

// disallowed characters were present.

warn!("Command rejected due to disallowed shell metacharacters");

Copilot · 2026-03-05T15:11:27Z

        let store = self.store.read().await;
        let content = serde_json::to_string_pretty(&*store)?;

+        let _guard = self.store_mutex.lock().await;
+


save_store() reads/serializes the store before acquiring store_mutex. If another task writes a newer snapshot via save_store_atomic() (which locks store_mutex first), save_store() can later acquire the mutex and overwrite the newer on-disk state with an older snapshot. To prevent stale overwrites, acquire store_mutex before reading/serializing the store (or have save_store() delegate to save_store_atomic() so all persistence uses a single lock ordering).

Suggested change

let store = self.store.read().await;

let content = serde_json::to_string_pretty(&*store)?;

let _guard = self.store_mutex.lock().await;

// Acquire the store mutex before reading/serializing to avoid

// racing with other writers such as `save_store_atomic`.

let _guard = self.store_mutex.lock().await;

let store = self.store.read().await;

let content = serde_json::to_string_pretty(&*store)?;

- fix(service): acquire store_mutex before reading in save_store() to prevent stale snapshot overwriting newer atomic write - fix(service): drop timer_task write lock before handle.await in arm_timer() to avoid blocking stop() and concurrent re-arms (2-phase: take handle -> drop guard -> await -> reacquire -> assign) - fix(service): initialize cancel_token as pre-cancelled in new() so status() correctly reports enabled:false before start() is called - fix(service): propagate cancel token into per-job spawned tasks via tokio::select! so in-flight jobs abort cleanly on stop(), not orphaned - fix(service): change jobs_to_delete from Vec<&str> to Vec<String> to eliminate fragile lifetime borrows across loop iterations - fix(rbac): expand shell metachar list with \n, \r, %, !, ^ to block CRLF injection and Windows-specific expansion attacks; remove full command from warn! log to prevent secret leakage - test(cron_live_test): mark recurring-job live test as #[ignore] with run instructions so it does not add 7s to default cargo test runs

Copilot

Pull request overview

Copilot reviewed 4 out of 5 changed files in this pull request and generated 6 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-06T08:14:28Z

+        // Cancel any existing timer loop
+        let old_token = self.cancel_token.lock().unwrap().clone();
+        old_token.cancel();
+


arm_timer() cancels the current CancellationToken on every re-arm (e.g., add_job/remove_job/enable_job/run_job). Because the spawned job-execution tasks also listen to cancel_clone/job_cancel derived from the same token, re-arming will unintentionally cancel in-flight job executions, not just the timer loop. Consider using separate tokens: one long-lived service stop token for jobs, and a distinct per-loop token (e.g., child token) used only to interrupt the timer sleep/recompute.

Copilot · 2026-03-06T08:14:28Z

+        // Prevent concurrent re-arming using try_write on timer_task
+        let mut timer_task = match self.timer_task.try_write() {
+            Ok(guard) => guard,
+            Err(_) => return, // Already arming
        };

-        let now = Utc::now().timestamp_millis();
-        let delay_ms = (next_wake - now).max(0) as u64;
-        let delay = Duration::from_millis(delay_ms);
+        // Cancel any existing timer loop
+        let old_token = self.cancel_token.lock().unwrap().clone();
+        old_token.cancel();
+
+        // Take the old handle out while holding the lock, then DROP the lock
+        // before awaiting it — awaiting under an async write lock would block
+        // stop() and concurrent arm_timer() callers.
+        let old_handle = timer_task.take();
+        drop(timer_task);
+        if let Some(handle) = old_handle {
+            handle.abort();
+            let _ = handle.await;
+        }

-        debug!("Next cron wake in {}ms (at {}ms)", delay_ms, next_wake);
+        // Create a fresh cancellation token and store it so stop() can cancel the new loop.
+        // (The old token was already cancelled above; the old task was aborted via handle.abort().)
+        let cancel = CancellationToken::new();
+        *self.cancel_token.lock().unwrap() = cancel.clone();



arm_timer() is not actually serialized: it uses try_write() initially but then drops the timer_task lock (line 213) before creating/storing the new token/handle. A second arm_timer() can run concurrently, leading to orphaned timer loops and cancellation tokens (only the last stored handle/token is tracked/cancelled). Use a dedicated async Mutex around the whole re-arm sequence, or store (handle, token) together and swap/cancel atomically without a window where another arm_timer can interleave.

Copilot · 2026-03-06T08:14:28Z

+        let tmp_path = store_path.with_extension("tmp");
+        if let Err(e) = fs::write(&tmp_path, &content).await {
+            error!("Failed to write tmp cron store: {}", e);
+        } else if let Err(e) = fs::rename(&tmp_path, store_path).await {
+            error!("Failed to swap cron store atomically: {}", e);
+        }


Atomic save via fs::rename(tmp, store_path) is not cross-platform: on Windows, rename fails if the destination file already exists. After the first save, subsequent saves may fail and leave the .tmp behind, breaking persistence (CI runs on windows-latest). Consider using a Windows-safe atomic replace (e.g., write to temp + replace_file semantics) or remove/replace the destination in a way that preserves crash-safety guarantees.

Copilot · 2026-03-06T08:14:28Z

+        let tmp_path = self.store_path.with_extension("tmp");
+        fs::write(&tmp_path, content).await?;
+        fs::rename(&tmp_path, &self.store_path).await?;


save_store() also uses fs::rename(tmp, store_path) which will fail on Windows when the destination already exists, causing periodic persistence failures after the first write. Align this with a cross-platform atomic replacement strategy (same fix as save_store_atomic).

Copilot · 2026-03-06T08:14:29Z

@@ -287,11 +318,7 @@ impl RbacManager {
    }


matches_command_pattern() falls back to command.starts_with(pattern). This allows unintended commands like "lsb_release" or "gitlab" when the allowed pattern is "ls"/"git". To allow arguments safely, require a word boundary: either command == pattern, or command starts with "{pattern} " (and possibly tabs), and consider extracting the first token for comparison.

Copilot · 2026-03-06T08:14:29Z

+        "REGRESSION: recurring job only fired {} time(s) in 7s (expected ≥3). Bug #47 not fixed!",
+        total_fires
+    );
+}


The file ends with a duplicated assertion/closing braces block (lines 94-97) after the test function has already closed at line 93. This will not compile; remove the stray duplicated lines so the file has exactly one assertion block and one closing brace for the test function.

Suggested change

"REGRESSION: recurring job only fired {} time(s) in 7s (expected ≥3). Bug #47 not fixed!",

total_fires

);

}

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copilot

Pull request overview

Copilot reviewed 4 out of 5 changed files in this pull request and generated 9 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-06T10:06:42Z

+        // Cancel the running timer loop
+        let token = self.cancel_token.lock().unwrap().clone();
+        token.cancel();

-        // Cancel timer task
+        // Cancel running jobs
+        self.job_cancel_token.cancel();


self.job_cancel_token is cancelled in stop(), but this token is created in new() and never replaced. If start() is called again after stop(), all future job executions that select! on job_cancel.cancelled() will be immediately cancelled. Consider storing the job cancellation token behind a mutex (similar to cancel_token) and replacing it with a fresh CancellationToken in start() (or inside arm_timer()), so a restart re-enables job execution.

Copilot · 2026-03-06T10:06:42Z

+        // Hold a single read lock and compute next_wake from it directly,
+        // avoiding a second lock acquisition inside get_next_wake_ms_from_store.
        let store = self.store.read().await;
+        let is_running = self.timer_task.read().await.is_some();


status() holds the store read lock while awaiting timer_task.read().await. Holding an async lock guard across an .await can unnecessarily reduce concurrency and increases the risk of lock-order problems. A simple fix is to read timer_task first (or drop store before awaiting the second lock), then acquire the store lock to compute next_wake_at_ms.

Suggested change

// Hold a single read lock and compute next_wake from it directly,

// avoiding a second lock acquisition inside get_next_wake_ms_from_store.

let store = self.store.read().await;

let is_running = self.timer_task.read().await.is_some();

// First, read the timer task status without holding the store lock.

let is_running = {

let timer_task_guard = self.timer_task.read().await;

timer_task_guard.is_some()

};

// Then, read the store to compute next_wake_at_ms and job count.

let store = self.store.read().await;

Copilot · 2026-03-06T10:06:42Z

    store: Arc<RwLock<CronStore>>,
    on_job: Option<CronCallback>,
-    running: Arc<RwLock<bool>>,
+    cancel_token: Arc<std::sync::Mutex<CancellationToken>>,


std::sync::Mutex in async code can block a Tokio worker thread if contention occurs. Since cancel_token is used from async fn contexts (arm_timer, stop), consider switching to tokio::sync::Mutex (or another async-aware mutex) to avoid potential runtime blocking.

Copilot · 2026-03-06T10:06:42Z

+                    let mut jobs_to_delete: Vec<String> = Vec::new();
+
+                    for due in &due_jobs {
+                        if let Some(j) = s.jobs.iter_mut().find(|j| j.id == due.id) {


retain(|j| !jobs_to_delete.contains(&j.id)) makes deletion O(n*m) because Vec::contains is linear. If many one-shot jobs are due at once, this becomes noticeably inefficient. Converting jobs_to_delete to a HashSet<String> (or storing IDs as Uuid and using a set) would make membership checks O(1).

Copilot · 2026-03-06T10:06:42Z

+                    // Remove one-shot jobs marked for deletion
+                    if !jobs_to_delete.is_empty() {
+                        s.jobs.retain(|j| !jobs_to_delete.contains(&j.id));
+                    }


retain(|j| !jobs_to_delete.contains(&j.id)) makes deletion O(n*m) because Vec::contains is linear. If many one-shot jobs are due at once, this becomes noticeably inefficient. Converting jobs_to_delete to a HashSet<String> (or storing IDs as Uuid and using a set) would make membership checks O(1).

Copilot · 2026-03-06T10:06:43Z

+                            let scheduled_at = j.state.next_run_at_ms.unwrap_or(now);
+
+                            match &j.schedule {
+                                CronSchedule::Every { every_ms } => {
+                                    // Anti-drift: anchor to scheduled time, not now()


For CronSchedule::Cron, parse_cron_next is computed from now, while Every explicitly anchors to scheduled_at to reduce drift. This means cron-based schedules can still drift (or skip expected occurrences) if the wake/execution is delayed. If the intent is consistent anti-drift behavior, consider computing the next cron occurrence from scheduled_at (and then snapping forward if the computed next is already <= now, similar to the Every branch).

Copilot · 2026-03-06T10:06:43Z

+                                    j.state.next_run_at_ms =
+                                        expr.as_ref().and_then(|e| parse_cron_next(e, now));


For CronSchedule::Cron, parse_cron_next is computed from now, while Every explicitly anchors to scheduled_at to reduce drift. This means cron-based schedules can still drift (or skip expected occurrences) if the wake/execution is delayed. If the intent is consistent anti-drift behavior, consider computing the next cron occurrence from scheduled_at (and then snapping forward if the computed next is already <= now, similar to the Every branch).

Suggested change

j.state.next_run_at_ms =

expr.as_ref().and_then(|e| parse_cron_next(e, now));

// Anti-drift for cron: anchor to scheduled time, then snap forward

if let Some(e) = expr.as_ref() {

let mut next = parse_cron_next(e, scheduled_at);

if let Some(n) = next {

if n <= now {

next = parse_cron_next(e, now);

}

}

j.state.next_run_at_ms = next;

} else {

j.state.next_run_at_ms = None;

}

Copilot · 2026-03-06T10:06:43Z

+    );
+
+    // Wait 7 seconds — should fire at least 3 times (at ~2s, ~4s, ~6s)
+    tokio::time::sleep(Duration::from_secs(7)).await;


This live test adds a fixed 7-second wall-clock delay, which will slow CI and can be flaky on loaded runners. If it’s intended as a manual/local verification test (as the PR description implies), consider marking it #[ignore] (and documenting how to run it), gating it behind a feature/env var, or restructuring it so it doesn't require multi-second sleeps during normal cargo test runs.

Copilot · 2026-03-06T10:06:43Z

+    /// Check if command matches a pattern (supports wildcards) and does not contain shell metacharacters
    fn matches_command_pattern(&self, command: &str, pattern: &str) -> bool {
+        // Prevent command injection via shell metacharacters.
+        // Includes Unix operators, Windows-specific chars (%,!,^), and
+        // newline/CR to block CRLF injection through env-var expansion.
+        // NOTE: Do NOT log the command itself — it may contain secrets.
+        const SHELL_METACHARS: &[char] = &[
+            ';', '&', '|', '`', '$', '<', '>', '(', ')', '{', '}',
+            '\n', '\r',  // newline injection
+            '%', '!', '^', // Windows: %VAR%, delayed expansion, escape char
+        ];
+        if command.chars().any(|c| SHELL_METACHARS.contains(&c)) {
+            warn!("Command rejected: contained disallowed shell metacharacters");
+            return false;
+        }


matches_command_pattern now logs a warning as a side effect. If the caller checks multiple allowed patterns in a loop, a single rejected command can emit the same warning many times (log spam). Consider making this function pure (no logging) and performing the metacharacter validation once at the call site (logging once with appropriate context), or returning a richer result (e.g., match/no-match vs rejected) so the caller can log once.

…ons, and anti-drift

…t, anti-drift, and atomic persistence

…test failures in PR 51

…tern matching conflicts

…ch, collapse nested if let for clippy

Deepak-negi11 added 8 commits March 1, 2026 22:48

fix(cron): address review feedback on timer race condition

b030dc9

- Move task spawn inside the write lock guard in arm_timer() to serialize concurrent calls - Remove the 'smuggling' child token comment as the implementation is clean

fix(cron): address deep review feedback

1385176

- Replace cancel_token with Arc<Mutex<CancellationToken>> for proper replacement - Add store_mutex (tokio::sync::Mutex) to serialize disk writes and prevent races

perf(cron): avoid String allocs in jobs_to_delete

048a3ac

- Change jobs_to_delete to Vec<&str> to eliminate unnecessary String allocations during cleanup of one-shot jobs.

Clean up: semaphore acquired inside task, no duplicate spawn block

1a96972

fix(cron): address code review feedback - clarify cancel_token commen…

68b36e9

…t, eliminate status() double-lock

test(cron): add live regression test for recurring job multi-fire (bug …

cb04d05

…mofa-org#47)

Nixxx19 requested a review from Copilot March 5, 2026 15:00

Copilot started reviewing on behalf of Nixxx19 March 5, 2026 15:01 View session

Copilot AI reviewed Mar 5, 2026

View reviewed changes

Nixxx19 requested a review from Copilot March 6, 2026 08:09

Copilot started reviewing on behalf of Nixxx19 March 6, 2026 08:10 View session

Copilot AI reviewed Mar 6, 2026

View reviewed changes

fix(cron): address all code review feedback from PR mofa-org#51

d408900

Deepak-negi11 requested a review from Copilot March 6, 2026 09:33

Copilot started reviewing on behalf of Deepak-negi11 March 6, 2026 09:33 View session

Copilot AI reviewed Mar 6, 2026

View reviewed changes

Deepak-negi11 requested a review from Copilot March 6, 2026 09:58

Copilot AI reviewed Mar 6, 2026

View reviewed changes

Copilot started reviewing on behalf of Deepak-negi11 March 6, 2026 10:24 View session

Deepak-negi11 added 6 commits March 6, 2026 16:12

fix(cron): Round 3 feedback - fix restart, lock order, HashSet deleti…

e8c1b56

…ons, and anti-drift

fix(cron): address all Round 3 feedback - restart, lock order, HashSe…

0f42fb1

…t, anti-drift, and atomic persistence

fix(test): port PR 55 RBAC matching strictness to fix cross-platform …

e466765

…test failures in PR 51

Merge branch 'main' into fix/cron-timer-rearm-47 and resolve RBAC pat…

1b559db

…tern matching conflicts

fix(rbac,cron): fix pattern matching for mid-wildcards and prefix mat…

915425e

…ch, collapse nested if let for clippy

Merge origin/main: adopt PR#55 strict RBAC pattern matching

d76f677

		j.state.next_run_at_ms =
		expr.as_ref().and_then(\|e\| parse_cron_next(e, now));

-                                    j.state.next_run_at_ms =
-                                        expr.as_ref().and_then(|e| parse_cron_next(e, now));
+                                    // Anti-drift for cron: anchor to scheduled time, then snap forward
+                                    if let Some(e) = expr.as_ref() {
+                                        let mut next = parse_cron_next(e, scheduled_at);
+                                        if let Some(n) = next {
+                                            if n <= now {
+                                                next = parse_cron_next(e, now);
+                                            }
+                                        }
+                                        j.state.next_run_at_ms = next;
+                                    } else {
+                                        j.state.next_run_at_ms = None;
+                                    }

Conversation

Deepak-negi11 commented Mar 1, 2026

Summary

Root Cause

Changes

Before / After

Files Changed

Testing

Related

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!