Skip to content

Refactor validate_run_id to accept a SQLx Executor #288

@daniel-thom

Description

@daniel-thom

Background

PR #287 fixed a deadlock in batch_complete_jobs by inlining the SELECT used by validate_run_id directly against the active transaction (&mut **tx), since the original helper takes a fresh pool connection and would re-deadlock the handler.

That left two near-duplicate copies of the same logic:

  • src/server/http_server/runtime_support.rs::validate_run_id — used by manage_job_status_change (lifecycle_support.rs:63) and the non-batch apply_job_completion_state path.
  • An inline equivalent in src/server/http_server/jobs_transport.rs::apply_job_completion_state_tx, written against tx.

This was raised by Copilot on PR #287 (review comment) and deferred from the hotfix to keep the patch's scope minimal during the production incident. Now that the hotfix has shipped, the duplication should be consolidated before the two implementations drift (e.g., a future tweak to error mapping or an extra validation rule lands in only one of them).

Proposed change

Refactor validate_run_id to accept a generic SQLx executor so the same body works for both a borrowed pool and an open transaction:

pub(super) async fn validate_run_id<'e, E>(
    executor: E,
    workflow_id: i64,
    provided_run_id: i64,
) -> Result<(), String>
where
    E: sqlx::Executor<'e, Database = sqlx::Sqlite>,

Then:

  • manage_job_status_change and the non-batch completion path call validate_run_id(&*self.pool, ...).
  • apply_job_completion_state_tx calls validate_run_id(&mut **tx, ...) and drops its inline copy.

Acceptance criteria

  • Single implementation of run_id validation; no inlined copy in apply_job_completion_state_tx.
  • Both pool-based and transaction-based call sites use the shared helper.
  • Existing tests still pass; tests/test_batch_complete_jobs.rs::test_batch_complete_jobs_does_not_deadlock_in_memory still catches the deadlock regression.

Out of scope

  • Other places where the same anti-pattern could exist. An audit on PR Fix regression in batch_job_complete #287 confirmed no remaining instances; this issue is purely about the duplication introduced by that PR.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions