Skip to content

Investigate backfill under-fetching in resource-based job claiming #267

@daniel-thom

Description

@daniel-thom

Problem

Copilot noted a remaining edge case in PR #266: the backfill query currently uses
LIMIT remaining_limit, but the Rust packing pass can still skip some of those returned rows.

That means the server may return fewer jobs than it could have if the top remaining_limit
backfill candidates do not pack together, even though lower-ranked candidates outside that limited
window would fit the remaining resources.

Example shape:

remaining CPU = 4
remaining_limit = 4

backfill candidates returned by SQL:
  job A: 3 CPU
  job B: 3 CPU
  job C: 3 CPU
  job D: 3 CPU

Rust claims A, then skips B/C/D because only 1 CPU remains.

Lower-ranked 1-CPU jobs may exist, but the backfill query did not fetch them.

Scope

This is distinct from the GPU-saturation paging fix in PR #266. That PR keeps the query bounded and
addresses the observed case where a primary page is dominated by higher-priority GPU jobs and
lower-priority CPU jobs can fill leftover CPU capacity.

Possible approaches

  • Over-fetch a bounded multiple of remaining_limit, with a reasonable cap.
  • Make the backfill pass iterative/page-based until either the claim limit is met, resources are
    saturated, or a maximum number of backfill pages has been scanned.
  • Add instrumentation first to see whether skips in the backfill pass are common enough to justify a
    broader heuristic.

Acceptance criteria

  • Add a regression test where the first backfill window contains candidates that individually fit the
    SQL remaining-resource filters but do not pack together, while lower-ranked candidates would fit.
  • Keep total SQL work bounded.
  • Preserve existing priority ordering and scheduler fallback behavior.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions