Skip to content

Support runner-side job prefetching / batching to reduce idle time between short jobs #276

@nkeilbart

Description

@nkeilbart

Summary

Add support in torc for job runners to prefetch multiple jobs from the server and execute them sequentially from a local in-memory queue, instead of fetching only a single job at a time.

This is intended to reduce idle time on node resources, especially GPUs, between short-running jobs.

Problem

Today, a TORC job runner works roughly like this:

  1. Poll the server for work
  2. Receive a single job
  3. Run the job
  4. Report completion back to the server
  5. Wait until the next polling cycle to request more work

Because polling happens on a configurable interval, there can be a gap between when one job finishes and when the next job begins. During that time, node resources are idle.

This is particularly noticeable for short-running jobs. In my current workload, individual jobs run for about 5 seconds. When running across 10 nodes, and potentially scaling to 200 nodes, the fetch/report/poll cycle can leave GPUs underutilized.

Requested Feature

Allow each job runner to request multiple eligible jobs at once and keep them in a local in-memory queue.

The runner should then:

  • start the next queued job immediately after the current one finishes
  • continue executing queued jobs sequentially
  • periodically check back in with the server
  • request additional work before it becomes idle, when possible

This should help keep resources busy continuously instead of waiting on the next poll cycle after every single job.

Desired Behavior

  • A runner can prefetch more than one job at a time
  • Prefetched jobs are stored in memory on the runner
  • Jobs are executed one after another as runner resources become available
  • Batch size should be configurable
  • Prefetch amount should take available resources into account

For example:

  • if a runner has 4 GPUs, it should be able to request enough work to keep those GPUs occupied for multiple job durations
  • torc already has resource-matching logic, so the runner should continue receiving only jobs appropriate for its available resources

Why This Matters

The current one-job-at-a-time model introduces unnecessary idle time between jobs, which is especially costly for short simulations.

Expected benefits:

  • higher GPU utilization
  • reduced idle time between jobs
  • fewer poll cycles per unit of work
  • better throughput for short-duration workloads
  • less resource waste on busy clusters

The main success metric for this feature would be improved GPU utilization, ideally keeping GPUs as close to 100% utilized as possible.

Scope / Assumptions

  • This is a feature request for internal development
  • Job dependencies are already handled by the server, not the runner
  • Resource compatibility is already handled by existing torc tooling
  • The runner only needs to execute jobs it has been assigned in order, based on available local capacity

Failure / Recovery Considerations

If a runner checks out multiple jobs and then stops reporting back within a configured timeout window, the server should return any outstanding jobs to the available pool.

Since runners already check in regularly, this seems like a reasonable recovery model for prefetched but unfinished work.

Reporting Considerations

There may be flexibility in how often runners report status:

Possible options:

  • report after every completed job
  • report completion in batches on a larger interval
  • report periodically while also requesting more work to avoid becoming idle

Any implementation should preserve correctness while reducing the amount of idle time introduced by per-job reporting.

Suggested Configuration

Potential configuration options:

  • prefetch enabled/disabled
  • batch size
  • max queued jobs per runner
  • refill threshold, for example request more work when queue depth drops below a certain level
  • lease / timeout for checked-out jobs

Example

Current

Runner polls, gets 1 job, runs it for ~5 seconds, reports completion, waits for next poll, then gets another job.

Proposed

Runner polls, gets a batch of eligible jobs, runs them back-to-back from memory, and refills the queue before it runs dry.

Acceptance Criteria

  • Runner can request multiple jobs in a single interaction with the server
  • Runner can maintain a local in-memory queue of prefetched jobs
  • Runner starts the next job immediately after the previous one completes, assuming resources are available
  • Batch size is configurable
  • Existing resource-matching behavior is preserved
  • Server can reclaim jobs that were checked out by an unresponsive runner after timeout
  • GPU idle time between short jobs is measurably reduced
  • Overall GPU utilization improves for short-running workloads

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions