Add server-side Slurm-job correlation endpoint#365
Closed
daniel-thom wants to merge 1 commit into
Closed
Conversation
`torc slurm diagnose-logs` correlated Slurm job IDs to the Torc jobs they
ran by fetching four full lists (scheduled compute nodes, compute nodes,
results, jobs) and joining them through three HashMaps in
`build_slurm_to_jobs_map` -- the heaviest client-side join in the codebase.
Add `GET /workflows/{id}/slurm_job_correlations`, which performs the whole
join in one SQL query: scheduled_compute_node (scheduler_id = Slurm job ID)
-> compute_node (linked via the scheduler JSON's scheduler_id) -> result ->
job, grouped/ordered by (slurm_job_id, job_id) and covering all runs (matching
the prior all_runs=true behavior). Every table is narrowed by its workflow_id
index before joining; the query plan is index-only with no table scans.
`build_slurm_to_jobs_map` now makes a single call and rebuilds the same
`HashMap<String, Vec<AffectedJob>>`, so all consumers are unchanged. The
now-unused pagination imports are removed.
Adds integration tests for the correlation chain and the 404 path.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Collaborator
Author
|
Superseded by #366, which consolidates all the remaining server-side aggregation work (these two commits are the first two there) plus the running-jobs command and the results/compute-node listing improvements into a single PR. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Tier 2 follow-up to the
torc statusserver-side migration (#363).torc slurm diagnose-logscorrelates Slurm job IDs to the Torc jobs they ran. It did this inbuild_slurm_to_jobs_mapby fetching four full lists — scheduled compute nodes, compute nodes, results, and jobs — and joining them through three HashMaps in memory. This was the heaviest client-side join in the codebase.This PR adds
GET /workflows/{id}/slurm_job_correlations, which performs the entire join in a single SQL query:grouped/ordered by
(slurm_job_id, job_id)and covering all runs (matching the priorall_runs=truebehavior). Each table is narrowed by itsworkflow_idindex before joining —EXPLAIN QUERY PLANshows it starting fromresultviaidx_result_workflow_idthen primary-key lookups, with no table scans.build_slurm_to_jobs_mapnow makes a single call and rebuilds the sameHashMap<String, Vec<AffectedJob>>, so all consumers (diagnose-logsoutput) are unchanged. The now-unused pagination imports are removed.Response shape
The client groups
itemsbyslurm_job_id; the server already deduplicates (GROUP BY) and orders the rows.Testing
test_get_slurm_job_correlations(builds the full SCN→compute_node→result→job chain and asserts the correlation) andtest_get_slurm_job_correlations_not_foundcargo fmt --check,cargo clippy --all --all-targets --all-features -- -D warnings,dprint check— clean (pre-commit hook)🤖 Generated with Claude Code