feat(ci): surface superseded UAT runs (#1274)#1583
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Enterprise Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughChangesThis PR adds a new GitHub Actions workflow, “UAT: Superseded Run Notice,” that runs after a cancelled “UAT Run” completion, checks the triggering run’s jobs through the GitHub API, and distinguishes a pending supersede from a genuine mid-run cancellation. For pending supersedes, it writes a step summary and emits a warning using percent-encoded metadata. The contributor UAT documentation is updated to describe this behavior and remove a roadmap item. Estimated code review effort: 3 (Moderate) | ~20 minutes Possibly related PRs
Suggested reviewers: 🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/workflows/uat-superseded-notice.yaml:
- Around line 70-74: Do not classify a workflow run as superseded when the Jobs
API request fails in the superseded-notice workflow. In the
`jobs_json`/`superseded` logic, stop falling back to `'{}'` on `gh api` errors
and instead detect the failure explicitly before running the `jq` check, so
`total_count == 0` only reflects a real empty jobs response. Use the `gh api`
call and the `superseded` assignment in the workflow to locate the fix.
- Around line 62-64: The superseded-run notice is missing the reservation
context, so the redispatch target remains ambiguous. Update the workflow notice
variables in uat-superseded-notice.yaml to include the reservation directly, or
propagate it from the upstream run name if you prefer. Use the existing
RUN_TITLE, RUN_URL, and HEAD_BRANCH handling as the place to add the reservation
so the notice uniquely identifies the intended run.
- Around line 81-94: The `echo "::warning::..."` command in the UAT superseded
notice workflow is vulnerable to command annotation injection if `RUN_TITLE` or
`RUN_URL` contains `%`, CR, or LF. Update the warning construction in
`uat-superseded-notice.yaml` to escape those values before emitting the
`::warning::` command, while leaving the GitHub Step Summary block unchanged;
use the existing `RUN_TITLE` and `RUN_URL` symbols as the inputs to sanitize.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: 15519289-719d-497d-a060-ed1625b66a89
📒 Files selected for processing (2)
.github/workflows/uat-superseded-notice.yamldocs/contributor/uat.md
|
🌿 Preview your docs: https://nvidia-preview-ci-uat-superseded-notice.docs.buildwithfern.com/aicr |
Add uat-superseded-notice.yaml, a reactive observer that makes a superseded UAT run visible instead of silent. The reservation lease (uat-run.yaml's `uat-<reservation>` concurrency group, cancel-in-progress: false) holds at most one in-progress plus one pending run; a third contender cancels the older pending run before it starts any job. That run is superseded, not failed, and would otherwise vanish without a trace. The observer triggers on `workflow_run: completed` for "UAT Run" (only top-level workflows emit workflow_run, and only from the default branch), classifies a cancelled run that never started a job as a supersede (total_count == 0, or every job has a null started_at) versus a genuine mid-run cancel, and emits a job-summary entry plus a ::warning. It reads only trusted workflow_run.* metadata via env (no injection) and needs only actions: read. This is the reactive/primary layer; the nightly controller reconciles the same signal synchronously for the cells it dispatches, and a DC6 guard (NVIDIA#1279) will exercise this observer. Related NVIDIA#1274, NVIDIA#1264. Signed-off-by: Nathan Hensley <nhensley@nvidia.com>
f84f2e5 to
282a065
Compare
mchmarny
left a comment
There was a problem hiding this comment.
Clean, well-scoped workflow_run observer that surfaces superseded UAT runs. Security posture is exactly right for this trigger type: no checkout of the triggering run, only trusted workflow_run.* metadata read via env: (never inlined), the ::warning percent-encodes %/CR/LF so a crafted run title can't inject, least-privilege actions: read, repo-gated (matching the repo's existing github.repository == 'nvidia/aicr' convention), timeout + per-run concurrency. The pending-vs-mid-run classifier is sound (total_count == 0 or all started_at null), and it fails toward not crying wolf on a jobs-API fetch error. Docs updated and the DC6 roadmap item retired. actionlint/analyze green; inert until merged to main as expected. LGTM.
Summary
Adds
uat-superseded-notice.yaml, a reactiveworkflow_runobserver that makes a superseded UAT run visible (job-summary entry +::warning) instead of letting it vanish silently.Motivation / Context
Fourth and final PR in the DC1 day/night scheduler series (#1274). The reservation lease (
uat-run.yaml'suat-<reservation>concurrency group,cancel-in-progress: false) holds at most one in-progress + one pending run; a third contender cancels the older pending run before it starts any job. That run is superseded, not failed — a dropped request that would otherwise leave no trace. This observer surfaces it. Builds only on PR-2's top-leveluat-run.yaml(#1569, merged); independent of PR-3 (#1579).Fixes: N/A
Related: #1274, #1264
Type of Change
Component(s) Affected
docs/,examples/).github/workflows/uat-superseded-notice.yaml)Implementation Notes
on: workflow_run: workflows: ["UAT Run"], types: [completed]. Only top-level workflows emitworkflow_run, and only from the default branch — souat-run.yaml(top-level since PR-2) is the right subject, and this observer only runs once merged tomain.total_count == 0or every job has a nullstarted_at; a genuine mid-run cancel (≥1 started job) is ignored. If the jobs API can't be read, it errs toward surfacing rather than silence.::warningnaming the run and reservation, with a re-dispatch hint.workflow_run.*metadata, passed viaenv:(never inlined), so a fork-influenced run title can't inject. Needs onlypermissions: actions: read.uat-nightly-batch.yaml, PR-3) reconciles the same signal synchronously for the cells it dispatches; a DC6 guard (DC6 — Reliability + gap-fills #1279) will exercise this observer. The two are complementary.uat.mdqueuing section now points at this observer instead of deferring to "DC6", and the shipped item is removed from the roadmap. This overlaps the roadmap region PR-3 also edits — whichever merges second takes a one-line rebase.Testing
actionlint+yamllintclean;bash -nclean; longest line 161 (< 200).runs/<id>/jobspayloads: zero-jobs → superseded; all-null-started_at→ superseded; one started → not; mixed → not; empty API response → surfaces (fail-toward-visible).workflow_runobservers fire only from the default branch, so it goes live on merge tomain; CI here only lints it.Risk Assessment
Rollout notes: Inert until merged to
main(default-branch-only trigger). No secrets, no checkout, least-privilege token.Checklist
make testwith-race) — N/A, no Go source changedactionlint+yamllint; no Go →make lintGo gate N/A)docs/contributor/uat.md)workflow_runobservers)git commit -S)