feat(ci): surface superseded UAT runs (#1274) by njhensley · Pull Request #1583 · NVIDIA/aicr

njhensley · 2026-07-01T20:08:44Z

Summary

Adds uat-superseded-notice.yaml, a reactive workflow_run observer that makes a superseded UAT run visible (job-summary entry + ::warning) instead of letting it vanish silently.

Motivation / Context

Fourth and final PR in the DC1 day/night scheduler series (#1274). The reservation lease (uat-run.yaml's uat-<reservation> concurrency group, cancel-in-progress: false) holds at most one in-progress + one pending run; a third contender cancels the older pending run before it starts any job. That run is superseded, not failed — a dropped request that would otherwise leave no trace. This observer surfaces it. Builds only on PR-2's top-level uat-run.yaml (#1569, merged); independent of PR-3 (#1579).

Fixes: N/A
Related: #1274, #1264

Type of Change

New feature (non-breaking change that adds functionality)
Build/CI/tooling

Component(s) Affected

Docs/examples (docs/, examples/)
Other: CI workflows (.github/workflows/uat-superseded-notice.yaml)

Implementation Notes

Trigger: on: workflow_run: workflows: ["UAT Run"], types: [completed]. Only top-level workflows emit workflow_run, and only from the default branch — so uat-run.yaml (top-level since PR-2) is the right subject, and this observer only runs once merged to main.
Classifier: a run cancelled while pending never starts a job. The job reads the triggering run's jobs and treats it as superseded when total_count == 0 or every job has a null started_at; a genuine mid-run cancel (≥1 started job) is ignored. If the jobs API can't be read, it errs toward surfacing rather than silence.
Output: a job-summary entry plus a ::warning naming the run and reservation, with a re-dispatch hint.
Security: never checks out the triggering run's code; reads only trusted workflow_run.* metadata, passed via env: (never inlined), so a fork-influenced run title can't inject. Needs only permissions: actions: read.
Layering: this is the reactive/primary layer. The nightly controller (uat-nightly-batch.yaml, PR-3) reconciles the same signal synchronously for the cells it dispatches; a DC6 guard (DC6 — Reliability + gap-fills #1279) will exercise this observer. The two are complementary.
Doc note: the uat.md queuing section now points at this observer instead of deferring to "DC6", and the shipped item is removed from the roadmap. This overlaps the roadmap region PR-3 also edits — whichever merges second takes a one-line rebase.

Testing

# New CI workflow + doc line — no Go source touched.
actionlint .github/workflows/uat-superseded-notice.yaml   # CI-pinned v1.7.11, checksum-verified
yamllint  .github/workflows/uat-superseded-notice.yaml
bash -n   <observer embedded script>

actionlint + yamllint clean; bash -n clean; longest line 161 (< 200).
Classifier validated against 5 synthetic runs/<id>/jobs payloads: zero-jobs → superseded; all-null-started_at → superseded; one started → not; mixed → not; empty API response → surfaces (fail-toward-visible).
Cannot be exercised on this PR: workflow_run observers fire only from the default branch, so it goes live on merge to main; CI here only lints it.

Risk Assessment

Low — Isolated, additive single workflow + one doc line; no change to any existing pipeline; easy to revert.

Rollout notes: Inert until merged to main (default-branch-only trigger). No secrets, no checkout, least-privilege token.

Checklist

Tests pass locally (make test with -race) — N/A, no Go source changed
Linter passes (actionlint + yamllint; no Go → make lint Go gate N/A)
I did not skip/disable tests to make CI green
I added/updated tests for new functionality — N/A (observer is default-branch-only; a DC6 regression guard, DC6 — Reliability + gap-fills #1279, will exercise it)
I updated docs if user-facing behavior changed (docs/contributor/uat.md)
Changes follow existing patterns in the codebase (mirrors the repo's workflow_run observers)
Commits are cryptographically signed (git commit -S)

coderabbitai · 2026-07-01T20:13:57Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: fbf5c936-3846-4d77-ace1-a6c07543e51f

📥 Commits

Reviewing files that changed from the base of the PR and between f84f2e5 and 282a065.

📒 Files selected for processing (2)

.github/workflows/uat-superseded-notice.yaml
docs/contributor/uat.md

📝 Walkthrough

Walkthrough

Changes

This PR adds a new GitHub Actions workflow, “UAT: Superseded Run Notice,” that runs after a cancelled “UAT Run” completion, checks the triggering run’s jobs through the GitHub API, and distinguishes a pending supersede from a genuine mid-run cancellation. For pending supersedes, it writes a step summary and emits a warning using percent-encoded metadata. The contributor UAT documentation is updated to describe this behavior and remove a roadmap item.

Estimated code review effort: 3 (Moderate) | ~20 minutes

Possibly related PRs

NVIDIA/aicr#1569: Introduces the “UAT Run” workflow that this new workflow_run observer listens to.

Suggested reviewers: mchmarny

🚥 Pre-merge checks | ✅ 4

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarizes the main change: surfacing superseded UAT runs in CI.
Description check	✅ Passed	The description is directly related to the workflow and docs changes in this pull request.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/uat-superseded-notice.yaml:
- Around line 70-74: Do not classify a workflow run as superseded when the Jobs
API request fails in the superseded-notice workflow. In the
`jobs_json`/`superseded` logic, stop falling back to `'{}'` on `gh api` errors
and instead detect the failure explicitly before running the `jq` check, so
`total_count == 0` only reflects a real empty jobs response. Use the `gh api`
call and the `superseded` assignment in the workflow to locate the fix.
- Around line 62-64: The superseded-run notice is missing the reservation
context, so the redispatch target remains ambiguous. Update the workflow notice
variables in uat-superseded-notice.yaml to include the reservation directly, or
propagate it from the upstream run name if you prefer. Use the existing
RUN_TITLE, RUN_URL, and HEAD_BRANCH handling as the place to add the reservation
so the notice uniquely identifies the intended run.
- Around line 81-94: The `echo "::warning::..."` command in the UAT superseded
notice workflow is vulnerable to command annotation injection if `RUN_TITLE` or
`RUN_URL` contains `%`, CR, or LF. Update the warning construction in
`uat-superseded-notice.yaml` to escape those values before emitting the
`::warning::` command, while leaving the GitHub Step Summary block unchanged;
use the existing `RUN_TITLE` and `RUN_URL` symbols as the inputs to sanitize.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 15519289-719d-497d-a060-ed1625b66a89

📥 Commits

Reviewing files that changed from the base of the PR and between f07962a and f84f2e5.

📒 Files selected for processing (2)

.github/workflows/uat-superseded-notice.yaml
docs/contributor/uat.md

github-actions · 2026-07-01T20:17:25Z

🌿 Preview your docs: https://nvidia-preview-ci-uat-superseded-notice.docs.buildwithfern.com/aicr

Add uat-superseded-notice.yaml, a reactive observer that makes a superseded UAT run visible instead of silent. The reservation lease (uat-run.yaml's `uat-<reservation>` concurrency group, cancel-in-progress: false) holds at most one in-progress plus one pending run; a third contender cancels the older pending run before it starts any job. That run is superseded, not failed, and would otherwise vanish without a trace. The observer triggers on `workflow_run: completed` for "UAT Run" (only top-level workflows emit workflow_run, and only from the default branch), classifies a cancelled run that never started a job as a supersede (total_count == 0, or every job has a null started_at) versus a genuine mid-run cancel, and emits a job-summary entry plus a ::warning. It reads only trusted workflow_run.* metadata via env (no injection) and needs only actions: read. This is the reactive/primary layer; the nightly controller reconciles the same signal synchronously for the cells it dispatches, and a DC6 guard (NVIDIA#1279) will exercise this observer. Related NVIDIA#1274, NVIDIA#1264. Signed-off-by: Nathan Hensley <nhensley@nvidia.com>

mchmarny

Clean, well-scoped workflow_run observer that surfaces superseded UAT runs. Security posture is exactly right for this trigger type: no checkout of the triggering run, only trusted workflow_run.* metadata read via env: (never inlined), the ::warning percent-encodes %/CR/LF so a crafted run title can't inject, least-privilege actions: read, repo-gated (matching the repo's existing github.repository == 'nvidia/aicr' convention), timeout + per-run concurrency. The pending-vs-mid-run classifier is sound (total_count == 0 or all started_at null), and it fails toward not crying wolf on a jobs-API fetch error. Docs updated and the DC6 roadmap item retired. actionlint/analyze green; inert until merged to main as expected. LGTM.

njhensley requested review from a team as code owners July 1, 2026 20:08

njhensley added the theme/ci-dx CI pipelines, developer experience, and build tooling label Jul 1, 2026

github-actions Bot added area/ci area/docs size/M labels Jul 1, 2026

coderabbitai Bot reviewed Jul 1, 2026

View reviewed changes

Comment thread .github/workflows/uat-superseded-notice.yaml

Comment thread .github/workflows/uat-superseded-notice.yaml Outdated

Comment thread .github/workflows/uat-superseded-notice.yaml Outdated

njhensley force-pushed the ci/uat-superseded-notice branch from f84f2e5 to 282a065 Compare July 1, 2026 20:18

mchmarny approved these changes Jul 1, 2026

View reviewed changes

mchmarny merged commit 1eb9140 into NVIDIA:main Jul 1, 2026
32 checks passed

coderabbitai Bot mentioned this pull request Jul 2, 2026

feat(ci): daytime human-access deployment scheduler (#1281) #1587

Open

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(ci): surface superseded UAT runs (#1274)#1583

feat(ci): surface superseded UAT runs (#1274)#1583
mchmarny merged 1 commit into
NVIDIA:mainfrom
njhensley:ci/uat-superseded-notice

njhensley commented Jul 1, 2026

Uh oh!

coderabbitai Bot commented Jul 1, 2026 •

edited

Loading

Walkthrough

Changes

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Jul 1, 2026

Uh oh!

mchmarny left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

njhensley commented Jul 1, 2026

Summary

Motivation / Context

Type of Change

Component(s) Affected

Implementation Notes

Testing

Risk Assessment

Checklist

Uh oh!

coderabbitai Bot commented Jul 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Jul 1, 2026

Uh oh!

mchmarny left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai Bot commented Jul 1, 2026 •

edited

Loading