Skip to content

Add manual workflow to reset failed builds at max retries#279

Closed
frostebite wants to merge 1 commit intomainfrom
feat/reset-failed-builds-workflow
Closed

Add manual workflow to reset failed builds at max retries#279
frostebite wants to merge 1 commit intomainfrom
feat/reset-failed-builds-workflow

Conversation

@frostebite
Copy link
Copy Markdown
Member

@frostebite frostebite commented Mar 26, 2026

Summary

Add a workflow_dispatch workflow that calls the resetFailedBuilds backend endpoint (game-ci/versioning-backend#85). Maintainers can trigger it from the Actions tab to reset builds stuck at max failure count.

  • Bulk reset: leave buildId empty to reset all maxed-out builds
  • Single reset: enter a specific build ID to reset just that build
  • Requires typing "yes" to confirm (same safety pattern as clean-up-stuck-builds.yml)
  • Uses the existing VERSIONING_TOKEN secret

Depends on

Test plan

  • Workflow appears in Actions tab with manual trigger
  • Bulk reset: resets all builds with failureCount >= 15
  • Single reset: resets only the specified build
  • Fails gracefully if confirmation is not "yes"

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Chores
    • Added a new manual workflow for resetting failed builds. Supports resetting individual builds or all failed builds simultaneously, with a confirmation requirement for execution safety.

Manually-triggered workflow that calls the resetFailedBuilds backend
endpoint. Can reset a single build by ID or bulk-reset all builds
stuck at max failure count.

Follows the same pattern as clean-up-stuck-builds.yml.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@frostebite frostebite requested a review from webbertakken March 26, 2026 17:17
@github-actions
Copy link
Copy Markdown

Cat Gif

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 26, 2026

📝 Walkthrough

Walkthrough

A new GitHub Actions workflow is added for resetting failed builds via manual dispatch. The workflow requires a confirmation input and optionally accepts a build ID parameter to reset either a specific build or all maxed-out builds via an authenticated API call.

Changes

Cohort / File(s) Summary
GitHub Actions Workflow
.github/workflows/reset-failed-builds.yml
New workflow triggered via workflow_dispatch with required confirm input validation and optional buildId parameter. Performs authenticated POST request to resetFailedBuilds Cloud Function and validates HTTP 200 response before success.

Sequence Diagram

sequenceDiagram
    actor User
    participant GitHub Actions
    participant Cloud Function
    
    User->>GitHub Actions: Trigger workflow_dispatch
    GitHub Actions->>GitHub Actions: Validate confirm='yes'
    alt confirm != 'yes'
        GitHub Actions->>User: Abort workflow
    else confirm == 'yes'
        alt buildId provided
            GitHub Actions->>GitHub Actions: Build payload {buildId}
        else buildId empty
            GitHub Actions->>GitHub Actions: Build payload {}
        end
        GitHub Actions->>Cloud Function: POST with VERSIONING_TOKEN
        Cloud Function->>Cloud Function: Process reset request
        Cloud Function-->>GitHub Actions: HTTP Response
        alt Status == 200
            GitHub Actions->>GitHub Actions: Parse response JSON
            GitHub Actions->>User: Emit success notice
        else Status != 200
            GitHub Actions->>User: Exit non-zero / Fail job
        end
    end
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 A workflow hops into the fold,
Resetting builds both new and old,
With tokens bright and checks so keen,
Cloud functions dance in between,
Building stronger, step by step! ✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: adding a manual workflow to reset failed builds, which matches the core purpose of the changeset.
Description check ✅ Passed The PR description is comprehensive and well-structured, covering the purpose, usage modes, safety features, dependencies, and test plan, though it deviates from the repository template structure.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/reset-failed-builds-workflow

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
.github/workflows/reset-failed-builds.yml (1)

33-37: Harden the backend call with timeout and transport-error handling.

Recommend adding --show-error, --connect-timeout, and --max-time (and handling curl non-zero exit) to avoid hanging/manual triage on transient network failures.

Proposed resiliency update
-          RESPONSE=$(curl -s -w "\n%{http_code}" -X POST \
+          RESPONSE=$(curl -sS --connect-timeout 10 --max-time 60 -w "\n%{http_code}" -X POST \
             -H "Authorization: Bearer ${{ secrets.VERSIONING_TOKEN }}" \
             -H "Content-Type: application/json" \
             -d "$PAYLOAD" \
-            "https://europe-west3-unity-ci-versions.cloudfunctions.net/resetFailedBuilds")
+            "https://europe-west3-unity-ci-versions.cloudfunctions.net/resetFailedBuilds") || {
+              echo "::error::Request failed before receiving an HTTP response"
+              exit 1
+            }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/reset-failed-builds.yml around lines 33 - 37, The curl
call that assigns RESPONSE should be hardened: add --show-error,
--connect-timeout (e.g. 10) and --max-time (e.g. 30) flags to the curl
invocation that posts to "https://.../resetFailedBuilds" and capture curl's exit
status; after running curl (the RESPONSE assignment for the POST to
resetFailedBuilds) check the exit code ($?) and if non-zero log an error and
fail the job (or retry) instead of proceeding with a possibly empty/hanging
RESPONSE so transport errors/timeouts are surfaced and handled.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/reset-failed-builds.yml:
- Line 27: The PAYLOAD construction currently interpolates BUILD_ID directly
into a JSON string which will break for quotes/backslashes; change the logic
that sets PAYLOAD (the PAYLOAD variable that uses BUILD_ID) to build the JSON
safely using jq --arg (e.g., echo '{}' | jq --arg buildId "$BUILD_ID" '.buildId
= $buildId') or an equivalent jq command so BUILD_ID is properly escaped and
valid JSON is produced.

---

Nitpick comments:
In @.github/workflows/reset-failed-builds.yml:
- Around line 33-37: The curl call that assigns RESPONSE should be hardened: add
--show-error, --connect-timeout (e.g. 10) and --max-time (e.g. 30) flags to the
curl invocation that posts to "https://.../resetFailedBuilds" and capture curl's
exit status; after running curl (the RESPONSE assignment for the POST to
resetFailedBuilds) check the exit code ($?) and if non-zero log an error and
fail the job (or retry) instead of proceeding with a possibly empty/hanging
RESPONSE so transport errors/timeouts are surfaced and handled.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1a5a4522-a9f8-43cd-b917-378b74495bab

📥 Commits

Reviewing files that changed from the base of the PR and between 53b69c3 and d1351fd.

📒 Files selected for processing (1)
  • .github/workflows/reset-failed-builds.yml


if [ -n "$BUILD_ID" ]; then
echo "Resetting failure count for build: $BUILD_ID"
PAYLOAD="{\"buildId\": \"$BUILD_ID\"}"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verifies current interpolation can produce invalid JSON.
BUILD_ID='editor-linux-"bad"-id'
PAYLOAD="{\"buildId\": \"$BUILD_ID\"}"
echo "$PAYLOAD" | jq . >/dev/null

Repository: game-ci/docker

Length of output: 115


🏁 Script executed:

cat .github/workflows/reset-failed-builds.yml

Repository: game-ci/docker

Length of output: 1638


🏁 Script executed:

#!/bin/bash
# Verify the proposed jq fix works with problematic BUILD_ID
BUILD_ID='editor-linux-"bad"-id'
PAYLOAD=$(jq -n --arg buildId "$BUILD_ID" '{buildId: $buildId}')
echo "$PAYLOAD" | jq . >/dev/null && echo "Valid JSON: $PAYLOAD"

Repository: game-ci/docker

Length of output: 112


Safely escape buildId when constructing JSON payload.

The code directly interpolates user input (github.event.inputs.buildId) into the JSON string, which will produce invalid JSON if the input contains quotes or backslashes. Use jq --arg to safely construct the payload:

Proposed fix
          if [ -n "$BUILD_ID" ]; then
            echo "Resetting failure count for build: $BUILD_ID"
-           PAYLOAD="{\"buildId\": \"$BUILD_ID\"}"
+           PAYLOAD=$(jq -n --arg buildId "$BUILD_ID" '{buildId: $buildId}')
          else
            echo "Resetting ALL failed builds at max retries..."
            PAYLOAD="{}"
          fi
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
PAYLOAD="{\"buildId\": \"$BUILD_ID\"}"
if [ -n "$BUILD_ID" ]; then
echo "Resetting failure count for build: $BUILD_ID"
PAYLOAD=$(jq -n --arg buildId "$BUILD_ID" '{buildId: $buildId}')
else
echo "Resetting ALL failed builds at max retries..."
PAYLOAD="{}"
fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/reset-failed-builds.yml at line 27, The PAYLOAD
construction currently interpolates BUILD_ID directly into a JSON string which
will break for quotes/backslashes; change the logic that sets PAYLOAD (the
PAYLOAD variable that uses BUILD_ID) to build the JSON safely using jq --arg
(e.g., echo '{}' | jq --arg buildId "$BUILD_ID" '.buildId = $buildId') or an
equivalent jq command so BUILD_ID is properly escaped and valid JSON is
produced.

Copy link
Copy Markdown
Member

@webbertakken webbertakken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I don't think we need this if we're going to make the UI more user friendly. The GitHub UI requires you to know the exact parameters etc, whereas in our UI we have all the parameters ready. So in my opinion upgrading https://game.ci/docs/docker/versions/ to become really great would be preferable.

@frostebite
Copy link
Copy Markdown
Member Author

Personally I don't think we need this if we're going to make the UI more user friendly. The GitHub UI requires you to know the exact parameters etc, whereas in our UI we have all the parameters ready. So in my opinion upgrading https://game.ci/docs/docker/versions/ to become really great would be preferable.

game-ci/documentation#548

we do have this, I usually go for a manual workflow too, but I am not opposed to desisting with it

@frostebite frostebite closed this Mar 26, 2026
@frostebite frostebite reopened this Mar 28, 2026
@frostebite
Copy link
Copy Markdown
Member Author

Closing per preference: all build management tools via documentation admin UI, not workflow-based interventions.

@frostebite frostebite closed this Mar 28, 2026
@frostebite frostebite deleted the feat/reset-failed-builds-workflow branch March 28, 2026 21:43
@github-actions
Copy link
Copy Markdown

Cat Gif

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants