cluster-init: continue processing remaining clusters on error#4996
cluster-init: continue processing remaining clusters on error#4996joepvd wants to merge 1 commit intoopenshift:mainfrom
Conversation
The `cluster-init onboard config update` command previously aborted on the first cluster failure, preventing subsequent clusters from being processed. This masked whether failures were isolated to a single cluster or affected multiple clusters. Collect errors from all clusters and report them together at the end so that one broken cluster does not block updates to the others. Made-with: Cursor
|
Pipeline controller notification For optional jobs, comment This repository is configured in: automatic mode |
WalkthroughModified error handling in a cluster configuration update function to aggregate multiple errors across cluster processing iterations rather than returning immediately on the first failure. The function now collects wrapped errors and returns them all using Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
cmd/cluster-init/cmd/onboard/config/update.go (1)
94-101: Consider adding kubeconfig failures to the aggregated errors.The error from
newKubeClients(line 98-101) is logged but not added to theerrsslice, whereas errors fromaddClusterInstallRuntimeInfoandrunConfigStepsare collected. This inconsistency means kubeconfig failures won't be surfaced to the caller in the returned error.If this is intentional (e.g., missing kubeconfig is a configuration vs. runtime issue), a brief comment would clarify the design decision.
♻️ Optional: collect kubeconfig errors for consistent reporting
var errs []error for clusterName, clusterInstall := range clusterInstalls { ctrlClient, kubeClient, config, err := newKubeClients(kubeconfigs, clusterName) - clusterInstall.Config = config if err != nil { + errs = append(errs, fmt.Errorf("kubeconfig for cluster %s: %w", clusterName, err)) log.WithField("cluster", clusterName).WithError(err).Warn("Skipping cluster due to missing or invalid kubeconfig") continue } + clusterInstall.Config = config🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/cluster-init/cmd/onboard/config/update.go` around lines 94 - 101, The kubeconfig error from newKubeClients is currently only logged and not appended to the errs slice, causing kubeconfig failures to be omitted from the aggregated return; update the loop handling (where newKubeClients is called and clusterInstall.Config is set) to append the returned err to errs (same aggregation used for addClusterInstallRuntimeInfo and runConfigSteps) before continuing, or if skipping is intentional add a clarifying comment explaining why kubeconfig errors are excluded; reference newKubeClients, errs, clusterInstall, addClusterInstallRuntimeInfo, and runConfigSteps to locate the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@cmd/cluster-init/cmd/onboard/config/update.go`:
- Around line 94-101: The kubeconfig error from newKubeClients is currently only
logged and not appended to the errs slice, causing kubeconfig failures to be
omitted from the aggregated return; update the loop handling (where
newKubeClients is called and clusterInstall.Config is set) to append the
returned err to errs (same aggregation used for addClusterInstallRuntimeInfo and
runConfigSteps) before continuing, or if skipping is intentional add a
clarifying comment explaining why kubeconfig errors are excluded; reference
newKubeClients, errs, clusterInstall, addClusterInstallRuntimeInfo, and
runConfigSteps to locate the change.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: d19f1f9e-6794-44a4-8c04-6f51d3b046fc
📒 Files selected for processing (1)
cmd/cluster-init/cmd/onboard/config/update.go
|
Scheduling required tests: |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: joepvd, psalajova The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/retest-required |
|
Scheduling required tests: |
|
/test e2e |
1 similar comment
|
/test e2e |
|
/override e2e |
|
@joepvd: joepvd unauthorized: /override is restricted to Repo administrators, approvers in top level OWNERS file, and the following github teams:openshift: openshift-release-oversight openshift-staff-engineers openshift-sustaining-engineers. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/test e2e |
2 similar comments
|
/test e2e |
|
/test e2e |
|
@joepvd: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Summary
cluster-init onboard config updatecommand previously aborted on the first cluster failure, preventing subsequent clusters from being processedbuild07) or affected multiple clustersContext
The
breaking-changespresubmit has been failing consistently becausebuild07's AWS STS/OIDC credentials return a 401 onec2:DescribeSecurityGroups. Sincebuild07is processed beforebuild11(alphabetical order), the fatal error prevents us from knowing whetherbuild11is also affected.Test plan
breaking-changespresubmit progresses pastbuild07and processesbuild11Made with Cursor
Summary by CodeRabbit