-
Notifications
You must be signed in to change notification settings - Fork 419
OTA-1010: extract manifests with new capabilities #1958
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@hongkailiu: This pull request references OTA-1010 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.19.0" version, but no target version was set. In response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
ad75be6 to
38aeb1d
Compare
|
@hongkailiu: This pull request references OTA-1010 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.19.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
69216c5 to
916427e
Compare
|
@hongkailiu: This pull request references OTA-1010 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.19.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
916427e to
27e03eb
Compare
|
/retest-required |
|
/cc |
c1909d5 to
0431ce4
Compare
|
/retest-required |
Some testing result from 0431ce4 (outdated)Cluster-bot:
|
|
/retest-required |
petr-muller
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only went through the pkg/cli/admin/release/extract.go diff and left some comments. The code is very hard to read, which is not your fault - it is mostly caused by constructing a series of very long anonymous callbacks that end up being called at whatever time later... Not sure what to do about it though. It would definitely help if this was a series of smaller PRs.
pkg/cli/admin/release/extract.go
Outdated
| if c := imageConfig.Config; c != nil { | ||
| if v, ok := c.Labels["io.openshift.release"]; ok { | ||
| klog.V(2).Infof("Retrieved the version from image configuration in the image to extract: %s", v) | ||
| versionInImageConfig = v |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this callback be called multiple times, overwriting previous values in `versionInImageConfig``? If yes, can the callback be called in parallel, which would be a race?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand it correctly, oc adm release extract has only one release to extract and one image means one image config.
From my test, it is called only once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we encode that assumption and blow up with a panic or Fatal if versionInImageConfig is already set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code has been moved to https://github.com/openshift/oc/pull/2050/files.
But I will keep this open because I did not address this comment and you might still think I should do it in this pull.
The concern is valid in general but it is unlikely to happen here.
The other callbacks of extract.ExtractOptions are not multi-thread safe either.
|
It may be beyond the scope of what this PR attempts to do, but I have a feeling that the code could be made more readable if some of callbacks (that are currently lambdas using various option struct members and closures on the surrounding scope variables) were extracted into a dedicated, named and documented single-purpose types with methods that would be used as the callbacks, with a constructor that makes the callback inputs a specified interface. |
I will give this a try. |
There are two files `image-references`, and `release-metadata` that are handled differently from manifest files. When those files come, their readers from the upstream are sent to the downstream callback right away. Other files contain manifests. They are parsed out and then sent to the downstream. We will embed more changes into this part, e.g., collecting all manifests in the image and then use them to calculate the enabled capabilities which is sent as an argument to the downstream callback. Those changes are coming in other pulls.
88d0075 to
3aacfdd
Compare
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: hongkailiu The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
dae1d70 to
d95f37c
Compare
|
@hongkailiu: This pull request references OTA-1010 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
d95f37c to
1f0a027
Compare
Before this full, the logging was only for the case that `findClusterIncludeConfigFromInstallConfig` is called, i.e., the path from an install-config file is provided. This pull extends it to the case where the configuration is taken from the current cluster. Another change from the pull is that the logging messages include the target version that is determined by inspecting the release image. The implementation for this is adding a new callback `ImageConfigCallback`.
1f0a027 to
76338a7
Compare
The `ManifestInclusionConfiguration` determines if a manifest is included on a cluster. Its `Capabilities` field takes the implicitly enabled capabilities into account. This change removes the workaround that handles the net-new capabilities introduced by a cluster upgrade. E.g. if a cluster is currently with 4.13, then it assumes that the capabilities "build", "deploymentConfig", and "ImageRegistry" are enabled. This is because the components underlying those capabilities are installed by default on 4.13, or earlier and cannot be disabled once installed. Those capabilities will become enabled after upgrade from 4.13 to 4.14: either explicitly or implicitly depending on the current value of `cv.spec.capabilities.baselineCapabilitySet`. https://github.com/openshift/oc/blob/e005223acd7c478bac070134c16f5533a258be12/pkg/cli/admin/release/extract_tools.go#L1241-L1252 CVO has already defined the function `GetImplicitlyEnabledCapabilities` to calculate the implicitly enabled capabilities of a cluster after a cluster upgrade. For this function to work, we have to provide * the manifests that are currently included on the cluster, and * the manifests from the payload in the upgrade image. The existing `ManifestReceiver` is enhanced in a way that it can provide enabled capabilities, including both explicit and implicit ones, when the callback to downstream is called. It is implemented by adding a cache to collect manifests from the upstream and calls downstream only when all manifests are collected and the capabilities are calculated with them using the function `GetImplicitlyEnabledCapabilities` that is mentioned earlier. This enhancement can be opted in by setting up the `needEnabledCapabilities` field of `ManifestReceiver`. Otherwise, its behaviours stays the same as before. In case that the inclusion configuration is taken from the cluster, i.e., `--install-config` is not set, `needEnabledCapabilities` is set to `true`.
76338a7 to
91537c7
Compare
|
@hongkailiu: This pull request references OTA-1010 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@hongkailiu: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
|
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/uncc Cleaning up my dashboards; if I unassign from a PR where my input would still be helpful, feel free to cc/assign me back |
|
Stale issues rot after 30d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle rotten |
WalkthroughThe changes refactor the release manifest extraction pipeline by introducing a structured ManifestReceiver pattern for processing manifests, adding image configuration callbacks to capture release versions, integrating current cluster payload manifests into inclusion logic, and implementing capability-aware filtering for extracted manifests. Multiple functions are updated to propagate version information through the extraction flow. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45–60 minutes
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
pkg/cli/admin/release/extract.go (2)
414-473: Guard against nilinclusionConfig.Capabilitiesto avoid panics with--install-configInside
manifestReceiver.manifestsCallback, the Included branch assumesinclusionConfig.Capabilitiesis always non‑nil:clusterVersionCapabilitiesStatus := &configv1.ClusterVersionCapabilitiesStatus{ KnownCapabilities: sets.New[configv1.ClusterVersionCapability](append(inclusionConfig.Capabilities.KnownCapabilities, configv1.KnownClusterVersionCapabilities...)...).UnsortedList(), EnabledCapabilities: sets.New[configv1.ClusterVersionCapability](append(inclusionConfig.Capabilities.EnabledCapabilities, enabled...)...).UnsortedList(), } if err := ms[i].Include(inclusionConfig.ExcludeIdentifier, inclusionConfig.RequiredFeatureSet, inclusionConfig.Profile, clusterVersionCapabilitiesStatus, inclusionConfig.Overrides); err != nil { ... }But
findClusterIncludeConfigFromInstallConfigonly populatesconfig.Capabilitieswhendata.Capabilities != nil. For--install-configfiles without acapabilitiesstanza,inclusionConfig.Capabilitiesstays nil, and this block will panic with a nil dereference despite--includedbeing valid there.A simple guard preserves existing semantics (no capabilities filter when none are configured) and still allows implicit-capabilities handling when available:
- } else { - clusterVersionCapabilitiesStatus := &configv1.ClusterVersionCapabilitiesStatus{ - KnownCapabilities: sets.New[configv1.ClusterVersionCapability](append(inclusionConfig.Capabilities.KnownCapabilities, configv1.KnownClusterVersionCapabilities...)...).UnsortedList(), - EnabledCapabilities: sets.New[configv1.ClusterVersionCapability](append(inclusionConfig.Capabilities.EnabledCapabilities, enabled...)...).UnsortedList(), - } - if err := ms[i].Include(inclusionConfig.ExcludeIdentifier, inclusionConfig.RequiredFeatureSet, inclusionConfig.Profile, clusterVersionCapabilitiesStatus, inclusionConfig.Overrides); err != nil { + } else { + var capabilitiesStatus *configv1.ClusterVersionCapabilitiesStatus + if inclusionConfig.Capabilities != nil { + capabilitiesStatus = &configv1.ClusterVersionCapabilitiesStatus{ + KnownCapabilities: sets.New[configv1.ClusterVersionCapability](append(inclusionConfig.Capabilities.KnownCapabilities, configv1.KnownClusterVersionCapabilities...)...).UnsortedList(), + EnabledCapabilities: sets.New[configv1.ClusterVersionCapability](append(inclusionConfig.Capabilities.EnabledCapabilities, enabled...)...).UnsortedList(), + } + } + if err := ms[i].Include(inclusionConfig.ExcludeIdentifier, inclusionConfig.RequiredFeatureSet, inclusionConfig.Profile, capabilitiesStatus, inclusionConfig.Overrides); err != nil { klog.V(4).Infof("Excluding %s: %s", ms[i].String(), err) ms = append(ms[:i], ms[i+1:]...) } }This way cluster-based inclusion (where Capabilities is always set) still sees the enriched status including implicitly enabled capabilities, while install-config-only flows remain robust when capabilities are not specified.
541-566: Manifest error handling behavior change is confirmed—errors abort the command, contradicting the best-effort commentThe verification confirms the review comment's analysis. Tracing the error flow:
ManifestReceiver.TarEntryCallbackDoneCallback(extract_tools.go:1393) returnskerrors.NewAggregate(mr.ManifestErrs)- This callback is invoked inside
opts.Run()(extract/extract.go:555-558) and its error is returned immediately if non-nil- The call
if err := opts.Run(); err != nil { return err }at extract.go:537-538 then aborts the command- The manifest error handling block at lines 541-566 is unreachable if any manifest errors exist
This directly contradicts the comment at lines 551-552 stating manifest errors should not cause the command to fail. The code block attempting to log errors gracefully will never execute when errors occur, breaking the documented best-effort semantics that existing workflows (e.g., mirroring) may depend on.
Either the error handling in
TarEntryCallbackDoneCallbackmust be adjusted to record errors without returning them (preserving best-effort behavior), or the comment and manifest error handling block must be removed/updated to reflect the new fatal-on-error behavior.
♻️ Duplicate comments (1)
pkg/cli/admin/release/extract_tools.go (1)
1303-1394: ManifestReceiver behavior is correct for capability computation; be mindful of its error semanticsThe new
ManifestReceiver:
- Skips parsing for
skipNamesand streams those files directly downstream.- Parses all manifest-like files, collects parse errors in
ManifestErrswithout immediately aborting.- When
needEnabledCapabilitiesis true, buffers manifests and, inTarEntryCallbackDoneCallback, computes implicitly enabled capabilities viaGetImplicitlyEnabledCapabilities, logs any newly implicit capabilities, and then replays the buffered manifests into the downstream callback with the computed enabled set.This design matches the requirement to compute capabilities over the full payload before applying inclusion logic.
The main caveat is error semantics:
TarEntryCallbackDoneCallbackcurrently returnskerrors.NewAggregate(mr.ManifestErrs), which causesextract.ExtractOptions.Run(and thus callers likeoc adm release extract) to fail on any parse or downstream callback error. That behavior coordination is discussed in the main extract.go comment; whichever decision you take there (fatal vs best‑effort) will likely require adjusting this return value accordingly.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (3)
pkg/cli/admin/release/extract.go(9 hunks)pkg/cli/admin/release/extract_tools.go(5 hunks)pkg/cli/image/extract/extract.go(3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**
⚙️ CodeRabbit configuration file
-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.
Files:
pkg/cli/admin/release/extract.gopkg/cli/image/extract/extract.gopkg/cli/admin/release/extract_tools.go
🔇 Additional comments (6)
pkg/cli/image/extract/extract.go (1)
144-145: New callbacks are wired correctly and keep existing behavior intactThe added
ImageConfigCallbackandTarEntryCallbackDoneCallbackhooks are guarded for nil, invoked at sensible points (immediately after parsing the image config, and after finishing all layer processing per mapping), and do not alter existing control flow when unset. No issues from this file alone.Also applies to: 153-155, 429-431, 555-559
pkg/cli/admin/release/extract.go (2)
572-600:currentPayloadManifestsimplementation looks correct and well-scopedThe helper to pull current payload manifests via a derived
ExtractOptions:
- Filters only manifest-like files by extension.
- Reuses the same inclusion configuration (including capabilities) as the update path.
- Accumulates manifests into a slice and returns them, with contextual error wrapping.
This matches the intended “current vs. update” comparison model and is gated behind
needEnabledCapabilities, so the extra IO is only paid when necessary.
602-634:getOptsToGetCurrentPayloadManifestscorrectly clones extract options for the current payloadThis helper cleanly:
- Discovers the cluster’s desired image via
ClusterVersions().Get("version").- Parses it into a
TypedImageReference.- Constructs a fresh
extract.ExtractOptionsthat copies the calling options’ IO streams, security/filter settings, mirror configs, and OnlyFiles mode, with a singlerelease-manifests/mapping.The separation between “source” (current cluster payload) and the update image keeps concerns clear with no obvious correctness issues.
pkg/cli/admin/release/extract_tools.go (3)
1171-1231: Capability-set logging and include-config helpers are reasonable
logCapabilitySetMayDifferand its integration intofindClusterIncludeConfigFromInstallConfig:
- Correctly derive the client (
oc) version withversion.ExtractVersion().- Log informative messages when the extracted image’s version and the client version differ, especially for
BaselineCapabilitySetCurrent.- Keep failures to determine the
ocversion as hard errors, which is acceptable given this should be rare and indicates a broken client build.The include-config-from-install-config flow remains mostly unchanged besides this logging, and the new error wrapping on YAML parse/validation improves diagnosability.
1233-1265: Cluster include-config capabilities wiring looks consistent with CVOThe updated
findClusterIncludeConfig:
- Derives a capabilities baseline set (
capSet) fromclusterVersion.Spec.Capabilitiesfalling back toClusterVersionCapabilitySetCurrent.- Starts from
clusterVersion.Status.Capabilities.DeepCopy(), then:
- Unions in the baseline capability set to
EnabledCapabilities.- Overrides
KnownCapabilitieswithconfigv1.KnownClusterVersionCapabilities.- Logs potential capability-set drift via
logCapabilitySetMayDiffer.Assuming
clusterVersion.Status.Capabilitiesis always non-nil in supported environments, this aligns with the CVO behavior and gives a solid basis for inclusion decisions and implicit-capability calculations.
1396-1452: Implicit-capability computation logic looks sound and matches expectations
GetImplicitlyEnabledCapabilities:
- Starts from the provided
currentImplicitlyEnabledset.- For each update manifest that fails inclusion under the current capabilities, looks for a matching current manifest by
SameResourceID.- For matches where the current manifest passes inclusion, derives new implicit capabilities as:
- Capabilities present in the update manifest but not in the current manifest,
- Excluding those already implicitly enabled,
- Excluding those already explicitly enabled in
manifestInclusionConfiguration.Capabilities.EnabledCapabilities.- Unions these into the return set and logs any newly implicitly enabled capabilities at V(2).
Given that it’s only called when
manifestInclusionConfiguration.Capabilitiesis non‑nil (cluster‑based include config), this is a reasonable and maintainable port of the CVO behavior.
| needEnabledCapabilities := o.ExtractManifests && o.Included && o.InstallConfig == "" | ||
| var inclusionConfig manifestInclusionConfiguration | ||
| manifestReceiver := ManifestReceiver{skipNames: sets.New[string]("image-references", "release-metadata"), | ||
| needEnabledCapabilities: needEnabledCapabilities} | ||
| // o.ExtractManifests implies o.File == "" | ||
| if o.ExtractManifests { | ||
| expectedProviderSpecKind := credRequestCloudProviderSpecKindMapping[o.Cloud] | ||
|
|
||
| include := func(m *manifest.Manifest) error { return nil } // default to including everything | ||
| if o.Included { | ||
| context := "connected cluster" | ||
| inclusionConfig := manifestInclusionConfiguration{} | ||
| if o.InstallConfig == "" { | ||
| inclusionConfig, err = findClusterIncludeConfig(ctx, o.RESTConfig) | ||
| inclusionConfig, err = findClusterIncludeConfig(ctx, o.RESTConfig, versionInImageConfig) | ||
| currentPayloadManifests, err := currentPayloadManifests(ctx, opts, o.RESTConfig, inclusionConfig) | ||
| if err != nil { | ||
| err = fmt.Errorf("failed to get the current payload manifests: %w", err) | ||
| } else { | ||
| manifestReceiver.currentPayloadManifests = currentPayloadManifests | ||
| } | ||
| } else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handle errors from findClusterIncludeConfig before computing current payload manifests
In the o.InstallConfig == "" branch you do:
inclusionConfig, err = findClusterIncludeConfig(ctx, o.RESTConfig, versionInImageConfig)
currentPayloadManifests, err := currentPayloadManifests(ctx, opts, o.RESTConfig, inclusionConfig)
if err != nil {
err = fmt.Errorf("failed to get the current payload manifests: %w", err)
} else {
manifestReceiver.currentPayloadManifests = currentPayloadManifests
}If findClusterIncludeConfig fails, its error is overwritten by the subsequent currentPayloadManifests call and the first failure is silently ignored while using a zero-valued inclusionConfig. That’s a correctness bug.
Consider short‑circuiting on the first error and keeping the wrapped context for the second, e.g.:
- inclusionConfig, err = findClusterIncludeConfig(ctx, o.RESTConfig, versionInImageConfig)
- currentPayloadManifests, err := currentPayloadManifests(ctx, opts, o.RESTConfig, inclusionConfig)
- if err != nil {
- err = fmt.Errorf("failed to get the current payload manifests: %w", err)
- } else {
- manifestReceiver.currentPayloadManifests = currentPayloadManifests
- }
+ inclusionConfig, err = findClusterIncludeConfig(ctx, o.RESTConfig, versionInImageConfig)
+ if err != nil {
+ return err
+ }
+ currentPayloadManifests, err := currentPayloadManifests(ctx, opts, o.RESTConfig, inclusionConfig)
+ if err != nil {
+ return fmt.Errorf("failed to get the current payload manifests: %w", err)
+ }
+ manifestReceiver.currentPayloadManifests = currentPayloadManifests🤖 Prompt for AI Agents
In pkg/cli/admin/release/extract.go around lines 378 to 395, the error returned
by findClusterIncludeConfig is overwritten by the subsequent
currentPayloadManifests call; check err immediately after calling
findClusterIncludeConfig and short‑circuit (wrap and return or set the outer
err) if it failed, and only call currentPayloadManifests when inclusionConfig is
valid; also avoid shadowing err (don’t use := for the currentPayloadManifests
error) and wrap the currentPayloadManifests error with context before returning
or assigning manifestReceiver.currentPayloadManifests.
The
ManifestInclusionConfigurationdetermines if a manifest isincluded on a cluster. Its
Capabilitiesfield takes the implicitlyenabled capabilities into account.
This change removes the workaround that handles the net-new capabilities
introduced by a cluster upgrade. E.g. if a cluster is currently with
4.13, then it assumes that the capabilities "build", "deploymentConfig",
and "ImageRegistry" are enabled. This is because the components
underlying those capabilities are installed by default on 4.13, or
earlier and cannot be disabled once installed. Those capabilities will
become enabled after upgrade from 4.13 to 4.14: either explicitly or
implicitly depending on the current value of
cv.spec.capabilities.baselineCapabilitySet.oc/pkg/cli/admin/release/extract_tools.go
Lines 1241 to 1252 in e005223
CVO has already defined the function
GetImplicitlyEnabledCapabilitiesto calculate the implicitly enabled capabilities of a cluster after a
cluster upgrade. For this function to work, we have to provide
The existing
ManifestReceiveris enhanced in a way that it can provideenabled capabilities, including both explicit and implicit ones, when
the callback to downstream is called. It is implemented by adding a
cache to collect manifests from the upstream and calls downstream only
when all manifests are collected and the capabilities are calculated
with them using the function
GetImplicitlyEnabledCapabilitiesthat ismentioned earlier.
This enhancement can be opted in by setting up the
needEnabledCapabilitiesfield ofManifestReceiver. Otherwise, itsbehaviours stays the same as before.
In case that the inclusion configuration is taken from the cluster,
i.e.,
--install-configis not set,needEnabledCapabilitiesis set totrue.