Skip to content

Conversation

@hongkailiu
Copy link
Member

@hongkailiu hongkailiu commented Jan 14, 2025

The ManifestInclusionConfiguration determines if a manifest is
included on a cluster. Its Capabilities field takes the implicitly
enabled capabilities into account.

This change removes the workaround that handles the net-new capabilities
introduced by a cluster upgrade. E.g. if a cluster is currently with
4.13, then it assumes that the capabilities "build", "deploymentConfig",
and "ImageRegistry" are enabled. This is because the components
underlying those capabilities are installed by default on 4.13, or
earlier and cannot be disabled once installed. Those capabilities will
become enabled after upgrade from 4.13 to 4.14: either explicitly or
implicitly depending on the current value of
cv.spec.capabilities.baselineCapabilitySet.

// FIXME: eventually pull in GetImplicitlyEnabledCapabilities from https://github.com/openshift/cluster-version-operator/blob/86e24d66119a73f50282b66a8d6f2e3518aa0e15/pkg/payload/payload.go#L237-L240 for cases where a minor update would implicitly enable some additional capabilities. For now, 4.13 to 4.14 will always enable MachineAPI, ImageRegistry, etc..
currentVersion := clusterVersion.Status.Desired.Version
matches := regexp.MustCompile(`^(\d+[.]\d+)[.].*`).FindStringSubmatch(currentVersion)
if len(matches) < 2 {
return config, fmt.Errorf("failed to parse major.minor version from ClusterVersion status.desired.version %q", currentVersion)
} else if matches[1] == "4.13" {
build := configv1.ClusterVersionCapability("Build")
deploymentConfig := configv1.ClusterVersionCapability("DeploymentConfig")
imageRegistry := configv1.ClusterVersionCapability("ImageRegistry")
config.Capabilities.EnabledCapabilities = append(config.Capabilities.EnabledCapabilities, configv1.ClusterVersionCapabilityMachineAPI, build, deploymentConfig, imageRegistry)
config.Capabilities.KnownCapabilities = append(config.Capabilities.KnownCapabilities, configv1.ClusterVersionCapabilityMachineAPI, build, deploymentConfig, imageRegistry)
}

CVO has already defined the function GetImplicitlyEnabledCapabilities
to calculate the implicitly enabled capabilities of a cluster after a
cluster upgrade. For this function to work, we have to provide

  • the manifests that are currently included on the cluster, and
  • the manifests from the payload in the upgrade image.

The existing ManifestReceiver is enhanced in a way that it can provide
enabled capabilities, including both explicit and implicit ones, when
the callback to downstream is called. It is implemented by adding a
cache to collect manifests from the upstream and calls downstream only
when all manifests are collected and the capabilities are calculated
with them using the function GetImplicitlyEnabledCapabilities that is
mentioned earlier.

This enhancement can be opted in by setting up the
needEnabledCapabilities field of ManifestReceiver. Otherwise, its
behaviours stays the same as before.

In case that the inclusion configuration is taken from the cluster,
i.e., --install-config is not set, needEnabledCapabilities is set to
true.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jan 14, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Jan 14, 2025

@hongkailiu: This pull request references OTA-1010 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.19.0" version, but no target version was set.

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 14, 2025
@hongkailiu hongkailiu force-pushed the OTA-1010-refactor-callback branch 5 times, most recently from ad75be6 to 38aeb1d Compare January 14, 2025 12:09
@openshift-ci-robot
Copy link

openshift-ci-robot commented Jan 14, 2025

@hongkailiu: This pull request references OTA-1010 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.19.0" version, but no target version was set.

In response to this:

This pull adds a ManifestReceiver that works between the upstream TarEntryCallback and the downstream manifestsCallback. With needEnabledCapabilities, it tells the receiver that the manifestsCallback is called with enabled capabilities computed. The price is that manifestsCallback is called only after it collects all the manifests from the upstream.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@hongkailiu hongkailiu force-pushed the OTA-1010-refactor-callback branch 2 times, most recently from 69216c5 to 916427e Compare January 14, 2025 12:18
@openshift-ci-robot
Copy link

openshift-ci-robot commented Jan 14, 2025

@hongkailiu: This pull request references OTA-1010 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.19.0" version, but no target version was set.

In response to this:

Before this pull, we enabled three 3 net-new capabilities for 4.13 clusters:

// FIXME: eventually pull in GetImplicitlyEnabledCapabilities from https://github.com/openshift/cluster-version-operator/blob/86e24d66119a73f50282b66a8d6f2e3518aa0e15/pkg/payload/payload.go#L237-L240 for cases where a minor update would implicitly enable some additional capabilities. For now, 4.13 to 4.14 will always enable MachineAPI, ImageRegistry, etc..
currentVersion := clusterVersion.Status.Desired.Version
matches := regexp.MustCompile(`^(\d+[.]\d+)[.].*`).FindStringSubmatch(currentVersion)
if len(matches) < 2 {
return config, fmt.Errorf("failed to parse major.minor version from ClusterVersion status.desired.version %q", currentVersion)
} else if matches[1] == "4.13" {
build := configv1.ClusterVersionCapability("Build")
deploymentConfig := configv1.ClusterVersionCapability("DeploymentConfig")
imageRegistry := configv1.ClusterVersionCapability("ImageRegistry")
config.Capabilities.EnabledCapabilities = append(config.Capabilities.EnabledCapabilities, configv1.ClusterVersionCapabilityMachineAPI, build, deploymentConfig, imageRegistry)
config.Capabilities.KnownCapabilities = append(config.Capabilities.KnownCapabilities, configv1.ClusterVersionCapabilityMachineAPI, build, deploymentConfig, imageRegistry)
}

Now the capabilities for the incoming release is calculated with the function from CVO based on the manifests from the current release and the ones from the incoming release.

To fit the current code that had TarEntryCallback already, the above logic is implemented via a ManifestReceiver that works between the upstream TarEntryCallback and the downstream manifestsCallback. With needEnabledCapabilities, it tells the receiver that the manifestsCallback is called with enabled capabilities computed. The price is that manifestsCallback is called only after it collects all the manifests from the upstream.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@hongkailiu hongkailiu force-pushed the OTA-1010-refactor-callback branch from 916427e to 27e03eb Compare January 14, 2025 23:38
@hongkailiu hongkailiu changed the title [wip]OTA-1010: extract included manifests with net-new capabilities OTA-1010: extract included manifests with net-new capabilities Jan 15, 2025
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 15, 2025
@hongkailiu
Copy link
Member Author

/retest-required

@petr-muller
Copy link
Member

/cc

@openshift-ci openshift-ci bot requested a review from petr-muller February 14, 2025 18:34
@hongkailiu hongkailiu force-pushed the OTA-1010-refactor-callback branch 4 times, most recently from c1909d5 to 0431ce4 Compare March 4, 2025 16:50
@hongkailiu
Copy link
Member Author

/retest-required

@hongkailiu
Copy link
Member Author

hongkailiu commented Mar 4, 2025

Some testing result from 0431ce4 (outdated)

Cluster-bot:

launch 4.13.12 aws

$ make oc
$ ./oc adm release extract --included --credentials-requests --to credentials-requests quay.io/openshift-release-dev/ocp-release:4.14.0-rc.0-x86_64
I0304 13:36:44.744021   32452 extract_tools.go:1254] Those capabilities become implicitly enabled for the incoming release [ImageRegistry MachineAPI]
Extracted release payload from digest sha256:1d2cc38cbd94c532dc822ff793f46b23a93b76b400f7d92b13c1e1da042c88fe created at 2023-09-07T07:37:47Z

$ rg ImageRegistry credentials-requests
credentials-requests/0000_50_cluster-image-registry-operator_01-registry-credentials-request.yaml
6:    capability.openshift.io/name: ImageRegistry

$ ll credentials-requests
total 48
-rw-r--r--@ 1 hongkliu  staff   1.8K Mar  4 13:36 0000_30_machine-api-operator_00_credentials-request.yaml
-rw-r--r--@ 1 hongkliu  staff   738B Mar  4 13:36 0000_50_cloud-credential-operator_05-iam-ro-credentialsrequest.yaml
-rw-r--r--@ 1 hongkliu  staff   1.3K Mar  4 13:36 0000_50_cluster-image-registry-operator_01-registry-credentials-request.yaml
-rw-r--r--@ 1 hongkliu  staff   920B Mar  4 13:36 0000_50_cluster-ingress-operator_00-ingress-credentials-request.yaml
-rw-r--r--@ 1 hongkliu  staff   1.0K Mar  4 13:36 0000_50_cluster-network-operator_02-cncc-credentials.yaml
-rw-r--r--@ 1 hongkliu  staff   1.5K Mar  4 13:36 0000_50_cluster-storage-operator_03_credentials_request_aws.yaml

### without --included
$ rm -rf credentials-requests          
$ ./oc adm release extract --to credentials-requests quay.io/openshift-release-dev/ocp-release:4.14.0-rc.0-x86_64                                  
Extracted release payload from digest sha256:1d2cc38cbd94c532dc822ff793f46b23a93b76b400f7d92b13c1e1da042c88fe created at 2023-09-07T07:37:47Z

$ ll credentials-requests | wc -l
     682

@hongkailiu
Copy link
Member Author

/retest-required

Copy link
Member

@petr-muller petr-muller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only went through the pkg/cli/admin/release/extract.go diff and left some comments. The code is very hard to read, which is not your fault - it is mostly caused by constructing a series of very long anonymous callbacks that end up being called at whatever time later... Not sure what to do about it though. It would definitely help if this was a series of smaller PRs.

if c := imageConfig.Config; c != nil {
if v, ok := c.Labels["io.openshift.release"]; ok {
klog.V(2).Infof("Retrieved the version from image configuration in the image to extract: %s", v)
versionInImageConfig = v
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this callback be called multiple times, overwriting previous values in `versionInImageConfig``? If yes, can the callback be called in parallel, which would be a race?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand it correctly, oc adm release extract has only one release to extract and one image means one image config.

From my test, it is called only once.

Copy link
Member

@petr-muller petr-muller May 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we encode that assumption and blow up with a panic or Fatal if versionInImageConfig is already set?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code has been moved to https://github.com/openshift/oc/pull/2050/files.
But I will keep this open because I did not address this comment and you might still think I should do it in this pull.

The concern is valid in general but it is unlikely to happen here.
The other callbacks of extract.ExtractOptions are not multi-thread safe either.

@petr-muller
Copy link
Member

It may be beyond the scope of what this PR attempts to do, but I have a feeling that the code could be made more readable if some of callbacks (that are currently lambdas using various option struct members and closures on the surrounding scope variables) were extracted into a dedicated, named and documented single-purpose types with methods that would be used as the callbacks, with a constructor that makes the callback inputs a specified interface.

@hongkailiu
Copy link
Member Author

It may be beyond the scope of what this PR attempts to do, but I have a feeling that the code could be made more readable if some of callbacks (that are currently lambdas using various option struct members and closures on the surrounding scope variables) were extracted into a dedicated, named and documented single-purpose types with methods that would be used as the callbacks, with a constructor that makes the callback inputs a specified interface.

I will give this a try.
(Currently I have another series of pulls to merge. They are also opened long time ago and made some progress recently. I think I can close them quickly. I will come back to this one right after).

There are two files `image-references`, and
 `release-metadata` that are handled differently from
 manifest files. When those files come, their readers
 from the upstream are sent to the downstream callback
 right away.

Other files contain manifests. They are parsed out
and then sent to the downstream. We will embed more
changes into this part, e.g., collecting all manifests
in the image and then use them to calculate the
enabled capabilities which is sent as an argument to
the downstream callback. Those changes are coming in
other pulls.
@hongkailiu hongkailiu force-pushed the OTA-1010-refactor-callback branch from 88d0075 to 3aacfdd Compare June 25, 2025 19:18
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 25, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: hongkailiu
Once this PR has been reviewed and has the lgtm label, please assign atiratree for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@hongkailiu hongkailiu force-pushed the OTA-1010-refactor-callback branch 2 times, most recently from dae1d70 to d95f37c Compare June 25, 2025 19:25
@openshift-ci-robot
Copy link

openshift-ci-robot commented Jun 25, 2025

@hongkailiu: This pull request references OTA-1010 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set.

In response to this:

The ManifestInclusionConfiguration is used to
determine is a manifest is included on a cluster.
Its Capabilities field takes the implicitly enabled
capabilities into account.

This change removes the workaround that handles the
net-new capabilities introduced by a cluster upgrade.
E.g. if a cluster is currently with 4.13, then it
assumes that the capabilities "build",
"deploymentConfig", and "ImageRegistry" are enabled.
This is because the components underlying those
capabilities are installed by default on 4.13, or
earlier and cannot be disabled once installed. Those
capabilities will become enabled after upgrade from
4.13 to 4.14: either explicitly or implicitly
depending on the current value of
cv.spec.capabilities.baselineCapabilitySet.

// FIXME: eventually pull in GetImplicitlyEnabledCapabilities from https://github.com/openshift/cluster-version-operator/blob/86e24d66119a73f50282b66a8d6f2e3518aa0e15/pkg/payload/payload.go#L237-L240 for cases where a minor update would implicitly enable some additional capabilities. For now, 4.13 to 4.14 will always enable MachineAPI, ImageRegistry, etc..
currentVersion := clusterVersion.Status.Desired.Version
matches := regexp.MustCompile(`^(\d+[.]\d+)[.].*`).FindStringSubmatch(currentVersion)
if len(matches) < 2 {
return config, fmt.Errorf("failed to parse major.minor version from ClusterVersion status.desired.version %q", currentVersion)
} else if matches[1] == "4.13" {
build := configv1.ClusterVersionCapability("Build")
deploymentConfig := configv1.ClusterVersionCapability("DeploymentConfig")
imageRegistry := configv1.ClusterVersionCapability("ImageRegistry")
config.Capabilities.EnabledCapabilities = append(config.Capabilities.EnabledCapabilities, configv1.ClusterVersionCapabilityMachineAPI, build, deploymentConfig, imageRegistry)
config.Capabilities.KnownCapabilities = append(config.Capabilities.KnownCapabilities, configv1.ClusterVersionCapabilityMachineAPI, build, deploymentConfig, imageRegistry)
}

CVO has already defined the function
GetImplicitlyEnabledCapabilities that calculates
the implicitly enabled capabilities of a cluster
after a cluster upgrade. For this function to work,
we have to provide

  • the manifests that are currently included on the
    cluster.
  • the manifests from the payload in the upgrade image.

The existing ManifestReceiver is enhanced in a way
that it can provide enabled capabilities, including
both explicit and implicit ones, when the callback to
downstream is called. It is implemented by a cache to
collect manifests from the upstream and calls
downstream only when all manifests are collected and
the capabilities are calculated with them using the
function GetImplicitlyEnabledCapabilities mentioned
earlier.

This enhancement can be opted in by setting up the
needEnabledCapabilities field of ManifestReceiver.
Otherwise, its behaviours stays the same as before.

In case that the inclusion configuration is taken
from the cluster, i.e., --install-config is not set,
needEnabledCapabilities is set to true.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@hongkailiu
Copy link
Member Author

/hold

Will rebase after #2048 and #2050 get in

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 25, 2025
@hongkailiu hongkailiu force-pushed the OTA-1010-refactor-callback branch from d95f37c to 1f0a027 Compare June 25, 2025 19:53
Before this full, the logging was only for the case
that `findClusterIncludeConfigFromInstallConfig` is
called, i.e., the path from an install-config
file is provided.

This pull extends it to the case where the
configuration is taken from the current cluster.

Another change from the pull is that the logging
messages include the target version that is determined
by inspecting the release image. The implementation
for this is adding a new callback `ImageConfigCallback`.
@hongkailiu hongkailiu force-pushed the OTA-1010-refactor-callback branch from 1f0a027 to 76338a7 Compare June 25, 2025 20:07
The `ManifestInclusionConfiguration` determines if a manifest is
included on a cluster. Its `Capabilities` field takes the implicitly
enabled capabilities into account.

This change removes the workaround that handles the net-new capabilities
introduced by a cluster upgrade. E.g. if a cluster is currently with
4.13, then it assumes that the capabilities "build", "deploymentConfig",
and "ImageRegistry" are enabled. This is because the components
underlying those capabilities are installed by default on 4.13, or
earlier and cannot be disabled once installed. Those capabilities will
become enabled after upgrade from 4.13 to 4.14: either explicitly or
implicitly depending on the current value of
`cv.spec.capabilities.baselineCapabilitySet`.

https://github.com/openshift/oc/blob/e005223acd7c478bac070134c16f5533a258be12/pkg/cli/admin/release/extract_tools.go#L1241-L1252

CVO has already defined the function `GetImplicitlyEnabledCapabilities`
to calculate the implicitly enabled capabilities of a cluster after a
cluster upgrade. For this function to work, we have to provide

* the manifests that are currently included on the cluster, and
* the manifests from the payload in the upgrade image.

The existing `ManifestReceiver` is enhanced in a way that it can provide
enabled capabilities, including both explicit and implicit ones, when
the callback to downstream is called. It is implemented by adding a
cache to collect manifests from the upstream and calls downstream only
when all manifests are collected and the capabilities are calculated
with them using the function `GetImplicitlyEnabledCapabilities` that is
mentioned earlier.

This enhancement can be opted in by setting up the
`needEnabledCapabilities` field of `ManifestReceiver`. Otherwise, its
behaviours stays the same as before.

In case that the inclusion configuration is taken from the cluster,
i.e., `--install-config` is not set, `needEnabledCapabilities` is set to
`true`.
@hongkailiu hongkailiu force-pushed the OTA-1010-refactor-callback branch from 76338a7 to 91537c7 Compare June 26, 2025 14:27
@openshift-ci-robot
Copy link

openshift-ci-robot commented Jun 26, 2025

@hongkailiu: This pull request references OTA-1010 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set.

In response to this:

The ManifestInclusionConfiguration determines if a manifest is
included on a cluster. Its Capabilities field takes the implicitly
enabled capabilities into account.

This change removes the workaround that handles the net-new capabilities
introduced by a cluster upgrade. E.g. if a cluster is currently with
4.13, then it assumes that the capabilities "build", "deploymentConfig",
and "ImageRegistry" are enabled. This is because the components
underlying those capabilities are installed by default on 4.13, or
earlier and cannot be disabled once installed. Those capabilities will
become enabled after upgrade from 4.13 to 4.14: either explicitly or
implicitly depending on the current value of
cv.spec.capabilities.baselineCapabilitySet.

// FIXME: eventually pull in GetImplicitlyEnabledCapabilities from https://github.com/openshift/cluster-version-operator/blob/86e24d66119a73f50282b66a8d6f2e3518aa0e15/pkg/payload/payload.go#L237-L240 for cases where a minor update would implicitly enable some additional capabilities. For now, 4.13 to 4.14 will always enable MachineAPI, ImageRegistry, etc..
currentVersion := clusterVersion.Status.Desired.Version
matches := regexp.MustCompile(`^(\d+[.]\d+)[.].*`).FindStringSubmatch(currentVersion)
if len(matches) < 2 {
return config, fmt.Errorf("failed to parse major.minor version from ClusterVersion status.desired.version %q", currentVersion)
} else if matches[1] == "4.13" {
build := configv1.ClusterVersionCapability("Build")
deploymentConfig := configv1.ClusterVersionCapability("DeploymentConfig")
imageRegistry := configv1.ClusterVersionCapability("ImageRegistry")
config.Capabilities.EnabledCapabilities = append(config.Capabilities.EnabledCapabilities, configv1.ClusterVersionCapabilityMachineAPI, build, deploymentConfig, imageRegistry)
config.Capabilities.KnownCapabilities = append(config.Capabilities.KnownCapabilities, configv1.ClusterVersionCapabilityMachineAPI, build, deploymentConfig, imageRegistry)
}

CVO has already defined the function GetImplicitlyEnabledCapabilities
to calculate the implicitly enabled capabilities of a cluster after a
cluster upgrade. For this function to work, we have to provide

  • the manifests that are currently included on the cluster, and
  • the manifests from the payload in the upgrade image.

The existing ManifestReceiver is enhanced in a way that it can provide
enabled capabilities, including both explicit and implicit ones, when
the callback to downstream is called. It is implemented by adding a
cache to collect manifests from the upstream and calls downstream only
when all manifests are collected and the capabilities are calculated
with them using the function GetImplicitlyEnabledCapabilities that is
mentioned earlier.

This enhancement can be opted in by setting up the
needEnabledCapabilities field of ManifestReceiver. Otherwise, its
behaviours stays the same as before.

In case that the inclusion configuration is taken from the cluster,
i.e., --install-config is not set, needEnabledCapabilities is set to
true.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@hongkailiu hongkailiu changed the title OTA-1010: extract included manifests with net-new capabilities OTA-1010: extract manifests with new capabilities Jun 26, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 16, 2025

@hongkailiu: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-ovn-serial-1of2 91537c7 link true /test e2e-aws-ovn-serial-1of2

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 15, 2025
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 15, 2025
@openshift-merge-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@petr-muller
Copy link
Member

/uncc

Cleaning up my dashboards; if I unassign from a PR where my input would still be helpful, feel free to cc/assign me back :shipit:

@openshift-bot
Copy link
Contributor

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 15, 2025
@coderabbitai
Copy link

coderabbitai bot commented Nov 15, 2025

Walkthrough

The changes refactor the release manifest extraction pipeline by introducing a structured ManifestReceiver pattern for processing manifests, adding image configuration callbacks to capture release versions, integrating current cluster payload manifests into inclusion logic, and implementing capability-aware filtering for extracted manifests. Multiple functions are updated to propagate version information through the extraction flow.

Changes

Cohort / File(s) Summary
Manifest extraction flow refactoring
pkg/cli/admin/release/extract.go
Reworks manifest extraction to use ManifestReceiver pattern; adds ImageConfigCallback for capturing release version from image config; introduces current payload manifest retrieval; implements manifest-level filtering based on inclusion rules; updates file path resolution and error handling to use dynamic filename variables; adds capability/status handling for cluster version capabilities.
Manifest processing infrastructure
pkg/cli/admin/release/extract_tools.go
Introduces ManifestReceiver type with TarEntryCallback and TarEntryCallbackDoneCallback methods for structured manifest processing; adds logCapabilitySetMayDiffer helper for version mismatch validation; implements GetImplicitlyEnabledCapabilities to compute capabilities from manifest comparisons; updates function signatures (findClusterIncludeConfig, findClusterIncludeConfigFromInstallConfig) to accept and propagate versionInImageConfig parameter; adds internal caching and error collection mechanisms.
Callback infrastructure extension
pkg/cli/image/extract/extract.go
Adds ImageConfigCallback and TarEntryCallbackDoneCallback fields to ExtractOptions struct; introduces invocation points after image config parsing and after layer processing for downstream callback execution.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45–60 minutes

  • ManifestReceiver implementation and integration: New public type with internal caching, buffering, and callback mechanics requires careful validation of state management and error handling across TarEntryCallback and TarEntryCallbackDoneCallback.
  • Capability computation and threading: GetImplicitlyEnabledCapabilities and version-aware function signature changes across multiple functions require tracing logic flow and validation of implicit capability derivation.
  • Manifest filtering and inclusion logic: Changes to how manifests are filtered, included, excluded, and written based on new capability and configuration state need verification against expected behavior.
  • Error aggregation and reporting: ManifestErrs collection and aggregated error reporting at extraction completion should be verified for completeness and clarity.
  • Version propagation consistency: versionInImageConfig threading through function calls and its use in cluster config resolution requires validation across all call sites.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
pkg/cli/admin/release/extract.go (2)

414-473: Guard against nil inclusionConfig.Capabilities to avoid panics with --install-config

Inside manifestReceiver.manifestsCallback, the Included branch assumes inclusionConfig.Capabilities is always non‑nil:

clusterVersionCapabilitiesStatus := &configv1.ClusterVersionCapabilitiesStatus{
    KnownCapabilities:   sets.New[configv1.ClusterVersionCapability](append(inclusionConfig.Capabilities.KnownCapabilities, configv1.KnownClusterVersionCapabilities...)...).UnsortedList(),
    EnabledCapabilities: sets.New[configv1.ClusterVersionCapability](append(inclusionConfig.Capabilities.EnabledCapabilities, enabled...)...).UnsortedList(),
}
if err := ms[i].Include(inclusionConfig.ExcludeIdentifier, inclusionConfig.RequiredFeatureSet, inclusionConfig.Profile, clusterVersionCapabilitiesStatus, inclusionConfig.Overrides); err != nil {
    ...
}

But findClusterIncludeConfigFromInstallConfig only populates config.Capabilities when data.Capabilities != nil. For --install-config files without a capabilities stanza, inclusionConfig.Capabilities stays nil, and this block will panic with a nil dereference despite --included being valid there.

A simple guard preserves existing semantics (no capabilities filter when none are configured) and still allows implicit-capabilities handling when available:

-                } else {
-                    clusterVersionCapabilitiesStatus := &configv1.ClusterVersionCapabilitiesStatus{
-                        KnownCapabilities:   sets.New[configv1.ClusterVersionCapability](append(inclusionConfig.Capabilities.KnownCapabilities, configv1.KnownClusterVersionCapabilities...)...).UnsortedList(),
-                        EnabledCapabilities: sets.New[configv1.ClusterVersionCapability](append(inclusionConfig.Capabilities.EnabledCapabilities, enabled...)...).UnsortedList(),
-                    }
-                    if err := ms[i].Include(inclusionConfig.ExcludeIdentifier, inclusionConfig.RequiredFeatureSet, inclusionConfig.Profile, clusterVersionCapabilitiesStatus, inclusionConfig.Overrides); err != nil {
+                } else {
+                    var capabilitiesStatus *configv1.ClusterVersionCapabilitiesStatus
+                    if inclusionConfig.Capabilities != nil {
+                        capabilitiesStatus = &configv1.ClusterVersionCapabilitiesStatus{
+                            KnownCapabilities:   sets.New[configv1.ClusterVersionCapability](append(inclusionConfig.Capabilities.KnownCapabilities, configv1.KnownClusterVersionCapabilities...)...).UnsortedList(),
+                            EnabledCapabilities: sets.New[configv1.ClusterVersionCapability](append(inclusionConfig.Capabilities.EnabledCapabilities, enabled...)...).UnsortedList(),
+                        }
+                    }
+                    if err := ms[i].Include(inclusionConfig.ExcludeIdentifier, inclusionConfig.RequiredFeatureSet, inclusionConfig.Profile, capabilitiesStatus, inclusionConfig.Overrides); err != nil {
                         klog.V(4).Infof("Excluding %s: %s", ms[i].String(), err)
                         ms = append(ms[:i], ms[i+1:]...)
                     }
                 }

This way cluster-based inclusion (where Capabilities is always set) still sees the enriched status including implicitly enabled capabilities, while install-config-only flows remain robust when capabilities are not specified.


541-566: Manifest error handling behavior change is confirmed—errors abort the command, contradicting the best-effort comment

The verification confirms the review comment's analysis. Tracing the error flow:

  1. ManifestReceiver.TarEntryCallbackDoneCallback (extract_tools.go:1393) returns kerrors.NewAggregate(mr.ManifestErrs)
  2. This callback is invoked inside opts.Run() (extract/extract.go:555-558) and its error is returned immediately if non-nil
  3. The call if err := opts.Run(); err != nil { return err } at extract.go:537-538 then aborts the command
  4. The manifest error handling block at lines 541-566 is unreachable if any manifest errors exist

This directly contradicts the comment at lines 551-552 stating manifest errors should not cause the command to fail. The code block attempting to log errors gracefully will never execute when errors occur, breaking the documented best-effort semantics that existing workflows (e.g., mirroring) may depend on.

Either the error handling in TarEntryCallbackDoneCallback must be adjusted to record errors without returning them (preserving best-effort behavior), or the comment and manifest error handling block must be removed/updated to reflect the new fatal-on-error behavior.

♻️ Duplicate comments (1)
pkg/cli/admin/release/extract_tools.go (1)

1303-1394: ManifestReceiver behavior is correct for capability computation; be mindful of its error semantics

The new ManifestReceiver:

  • Skips parsing for skipNames and streams those files directly downstream.
  • Parses all manifest-like files, collects parse errors in ManifestErrs without immediately aborting.
  • When needEnabledCapabilities is true, buffers manifests and, in TarEntryCallbackDoneCallback, computes implicitly enabled capabilities via GetImplicitlyEnabledCapabilities, logs any newly implicit capabilities, and then replays the buffered manifests into the downstream callback with the computed enabled set.

This design matches the requirement to compute capabilities over the full payload before applying inclusion logic.

The main caveat is error semantics: TarEntryCallbackDoneCallback currently returns kerrors.NewAggregate(mr.ManifestErrs), which causes extract.ExtractOptions.Run (and thus callers like oc adm release extract) to fail on any parse or downstream callback error. That behavior coordination is discussed in the main extract.go comment; whichever decision you take there (fatal vs best‑effort) will likely require adjusting this return value accordingly.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between ea5c720 and 91537c7.

📒 Files selected for processing (3)
  • pkg/cli/admin/release/extract.go (9 hunks)
  • pkg/cli/admin/release/extract_tools.go (5 hunks)
  • pkg/cli/image/extract/extract.go (3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Files:

  • pkg/cli/admin/release/extract.go
  • pkg/cli/image/extract/extract.go
  • pkg/cli/admin/release/extract_tools.go
🔇 Additional comments (6)
pkg/cli/image/extract/extract.go (1)

144-145: New callbacks are wired correctly and keep existing behavior intact

The added ImageConfigCallback and TarEntryCallbackDoneCallback hooks are guarded for nil, invoked at sensible points (immediately after parsing the image config, and after finishing all layer processing per mapping), and do not alter existing control flow when unset. No issues from this file alone.

Also applies to: 153-155, 429-431, 555-559

pkg/cli/admin/release/extract.go (2)

572-600: currentPayloadManifests implementation looks correct and well-scoped

The helper to pull current payload manifests via a derived ExtractOptions:

  • Filters only manifest-like files by extension.
  • Reuses the same inclusion configuration (including capabilities) as the update path.
  • Accumulates manifests into a slice and returns them, with contextual error wrapping.

This matches the intended “current vs. update” comparison model and is gated behind needEnabledCapabilities, so the extra IO is only paid when necessary.


602-634: getOptsToGetCurrentPayloadManifests correctly clones extract options for the current payload

This helper cleanly:

  • Discovers the cluster’s desired image via ClusterVersions().Get("version").
  • Parses it into a TypedImageReference.
  • Constructs a fresh extract.ExtractOptions that copies the calling options’ IO streams, security/filter settings, mirror configs, and OnlyFiles mode, with a single release-manifests/ mapping.

The separation between “source” (current cluster payload) and the update image keeps concerns clear with no obvious correctness issues.

pkg/cli/admin/release/extract_tools.go (3)

1171-1231: Capability-set logging and include-config helpers are reasonable

logCapabilitySetMayDiffer and its integration into findClusterIncludeConfigFromInstallConfig:

  • Correctly derive the client (oc) version with version.ExtractVersion().
  • Log informative messages when the extracted image’s version and the client version differ, especially for BaselineCapabilitySetCurrent.
  • Keep failures to determine the oc version as hard errors, which is acceptable given this should be rare and indicates a broken client build.

The include-config-from-install-config flow remains mostly unchanged besides this logging, and the new error wrapping on YAML parse/validation improves diagnosability.


1233-1265: Cluster include-config capabilities wiring looks consistent with CVO

The updated findClusterIncludeConfig:

  • Derives a capabilities baseline set (capSet) from clusterVersion.Spec.Capabilities falling back to ClusterVersionCapabilitySetCurrent.
  • Starts from clusterVersion.Status.Capabilities.DeepCopy(), then:
    • Unions in the baseline capability set to EnabledCapabilities.
    • Overrides KnownCapabilities with configv1.KnownClusterVersionCapabilities.
  • Logs potential capability-set drift via logCapabilitySetMayDiffer.

Assuming clusterVersion.Status.Capabilities is always non-nil in supported environments, this aligns with the CVO behavior and gives a solid basis for inclusion decisions and implicit-capability calculations.


1396-1452: Implicit-capability computation logic looks sound and matches expectations

GetImplicitlyEnabledCapabilities:

  • Starts from the provided currentImplicitlyEnabled set.
  • For each update manifest that fails inclusion under the current capabilities, looks for a matching current manifest by SameResourceID.
  • For matches where the current manifest passes inclusion, derives new implicit capabilities as:
    • Capabilities present in the update manifest but not in the current manifest,
    • Excluding those already implicitly enabled,
    • Excluding those already explicitly enabled in manifestInclusionConfiguration.Capabilities.EnabledCapabilities.
  • Unions these into the return set and logs any newly implicitly enabled capabilities at V(2).

Given that it’s only called when manifestInclusionConfiguration.Capabilities is non‑nil (cluster‑based include config), this is a reasonable and maintainable port of the CVO behavior.

Comment on lines +378 to 395
needEnabledCapabilities := o.ExtractManifests && o.Included && o.InstallConfig == ""
var inclusionConfig manifestInclusionConfiguration
manifestReceiver := ManifestReceiver{skipNames: sets.New[string]("image-references", "release-metadata"),
needEnabledCapabilities: needEnabledCapabilities}
// o.ExtractManifests implies o.File == ""
if o.ExtractManifests {
expectedProviderSpecKind := credRequestCloudProviderSpecKindMapping[o.Cloud]

include := func(m *manifest.Manifest) error { return nil } // default to including everything
if o.Included {
context := "connected cluster"
inclusionConfig := manifestInclusionConfiguration{}
if o.InstallConfig == "" {
inclusionConfig, err = findClusterIncludeConfig(ctx, o.RESTConfig)
inclusionConfig, err = findClusterIncludeConfig(ctx, o.RESTConfig, versionInImageConfig)
currentPayloadManifests, err := currentPayloadManifests(ctx, opts, o.RESTConfig, inclusionConfig)
if err != nil {
err = fmt.Errorf("failed to get the current payload manifests: %w", err)
} else {
manifestReceiver.currentPayloadManifests = currentPayloadManifests
}
} else {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Handle errors from findClusterIncludeConfig before computing current payload manifests

In the o.InstallConfig == "" branch you do:

inclusionConfig, err = findClusterIncludeConfig(ctx, o.RESTConfig, versionInImageConfig)
currentPayloadManifests, err := currentPayloadManifests(ctx, opts, o.RESTConfig, inclusionConfig)
if err != nil {
    err = fmt.Errorf("failed to get the current payload manifests: %w", err)
} else {
    manifestReceiver.currentPayloadManifests = currentPayloadManifests
}

If findClusterIncludeConfig fails, its error is overwritten by the subsequent currentPayloadManifests call and the first failure is silently ignored while using a zero-valued inclusionConfig. That’s a correctness bug.

Consider short‑circuiting on the first error and keeping the wrapped context for the second, e.g.:

-        inclusionConfig, err = findClusterIncludeConfig(ctx, o.RESTConfig, versionInImageConfig)
-        currentPayloadManifests, err := currentPayloadManifests(ctx, opts, o.RESTConfig, inclusionConfig)
-        if err != nil {
-            err = fmt.Errorf("failed to get the current payload manifests: %w", err)
-        } else {
-            manifestReceiver.currentPayloadManifests = currentPayloadManifests
-        }
+        inclusionConfig, err = findClusterIncludeConfig(ctx, o.RESTConfig, versionInImageConfig)
+        if err != nil {
+            return err
+        }
+        currentPayloadManifests, err := currentPayloadManifests(ctx, opts, o.RESTConfig, inclusionConfig)
+        if err != nil {
+            return fmt.Errorf("failed to get the current payload manifests: %w", err)
+        }
+        manifestReceiver.currentPayloadManifests = currentPayloadManifests
🤖 Prompt for AI Agents
In pkg/cli/admin/release/extract.go around lines 378 to 395, the error returned
by findClusterIncludeConfig is overwritten by the subsequent
currentPayloadManifests call; check err immediately after calling
findClusterIncludeConfig and short‑circuit (wrap and return or set the outer
err) if it failed, and only call currentPayloadManifests when inclusionConfig is
valid; also avoid shadowing err (don’t use := for the currentPayloadManifests
error) and wrap the currentPayloadManifests error with context before returning
or assigning manifestReceiver.currentPayloadManifests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants