Skip to content

Conversation

@mtulio
Copy link
Contributor

@mtulio mtulio commented Jun 2, 2025

Updating the k/cloud-provider-aws to gather the feature of Service type-loadBalancer NLB with managed Security Group through cloud-config under the OpenShift feature set TechPreviewNoUpgrade.

Upstream feature:

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 2, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 2, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@mtulio mtulio changed the title tmp/DNM: validating NLB+SG config DNM/SPLAT-2253: tmp validation of NLB+SG setup Jun 2, 2025
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jun 2, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Jun 2, 2025

@mtulio: This pull request references SPLAT-2253 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set.

In response to this:

Bumping cloud-provider-aws are crashing, focusing in the change for now to be able to validate with cluster-bot.

this PR is created to be used with cluster-bot:

Ref: openshift/cloud-provider-aws#108

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@mtulio
Copy link
Contributor Author

mtulio commented Jul 24, 2025

/test all

@mtulio mtulio changed the title DNM/SPLAT-2253: tmp validation of NLB+SG setup DNM/SPLAT-2253: CCM-AWS config enforce to provision Service NLB with SG under gate Jul 24, 2025
@mtulio
Copy link
Contributor Author

mtulio commented Sep 10, 2025

PR rebased with upstream updates, and CCCMO FG support by #400

@mtulio
Copy link
Contributor Author

mtulio commented Sep 10, 2025

Next step: create a CI job to exercise this scenario.

@mtulio
Copy link
Contributor Author

mtulio commented Sep 10, 2025

/test ?

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 10, 2025

@mtulio: The following commands are available to trigger required jobs:

/test e2e-aws-ovn
/test e2e-aws-ovn-upgrade
/test fmt
/test images
/test lint
/test okd-scos-images
/test security
/test unit
/test vendor
/test verify-deps
/test vet

The following commands are available to trigger optional jobs:

/test e2e-azure-manual-oidc
/test e2e-azure-ovn
/test e2e-azure-ovn-upgrade
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-ibmcloud-ovn
/test e2e-nutanix-ovn
/test e2e-openstack-ovn
/test e2e-vsphere-ovn
/test level0-clusterinfra-azure-ipi-proxy-tests
/test okd-scos-e2e-aws-ovn
/test regression-clusterinfra-vsphere-ipi-ccm

Use /test all to run the following jobs that were automatically triggered:

pull-ci-openshift-cluster-cloud-controller-manager-operator-main-e2e-aws-ovn
pull-ci-openshift-cluster-cloud-controller-manager-operator-main-e2e-aws-ovn-upgrade
pull-ci-openshift-cluster-cloud-controller-manager-operator-main-e2e-azure-ovn
pull-ci-openshift-cluster-cloud-controller-manager-operator-main-e2e-azure-ovn-upgrade
pull-ci-openshift-cluster-cloud-controller-manager-operator-main-e2e-gcp-ovn
pull-ci-openshift-cluster-cloud-controller-manager-operator-main-e2e-gcp-ovn-upgrade
pull-ci-openshift-cluster-cloud-controller-manager-operator-main-e2e-openstack-ovn
pull-ci-openshift-cluster-cloud-controller-manager-operator-main-e2e-vsphere-ovn
pull-ci-openshift-cluster-cloud-controller-manager-operator-main-fmt
pull-ci-openshift-cluster-cloud-controller-manager-operator-main-images
pull-ci-openshift-cluster-cloud-controller-manager-operator-main-level0-clusterinfra-azure-ipi-proxy-tests
pull-ci-openshift-cluster-cloud-controller-manager-operator-main-lint
pull-ci-openshift-cluster-cloud-controller-manager-operator-main-okd-scos-e2e-aws-ovn
pull-ci-openshift-cluster-cloud-controller-manager-operator-main-okd-scos-images
pull-ci-openshift-cluster-cloud-controller-manager-operator-main-regression-clusterinfra-vsphere-ipi-ccm
pull-ci-openshift-cluster-cloud-controller-manager-operator-main-security
pull-ci-openshift-cluster-cloud-controller-manager-operator-main-unit
pull-ci-openshift-cluster-cloud-controller-manager-operator-main-vendor
pull-ci-openshift-cluster-cloud-controller-manager-operator-main-verify-deps
pull-ci-openshift-cluster-cloud-controller-manager-operator-main-vet

In response to this:

/test ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Sep 11, 2025

@mtulio: This pull request references SPLAT-2253 which is a valid jira issue.

In response to this:

Bumping cloud-provider-aws are crashing, focusing in the change for now to be able to validate with cluster-bot.

this PR is created to be used with cluster-bot:

Ref: openshift/cloud-provider-aws#117

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@mtulio mtulio changed the title DNM/SPLAT-2253: CCM-AWS config enforce to provision Service NLB with SG under gate SPLAT-2253/WIP: CCM-AWS config enforce to provision Service NLB with SG under gate Sep 17, 2025
@openshift-ci-robot openshift-ci-robot removed the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Sep 17, 2025
@openshift-ci-robot
Copy link

@mtulio: No Jira issue is referenced in the title of this pull request.
To reference a jira issue, add 'XYZ-NNN:' to the title of this pull request and request another refresh with /jira refresh.

In response to this:

Bumping cloud-provider-aws are crashing, focusing in the change for now to be able to validate with cluster-bot.

this PR is created to be used with cluster-bot:

Ref: openshift/cloud-provider-aws#117

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@mtulio
Copy link
Contributor Author

mtulio commented Sep 17, 2025

/payload-job ?

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 17, 2025

@mtulio: it appears that you have attempted to use some version of the payload command, but your comment was incorrectly formatted and cannot be acted upon. See the docs for usage info.

@mtulio
Copy link
Contributor Author

mtulio commented Sep 17, 2025

/testwith openshift/cluster-cloud-controller-manager-operator/main/e2e-aws-ovn openshift/origin#30235 openshift/cloud-provider-aws#117

@deepsm007
Copy link

/testwith openshift/cluster-cloud-controller-manager-operator/main/e2e-aws-ovn openshift/cloud-provider-aws#117

@openshift-ci-robot
Copy link

openshift-ci-robot commented Oct 30, 2025

@mtulio: This pull request references SPLAT-2253 which is a valid jira issue.

In response to this:

Updating the cloud-provider-aws and OpenShift clients to gather the NLB+SG feature, enabling the configuration to provision SGs for all NLBs through the sync transformer.

Ref: openshift/cloud-provider-aws#117

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@mtulio
Copy link
Contributor Author

mtulio commented Oct 30, 2025

Upgrading from k8s 1.33 to 1.34 introduced JSON marshaling behavior changed, and looks like when updating openshift clients it is hitting the unit tests resourceapply to fail when calculating the hash of object.

Considering this would be unrelated with changes introduced to this PR, I will open a different thread to discuss the correct approach. As for now my view is this is blocking this PR as it requires to update cloud-provider-aws to 1.34 (which requires o && k 1.34)

> The Problem
The spec-hash annotation is used in production to detect if a resource's spec has changed. Looking at the code:
- Change Detection: The hash is compared to determine if an update is needed
- Backward Compatibility: If the hash calculation changes between library versions, existing resources with old hashes will be incorrectly detected as "changed" and unnecessarily updated

> The Risk
When upgrading from 1.33 to 1.34:
- Existing resources in production will have hashes computed with the old JSON marshaling
- New code will compute different hashes for the same spec due to JSON marshaling changes
- This will cause unnecessary updates to all existing resources on first deployment after upgrade

cc @rvanderp3 @damdo

@mtulio
Copy link
Contributor Author

mtulio commented Nov 7, 2025

This PR is blocked by #428 where there will provide the bump as well fixes found in the unit tests.

/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 7, 2025
@mtulio
Copy link
Contributor Author

mtulio commented Nov 11, 2025

/test e2e-aws-ovn-techpreview

@mtulio
Copy link
Contributor Author

mtulio commented Nov 11, 2025

weird, locally is passing.

/test unmit

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 11, 2025

@mtulio: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

/test e2e-aws-ovn
/test e2e-aws-ovn-upgrade
/test fmt
/test images
/test lint
/test okd-scos-images
/test unit
/test vendor
/test verify-deps
/test vet

The following commands are available to trigger optional jobs:

/test e2e-aws-ovn-techpreview
/test e2e-azure-manual-oidc
/test e2e-azure-ovn
/test e2e-azure-ovn-upgrade
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-ibmcloud-ovn
/test e2e-nutanix-ovn
/test e2e-openstack-ovn
/test e2e-vsphere-ovn
/test level0-clusterinfra-azure-ipi-proxy-tests
/test okd-scos-e2e-aws-ovn
/test regression-clusterinfra-vsphere-ipi-ccm

Use /test all to run the following jobs that were automatically triggered:

pull-ci-openshift-cluster-cloud-controller-manager-operator-main-e2e-aws-ovn
pull-ci-openshift-cluster-cloud-controller-manager-operator-main-e2e-aws-ovn-upgrade
pull-ci-openshift-cluster-cloud-controller-manager-operator-main-fmt
pull-ci-openshift-cluster-cloud-controller-manager-operator-main-images
pull-ci-openshift-cluster-cloud-controller-manager-operator-main-level0-clusterinfra-azure-ipi-proxy-tests
pull-ci-openshift-cluster-cloud-controller-manager-operator-main-lint
pull-ci-openshift-cluster-cloud-controller-manager-operator-main-okd-scos-e2e-aws-ovn
pull-ci-openshift-cluster-cloud-controller-manager-operator-main-okd-scos-images
pull-ci-openshift-cluster-cloud-controller-manager-operator-main-regression-clusterinfra-vsphere-ipi-ccm
pull-ci-openshift-cluster-cloud-controller-manager-operator-main-unit
pull-ci-openshift-cluster-cloud-controller-manager-operator-main-vendor
pull-ci-openshift-cluster-cloud-controller-manager-operator-main-verify-deps
pull-ci-openshift-cluster-cloud-controller-manager-operator-main-vet

In response to this:

weird, locally is passing.

/test unmit

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@mtulio
Copy link
Contributor Author

mtulio commented Nov 11, 2025

/test unit

Update kubernetes/cloud-provider-aws package to use latest support of Service
type-loadBalancer NLB with support of Security Groups.
@mtulio mtulio force-pushed the SPLAT-2253 branch 2 times, most recently from 50cbb4a to da7b49b Compare November 11, 2025 20:40
Enforce CCM to manage Security Group by default for
security compliance and best practices on Service type-loadBalancer
when using Network Load Balancer (NLB).

Fixes INI files with sections sorted co-authored by Claude.

Co-Authored-By: Claude <[email protected]>
Signed-off-by: Marco Braga <[email protected]>
@mtulio
Copy link
Contributor Author

mtulio commented Nov 11, 2025

We've found a bug in the unit tests where sections where not ordered correctly making unit to flake. The commit also propose to make sure sections are ordered targeting to address this question:

● Perfect! I've found the root cause!

  Root Cause Analysis

  The test TestCloudConfigTransformer/with_existing_configuration_and_overrides is flaky because of the following:

  1. Map iteration order is non-deterministic in Go: On line 63 of aws_config_transformer.go, there's a loop: for id, override := range 
  cfg.ServiceOverride. The ServiceOverride field is a map[string], and Go explicitly randomizes map iteration order.
  2. Sorting happens but uses string comparison: On line 80, there's a sort: sort.Slice(file.Sections(), func(i, j int) bool { return 
  file.Sections()[i].Name() < file.Sections()[j].Name() }). This sorts section names alphabetically.
  3. The issue: When sorting [ServiceOverride "1"] and [ServiceOverride "2"] alphabetically:
    - Sometimes the sections are added in the order "1", "2"
    - Sometimes they're added in the order "2", "1"
    - When sorted, [ServiceOverride "2"] comes before [ServiceOverride "1"] alphabetically because the string comparison includes the quotes!
    - "2" < "1" is FALSE in string comparison, so the actual sort is correct lexicographically

credits to @jcpowermac :)

@mtulio
Copy link
Contributor Author

mtulio commented Nov 12, 2025

/test e2e-aws-ovn-techpreview

@huali9
Copy link

huali9 commented Nov 12, 2025

/verified by @huali9

Details are on https://issues.redhat.com/browse/SPLAT-2377

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Nov 12, 2025
@openshift-ci-robot
Copy link

@huali9: This PR has been marked as verified by @huali9.

In response to this:

/verified by @huali9

Details are on https://issues.redhat.com/browse/SPLAT-2377

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Nov 12, 2025

@mtulio: This pull request references SPLAT-2253 which is a valid jira issue.

In response to this:

Updating the k/cloud-provider-aws to gather the feature of Service type-loadBalancer NLB with managed Security Group through cloud-config under the OpenShift feature set TechPreviewNoUpgrade.

Upstream feature:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@mtulio
Copy link
Contributor Author

mtulio commented Nov 12, 2025

/test e2e-azure-ovn

@mtulio
Copy link
Contributor Author

mtulio commented Nov 12, 2025

/assign @damdo @nrb

@openshift-ci openshift-ci bot assigned damdo and nrb Nov 12, 2025
@damdo
Copy link
Member

damdo commented Nov 12, 2025

/assign @theobarberbany

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 12, 2025

@mtulio: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@mtulio
Copy link
Contributor Author

mtulio commented Nov 13, 2025

Removing the hold as this PR is already ready.

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants