Skip to content

e2e: reduce pod-pending-timeout to fail fast on stuck pods#5001

Open
joepvd wants to merge 2 commits intoopenshift:mainfrom
joepvd:e2e-reduce-pod-pending-timeout
Open

e2e: reduce pod-pending-timeout to fail fast on stuck pods#5001
joepvd wants to merge 2 commits intoopenshift:mainfrom
joepvd:e2e-reduce-pod-pending-timeout

Conversation

@joepvd
Copy link
Contributor

@joepvd joepvd commented Mar 11, 2026

Summary

  • The default --pod-pending-timeout of 60m causes e2e tests to waste up to an hour waiting when a step pod is stuck in Pending state (e.g. due to scheduling or image pull issues).
  • Set it to 10m for e2e tests, which is long enough for normal scheduling but fails fast when something is fundamentally wrong.
  • This mirrors the existing --lease-acquire-timeout=2s override already applied for the same reason.

Context

Observed in PR #4996 e2e job where configurable-leases-check-leases (a trivial step that should complete in under a second) failed after 59m59s — almost exactly the 60m --pod-pending-timeout default.

Made with Cursor

Summary by CodeRabbit

  • Tests
    • Enhanced end-to-end testing: increased the pod pending timeout used by the CI test runner to reduce flaky failures and improve stability of integration/e2e test runs.

The default --pod-pending-timeout of 60m causes e2e tests to waste up
to an hour waiting when a step pod is stuck in Pending state (e.g. due
to scheduling or image pull issues). Set it to 10m for e2e tests, which
is long enough for normal scheduling but fails fast when something is
fundamentally wrong. This mirrors the existing --lease-acquire-timeout=2s
override already applied for the same reason.

Made-with: Cursor
@openshift-ci-robot
Copy link
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: automatic mode

@openshift-ci openshift-ci bot requested review from deepsm007 and droslean March 11, 2026 07:50
@coderabbitai
Copy link

coderabbitai bot commented Mar 11, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 25161fbe-628b-464e-b1bf-2c2d7f2163da

📥 Commits

Reviewing files that changed from the base of the PR and between ae2d54e and 8f8e712.

📒 Files selected for processing (1)
  • test/e2e/framework/ci-operator.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • test/e2e/framework/ci-operator.go

Walkthrough

Added the --pod-pending-timeout=20m flag to the ci-operator command invocation in the test framework, adjusting the pod pending timeout to 20 minutes without other behavioral changes.

Changes

Cohort / File(s) Summary
CI Operator Flag Configuration
test/e2e/framework/ci-operator.go
Added --pod-pending-timeout=20m flag to the ci-operator command invocation (one-line CLI argument change).

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 inconclusive)

Check name Status Explanation Resolution
Title check ❓ Inconclusive The title describes reducing pod-pending-timeout to fail fast on stuck pods. However, the actual change implements a 20-minute timeout, not a reduction to the 10-minute value mentioned in the PR description. Clarify whether the title accurately reflects the final timeout value (20m vs 10m) to ensure the title matches the implemented change.
Test Structure And Quality ❓ Inconclusive Cannot access the modified file to assess Ginkgo test code quality. Repository appears to be in an inaccessible state or file does not exist in expected location. Provide access to test/e2e/framework/ci-operator.go content or verify the file path and repository state to enable proper Ginkgo test quality assessment.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Stable And Deterministic Test Names ✅ Passed The modified file ci-operator.go is a framework utility file containing no Ginkgo test definitions (It, Describe, Context, When), making the custom check for stable test names inapplicable.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@danilo-gemoli
Copy link
Contributor

/test e2e
/lgtm

@openshift-ci openshift-ci bot added lgtm Indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Mar 11, 2026
@joepvd
Copy link
Contributor Author

joepvd commented Mar 11, 2026

/retest-required

@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Mar 11, 2026
@joepvd
Copy link
Contributor Author

joepvd commented Mar 11, 2026

/test e2e

@joepvd
Copy link
Contributor Author

joepvd commented Mar 11, 2026

/test images
/test e2e

@danilo-gemoli
Copy link
Contributor

/lgtm
/retest-required

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Mar 12, 2026
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 12, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: danilo-gemoli, joepvd

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@danilo-gemoli
Copy link
Contributor

/test images

@openshift-ci-robot
Copy link
Contributor

Tests from second stage were triggered manually. Pipeline can be controlled only manually, until HEAD changes. Use command to trigger second stage.

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD 2b0ca42 and 2 for PR HEAD 8f8e712 in total

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD cc514eb and 1 for PR HEAD 8f8e712 in total

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD aa49356 and 0 for PR HEAD 8f8e712 in total

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 14, 2026

@joepvd: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/breaking-changes 8f8e712 link false /test breaking-changes
ci/prow/images 8f8e712 link true /test images

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-ci-robot
Copy link
Contributor

/hold

Revision 8f8e712 was retested 3 times: holding

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants