Symptom
Every CI run on this repo for the past 2+ weeks has been failure or cancelled due to the CI/e2e job. This blocks every PR and every dependabot bump.
Evidence
gh run list --workflow=CI --limit 20 shows an unbroken streak of failures since at least 2026-05-11 across:
Failure pattern
In every run the Playwright step shows the same shape: tests start, then a flood of ××F (retry-retry-fail) markers, then the job hits the GitHub Actions timeout-minutes limit and is cancelled.
PR #189 bumped timeout-minutes from 15 → 30 to rule out budget; the suite still timed out at 30 min with the same pattern, so the cause is real test failures (not just slowness). The suite runs 126 tests with workers: 1 and retries: 2 in CI per playwright.config.ts.
Root cause is unknown
This needs dedicated investigation outside the scope of any single feature PR. Candidates:
- Real product regression somewhere between 2026-05-11 and now
- Server startup / shutdown race against early test requests (some failures show
socket hang up against the local dev server)
- Resource pressure with 1-worker × 2-retries × 126-test config
- A specific spec file that hangs, blocking the rest
Suggested next steps
- Reproduce locally with
CI=1 npx playwright test --project=desktop --reporter=list and capture which tests are flaky vs hard-failing
- Triage by spec file — disable the slowest/flakiest spec only after attaching a tracking link
- Consider raising
workers on CI (the bottleneck is wall clock, not CPU contention)
- Once green, set
retries: 0 on the worst offenders so flake stays visible
Out of scope for this issue
Do not paper over the failure by skipping the e2e job, marking it optional, or removing tests. Fix the underlying problem.
Symptom
Every CI run on this repo for the past 2+ weeks has been
failureorcancelleddue to theCI/e2ejob. This blocks every PR and every dependabot bump.Evidence
gh run list --workflow=CI --limit 20shows an unbroken streak of failures since at least 2026-05-11 across:Failure pattern
In every run the Playwright step shows the same shape: tests start, then a flood of
××F(retry-retry-fail) markers, then the job hits the GitHub Actionstimeout-minuteslimit and is cancelled.PR #189 bumped
timeout-minutesfrom 15 → 30 to rule out budget; the suite still timed out at 30 min with the same pattern, so the cause is real test failures (not just slowness). The suite runs 126 tests withworkers: 1andretries: 2in CI perplaywright.config.ts.Root cause is unknown
This needs dedicated investigation outside the scope of any single feature PR. Candidates:
socket hang upagainst the local dev server)Suggested next steps
CI=1 npx playwright test --project=desktop --reporter=listand capture which tests are flaky vs hard-failingworkerson CI (the bottleneck is wall clock, not CPU contention)retries: 0on the worst offenders so flake stays visibleOut of scope for this issue
Do not paper over the failure by skipping the e2e job, marking it optional, or removing tests. Fix the underlying problem.