fix(ci): check E2E jobs within ci-e2e.yml orchestrator runs#409
Conversation
…dalone runs Two bugs in release-gate.yml: 1. Web E2E verification checked the CI E2E orchestrator's overall conclusion before checking the Web E2E job. If Desktop E2E failed, the orchestrator failed, blocking the web gate even though Web E2E passed. Fix: skip the overall conclusion check and inspect the specific Web E2E job directly. 2. Desktop E2E verification searched for standalone desktop-e2e.yml workflow runs. But when ci-e2e.yml calls desktop-e2e.yml via workflow_call, it doesn't create separate workflow runs — the jobs appear within the ci-e2e.yml run. Fix: check for "Desktop E2E" jobs inside ci-e2e.yml runs (same pattern as the web check). Also fixed the web fallback search to not filter by overall run conclusion, so it can find passing Web E2E jobs even in runs where other jobs failed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: c41172817edd
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 9 minutes and 25 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
WalkthroughShifted E2E verification in Changes
Sequence Diagram(s)sequenceDiagram
participant ReleaseGate as Release Gate (workflow)
participant GHAPI as GitHub API
participant CI_E2E as ci-e2e.yml run
participant WebJobs as Web E2E jobs
participant DesktopJobs as Desktop E2E jobs
ReleaseGate->>GHAPI: Query latest ci-e2e run for MAIN_SHA / latest completed
GHAPI-->>ReleaseGate: Return run metadata
ReleaseGate->>CI_E2E: Poll until completed (if needed)
CI_E2E-->>ReleaseGate: Run completed
ReleaseGate->>GHAPI: List jobs for run (filter "Web E2E")
GHAPI-->>ReleaseGate: Return WebJobs list + conclusions
alt any Web job has non-success/non-skipped
ReleaseGate->>ReleaseGate: Fail Web gate
else at least one Web job success
ReleaseGate->>ReleaseGate: Pass Web gate
else Web jobs absent/skipped
ReleaseGate->>GHAPI: Search older ci-e2e runs until all Web jobs succeeded
GHAPI-->>ReleaseGate: Return fallback run or not
end
ReleaseGate->>GHAPI: List jobs for run (filter "Desktop E2E")
GHAPI-->>ReleaseGate: Return DesktopJobs list + conclusions
alt any Desktop job failed
ReleaseGate->>ReleaseGate: Fail Desktop gate
else all Desktop jobs succeeded
ReleaseGate->>ReleaseGate: Pass Desktop gate
else Desktop jobs absent/skipped
ReleaseGate->>GHAPI: Search older ci-e2e runs until all Desktop jobs succeeded
GHAPI-->>ReleaseGate: Return fallback run or not
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #409 +/- ##
=======================================
Coverage 61.28% 61.28%
=======================================
Files 133 133
Lines 9790 9790
Branches 988 988
=======================================
Hits 6000 6000
Misses 3574 3574
Partials 216 216
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
Updates the release gate workflow to validate Web/Desktop E2E results by inspecting specific jobs within ci-e2e.yml orchestrator runs, rather than relying on overall workflow conclusions or searching for standalone workflow runs that don’t exist under workflow_call.
Changes:
- Web E2E gate no longer fails purely on the
ci-e2e.ymlrun’s overall conclusion; it’s intended to check the “Web E2E” job directly. - Desktop E2E gate rewritten to look for “Desktop E2E” jobs inside
ci-e2e.ymlruns instead ofdesktop-e2e.ymlworkflow runs. - Web fallback search no longer filters candidate runs by overall run conclusion.
Address review feedback: the gate now distinguishes three states for each E2E check (passed/failed/skipped) instead of two (passed/other). - Failed jobs trigger immediate failure instead of falling back to older runs — prevents masking regressions on main. - Desktop E2E matrix: verifies ALL platform jobs passed, not just that at least one succeeded (prevents partial-failure false pass). - Fallback to older runs only activates when jobs are absent/skipped. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 0b8e5d642de5
Use if/else pattern for Desktop E2E pass/fallback instead of if+exit 0, matching the Web E2E code structure. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: bac91d754325
Tighten fallback acceptance condition for both Web and Desktop E2E: require all matching jobs to have succeeded (SUCCESS == TOTAL) instead of just checking no failures. Prevents accepting partial runs where some jobs were skipped. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: d282fc26c9a5
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/release-gate.yml:
- Around line 224-245: The loop that sets RECENT_RUN_ID currently skips past
newer runs with Web E2E failures; change the logic so you iterate completed
ci-e2e.yml runs newest-first and for each run (use RECENT_RUN_ID / rid,
JOBS_JSON, TOTAL, SUCCESS, FAILED) you first check if TOTAL (Web E2E jobs) is
zero and then continue (absent/skipped), but if TOTAL > 0 treat this run as
authoritative: if FAILED > 0 emit an error and exit non-zero immediately,
otherwise set RECENT_RUN_ID (echo "$rid") and break; ensure you do not continue
scanning older successful runs once a newer run had Web E2E jobs (either pass or
fail).
- Around line 318-340: The current RECENT_RUN_ID loop (using JOBS_JSON and
reading run ids) skips newer runs with failing Desktop E2E jobs and keeps
searching for an older success; change the logic so you examine runs in
descending order and stop at the first run where Desktop E2E actually ran:
compute TOTAL, SUCCESS and FAILED from JOBS_JSON and if TOTAL>0 then if FAILED>0
immediately fail/exit with an error about that run, otherwise set RECENT_RUN_ID
to that run id and break; do not continue searching past the first run that ran
Desktop E2E (use the existing RECENT_RUN_ID, JOBS_JSON, TOTAL, SUCCESS, FAILED
symbols).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 9c321963-3c9a-4953-9aef-0fc539b7b7c7
📒 Files selected for processing (1)
.github/workflows/release-gate.yml
The fallback loops that search older ci-e2e.yml runs now treat the first run that actually contains the E2E jobs as authoritative: if that run failed, fail the gate immediately instead of scanning further back for an older passing run. This prevents masking regressions. Also switched from piped while-loop (subshell) to process substitution so that exit 1 inside the loop propagates to the step. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 9fb58ca125c1
Both polling loops waited 20 min (40 * 30s) but desktop-e2e.yml has a 45-minute timeout. If Desktop E2E runs long, the orchestrator won't complete within the polling window, causing false failures. - Increase both polling loops to 90 * 30s = 45 min - Increase verify-e2e job timeout from 40 to 50 min Since both steps poll the same ci-e2e.yml run, the second step finds it already complete — no risk of 45+45=90 min sequential waits. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 2968bec3da2a
If all matching E2E jobs in a ci-e2e.yml run have conclusion "skipped", the fallback should continue searching older runs (E2E didn't actually run). Previously, TOTAL > 0 but SUCCESS == 0 would trigger a false failure. Now counts only non-skipped (active) jobs for the pass/fail evaluation. Applies to both Web and Desktop E2E fallback loops. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 75ed4a9a4008
The polling loops selected [0] from runs matching the HEAD SHA, which picks the newest run. If a rerun is in-progress while an older completed run exists, the gate would wait unnecessarily (or time out). Now sorts completed runs first via jq sort_by, so a completed run is used immediately even if a rerun is in-progress. Also consolidates the conclusion + run ID into a single API call. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 90d2204a667c
|
@coderabbitai please review |
🧩 Analysis chain🏁 Script executed: Length of output: 0 🏁 Script executed: Length of output: 20095
🔴 Timeout budget mismatchThe job's But when only Desktop E2E runs (no web changes detected, so the Web step is skipped), the Desktop step must do the full 45-min poll on its own. With 50 min available, there's only ~5 min of margin for everything else in the job. If the desktop suite runs long, the job timeout will fire before the poll completes and the gate exits without a useful error message. Suggested fix: raise 🟡
|
- Raise verify-e2e job timeout from 50 to 60 min (more margin when
only Desktop E2E runs and needs the full 45-min poll window)
- Add E2E_RUN_ID validation after both poll loops — fail with a clear
error instead of a confusing 404 on "runs/null/jobs"
- Add missing ${DESKTOP_TOTAL:-0} default for consistency with other
variable references
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: ae1a31740645
|
All three items addressed in 0a8acb9:
|
When E2E jobs are absent/skipped at HEAD, only fall back to older runs if the ci-e2e.yml run succeeded (meaning detect-changes intentionally skipped the suite). If the run failed or was cancelled, E2E may have been skipped due to upstream failure — falling back would mask the issue. Also improved log messages to say "skipped/absent" instead of assuming "no changes in that push". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 5051a8c8a04f
If the GitHub API call fails (rate limit, auth, network), the polling loop would spin until timeout with a misleading "still running" message. Now uses `if !` to fail fast on API errors. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 28b16c9575e1
Summary
desktop-e2e.ymlworkflow runs (which don't exist when triggered viaworkflow_call).Context
The release gate was failing because:
desktop-e2e.ymlruns, butworkflow_callruns appear as jobs within the caller — not as separate workflow runsBoth checks now use the same consistent pattern: find ci-e2e.yml run → check specific job by name → fall back to recent runs if skipped.
Test plan
🤖 Generated with Claude Code
Summary by CodeRabbit