Skip to content

fix(ci): stabilize recent dev smoke flakes#930

Merged
Astro-Han merged 3 commits into
devfrom
codex/ci-flake-stabilize
May 26, 2026
Merged

fix(ci): stabilize recent dev smoke flakes#930
Astro-Han merged 3 commits into
devfrom
codex/ci-flake-stabilize

Conversation

@Astro-Han

@Astro-Han Astro-Han commented May 26, 2026

Copy link
Copy Markdown
Owner

Summary

Fixes two recent merged dev CI flakes without changing product behavior:

  • Adds a bounded shutdown fallback to the desktop report-problem smoke script so a successful smoke result cannot hang until the workflow timeout.
  • Keeps the slow same-step loop-gate live test on the default timeout outside Windows, while giving Windows a 30s budget after it exceeded the 10s default by milliseconds.

No issue exists; this came from live investigation of the latest merged dev CI failures.

Why

Recent merged dev runs showed two separate failure shapes:

  • desktop-smoke on 2a6d53d completed the report-problem assertions and printed a ready result, then stayed alive until GitHub cancelled the job at 30 minutes.
  • windows-advisory on 9d1b7a8 failed only because loop gate records same-step repeated tool errors without block or stop timed out at 10.016s on Windows.

Related Issue

None.

Human Review Status

Pending

Review Focus

Please check that the smoke-script shutdown fallback is scoped to test cleanup only, and that the timeout increase stays limited to the known slow Windows live test.

Risk Notes

No product runtime behavior changes. Platform impact considered: this touches macOS Electron smoke cleanup and Windows advisory test timing only. Manual workflow_dispatch for windows-advisory was attempted after the review cleanup, but GitHub returned HTTP 500 for Actions dispatch creation; the Windows-specific path remains covered by the platform-gated test change and will be rechecked on the next dispatchable Windows advisory run.

How To Verify

GitHub Actions log inspection: latest merged dev failures were desktop-smoke cancellation after successful report result, and windows-advisory timeout in one prompt-effect live test.
Focused prompt test: cd packages/opencode && bun test --timeout 30000 test/session/prompt-effect.test.ts -t "loop gate records same-step repeated tool errors without block or stop" -> 1 pass.
Workflow contract tests: cd packages/opencode && bun test test/github/desktop-smoke-workflow.test.ts test/github/ci-workflow.test.ts -> 12 pass.
Desktop build: cd packages/desktop-electron && bun run build -> passed.
Report smoke: cd packages/desktop-electron && perl -e 'alarm shift; exec @ARGV' 25 bun run smoke:report -> passed and exited with code 0.
Diff check: git diff --check -> passed.

Screenshots or Recordings

Not applicable; no visible UI changes.

Checklist

  • Type label — this PR carries exactly one of bug, enhancement, task, documentation. Type labels are author-added; the labeler bot does NOT assign them. Add the label in the GitHub UI, then tick this.
  • Routing labels — this PR carries at least one of app, ui, platform, harness, ci. The labeler bot assigns these on PR open based on changed paths. Confirm the bot's choice (or override if wrong), then tick this.
  • Priority label — this PR carries exactly one of P0, P1, P2, P3. The priority-triage bot suggests one on PR open. Confirm or override, then tick this.
  • Human Review Status above is set to Pending, Approved by @<reviewer>, or Not required: <reason> (default is Pending; "not required" is restricted to bot-authored low-risk PRs).
  • I linked the related issue, or stated in Summary why there is no issue.
  • I described the review focus and any meaningful risks.
  • I replaced the example block in How To Verify with the real verification steps and the key result for each.
  • I did not introduce unrelated refactors, dependencies, generated files, or file changes beyond the stated scope.
  • (conditional) I manually checked visible UI or copy changes when needed, with screenshots or recordings. Leave unticked only if no visible UI or copy changed.
  • (conditional) I considered macOS and Windows impact for platform, packaging, updater, signing, paths, shell, or permissions changes. Leave unticked only if no platform/packaging surface was touched.
  • (conditional) I called out docs, release notes, dependencies, permissions, credentials, deletion behavior, generated content, or local file changes when relevant. Leave unticked only if none of those surfaces was touched.
  • I reviewed the final diff for unrelated changes and suspicious dependency changes.
  • I am targeting dev, and my PR title and commit messages use Conventional Commits in English.

Summary by CodeRabbit

  • Bug Fixes

    • Enhanced application shutdown reliability on desktop with improved process management and robust termination handling.
  • Tests

    • Improved test stability on Windows with optimized timeout configuration.

Review Change Stack

@Astro-Han Astro-Han added bug Something isn't working ci Continuous integration / GitHub Actions windows Windows-specific P1 High priority desktop labels May 26, 2026
@coderabbitai

coderabbitai Bot commented May 26, 2026

Copy link
Copy Markdown
Contributor

Warning

Review limit reached

@Astro-Han, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 59 minutes and 15 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 7a94d7cc-1a5b-4edd-96ee-60f9716740f8

📥 Commits

Reviewing files that changed from the base of the PR and between c2ddd87 and 0d4373c.

📒 Files selected for processing (2)
  • packages/desktop-electron/scripts/report-problem-smoke.mjs
  • packages/opencode/test/session/prompt-effect.test.ts
📝 Walkthrough

Walkthrough

This PR hardens test reliability by adding robust process termination logic to the Electron smoke test script and configuring a Windows-specific longer timeout for a flaky prompt-effect live test. Both changes address test stability on different execution environments.

Changes

Test stability across platforms

Layer / File(s) Summary
Electron process shutdown with graceful fallback
packages/desktop-electron/scripts/report-problem-smoke.mjs
New helpers (childIsRunning, withTimeout, waitForExit, closeApp) implement safe app closure with bounded timeouts and SIGKILL fallback. Cleanup phase calls closeApp(app) to ensure child process terminates before temp directory cleanup.
Windows-specific test timeout for prompt-effect
packages/opencode/test/session/prompt-effect.test.ts
slowWindowsLiveTimeout constant applies a 30-second timeout on Windows and defaults elsewhere. Wired into a specific it.live(...) test to reduce flakiness on slower Windows CI runners.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

  • Astro-Han/pawwork#579: Adjusts Windows-specific per-test timeouts in prompt-effect.test.ts via a shared "slow" timeout constant pattern.
  • Astro-Han/pawwork#543: Modifies prompt-effect.test.ts timeout values in the affected it.live(...) test cases.
  • Astro-Han/pawwork#429: Updates prompt-effect timeout and gating logic related to live tests.

Suggested labels

ci, flaky-test

Poem

🐰 A rabbit hops through test logs with care,
Process cleanup and timeouts everywhere!
On Windows slow, the tests now wait,
While Electron shutters with forceful fate,
Flakiness fades—our suite runs great! 🎯

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'fix(ci): stabilize recent dev smoke flakes' directly and specifically summarizes the main changes: fixing CI flakes in smoke and live tests.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The pull request description is comprehensive and follows the required template with all major sections completed.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/ci-flake-stabilize

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added platform Electron shell, OS integration, packaging, updater, signing, paths, and permissions harness Model harness, prompts, tool descriptions, and session mechanics and removed ci Continuous integration / GitHub Actions labels May 26, 2026

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested priority: P2 (includes non-doc, non-test paths outside the low-risk bucket).

P1/P0 are reserved for maintainer confirmation. Please relabel manually if this is a release blocker, security issue, data-loss risk, or updater/runtime failure.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a closeApp helper function in the electron smoke test script to handle application teardown more robustly, and increases a test timeout in prompt-effect.test.ts. The review identifies two issues in the closeApp implementation: uncleared timeouts that can keep the Node.js event loop active, and an unconditional wait on an already exited process that causes a 5-second hang. A code suggestion is provided to resolve these issues.

Comment thread packages/desktop-electron/scripts/report-problem-smoke.mjs
@Astro-Han Astro-Han force-pushed the codex/ci-flake-stabilize branch from 49e97ed to c2ddd87 Compare May 26, 2026 10:55
@Astro-Han Astro-Han force-pushed the codex/ci-flake-stabilize branch from c2ddd87 to 8cf5a57 Compare May 26, 2026 11:17
@Astro-Han

Copy link
Copy Markdown
Owner Author

Temporarily closing and reopening to retrigger required PR workflows after GitHub did not create check suites for the latest synchronize events.

@Astro-Han Astro-Han closed this May 26, 2026
@Astro-Han Astro-Han reopened this May 26, 2026
@Astro-Han Astro-Han force-pushed the codex/ci-flake-stabilize branch from 67abbef to 8cf5a57 Compare May 26, 2026 11:21
@Astro-Han Astro-Han merged commit 0812f86 into dev May 26, 2026
46 of 66 checks passed
@Astro-Han Astro-Han deleted the codex/ci-flake-stabilize branch May 26, 2026 14:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working desktop harness Model harness, prompts, tool descriptions, and session mechanics P1 High priority platform Electron shell, OS integration, packaging, updater, signing, paths, and permissions windows Windows-specific

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant