feat(retry): add persistent retry mode for unattended CI/CD environments by zhangxy-zju · Pull Request #3080 · QwenLM/qwen-code

zhangxy-zju · 2026-04-10T03:25:09Z

Why

When Qwen Code runs as part of a CI/CD pipeline or as a background daemon (e.g., overnight batch refactoring, large-scale security audits), transient API capacity errors — HTTP 429 (Rate Limit) and 529 (Overloaded) — should not terminate long-running tasks. The current implementation uses a fixed maxAttempts: 7 across all modes, which means a multi-hour automated job can be killed by a brief API outage lasting just a couple of minutes. This is the #1 blocker for using Qwen Code as reliable DevOps infrastructure.

Claude Code solves this with a "persistent retry" mode (services/api/withRetry.ts). This PR brings equivalent capability to Qwen Code.

Closes #3003

What changed

Core: `packages/core/src/utils/retry.ts`

isUnattendedMode() — Detects persistent retry opt-in via QWEN_CODE_UNATTENDED_RETRY env var (explicit only — CI=true alone does NOT activate, to avoid silently turning fast-fail CI jobs into infinite-wait jobs)
isTransientCapacityError() — Classifies only 429 and 529 as transient capacity errors (HTTP 500 is excluded — it may be a permanent server bug)
sleepWithHeartbeat() — Chunked sleep that emits heartbeat callbacks every 30s to keep CI runners alive
retryWithBackoff() enhanced — In persistent mode, transient capacity errors bypass maxAttempts and retry indefinitely with:
- Exponential backoff capped at 5 minutes per retry
- Absolute single-wait cap at 6 hours
- Retry-After header respected when present
- AbortSignal support for graceful cancellation
- Independent persistentAttempt counter (attempt clamping technique from Claude Code)

Callers: `geminiChat.ts`, `baseLlmClient.ts`

Pass persistentMode: isUnattendedMode() and a heartbeatFn that writes to stderr

Docs

docs/users/configuration/settings.md — Added QWEN_CODE_UNATTENDED_RETRY to environment variables table
docs/users/features/headless.md — Added "Persistent Retry Mode" section with activation, examples, and monitoring

Non-breaking

persistentMode defaults to false — zero behavior change for interactive users
All 26 existing tests pass unchanged; 20 new tests added (46 total)

How to verify

Unit tests

npx vitest run packages/core/src/utils/retry.test.ts

All 46 tests pass, covering:

429/529 persistent retry beyond maxAttempts
500 still fails at maxAttempts (not persistent)
CI=true alone does NOT activate persistent mode
Backoff cap enforcement
Heartbeat callback invocation
AbortSignal cancellation
isUnattendedMode() env var detection (truthy/falsy)
Regression: non-persistent mode unchanged

Manual / integration

QWEN_CODE_UNATTENDED_RETRY=1 qwen-code "refactor all files in src/" — enable persistent retry
Temporarily inject a 429 error in the API layer and observe persistent retry with heartbeat logs on stderr
Verify normal interactive mode is unaffected (no env vars set)

github-actions · 2026-04-10T03:26:44Z

📋 Review Summary

This PR introduces a persistent retry mode for handling transient API capacity errors (429/529) in unattended/CI environments, enabling long-running automated tasks to survive temporary API outages. The implementation is well-structured with comprehensive test coverage, but there are some concerns around the attempt clamping technique and potential infinite loop risks that should be addressed before merging.

🔍 General Feedback

Strong test coverage: 20 new tests thoroughly cover persistent mode behavior, edge cases, and regression scenarios
Non-breaking design: persistentMode defaults to false, ensuring zero impact on interactive users
Good separation of concerns: New helper functions (isTransientCapacityError, isUnattendedMode, sleepWithHeartbeat) are well-isolated
Heartbeat mechanism: Excellent addition for keeping CI runners alive during long waits
AbortSignal support: Proper graceful cancellation support for persistent retries
Concern: The attempt clamping technique (if (attempt >= maxAttempts) { attempt = maxAttempts - 1 }) is a clever but potentially confusing pattern that could be refactored for clarity

🎯 Specific Feedback

🔴 Critical

File: packages/core/src/utils/retry.ts:267-271 - The attempt clamping technique creates a potential maintenance hazard. While functional, this pattern of modifying the loop counter to prevent exit is non-standard and could confuse future maintainers:
```
// Clamp attempt so the while-loop never exits
if (attempt >= maxAttempts) {
  attempt = maxAttempts - 1;
}
```
Recommendation: Consider restructuring the loop logic to use separate conditions for persistent vs. normal mode, or extract the persistent retry loop into its own function for clarity.

🟡 High

File: packages/core/src/utils/retry.ts:224-227 - The shouldPersist logic has a subtle issue. When shouldPersist is true, the code bypasses the normal retry exhaustion check entirely, but then still uses attempt in the while condition. This creates two independent counters (attempt and persistentAttempt) that track different things:
```
const shouldPersist = persistent && isTransient;

if (!shouldPersist) {
  if (attempt >= maxAttempts || !shouldRetryOnError(error as Error)) {
    throw error;
  }
}
```
Recommendation: Add a comment explaining why attempt is still used in the while condition even when in persistent mode, or consider using persistentAttempt for the while condition in persistent mode.
File: packages/core/src/utils/retry.ts:15-16 - The constants are defined at module level but the interface RetryOptions allows overriding them. This is fine, but there's no validation that persistentCapMs should be greater than persistentMaxBackoffMs. If someone accidentally swaps them, behavior could be unexpected.
Recommendation: Add validation or at least a comment about the relationship between these values.

🟢 Medium

File: packages/core/src/utils/retry.ts:238-243 - The backoff calculation in persistent mode applies jitter after capping, which could result in delays slightly exceeding the cap (though the comment acknowledges this with + 1 for rounding). The test at line 615 checks 5000 * 1.25 + 1, which is a 25% jitter range, but the code comment says "±25%":
```
delayMs += delayMs * 0.25 * (Math.random() * 2 - 1);
```
This is correct, but the test comment at line 614 says "cap + max jitter" which could be clearer about the 25% figure.
Recommendation: Align the comment to explicitly mention 25% jitter for consistency.
File: packages/core/src/core/geminiChat.ts:592-597 and packages/core/src/core/baseLlmClient.ts:120-125 - The heartbeat function writes to process.stderr with a fixed format. This is good for visibility, but there's no way to customize or suppress this output if a CI environment has specific logging requirements.
Recommendation: Consider making the heartbeat callback optional or providing a way to customize the output format via config.
File: packages/core/src/utils/retry.ts:78-82 - The isEnvTruthy helper function is defined after isUnattendedMode in the diff, but logically it should be defined before since it's a dependency. The current order works but is slightly harder to follow.
Recommendation: Move isEnvTruthy above isUnattendedMode for better code flow.

🔵 Low

File: packages/core/src/utils/retry.ts:51-54 - The HeartbeatInfo interface is exported but only used internally within the retry module. Consider whether this needs to be part of the public API or if it should remain internal.
Suggestion: If external customization of heartbeat callbacks is desired, keep it exported. Otherwise, consider making it internal to reduce API surface.
File: packages/core/src/utils/retry.ts:135-148 - The sleepWithHeartbeat function checks ctx.signal?.aborted at the start of each chunk, but if the signal is aborted mid-chunk (during the delay(chunk) call), it won't be detected until the chunk completes. For a 30-second heartbeat interval, this could mean up to 30 seconds of delay before abort is recognized.
Suggestion: Consider using AbortSignal.timeout() or breaking chunks into smaller intervals for more responsive abort handling, though this is a minor concern given the 30-second default.
File: packages/core/src/utils/retry.test.ts:558-560 - The test for backoff cap enforcement checks delays with 5000 * 1.25 + 1, which includes the 25% jitter. This is correct, but the test comment says "cap + max jitter + rounding" without specifying the 25% figure explicitly.
Suggestion: Add a comment noting the 25% jitter constant for future maintainers.
File: packages/core/src/utils/retry.ts:219 - The debug log message format changes between persistent and normal mode ([Persistent] prefix vs. no prefix). This is good for differentiation, but consider adding the mode indicator to the normal mode logs as well for consistency.
Suggestion: Add [Normal] or similar prefix to non-persistent mode logs for easier log filtering.

✅ Highlights

Excellent test coverage: The test suite comprehensively covers persistent mode behavior including 429/529 indefinite retry, 500 error handling (correctly fails at maxAttempts), backoff cap enforcement, heartbeat invocation, AbortSignal cancellation, and regression tests for normal mode
Clean API design: The persistentMode, heartbeatFn, and signal options are well-integrated into the existing RetryOptions interface without breaking existing callers
Smart use of Retry-After header: The implementation respects server-provided retry timing when available, falling back to exponential backoff otherwise
Heartbeat mechanism: The chunked sleep with heartbeat callbacks is an excellent solution for keeping CI runners alive during long waits
Proper error classification: Excluding HTTP 500 from transient capacity errors is the right call - permanent server bugs shouldn't trigger infinite retries
Non-breaking by default: The design ensures existing behavior is completely unchanged for interactive users

Copilot

Pull request overview

Adds an opt-in “persistent retry” mode to make long-running, unattended Qwen Code runs (e.g., CI/CD) resilient to transient API capacity errors without impacting interactive behavior.

Changes:

Enhanced retryWithBackoff() with persistent retry semantics for 429/529, heartbeat sleeping, and additional options (persistentMode, caps, heartbeat, signal).
Wired persistent retry into core API callers (GeminiChat, BaseLlmClient) and documented the new env var (QWEN_CODE_UNATTENDED_RETRY).
Added unit tests for transient classification, unattended mode detection, and persistent retry behavior.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
packages/core/src/utils/retry.ts	Implements persistent retry mode, heartbeat sleep, and new retry options.
packages/core/src/utils/retry.test.ts	Adds coverage for persistent retry mode + env detection helpers.
packages/core/src/core/geminiChat.ts	Enables persistent mode in Gemini caller with stderr heartbeat logging.
packages/core/src/core/baseLlmClient.ts	Enables persistent mode in base client with stderr heartbeat logging.
docs/users/features/headless.md	Documents “Persistent Retry Mode” usage and monitoring behavior.
docs/users/configuration/settings.md	Adds `QWEN_CODE_UNATTENDED_RETRY` to the environment variables table.

Comments suppressed due to low confidence (1)

packages/core/src/utils/retry.ts:186

The new signal?: AbortSignal option is only consulted inside sleepWithHeartbeat (persistent path). In non-persistent mode (and in the shouldRetryOnContent backoff), delays ignore the signal and the loop doesn't check signal.aborted before calling fn(). If the intent is "graceful cancellation" across all retries, consider checking signal?.aborted at the top of the loop and making all waits abortable (including non-persistent delay(...) calls).

  while (attempt < maxAttempts) {
    attempt++;
    try {
      const result = await fn();

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

wenshao

packages/core/src/core/client.ts:963 still calls retryWithBackoff() without persistentMode: isUnattendedMode() or the new unattended retry wiring, while this PR adds that behavior in BaseLlmClient.generateJson() and GeminiChat.makeApiCallAndProcessStream(). Callers that still route through Client.generateContent()—including packages/core/src/tools/web-fetch.ts:111, packages/cli/src/ui/commands/btwCommand.ts:66, and packages/cli/src/ui/commands/summaryCommand.ts:92—will still fail after the normal retry budget during 429/529 capacity outages, so unattended retry remains inconsistent across top-level model entry points. Please apply the same unattended retry options here, or centralize the policy in a shared wrapper so all top-level model calls opt in consistently.

— gpt-5.4 via Qwen Code /review

zhangxy-zju · 2026-04-12T09:34:54Z

@wenshao Good catch — client.ts:963 was indeed missing the persistent retry wiring.

Fixed in 389e856: added persistentMode: isUnattendedMode() and signal: abortSignal to the retryWithBackoff() call in GeminiClient.generateContent(), matching the existing wiring in baseLlmClient.ts and geminiChat.ts.

All three retryWithBackoff() call sites now consistently forward persistent mode and abort signal.

Fixed in ec7659c — added persistentMode and signal to client.ts:963 retryWithBackoff() call. All three call sites now have consistent wiring.

wenshao

See inline comments. — gpt-5.4 via Qwen Code /review

wenshao · 2026-04-12T21:01:12Z

+
+  while (remaining > 0) {
+    if (ctx.signal?.aborted) {
+      throw new Error('Retry aborted by signal');


[Suggestion] sleepWithHeartbeat() says it supports AbortSignal, but each chunk still waits on a plain timer. If the signal is aborted after a chunk starts, cancellation is delayed until that chunk finishes, which can be up to the full heartbeat interval in persistent mode.

— gpt-5.4 via Qwen Code /review

Acknowledged but won't fix — the worst-case cancel latency in persistent mode is bounded by heartbeatIntervalMs (default 30s), and a single chunk is already a discrete sleep that ends naturally without any work happening during it. Making each chunk Promise.race([delay, signal])-based would add abort-listener wiring/cleanup for ≤30s of latency improvement.

This is consistent with the previously declined Copilot suggestion about non-persistent delay() not honoring signal — same rationale: the bounded wait makes the responsiveness gap not worth the complexity. Happy to revisit if there's a concrete user-facing scenario where 30s cancel lag is a problem.

When running in CI/CD pipelines or background daemon mode, transient API capacity errors (429/529) should not terminate long-running tasks after a fixed number of retries. This adds an environment-aware persistent retry mode that retries indefinitely for transient errors, with exponential backoff capped at 5 minutes and heartbeat keepalives every 30 seconds to prevent CI runner timeouts.

Add environment variable entries (QWEN_CODE_UNATTENDED_RETRY, QWEN_CODE_BG) to the settings reference, and a new "Persistent Retry Mode" section to the headless mode docs covering activation, behavior, and CI/CD usage examples.

…NDED_RETRY Remove QWEN_CODE_BG and CI=true as activation triggers for persistent retry. Having multiple env vars with identical behavior adds confusion, and silently activating infinite retry on CI=true is dangerous — a regular CI test hitting a 429 would hang forever instead of failing fast.

- Forward caller's abortSignal into retryWithBackoff in both baseLlmClient.ts and geminiChat.ts so persistent waits remain cancellable (wenshao) - Re-apply maxBackoff and capMs after jitter so delays strictly respect stated caps (Copilot) - Respect shouldRetryOnError in persistent mode so callers can force fast-fail even for transient 429/529 errors (Copilot) - Guard sleepWithHeartbeat against infinite loop when heartbeat interval is <= 0 via Math.max(1, ...) (Copilot) - Normalize isEnvTruthy with trim/toLowerCase for robust env var parsing across CI conventions (Copilot)

…beat zero-interval guard

Server-specified Retry-After values should only be limited by the absolute cap (capMs/6h), not the exponential backoff cap (maxBackoff/5min). Jitter is also skipped for Retry-After since the server already specified the exact wait time.

…ention Replace custom isEnvTruthy (trim + toLowerCase) with strict matching (val === 'true' || val === '1') to match parseBooleanEnvFlag used elsewhere in the codebase. Prevents inconsistent behavior where 'TRUE' or ' 1 ' would activate persistent retry here but not in telemetry or other env-driven features.

Cover three key behaviors: - Retry-After is NOT capped at maxBackoff (only at capMs) - Retry-After IS capped at persistentCapMs absolute limit - Retry-After delays have no jitter applied

The existing vi.mock for retry.js only exported retryWithBackoff. After adding isUnattendedMode to the retry module, baseLlmClient.ts imports it, causing all 10 generateJson tests to fail with 'No "isUnattendedMode" export is defined on the mock'.

Forward persistentMode and abortSignal to retryWithBackoff() in GeminiClient.generateContent(), matching the existing wiring in baseLlmClient.ts and geminiChat.ts.

Resolve conflict in docs/users/configuration/settings.md by keeping both QWEN_CODE_UNATTENDED_RETRY (this branch) and QWEN_CODE_PROFILE_STARTUP (main) rows. Address PR QwenLM#3080 review (wenshao 2026-04-19): add the missing heartbeatFn to the retryWithBackoff() call in GeminiClient.generateContent(), matching the stderr keepalive pattern already used in geminiChat.ts and baseLlmClient.ts. Without this, unattended retries on the non-streaming content path stay silent during long 429/529 outages, contradicting the heartbeat-based monitoring promised in the new docs.

zhangxy-zju · 2026-04-21T12:03:40Z

Validation Strategy & Results

After the merge + heartbeat-fix commit (3f6020f0), I ran a 4-layer validation. 86 assertions pass, 0 manual steps remaining. Scripts are self-contained — running all 4 layers on a clean checkout takes < 30s.

Layer 1 — Unit tests (56/56)

vitest run packages/core/src/utils/retry.test.ts — covers all branches of retry.ts: base retry, Qwen OAuth quota handling, isTransientCapacityError, isUnattendedMode strict matching, persistent mode core paths, Retry-After, multi-shape getErrorStatus.

Layer 2 — In-process integration (17/17)

Parent spawns a worker subprocess that imports the built dist/ artifact (not the TS source vitest compiles in-memory) and exercises 6 scenarios under real process.env.QWEN_CODE_UNATTENDED_RETRY. Catches issues that vi.stubEnv cannot.

Key evidence — T3 (persistent 429 breaks past maxAttempts=5):

attempts=12, heartbeatCount=29
heartbeatAttempts=[1, 2, 3,3,3, 4,4,4, 5,5,5, 6,6,6, ..., 11,11,11]

Layer 3 — fake provider e2e (8/8)

Local http.Server returning controlled 429/529 + Retry-After + real OpenAI SDK (maxRetries: 0 to disable SDK-internal retry). Validates the real SDK error shape flowing through retryWithBackoff, which mocked unit tests cannot do.

Key evidence:

S1: persistent mode succeeds via exponential backoff (1419ms, heartbeat fires)
S2: 529 survives maxAttempts=1, succeeds on 3rd call
S3: real RateLimitError → getErrorStatus extracts status=429 correctly
S4: AbortSignal cancels within 776ms despite server advertising Retry-After: 5

Layer 4 — SIGINT + streaming stderr e2e (5/5)

Previously considered manual-only. Turns out child.kill('SIGINT') is equivalent to terminal Ctrl+C, and stderr realtime-vs-buffered is observable via timestamped child.stderr.on('data'). So all 3 originally-manual checks (Ctrl+C responsiveness, stderr realtime, GHA-equivalent integration) are scripted.

Key evidence:

[ 1207ms] [qwen-code] Waiting for API capacity... attempt 2, retry in 1s   ← realtime stderr
[ 1218ms] SIGINT sent
[ 1218ms] [worker] SIGINT received, aborting...
[ 1258ms] [worker] EXIT error=Retry aborted by signal                      ← 40ms SIGINT latency, exit code 130

SIGINT latency 40ms — far below the design upper bound of heartbeatIntervalMs (400ms in test, 30s in production).

Pre-existing bug surfaced (NOT introduced by this PR, does NOT block merge)

The fake-provider e2e caught a bug in getRetryAfterDelayMs(): it only reads error.response.headers['retry-after'] (axios shape), but OpenAI SDK 5.x exposes headers at error.headers (a Web Headers instance) with no error.response at all. Unit tests missed this because they construct fake errors in the axios shape.

Observed: server sends Retry-After: 5, retry fires after ~100ms (exponential backoff fallback) instead of 5s.

Scope: affects all OpenAI-SDK-based providers (Qwen DashScope OpenAI-compatible mode, etc.). Pre-dates #3080 (introduced in a9d6965be pre-release commit). Since this PR's core promise (long tasks survive 429/529) is fulfilled by the exponential backoff fallback, fixing this is a separate concern — I'll open a follow-up issue.

One-shot reproducer

# Unit + typecheck
(cd packages/core && npx vitest run src/utils/retry.test.ts && npx tsc --noEmit)

# Then the three e2e scripts (happy to share them in a gist if useful):
node run_integration_test.mjs          # 17 assertions
node run_e2e_with_fake_provider.mjs    # 8 assertions (OpenAI SDK + fake server)
node run_signal_and_stream_test.mjs    # 5 assertions (SIGINT + stderr realtime)

zhangxy-zju · 2026-04-21T12:18:51Z

Follow-up: both SDK paths verified — Gemini path is worse

Validated the @google/genai path too with 7 new e2e assertions (same fake-provider harness as before, pointed at the Gemini SDK). Result: retry.ts works on both SDK paths, but Retry-After is honored on neither — and the Gemini path is worse than OpenAI.

SDK error shape comparison

SDK	`.status`	`.headers`	`.response`	Can we recover `retry-after`?
`openai@5.11.0` (`RateLimitError`)	429 ✓	`Headers` instance (`.get('retry-after')`)	`undefined`	Yes — patch `getRetryAfterDelayMs` to also read `error.headers?.get?.('retry-after')`
`@google/genai@1.30.0` (`ApiError`)	429 ✓	`undefined`	`undefined`	No — SDK drops all headers at the error boundary. Only `[stack, message, name, status]` survive.

// @google/genai ApiError probe output:
ERROR class: ApiError
e.status: 429                ← the only recoverable field
e.headers: undefined
e.response: undefined
all property names: [stack, message, name, status]

Impact for this PR

Neither finding blocks #3080 merge:

Core promise (429/529 survive in unattended mode) is fulfilled via exponential backoff on both SDK paths — isTransientCapacityError only needs status, which both SDKs expose.
Observed behavior: 3 × 429 with Retry-After: 5 from fake server completes in ~260ms (expected ~15s if honored) on both paths — degraded to exponential backoff, but the retry itself works and eventually succeeds.

Follow-up fix requires two things

OpenAI path — local fix in getRetryAfterDelayMs: try error.headers?.get?.('retry-after') and error.headers?.['retry-after'] before the existing error.response.headers[…] axios-shape path. Plus a real RateLimitError regression test case (current unit tests construct fake errors in the axios shape, which is why this wasn't caught).
Gemini path — upstream issue on googleapis/js-genai asking ApiError to preserve response headers (at least retry-after). No local fix possible until then.

Happy to open both as separate issues after this PR lands.

Total script coverage now: 93 assertions (56 unit + 17 in-process + 8 OpenAI e2e + 7 Gemini e2e + 5 SIGINT/stderr e2e), still 0 manual steps remaining.

wenshao

No issues found. LGTM! ✅ — gpt-5.4 via Qwen Code /review

wenshao · 2026-04-21T14:00:03Z

Ran the full suite on this branch plus a small set of observational tests against the real timer to eyeball the heartbeat / backoff behavior. All green.

Tests

Suite	Result
`packages/core/src/utils/retry.test.ts`	56/56 ✓
`packages/core/src/core/baseLlmClient.test.ts`	18/18 ✓
`packages/core/src/core/geminiChat.test.ts`	52/52 ✓
`tsc --noEmit` (packages/core)	clean

Observational scenarios (real timers, `MANUAL_RETRY_OBSERVE=1`)

#	Scenario	Verified
1	429 persistent retry bypasses `maxAttempts`	`maxAttempts=2`, took 4 calls to succeed; 15 heartbeats fired; exponential backoff visible (~300ms → ~480ms → ~1090ms)
2	500 does not trigger persistent mode	still throws after `maxAttempts=3` — 3 calls total
3	`AbortSignal` cancels mid-sleep	throws `Retry aborted by signal`
4	`heartbeatIntervalMs=0` infinite-loop guard	`Math.max(1, …)` works, returns success
5	`shouldRetryOnError=false` vetoes persistent mode	single call, fast-fail

Non-blocking nits

The three-line heartbeatFn closure writing to process.stderr is duplicated verbatim in geminiChat.ts, client.ts, and baseLlmClient.ts. Worth extracting a defaultHeartbeatToStderr helper in retry.ts so callers can just pass a reference.
editor.ts / editor.test.ts contain unrelated formatting changes (indentation + line-wrap reflow) — probably prettier-on-save. Would be cleaner split into their own commit/PR.
The attempt = maxAttempts - 1 clamp at retry.ts:275-277 works and mirrors the reference implementation, so it's fine — a one-line comment explaining why the loop is kept alive that way would help future readers.

Code looks solid overall, functionality lines up with the PR description. 👍

wenshao · 2026-04-21T14:06:47Z

Since the PR body cites Claude Code's services/api/withRetry.ts as the reference, I went through both side-by-side. Posting the diff-of-capabilities here so future readers can tell what was ported faithfully vs. deliberately dropped.

Ported verbatim / structurally equivalent

Element	Reference	PR #3080
Single backoff cap	`PERSISTENT_MAX_BACKOFF_MS = 5 min`	same constant, same value
Absolute wait cap	`PERSISTENT_RESET_CAP_MS = 6 h`	`PERSISTENT_CAP_MS = 6 h`
Heartbeat interval	`HEARTBEAT_INTERVAL_MS = 30 s`	same constant, same value
Trigger set	429 + 529 only	same
Dual counter (`attempt` + `persistentAttempt`)	yes	yes
Attempt-clamp trick to keep the main loop alive	`if (attempt >= maxRetries) attempt = maxRetries`	`if (attempt >= maxAttempts) attempt = maxAttempts - 1`
Chunked sleep with heartbeat	inline while-loop	extracted into a `sleepWithHeartbeat` helper (cleaner)
`AbortSignal` mid-sleep cancellation	yes	yes
`Retry-After` respected, not clamped to `maxBackoff`	yes	yes
Explicit env opt-in, no implicit `CI=true` activation	`CLAUDE_CODE_UNATTENDED_RETRY`	`QWEN_CODE_UNATTENDED_RETRY`

Correctly dropped (not applicable to qwen-code)

Reference feature	Why it's not a gap here
`anthropic-ratelimit-unified-reset` header handling (one-shot wait until window reset)	Anthropic-specific header; Qwen/DashScope and OpenAI-compatible endpoints don't emit it
`x-should-retry` header	Anthropic-proprietary non-standard header (the reference's own comment: "Note this is not a standard header."), consumed only by `@anthropic-ai/sdk`. qwen-code's upstreams don't send it
`FOREGROUND_529_RETRY_SOURCES` / `QuerySource` taxonomy	qwen-code has no per-call source classification; concept not applicable
Fast-mode cooldown & short-vs-long retry-after branching	Reference-product-specific, no analog here
Subscriber-tier gating (`isClaudeAISubscriber` / `isEnterpriseSubscriber`)	No subscriber model in qwen-code
Consecutive-529 counter + primary→fallback model switch	No built-in primary/fallback model pair
Auth-refresh / credential-cache clearing (401, 403 token-revoked, Bedrock, Vertex, ECONNRESET)	qwen-code handles auth in each provider layer, not in the retry layer

One notable simplification

Heartbeat transport. The reference yields a structured { type: 'system', subtype: 'api_retry', ... } event via async generator, which a downstream SDK/UI can consume. This PR writes a plain text line to stderr: [qwen-code] Waiting for API capacity... attempt N, retry in Ns. That's sufficient for the stated CI/daemon target — any future SDK consumer would need to convert this to a structured event.

Bottom line

The port is faithful on the core mechanism. Everything that was cut falls into "reference-product/SDK-specific" or "architectural choice that doesn't exist in qwen-code" — I don't see a retry-correctness gap. 👍

…nts (#3080) * feat(retry): add persistent retry mode for unattended CI/CD environments When running in CI/CD pipelines or background daemon mode, transient API capacity errors (429/529) should not terminate long-running tasks after a fixed number of retries. This adds an environment-aware persistent retry mode that retries indefinitely for transient errors, with exponential backoff capped at 5 minutes and heartbeat keepalives every 30 seconds to prevent CI runner timeouts. * docs: add persistent retry mode documentation Add environment variable entries (QWEN_CODE_UNATTENDED_RETRY, QWEN_CODE_BG) to the settings reference, and a new "Persistent Retry Mode" section to the headless mode docs covering activation, behavior, and CI/CD usage examples. * refactor(retry): simplify to single explicit env var QWEN_CODE_UNATTENDED_RETRY Remove QWEN_CODE_BG and CI=true as activation triggers for persistent retry. Having multiple env vars with identical behavior adds confusion, and silently activating infinite retry on CI=true is dangerous — a regular CI test hitting a 429 would hang forever instead of failing fast. * fix(retry): address PR review feedback - Forward caller's abortSignal into retryWithBackoff in both baseLlmClient.ts and geminiChat.ts so persistent waits remain cancellable (wenshao) - Re-apply maxBackoff and capMs after jitter so delays strictly respect stated caps (Copilot) - Respect shouldRetryOnError in persistent mode so callers can force fast-fail even for transient 429/529 errors (Copilot) - Guard sleepWithHeartbeat against infinite loop when heartbeat interval is <= 0 via Math.max(1, ...) (Copilot) - Normalize isEnvTruthy with trim/toLowerCase for robust env var parsing across CI conventions (Copilot) * test(retry): add missing UT for shouldRetryOnError override and heartbeat zero-interval guard * fix(retry): do not cap Retry-After delays at maxBackoff Server-specified Retry-After values should only be limited by the absolute cap (capMs/6h), not the exponential backoff cap (maxBackoff/5min). Jitter is also skipped for Retry-After since the server already specified the exact wait time. * refactor(retry): align isUnattendedMode with project env parsing convention Replace custom isEnvTruthy (trim + toLowerCase) with strict matching (val === 'true' || val === '1') to match parseBooleanEnvFlag used elsewhere in the codebase. Prevents inconsistent behavior where 'TRUE' or ' 1 ' would activate persistent retry here but not in telemetry or other env-driven features. * test(retry): add Retry-After handling tests for persistent mode Cover three key behaviors: - Retry-After is NOT capped at maxBackoff (only at capMs) - Retry-After IS capped at persistentCapMs absolute limit - Retry-After delays have no jitter applied * fix(test): add isUnattendedMode to retry.js mock in baseLlmClient tests The existing vi.mock for retry.js only exported retryWithBackoff. After adding isUnattendedMode to the retry module, baseLlmClient.ts imports it, causing all 10 generateJson tests to fail with 'No "isUnattendedMode" export is defined on the mock'. * fix(retry): wire persistent retry mode into client.ts generateContent Forward persistentMode and abortSignal to retryWithBackoff() in GeminiClient.generateContent(), matching the existing wiring in baseLlmClient.ts and geminiChat.ts.

…nts (QwenLM#3080) * feat(retry): add persistent retry mode for unattended CI/CD environments When running in CI/CD pipelines or background daemon mode, transient API capacity errors (429/529) should not terminate long-running tasks after a fixed number of retries. This adds an environment-aware persistent retry mode that retries indefinitely for transient errors, with exponential backoff capped at 5 minutes and heartbeat keepalives every 30 seconds to prevent CI runner timeouts. * docs: add persistent retry mode documentation Add environment variable entries (QWEN_CODE_UNATTENDED_RETRY, QWEN_CODE_BG) to the settings reference, and a new "Persistent Retry Mode" section to the headless mode docs covering activation, behavior, and CI/CD usage examples. * refactor(retry): simplify to single explicit env var QWEN_CODE_UNATTENDED_RETRY Remove QWEN_CODE_BG and CI=true as activation triggers for persistent retry. Having multiple env vars with identical behavior adds confusion, and silently activating infinite retry on CI=true is dangerous — a regular CI test hitting a 429 would hang forever instead of failing fast. * fix(retry): address PR review feedback - Forward caller's abortSignal into retryWithBackoff in both baseLlmClient.ts and geminiChat.ts so persistent waits remain cancellable (wenshao) - Re-apply maxBackoff and capMs after jitter so delays strictly respect stated caps (Copilot) - Respect shouldRetryOnError in persistent mode so callers can force fast-fail even for transient 429/529 errors (Copilot) - Guard sleepWithHeartbeat against infinite loop when heartbeat interval is <= 0 via Math.max(1, ...) (Copilot) - Normalize isEnvTruthy with trim/toLowerCase for robust env var parsing across CI conventions (Copilot) * test(retry): add missing UT for shouldRetryOnError override and heartbeat zero-interval guard * fix(retry): do not cap Retry-After delays at maxBackoff Server-specified Retry-After values should only be limited by the absolute cap (capMs/6h), not the exponential backoff cap (maxBackoff/5min). Jitter is also skipped for Retry-After since the server already specified the exact wait time. * refactor(retry): align isUnattendedMode with project env parsing convention Replace custom isEnvTruthy (trim + toLowerCase) with strict matching (val === 'true' || val === '1') to match parseBooleanEnvFlag used elsewhere in the codebase. Prevents inconsistent behavior where 'TRUE' or ' 1 ' would activate persistent retry here but not in telemetry or other env-driven features. * test(retry): add Retry-After handling tests for persistent mode Cover three key behaviors: - Retry-After is NOT capped at maxBackoff (only at capMs) - Retry-After IS capped at persistentCapMs absolute limit - Retry-After delays have no jitter applied * fix(test): add isUnattendedMode to retry.js mock in baseLlmClient tests The existing vi.mock for retry.js only exported retryWithBackoff. After adding isUnattendedMode to the retry module, baseLlmClient.ts imports it, causing all 10 generateJson tests to fail with 'No "isUnattendedMode" export is defined on the mock'. * fix(retry): wire persistent retry mode into client.ts generateContent Forward persistentMode and abortSignal to retryWithBackoff() in GeminiClient.generateContent(), matching the existing wiring in baseLlmClient.ts and geminiChat.ts.

zhangxy-zju requested review from Copilot, tanzhenxin and wenshao April 10, 2026 04:52

Copilot started reviewing on behalf of zhangxy-zju April 10, 2026 04:53 View session

wenshao requested changes Apr 10, 2026

View reviewed changes

Comment thread packages/core/src/core/baseLlmClient.ts

Comment thread packages/core/src/core/geminiChat.ts

Copilot AI reviewed Apr 10, 2026

View reviewed changes

Comment thread packages/core/src/utils/retry.ts Outdated

Comment thread packages/core/src/utils/retry.ts Outdated

Comment thread packages/core/src/utils/retry.ts Outdated

Comment thread packages/core/src/utils/retry.ts Outdated

tanzhenxin added the type/feature-request New feature or enhancement request label Apr 10, 2026

zhangxy-zju requested review from Copilot and jifeng April 10, 2026 05:12

Copilot started reviewing on behalf of zhangxy-zju April 10, 2026 05:19 View session

zhangxy-zju requested review from Mingholy, pomelo-nwu, wenshao and yiliang114 April 10, 2026 05:22

Copilot AI reviewed Apr 10, 2026

View reviewed changes

Comment thread packages/core/src/utils/retry.ts Outdated

Comment thread packages/core/src/utils/retry.ts

Comment thread packages/core/src/utils/retry.ts

Comment thread packages/core/src/utils/retry.ts

wenshao previously requested changes Apr 12, 2026

View reviewed changes

wenshao reviewed Apr 12, 2026

View reviewed changes

github-actions Bot mentioned this pull request Apr 13, 2026

📊 AI CLI 工具社区动态日报 2026-04-13 gsscsd/big_model_radar#177

Open

zhangxy-zju added 8 commits April 13, 2026 10:36

docs: add persistent retry mode documentation

7c270b9

Add environment variable entries (QWEN_CODE_UNATTENDED_RETRY, QWEN_CODE_BG) to the settings reference, and a new "Persistent Retry Mode" section to the headless mode docs covering activation, behavior, and CI/CD usage examples.

test(retry): add missing UT for shouldRetryOnError override and heart…

7f5c60e

…beat zero-interval guard

test(retry): add Retry-After handling tests for persistent mode

e9569ee

Cover three key behaviors: - Retry-After is NOT capped at maxBackoff (only at capMs) - Retry-After IS capped at persistentCapMs absolute limit - Retry-After delays have no jitter applied

zhangxy-zju added 2 commits April 13, 2026 10:39

fix(retry): wire persistent retry mode into client.ts generateContent

051f367

Forward persistentMode and abortSignal to retryWithBackoff() in GeminiClient.generateContent(), matching the existing wiring in baseLlmClient.ts and geminiChat.ts.

zhangxy-zju force-pushed the feat/persistent-retry-mode branch from ec7659c to 051f367 Compare April 13, 2026 02:39

wenshao reviewed Apr 19, 2026

View reviewed changes

Comment thread packages/core/src/core/client.ts

wenshao approved these changes Apr 21, 2026

View reviewed changes

wenshao merged commit ebe364d into QwenLM:main Apr 21, 2026
13 checks passed

Conversation

zhangxy-zju commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

What changed

Core: packages/core/src/utils/retry.ts

Callers: geminiChat.ts, baseLlmClient.ts

Docs

Non-breaking

How to verify

Unit tests

Manual / integration

Uh oh!

github-actions Bot commented Apr 10, 2026

📋 Review Summary

🔍 General Feedback

🎯 Specific Feedback

🔴 Critical

🟡 High

🟢 Medium

🔵 Low

✅ Highlights

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wenshao left a comment

Choose a reason for hiding this comment

Uh oh!

zhangxy-zju commented Apr 12, 2026

Uh oh!

wenshao left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

wenshao Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

zhangxy-zju Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zhangxy-zju commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Validation Strategy & Results

Layer 1 — Unit tests (56/56)

Layer 2 — In-process integration (17/17)

Layer 3 — fake provider e2e (8/8)

Layer 4 — SIGINT + streaming stderr e2e (5/5)

Pre-existing bug surfaced (NOT introduced by this PR, does NOT block merge)

One-shot reproducer

Uh oh!

zhangxy-zju commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Follow-up: both SDK paths verified — Gemini path is worse

SDK error shape comparison

Impact for this PR

Follow-up fix requires two things

Uh oh!

wenshao left a comment

Choose a reason for hiding this comment

Uh oh!

wenshao commented Apr 21, 2026

Tests

zhangxy-zju commented Apr 10, 2026 •

edited

Loading

Core: `packages/core/src/utils/retry.ts`

Callers: `geminiChat.ts`, `baseLlmClient.ts`

zhangxy-zju commented Apr 21, 2026 •

edited

Loading

zhangxy-zju commented Apr 21, 2026 •

edited

Loading

Observational scenarios (real timers, `MANUAL_RETRY_OBSERVE=1`)