fix(core): honor output language in side queries by Jerry2003826 · Pull Request #4519 · QwenLM/qwen-code

Jerry2003826 · 2026-05-25T21:43:23Z

What this PR does

Honors the configured output language for user-visible side queries without duplicating the language instruction in project-summary prompts.

Why it's needed

Some side-query results are shown directly to users, such as session titles, session recaps, tool-use summaries, suggestions, web-fetch summaries, generated subagent artifacts, arena summaries, and /insight reports. These should follow output-language.md. Internal structured side queries and summary prompts that already carry the language instruction should not receive duplicate language rules.

Reviewer Test Plan

How to verify

Review the changed runSideQuery call sites and confirm only user-visible side queries pass respectOutputLanguagePreference: true, while internal classifier/selector-style calls and project-summary prompts do not. Then run the targeted checks below.

npm run test:ci --workspace=packages/core -- src/permissions/classifier.test.ts src/services/sessionRecap.test.ts src/utils/sideQuery.test.ts
npm run test --workspace=packages/cli -- src/services/insight/generators/DataProcessor.test.ts
npx prettier --check packages/core/src/services/sessionRecap.test.ts packages/cli/src/services/insight/generators/DataProcessor.ts packages/cli/src/services/insight/generators/DataProcessor.test.ts
npx eslint packages/core/src/services/sessionRecap.test.ts packages/cli/src/services/insight/generators/DataProcessor.ts packages/cli/src/services/insight/generators/DataProcessor.test.ts
npm run typecheck --workspace=packages/core
git diff --check

Expected result: the core side-query tests pass, the insight generator tests pass, and the DataProcessor calls for insight-session-analysis and insight-qualitative-generate both include respectOutputLanguagePreference: true.

Evidence (Before & After)

Before: switching runSideQuery from unconditional output-language injection to opt-in could make /insight reports ignore output-language.md, and the session recap path lacked direct coverage that generateSessionRecap() wires the recap side query with output-language opt-in.

After: DataProcessor opts in for both user-visible /insight JSON side queries, DataProcessor.test.ts verifies both call boundaries, and sessionRecap.test.ts verifies generateSessionRecap() calls the recap side query with respectOutputLanguagePreference: true and extracts the tagged recap. This is prompt-construction behavior, so TUI screenshots are N/A.

Tested on

OS	Status
macOS	CI
Windows	tested locally
Linux	CI

Environment (optional)

Local Windows/PowerShell checkout with repository npm workspaces. Full local packages/cli typecheck is not listed as a passing local check because this checkout currently reports unrelated generated/package-link errors in serve/acp/channel/web-template areas; the changed files pass targeted tests and eslint, and CI runs the repository matrix.

Risk & Scope

Main risk or tradeoff: The opt-in policy must include all user-visible side-query callers without broadening to internal classifier/selector calls.
Not validated / out of scope: No TUI screenshot because the change is prompt construction and side-query routing, not a visible TUI state. No unrelated compression, provider, or UI behavior changes.
Breaking changes / migration notes: None expected.

Linked Issues

Fixes #4494

中文说明

这个 PR 做了什么

让用户可见的 side query 遵守 output-language.md，同时避免在 project summary prompt 里重复注入语言要求。

为什么需要

session title、session recap、tool-use summary、follow-up suggestion、web-fetch summary、subagent artifact、arena summary 和 /insight report 这些内容会展示给用户，因此应该遵守输出语言偏好。内部 classifier/selector 类 side query 不应该被扩大影响。

Reviewer Test Plan

审核时可以检查用户可见的 runSideQuery 调用是否传入 respectOutputLanguagePreference: true，内部结构化调用是否没有 opt in。已在 Windows 本地验证 core side-query/sessionRecap 组合测试、CLI DataProcessor 测试、changed-file prettier/eslint、core typecheck 和 git diff --check。

证据

修改前：opt-in 切换后 /insight 这类用户可见报告可能漏掉输出语言偏好，session recap 外层调用也缺少直接覆盖。修改后：DataProcessor 两个 /insight side query 都 opt in，测试覆盖 call boundary；generateSessionRecap() 测试确认 recap side query opt in 并提取 <recap> 内容。该改动是 prompt 构造逻辑，TUI 截图不适用。

风险和范围

风险主要在 opt-in 覆盖范围：需要覆盖用户可见 side query，但不扩大到内部 classifier/selector。没有公共 API 或迁移影响。

wenshao

[Suggestion] Missed callers that produce user-visible text

Three additional runSideQuery callers produce user-visible output but were not updated with respectOutputLanguagePreference: true:

packages/core/src/tools/web-fetch.ts:154 — web content summaries shown to users
packages/core/src/agents/arena/ArenaManager.ts:1690 — agent approach summaries for side-by-side comparison
packages/core/src/utils/subagentGenerator.ts:129 — generated agent name/description visible to users

Consider adding the flag to these call sites for consistency with the four callers already updated.

[Suggestion] Test gaps in sideQuery.test.ts

The new tests cover the happy path (flag=true, valid file with content) but miss three realistic edge cases:

File does not exist (exercises the catch {} path at sideQuery.ts:157)
File is empty or whitespace-only (exercises content.trim() || undefined)
getOutputLanguageFilePath() returns undefined with respectOutputLanguagePreference: true (exercises the early return at sideQuery.ts:152)

— qwen3.7-max via Qwen Code /review

wenshao · 2026-05-26T01:06:32Z

+  config: Config,
+): Promise<string | undefined> {
+  const filePath = config.getOutputLanguageFilePath?.();
+  if (!filePath) return undefined;


[Suggestion] readOutputLanguagePreference calls readFile on every runSideQuery invocation with no caching. The file path is static for the Config instance lifetime, and generateToolUseSummary fires on every completed tool batch — so the same small file is read from disk dozens of times per session.

Consider memoizing the result (keyed by filePath):

let cachedPreference: { filePath: string; value: string | undefined } | undefined; async function readOutputLanguagePreference( config: Config, ): Promise<string | undefined> { const filePath = config.getOutputLanguageFilePath?.(); if (!filePath) return undefined; if (cachedPreference?.filePath === filePath) return cachedPreference.value; try { const content = await readFile(filePath, 'utf8'); const value = content.trim() || undefined; cachedPreference = { filePath, value }; return value; } catch { cachedPreference = { filePath, value: undefined }; return undefined; } }

— qwen3.7-max via Qwen Code /review

Handled in 84eec74. readOutputLanguagePreference now memoizes by file path, and the new regression mutates the same output-language.md after the first side query to verify subsequent calls reuse the cached preference instead of re-reading.

wenshao · 2026-05-26T01:06:32Z

    } as unknown as Config;
  });

+  function writeOutputLanguageFile(content: string): string {


[Suggestion] writeOutputLanguageFile is duplicated verbatim across 5 test files (sideQuery.test.ts, sessionRecap.test.ts, sessionTitle.test.ts, toolUseSummary.test.ts, suggestionGenerator.test.ts). Only the temp directory prefix string differs.

Consider extracting to a shared test utility:

// packages/core/src/test-utils/outputLanguage.ts export function writeOutputLanguageFile(content: string): string { const dir = fs.mkdtempSync(path.join(os.tmpdir(), 'qwen-output-lang-')); const file = path.join(dir, 'output-language.md'); fs.writeFileSync(file, content, 'utf8'); return file; }

— qwen3.7-max via Qwen Code /review

I left this one out of the focused bug fix for now. Extracting the duplicated test helper would touch several unrelated test files, while this PR's behavioral changes stay limited to side-query output-language plumbing and its direct coverage.

wenshao · 2026-05-26T01:06:32Z

        { role: 'user', parts: [{ text: RECAP_USER_PROMPT }] },
      ],
      systemInstruction: RECAP_SYSTEM_PROMPT,
+      respectOutputLanguagePreference: true,


[Suggestion] RECAP_SYSTEM_PROMPT (line 20) and TITLE_SYSTEM_PROMPT (line 26 in sessionTitle.ts) both contain the instruction "Match the dominant language of the conversation (English or Chinese)." When respectOutputLanguagePreference: true appends a conflicting directive like "You MUST always respond in Chinese", the model receives two contradictory language instructions. The outcome is model-dependent and non-deterministic.

Consider either removing the "Match the dominant language" sentence from these prompts, or appending an explicit override (e.g., "This overrides any prior language instructions in this prompt.").

— qwen3.7-max via Qwen Code /review

Handled in 84eec74. The output-language append now includes an explicit override sentence, so a configured output-language preference supersedes earlier language-selection guidance in prompts like recap/title without editing each caller prompt separately.

pomelo-nwu

Hi @Jerry2003826, thank you for your continued contributions — 9 PRs in a short time is impressive! 🎉

As we review your changes, we'd like to ask you to update each PR to follow the latest PR template on the main branch. The most important section is the Reviewer Test Plan, which significantly accelerates the review and merge process.

Specifically, for each PR please include:

How to verify — clear reproduction steps so a reviewer can confirm the fix/feature
Evidence (Before & After) — use the tmux-real-user-testing skill (or manual tmux capture) to show before/after screenshots of the TUI behavior. Side-by-side evidence makes it much faster for maintainers to validate and merge
Tested on — fill in the OS table (macOS / Windows / Linux)

PRs with a complete Reviewer Test Plan are prioritized for review — without it, review may be delayed.

You can see the full template at: .github/pull_request_template.md

Thanks again for your effort — looking forward to getting these merged! 🚀

中文说明

你好 @Jerry2003826，感谢你的持续贡献——短时间内提交了 9 个 PR，非常高效！🎉

在 review 过程中，我们希望你能按照 main 分支上最新的 PR 模版更新每个 PR 的描述。其中最关键的部分是 Reviewer Test Plan，它能显著加速审核和合并流程。

具体来说，请为每个 PR 补充：

How to verify — 清晰的复现步骤，让 reviewer 能确认修复/功能的效果
Evidence (Before & After) — 使用 tmux-real-user-testing skill（或手动 tmux 截取）展示修改前后的 TUI 截图对比，前后对比能让维护者更快地验证和合并
Tested on — 填写操作系统测试表格（macOS / Windows / Linux）

有完整 Reviewer Test Plan 的 PR 会被优先审核——缺少该部分可能会导致审核延迟。

完整模版见：.github/pull_request_template.md

再次感谢你的付出，期待尽快把这些 PR 合并！🚀

— Qwen Code

Jerry2003826 · 2026-05-26T02:17:53Z

Updated in a22d233. This addresses the review comments by adding respectOutputLanguagePreference: true to the remaining user-visible runSideQuery callers:

web-fetch summaries
Arena approach summaries
generated subagent artifacts

I also added coverage for the requested sideQuery edge cases: missing output-language file, blank output-language file, and getOutputLanguageFilePath() returning undefined while the flag is enabled. The Arena summary test was adjusted to return mock summaries based on each prompt's agentId instead of concurrent call order, because output-language file reads make the two summary calls race in a realistic way.

Validation run locally on Windows:

npm run test --workspace=packages/core -- src/tools/web-fetch.test.ts -t "output language"
npm run test --workspace=packages/core -- src/utils/subagentGenerator.test.ts -t "output language"
npm run test --workspace=packages/core -- src/agents/arena/ArenaManager.test.ts -t "routes all approach summaries"
npm run test --workspace=packages/core -- src/utils/sideQuery.test.ts -t "output language path|output language file is blank"
npm run test --workspace=packages/core -- src/utils/sideQuery.test.ts src/services/sessionTitle.test.ts src/services/sessionRecap.test.ts src/services/toolUseSummary.test.ts src/followup/suggestionGenerator.test.ts src/tools/web-fetch.test.ts src/agents/arena/ArenaManager.test.ts src/utils/subagentGenerator.test.ts
npx prettier --check packages/core/src/utils/sideQuery.test.ts packages/core/src/tools/web-fetch.ts packages/core/src/tools/web-fetch.test.ts packages/core/src/agents/arena/ArenaManager.ts packages/core/src/agents/arena/ArenaManager.test.ts packages/core/src/utils/subagentGenerator.ts packages/core/src/utils/subagentGenerator.test.ts
npx eslint packages/core/src/utils/sideQuery.test.ts packages/core/src/tools/web-fetch.ts packages/core/src/tools/web-fetch.test.ts packages/core/src/agents/arena/ArenaManager.ts packages/core/src/agents/arena/ArenaManager.test.ts packages/core/src/utils/subagentGenerator.ts packages/core/src/utils/subagentGenerator.test.ts
npm run lint --workspace=packages/core
npm run typecheck --workspace=packages/core
git diff --check

Note: when I initially ran several vitest commands in parallel locally on Windows, two runs hit the known coverage/.tmp race after tests had passed. I reran those checks serially, and they passed.

Jerry2003826 · 2026-05-26T03:10:23Z

Added aaf6883 for the remaining sideQuery.test.ts missing-file edge case.

I also rechecked the three user-visible call sites from the review against the current branch head; they already pass respectOutputLanguagePreference: true:

packages/core/src/tools/web-fetch.ts
packages/core/src/agents/arena/ArenaManager.ts
packages/core/src/utils/subagentGenerator.ts

Validation:

npm run test --workspace=@qwen-code/qwen-code-core -- src/utils/sideQuery.test.ts -t "output language"
npm run test --workspace=@qwen-code/qwen-code-core -- src/utils/sideQuery.test.ts
npx eslint packages/core/src/utils/sideQuery.ts packages/core/src/utils/sideQuery.test.ts packages/core/src/tools/web-fetch.ts packages/core/src/agents/arena/ArenaManager.ts packages/core/src/utils/subagentGenerator.ts
npx prettier --check packages/core/src/utils/sideQuery.ts packages/core/src/utils/sideQuery.test.ts packages/core/src/tools/web-fetch.ts packages/core/src/agents/arena/ArenaManager.ts packages/core/src/utils/subagentGenerator.ts
npm run lint --workspace=@qwen-code/qwen-code-core
npm run typecheck --workspace=@qwen-code/qwen-code-core
git diff --check

wenshao · 2026-05-26T03:56:15Z

+      );
+    });
+
+    it('leaves JSON system instruction unchanged when output language path is missing', async () => {


[Suggestion] JSON-mode negative test coverage is asymmetric with text mode. Text mode has three negative tests (path undefined, file missing, file blank), but JSON mode has only one (file missing). Also: the test name says "path is missing" when the file at the path is missing — the text-mode tests correctly distinguish these cases. And the bare path.join(os.tmpdir(), 'qwen-side-query-missing.md') is not parallel-safe; the text-mode "file is missing" test at line ~515 correctly uses fs.mkdtempSync for isolation.

Consider adding two JSON-mode tests mirroring text mode, and using mkdtempSync for the existing one:

it('leaves JSON system instruction unchanged when output language path is undefined', async () => { vi.mocked(mockBaseLlmClient.generateJson).mockResolvedValue({ ok: true }); // mockConfig already returns undefined for getOutputLanguageFilePath await runSideQuery<{ ok: boolean }>(mockConfig, { /* ... */ respectOutputLanguagePreference: true }); expect(callArg.systemInstruction).toBe('custom JSON side query prompt'); }); it('leaves JSON system instruction unchanged when output language file is missing', async () => { vi.mocked(mockBaseLlmClient.generateJson).mockResolvedValue({ ok: true }); const dir = fs.mkdtempSync(path.join(os.tmpdir(), 'qwen-side-query-missing-')); vi.mocked(mockConfig.getOutputLanguageFilePath).mockReturnValue( path.join(dir, 'output-language.md'), ); // ... }); it('leaves JSON system instruction unchanged when output language file is blank', async () => { vi.mocked(mockBaseLlmClient.generateJson).mockResolvedValue({ ok: true }); vi.mocked(mockConfig.getOutputLanguageFilePath).mockReturnValue( writeOutputLanguageFile(' \n\t '), ); // ... });

— qwen3.7-max via Qwen Code /review

Added in e0c28da. JSON mode now mirrors the text-mode negative coverage: undefined output-language path, missing file using an isolated temp dir, and blank file all leave the original JSON systemInstruction unchanged.

wenshao · 2026-05-26T03:56:15Z

+      const callArg = mockGenerateContent.mock.calls[0]?.[0];
+      expect(callArg.systemInstruction).toContain(
+        'You MUST always respond in Chinese.',
+      );


[Suggestion] This assertion only checks that the language preference is present, but doesn't verify the original systemInstruction is preserved. Compare with existing tests in toolUseSummary.test.ts and sessionRecap.test.ts which assert both:

// Established pattern (e.g. toolUseSummary.test.ts:322-326): expect(options.systemInstruction).toContain(TOOL_USE_SUMMARY_SYSTEM_PROMPT); expect(options.systemInstruction).toContain('You MUST always answer in Chinese.');

If appendSystemInstructionText had a bug that replaced rather than appended, this test would still pass. Add:

Suggested change

);

expect(callArg.systemInstruction).toContain(

'Extract and summarize the requested information',

);

expect(callArg.systemInstruction).toContain(

'You MUST always answer in Chinese.',

);

— qwen3.7-max via Qwen Code /review

Added in e0c28da. The web-fetch output-language test now asserts both that the original web-fetch system instruction (Extract and summarize the requested information...) is preserved and that the configured language preference is appended.

wenshao · 2026-05-26T03:56:15Z

+    const generateJsonCall = (mockClient.generateJson as Mock).mock.calls[0];
+    const callParams = generateJsonCall[0];
+    expect(callParams.systemInstruction).toContain(
+      'You MUST always respond in Chinese.',


[Suggestion] Same as web-fetch.test.ts — this only asserts the language preference is present. The established pattern in toolUseSummary.test.ts and sessionRecap.test.ts also verifies the original systemInstruction is preserved:

Suggested change

'You MUST always respond in Chinese.',

expect(callParams.systemInstruction).toContain(

'You are an elite AI agent architect',

);

expect(callParams.systemInstruction).toContain(

'You MUST always answer in Chinese.',

);

— qwen3.7-max via Qwen Code /review

Added in e0c28da. The subagent output-language test now follows the same pattern: it verifies the original You are an elite AI agent architect system prompt remains present and the configured language preference is appended.

Jerry2003826 · 2026-05-26T09:00:55Z

Updated in e0c28da for the latest review suggestions.

Changes made:

Mirrored text-mode negative coverage for JSON side queries: undefined output-language path, missing file, and blank file all preserve the original JSON systemInstruction.
Strengthened the web-fetch output-language test to assert the original system instruction is preserved while the language directive is appended.
Strengthened the subagent output-language test with the same preserve-and-append assertion pattern.

Validation run locally on Windows:

npm run test --workspace=packages/core -- src/utils/sideQuery.test.ts src/tools/web-fetch.test.ts src/utils/subagentGenerator.test.ts -t "output language|JSON system instruction"
npm run test --workspace=packages/core -- src/utils/sideQuery.test.ts src/tools/web-fetch.test.ts src/utils/subagentGenerator.test.ts
npx prettier --check packages/core/src/utils/sideQuery.test.ts packages/core/src/tools/web-fetch.test.ts packages/core/src/utils/subagentGenerator.test.ts
npx eslint packages/core/src/utils/sideQuery.test.ts packages/core/src/tools/web-fetch.test.ts packages/core/src/utils/subagentGenerator.test.ts
npm run typecheck --workspace=packages/core
npm run lint --workspace=packages/core
git diff --check

wenshao

R3: All R2 suggestions addressed. No new issues found. ✅ — qwen3.7-max via Qwen Code /review

wenshao

⚠️ Downgraded from Approve to Comment: CI failing (Post Coverage Comment, Lint, CodeQL, Test, Classify PR).

R4 incremental review (delta since R3: sideQuery.ts memoization + override sentence, sideQuery.test.ts new tests). The memoization and override sentence address the remaining R1 Suggestions. One new Suggestion below regarding cache invalidation for long-running sessions.

— qwen3.7-max via Qwen Code /review

wenshao · 2026-05-26T15:32:03Z

+  'User output language preference from output-language.md:';
+const OUTPUT_LANGUAGE_PREFERENCE_OVERRIDE =
+  'This preference overrides any earlier language-selection rule in this system instruction.';
+const outputLanguagePreferenceCache = new Map<


[Suggestion] outputLanguagePreferenceCache is a module-level Map with no invalidation path. When a user runs /language mid-session, languageCommand.ts rewrites the file and refreshes the main conversation's system instruction — but this cache still holds the old resolved Promise. All 7 respectOutputLanguagePreference: true consumers (session title, recap, tool-use summary, follow-up suggestions, web-fetch, subagent generation, Arena) will serve the stale language rule for the remainder of the process.

The new memoization test (sideQuery.test.ts) correctly asserts this behavior, but it codifies a regression for daemon mode or any long-lived session where the user changes language.

Suggested fix: Export a clearOutputLanguagePreferenceCache() function and call it from the /language command handler (alongside the existing config.refreshHierarchicalMemory() call), or scope the cache to the Config instance lifetime rather than the module:

export function clearOutputLanguagePreferenceCache(): void { outputLanguagePreferenceCache.clear(); }

Then in languageCommand.ts:

import { clearOutputLanguagePreferenceCache } from '../utils/sideQuery.js'; // ...after updateOutputLanguageFile(): clearOutputLanguagePreferenceCache();

— qwen3.7-max via Qwen Code /review

Fixed in 28aee11. Side-query output-language caching now invalidates on file signature changes, and /language output ... explicitly clears the cached path after rewriting output-language.md. Added regressions for both the side-query refresh and the live /language command cache clear.

wenshao

R5 incremental review (delta since R4: cache invalidation via stat-based signatures + clearOutputLanguagePreferenceCache export). CI 10/10 all pass. 75 tests pass.

Missed caller (out of diff scope): packages/cli/src/ui/commands/summaryCommand.ts:107 calls runSideQuery to generate a user-visible project summary (displayed in TUI and saved to .qwen/PROJECT_SUMMARY.md) but does not set respectOutputLanguagePreference: true. If the user configures output language to Chinese, the project summary would still be generated in English. Consider adding respectOutputLanguagePreference: true to match the PR's stated goal.

— qwen3.7-max via Qwen Code /review

wenshao · 2026-05-26T21:36:26Z


    // Update the rule file with the resolved language
    updateOutputLanguageFile(settingValue);
+    clearOutputLanguagePreferenceCache(config?.getOutputLanguageFilePath?.());


[Suggestion] Path mismatch between write target and cache invalidation target.

updateOutputLanguageFile(settingValue) writes to the global path (~/.qwen/output-language.md, hardcoded in languageUtils.ts), but clearOutputLanguagePreferenceCache(config?.getOutputLanguageFilePath?.()) clears the cache for the config's path, which may be a project-level path (<cwd>/.qwen/output-language.md). When a project-level file exists, the write modifies the global file while side queries re-read the unchanged project-level file — the user's /language command has no visible effect on side-query outputs.

Additionally, when config?.getOutputLanguageFilePath?.() returns undefined (e.g., config lacks the method), clearOutputLanguagePreferenceCache(undefined) fires the clear-all branch, wiping every cached path — including entries from other sessions in daemon mode.

Suggested change

clearOutputLanguagePreferenceCache(config?.getOutputLanguageFilePath?.());

clearOutputLanguagePreferenceCache();

Using the no-arg version clears all entries as a belt-and-suspenders approach. The stat-based invalidation in readOutputLanguagePreference handles correctness automatically on the next call, so the only cost is one extra readFile per session.

— qwen3.7-max via Qwen Code /review

Addressed in the current branch. /language output now calls clearOutputLanguagePreferenceCache() without a path after updateOutputLanguageFile(), so project/global path differences cannot leave a stale side-query language preference cached.

wenshao · 2026-05-26T21:36:26Z

+  }
+>();
+
+export function clearOutputLanguagePreferenceCache(filePath?: string): void {


[Suggestion] This new public API has two branches (specific-path deletion and clear-all) but no behavioral unit tests. It is only verified via vi.fn() mock in languageCommand.test.ts, which confirms the function was called but not that it works.

Consider adding tests in sideQuery.test.ts:

Specific-path deletion: populate cache via runSideQuery, call clearOutputLanguagePreferenceCache(path), verify next call re-reads the file.

Clear-all: populate cache with entries, call clearOutputLanguagePreferenceCache(), verify all entries are removed.

Also consider adding a JSDoc note that this function is a performance hint — stat-based invalidation in readOutputLanguagePreference handles correctness automatically.

— qwen3.7-max via Qwen Code /review

Covered in the current branch. sideQuery.test.ts now exercises both cache-clear branches: specific-path deletion preserves other cached entries, and no-argument clearing refreshes all cached preferences. Verified with npm run test --workspace=packages/core -- src/utils/sideQuery.test.ts.

wenshao · 2026-05-27T05:55:01Z

            ],
          },
        ],
+        respectOutputLanguagePreference: true,


[Suggestion] Output language rule is appended twice to this side query. The main session's systemInstruction (extracted at line 99-105 via getGenerationConfig().systemInstruction) already embeds output-language.md content as part of userMemory (loaded through getExtensionContextFilePaths() → loadServerHierarchicalMemory()). Setting respectOutputLanguagePreference: true causes applyOutputLanguagePreference in sideQuery.ts to read the same file and append the language rule a second time.

The model receives the instruction duplicated in every summary generation — redundant tokens and a confusing artifact for anyone debugging system instructions.

Consider either:

Removing respectOutputLanguagePreference: true here since the carried-over chatSystemInstruction already contains the rule, or

Stripping the output-language section from chatSystemInstruction before passing it, and keeping the explicit flag.

— qwen3.7-max via Qwen Code /review

Addressed in the current branch. summaryCommand no longer sets respectOutputLanguagePreference: true for the project-summary side query because it carries over the main chat system instruction, which already includes the configured output-language rule through user memory.

wenshao · 2026-05-27T05:55:01Z

+      expect(secondCallArg.systemInstruction).not.toContain('Chinese');
+    });
+
+    it('clears one cached output language preference by file path', async () => {


[Suggestion] This test creates only a single file and verifies its cache was cleared, but does not include a negative assertion — it never checks that a different cached file's entry survived the selective clear. If clearOutputLanguagePreferenceCache(file) were accidentally changed to call .clear() instead of .delete(filePath), this test would still pass.

Consider creating a second file, caching it via runSideQuery, then after clearOutputLanguagePreferenceCache(firstFile), running a side query for the second file and asserting it still contains the original (stale) preference — proving its cache was not disturbed. This would make the two new tests properly complementary (one proves selective clear, the other proves blanket clear).

— qwen3.7-max via Qwen Code /review

Covered in the current branch. The specific-path cache test now populates two files, clears only the first path, and asserts the second cached preference survives instead of being globally cleared. Verified with npm run test --workspace=packages/core -- src/utils/sideQuery.test.ts.

wenshao · 2026-05-27T16:31:40Z

+      signature,
+      preference: readFile(filePath, 'utf8')
+        .then((content) => content.trim() || undefined)
+        .catch(() => undefined),


[Suggestion] Transient readFile failure permanently caches undefined until the file's mtime or size changes.

When stat() succeeds but readFile() fails (EMFILE, permission flap, transient IO error), .catch(() => undefined) stores a Promise<undefined> as cached.preference with the current signature. On subsequent calls, stat() returns the same mtimeMs:size, the cache hits, and the language preference is silently lost.

Suggested change

.catch(() => undefined),

.catch(() => {

outputLanguagePreferenceCache.delete(filePath);

return undefined;

}),

— qwen3.7-max via Qwen Code /review

Addressed in the current branch. readOutputLanguagePreference() now deletes the cache entry when readFile() fails for the current signature, so transient read failures do not permanently cache undefined.

wenshao · 2026-05-27T16:35:58Z

Verification Report — PR #4519

Reviewer: wenshao
Branch: codex/fix-side-query-output-language → main
Environment: macOS Darwin 25.4.0, Node v22.17.0

1. Build & Type Check

Check	Result
`npm install`	✅ Pass
`tsc --noEmit` (packages/core)	✅ Pass (zero errors)
`tsc --noEmit` (packages/cli)	✅ Pass (zero errors)
`eslint` (changed files)	✅ Pass
`prettier --check` (changed files)	✅ Pass

2. Unit Tests

PR-related tests (all pass):

Test File	Result
`sideQuery.test.ts`	✅ 29/29 passed
`languageCommand.test.ts`	✅ 48/48 passed
`ArenaManager.test.ts`	✅ 22/22 passed
`sessionRecap.test.ts`	✅ new test passed
`sessionTitle.test.ts`	✅ all passed
`toolUseSummary.test.ts`	✅ all passed
`web-fetch.test.ts`	✅ all passed
`suggestionGenerator.test.ts`	✅ all passed
`subagentGenerator.test.ts`	✅ all passed
Total across 9 files	✅ 149/149 passed

Full core suite: 9438 passed, 1 failed, 3 skipped (344 files). The single failure (anthropicContentGenerator.test.ts: treats unset baseURL as Anthropic-native) is pre-existing on main — verified by checking out the main version of that test file and confirming the same failure. Unrelated to this PR.

3. Code Review

Architecture (sideQuery.ts)

The design is clean and well-layered:

readOutputLanguagePreference(config) — reads output-language.md with a stat-signature-based cache (mtimeMs:size). Cache is a Map<filePath, {signature, preference}>. The preference value is a Promise<string | undefined> — concurrent callers reuse the same in-flight read.
appendSystemInstructionText(systemInstruction, text) — handles all 4 systemInstruction type variants: string, Part[], Content (with .parts), and single Part. Correctly creates a new object/array without mutating the original.
applyOutputLanguagePreference(config, systemInstruction, flag) — wires the above together; early-returns when respectOutputLanguagePreference is falsy or no preference exists.
clearOutputLanguagePreferenceCache(filePath?) — selective or full cache invalidation, called from languageCommand.ts when the user changes their output language.

Callers — Correct Opt-in

7 side-query callers correctly opt in with respectOutputLanguagePreference: true:

sessionRecap, sessionTitle, toolUseSummary — session management UI text
suggestionGenerator — follow-up suggestions
web-fetch — web content summaries
subagentGenerator — subagent spec generation
ArenaManager — arena approach summaries

summaryCommand.ts correctly does NOT opt in — its comment explains the system instruction already includes the output-language rule via user memory. This avoids the duplicate-language-instruction problem stated in the PR.

Test Coverage

The new tests thoroughly cover:

Append to string / Part[] / Content with .parts / single Part
Undefined / missing / blank output-language file → no-op
Cache hit / miss / invalidation (selective and full)
File content change detection via stat signature
Each caller tested for output-language presence in the prompt

4. Issues Found

Severity	Issue
None	No issues found in this PR's changes.

5. Summary

This is a well-designed, well-tested fix. The opt-in respectOutputLanguagePreference flag is the right approach — it avoids changing default behavior while allowing user-visible side queries to honor the configured output language. The stat-based caching is efficient and the cache invalidation is properly wired into the language change flow. Test coverage is comprehensive at 149 new/updated test cases.

Recommendation: ✅ Approve.

Verified locally on 2026-05-28

Jerry2003826 · 2026-05-27T16:46:33Z

Updated in b86309b for the transient read failure cache issue.

If stat() succeeds but readFile() fails, the cached entry is now removed before returning undefined, so a later side query with the same file signature retries the preference read instead of permanently treating the preference as absent.

Validation run locally on Windows:

npm run test --workspace=packages/core -- src/utils/sideQuery.test.ts -t "output language"
npm run test --workspace=packages/core -- src/utils/sideQuery.test.ts
npx prettier --check packages/core/src/utils/sideQuery.ts packages/core/src/utils/sideQuery.test.ts
npx eslint packages/core/src/utils/sideQuery.ts packages/core/src/utils/sideQuery.test.ts
npm run typecheck --workspace=packages/core

Jerry2003826 · 2026-05-28T00:21:13Z

CI note for the latest run: the macOS failure appears unrelated to this PR's side-query change.

The failed test is in packages/cli/src/ui/components/messages/AskUserQuestionDialog.test.tsx:

<AskUserQuestionDialog /> > single-select interaction > keeps bare k/j in custom input while Ctrl+P/N still navigates options
AssertionError: expected rendered output to contain '? 4.'

This PR only changes packages/core/src/utils/sideQuery.ts and sideQuery.test.ts. The targeted side-query checks and the failed CLI dialog test both pass locally on Windows:

npm run test --workspace=packages/core -- src/utils/sideQuery.test.ts -t "output language"
npm run test --workspace=packages/core -- src/utils/sideQuery.test.ts
npm run test --workspace=packages/cli -- src/ui/components/messages/AskUserQuestionDialog.test.tsx -t "keeps bare k/j"

I tried to rerun the failed workflow, but GitHub rejected it because I do not have repository admin rights. Could a maintainer rerun the failed macOS job?

BZ-D

LGTM. The respectOutputLanguagePreference plumbing is thorough — correctly wired into session title, recap, tool-use summary, suggestion generator, web fetch, subagent generator, and arena manager. The stat-based cache with mtime+size signature is a good balance of freshness vs performance, and clearOutputLanguagePreferenceCache gives the /language command an explicit flush path. Edge cases (missing file, blank content, transient read failures) are all covered.

BZ-D · 2026-06-01T07:51:08Z

This PR has merge conflicts with main — please rebase or merge main to resolve before merging.

Jerry2003826 · 2026-06-01T08:42:52Z

Merged current origin/main in d286835 and resolved the side-query/config conflicts.

Resolution notes:

Kept this PR's opt-in respectOutputLanguagePreference path and cache/clear behavior instead of applying the preference to every side query unconditionally.
Folded main's output-language tests into the opt-in behavior.
Removed the duplicate getOutputLanguageFilePath() implementation introduced by the merge.

Validation:

npm run test --workspace=packages/core -- src/utils/sideQuery.test.ts
npx prettier --check packages/core/src/config/config.ts packages/core/src/utils/sideQuery.ts packages/core/src/utils/sideQuery.test.ts
npx eslint packages/core/src/config/config.ts packages/core/src/utils/sideQuery.ts packages/core/src/utils/sideQuery.test.ts
npm run typecheck --workspace=packages/core

wenshao · 2026-06-01T14:42:32Z

  const promptId = options.promptId ?? buildDefaultPromptId(options.purpose);
  const requestConfig = applyThinkingDefault(options.config);
-  const systemInstruction = appendSystemInstruction(
+  const systemInstruction = await applyOutputLanguagePreference(


[Suggestion] Missed caller: packages/cli/src/acp-integration/session/rewrite/LlmRewriter.ts:117 calls runSideQuery without respectOutputLanguagePreference: true. Before this PR, getOutputLanguageInstruction(config) was called unconditionally for ALL side queries, so LlmRewriter received the language rule. After the opt-in gate, it silently loses language enforcement. LlmRewriter produces user-visible ACP progress updates — a user who configured Chinese output will see Chinese titles, recaps, summaries, and subagent specs, but ACP rewrites will come back in whatever language the model picks.

Suggested change

const systemInstruction = await applyOutputLanguagePreference(

const result = await runSideQuery(this.config, {

purpose: 'acp-rewrite',

model,

maxAttempts: 1,

systemInstruction: this.prompt,

respectOutputLanguagePreference: true,

— qwen-latest-series-invite-beta-v38 via Qwen Code /review

Covered in the current branch. LlmRewriter now calls runSideQuery with respectOutputLanguagePreference: true, and LlmRewriter.test.ts asserts that option for user-visible rewrites. Verified with npm run test --workspace=packages/cli -- src/acp-integration/session/rewrite/LlmRewriter.test.ts.

wenshao · 2026-06-01T14:42:33Z

 describe('runSideQuery', () => {
  let mockBaseLlmClient: BaseLlmClient;
  let mockConfig: Config;
  let abortController: AbortController;


[Suggestion] No negative test for the opt-in gate. Every test in this file sets respectOutputLanguagePreference: true. There is no test verifying that when the flag is omitted (or false) AND a valid output-language.md file exists, the system instruction is passed through unchanged. This is the central behavioral change of the PR — old code always appended, new code is opt-in — yet no test covers the false/omitted path.

A regression that re-introduces unconditional language injection would go undetected.

Suggested change

let abortController: AbortController;

beforeEach(() => {

clearOutputLanguagePreferenceCache();

abortController = new AbortController();

mockBaseLlmClient = {

generateJson: vi.fn(),

generateText: vi.fn(),

} as unknown as BaseLlmClient;

mockConfig = {

getBaseLlmClient: vi.fn(() => mockBaseLlmClient),

getModel: vi.fn(() => 'qwen3-coder-plus'),

getFastModel: vi.fn(() => 'fast-model'),

getOutputLanguageFilePath: vi.fn(),

} as unknown as Config;

});

Add a test like:

it('does not append language when respectOutputLanguagePreference is omitted', async () => { mockTextResult('ok'); vi.mocked(mockConfig.getOutputLanguageFilePath).mockReturnValue( writeOutputLanguageFile('You MUST always respond in Chinese.'), ); await runSideQuery(mockConfig, { purpose: 'p', contents: [{ role: 'user', parts: [{ text: 'q' }] }], abortSignal: abortController.signal, systemInstruction: 'Summarize the tool batch.', }); expect(vi.mocked(mockBaseLlmClient.generateText).mock.calls[0][0].systemInstruction) .toBe('Summarize the tool batch.'); });

— qwen-latest-series-invite-beta-v38 via Qwen Code /review

Covered in the current branch. sideQuery.test.ts includes does not append output language preference when opt-in is omitted, which writes a valid output-language.md and verifies the system instruction is unchanged without respectOutputLanguagePreference. Verified with npm run test --workspace=packages/core -- src/utils/sideQuery.test.ts.

wenshao · 2026-06-01T14:42:33Z

-  if (systemInstruction === undefined) return outputLanguageInstruction;
+  text: string,
+): string | Part | Part[] | Content {
+  if (!systemInstruction) {


[Suggestion] appendSystemInstructionText handles five systemInstruction variants (string, Part, Part[], Content, undefined), but the Part[] and Content branches are untested — every test in this PR passes a string or undefined. If a future caller passes a non-string systemInstruction with respectOutputLanguagePreference: true, these three branches (Part[] append, Content.parts append, single-Part wrapping) have no regression protection.

Consider adding at least one test per branch, e.g.:

it('appends to Part[] system instructions', async () => { mockTextResult('ok'); vi.mocked(mockConfig.getOutputLanguageFilePath).mockReturnValue( writeOutputLanguageFile('Respond in Chinese.'), ); await runSideQuery(mockConfig, { purpose: 'p', contents: [{ role: 'user', parts: [{ text: 'q' }] }], abortSignal: abortController.signal, systemInstruction: [{ text: 'existing prompt' }], respectOutputLanguagePreference: true, }); const parts = vi.mocked(mockBaseLlmClient.generateText).mock.calls[0][0].systemInstruction; expect(parts).toEqual([ { text: 'existing prompt' }, expect.objectContaining({ text: expect.stringContaining('Respond in Chinese.') }), ]); });

— qwen-latest-series-invite-beta-v38 via Qwen Code /review

Covered in the current branch. sideQuery.test.ts now covers Part[], Content, and single-Part system instructions with respectOutputLanguagePreference: true, verifying the appended preference is preserved in the correct shape. Verified with npm run test --workspace=packages/core -- src/utils/sideQuery.test.ts.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

DragonnZhang

Review Summary

Verdict: No new findings. This PR has been reviewed exhaustively (8+ prior review rounds, 23 inline comments, 9 issue comments). After reading all 24 changed files in full and cross-referencing every existing comment, I found no high-confidence issues that are not already tracked.

What was checked

sideQuery.ts core logic: respectOutputLanguagePreference opt-in gate, readOutputLanguagePreference caching with stat-based invalidation, appendSystemInstructionText for all 5 systemInstruction variants, clearOutputLanguagePreferenceCache selective and full clear, isJsonOptions type guard, concurrent-call safety via shared Promise deduplication. Implementation is correct.
All 11 caller sites: sessionTitle, sessionRecap, toolUseSummary, suggestionGenerator, web-fetch, subagentGenerator, ArenaManager, DataProcessor (both analyzeSession and generateQualitativeInsights), LlmRewriter, summaryCommand (correctly skipped with explanatory comment). Each user-visible caller opts in; internal/classifier callers do not.
Test coverage: ~512 new lines of tests covering positive opt-in for text and JSON modes, negative paths (missing file, blank file, undefined path), Part[]/Content/single-Part system instruction variants, cache refresh on file change, selective and full cache clear, transient read failure recovery, and opt-in gate negative test. Coverage is thorough.
languageCommand.ts: clearOutputLanguagePreferenceCache() called after updateOutputLanguageFile() with clear-all (no path arg) to handle global-vs-project path differences. Correct.
config.ts: getOutputLanguageFilePath relocated for grouping; no behavioral change.

CI status

CI is currently failing (Lint, CodeQL, Test, Post Coverage Comment, Classify PR). Some failures appear environment-related (the macOS AskUserQuestionDialog test failure mentioned by the author). Approve downgraded to Comment per presubmit policy.

Open items from prior reviewers

Several open inline comments remain valid and tracked:

Path mismatch between updateOutputLanguageFile (global) and getOutputLanguageFilePath (project-scoped) — mitigated by clear-all strategy
sessionRecap.test.ts reduced from end-to-end to builder-only testing
summaryCommand.ts double-injection concern — addressed by inline comment explaining the carried system instruction already includes the language rule

— qwen-code via Qwen Code /review

Jerry2003826 · 2026-06-10T00:09:48Z

Updated this branch against current origin/main in a57ac5d81 and re-verified the insight output-language path.

The /insight regression mentioned in the review is covered on the current branch:

DataProcessor.ts passes respectOutputLanguagePreference: true for both insight-session-analysis and insight-qualitative-generate.
DataProcessor.test.ts has regression coverage for both side-query call sites.

Local validation on Windows:

npm run test --workspace=packages/cli -- src/services/insight/generators/DataProcessor.test.ts -t "output language" - 2 passed
npm run test --workspace=packages/cli -- src/services/insight/generators/DataProcessor.test.ts - 47 passed
npm run test --workspace=packages/core -- src/utils/sideQuery.test.ts -t "output language" - 18 passed
npm run test --workspace=packages/core -- src/utils/sideQuery.test.ts - 36 passed
npm run build --workspace=packages/core - passed
npm run typecheck --workspace=packages/core - passed
npx eslint --max-warnings 0 --no-warn-ignored packages/core/src/utils/sideQuery.ts packages/core/src/utils/sideQuery.test.ts packages/cli/src/services/insight/generators/DataProcessor.ts packages/cli/src/services/insight/generators/DataProcessor.test.ts packages/core/src/config/config.ts packages/cli/src/ui/commands/summaryCommand.ts - passed
npx prettier --check packages/core/src/utils/sideQuery.ts packages/core/src/utils/sideQuery.test.ts packages/cli/src/services/insight/generators/DataProcessor.ts packages/cli/src/services/insight/generators/DataProcessor.test.ts packages/core/src/config/config.ts packages/cli/src/ui/commands/summaryCommand.ts - passed

Note: full packages/cli typecheck now gets past core/acp/channel package references after building those packages, but still stops on @qwen-code/web-templates because npm run build --workspace=packages/web-templates fails locally on this Windows path with a space (C:\Users\Jiarui Li\...). The failure happens inside packages/web-templates/build.mjs before this PR's files are typechecked and is unrelated to the side-query output-language change.

DragonnZhang

Incremental Review (R2) — delta since R1

HEAD: 8730d9f06579 (two new commits: merge from origin/main + test: normalize dev launcher paths on Windows)
CI: 13/13 checks pass.

Delta reviewed

scripts/tests/dev.test.js — Adds normalizePath helper that converts Windows backslashes to forward slashes before substring assertions. Correct fix: String(filePath).endsWith('node_modules/tsx/dist/cli.mjs') would fail on native Windows paths where filePath contains backslashes. The refactor to destructure spawnMock.mock.calls[0] and assert each field individually is clearer than the prior monolithic toHaveBeenCalledWith.

Verdict

No high-confidence findings in the delta. The Windows path normalization is a clean, well-scoped test fix.

The six stale inline comments from prior reviewers (IDs 3300572305, 3301123073, 3307003468, 3308736776, 3312297210, 3337943627) are all addressed in the current code:

sessionRecap.ts language override — the appended text includes an explicit override sentence.
JSON negative coverage — mirrored text-mode negative tests now present.
languageCommand path mismatch — clearOutputLanguagePreferenceCache() (no-arg) correctly clears the entire map.
summaryCommand double injection — correctly omits respectOutputLanguagePreference: true since the carried-over system instruction already embeds the rule.
Transient readFile caching — the .catch handler now deletes the cache entry on read failure.
sessionRecap test refactor — the generateSessionRecap end-to-end test is retained with runSideQuery mocked; buildSessionRecapSideQueryOptions gets separate unit coverage.

— qwen3-coder via Qwen Code /review

DragonnZhang

Review: APPROVE

Scope: 30 files, +1151/-101 — adds opt-in respectOutputLanguagePreference to runSideQuery for user-visible side queries (session title, recap, tool-use summary, suggestions, web-fetch summaries, subagent artifacts, arena summaries, /insight reports, LLM rewriter). Includes stat-based file cache with invalidation on language change. Bundles two small fixes (cronScheduler creation-minute guard, chatCompression hard-threshold formula) from upstream main merges.

CI: All checks pass (Lint, Test x3 platforms, CodeQL).

Findings: No high-confidence bugs, security issues, or critical logic errors identified.

Analysis

Side-query opt-in coverage: Correctly applied to all 10 user-visible call sites (sessionTitle, sessionRecap, toolUseSummary, suggestionGenerator, web-fetch, subagentGenerator, ArenaManager, LlmRewriter, DataProcessor x2). Intentionally omitted from internal classifiers (permissions/classifier, relevanceSelector, forget, nextSpeakerChecker, chatCompression) and from summaryCommand (which already carries the language rule through user memory).
Caching: Stat-based signature (mtimeMs:size) with module-level Map keyed by file path. Race-safe: concurrent reads share the same Promise; readFile failures clean up only the current entry. clearOutputLanguagePreferenceCache() called on language change in languageCommand, clearing all entries to handle global/project path differences.
appendSystemInstructionText: Correctly handles all system instruction shapes (string, Part, Part[], Content with parts array, and single Part fallback).
Session recap refactor: Extracted buildSessionRecapSideQueryOptions for testability without behavioral change.
Bundled fixes from main merges: cronScheduler lastFiredAt initialization and chatCompression hard threshold formula — both sound and already validated by their own PRs.

wenshao · 2026-06-14T17:49:49Z

@Jerry2003826 heads up — this PR currently has merge conflicts with main and can't be merged as-is. Could you merge main in (or rebase) and resolve them when you get a chance?

Conflicting files:

packages/cli/src/ui/commands/languageCommand.ts
packages/cli/src/ui/commands/languageCommand.test.ts
packages/core/src/config/config.ts
packages/core/src/services/sessionRecap.ts

The rest merges cleanly. Thanks!

中文

@Jerry2003826 提个醒 —— 这个 PR 目前和 main 有合并冲突,暂时没法直接合入。方便的时候麻烦把最新的 main merge 进来(或 rebase)解决一下冲突。

冲突文件:

packages/cli/src/ui/commands/languageCommand.ts
packages/cli/src/ui/commands/languageCommand.test.ts
packages/core/src/config/config.ts
packages/core/src/services/sessionRecap.ts

其余文件可以自动合并。谢谢!

Merge upstream/main into the side-query output language branch and reconcile four conflicting files: - languageCommand: use writeOutputLanguageAndRegisterPath from main while keeping clearOutputLanguagePreferenceCache on language change - languageCommand.test: align mocks with config-bound path registration - config: keep get/setOutputLanguageFilePath near getUserMemory and remove the duplicate getter - sessionRecap: keep buildSessionRecapSideQueryOptions opt-in path with upstream debug logging Co-authored-by: JerryLee <Jerry2003826@users.noreply.github.com>

wenshao · 2026-06-14T18:37:38Z

Verification: honor output language in user-visible side queries

Verdict: PASS for the stated feature — verified end-to-end against a real model. One material caveat: this PR also bundles an undisclosed second change (DeepSeek prompt-cache work) that contradicts the PR body's "No unrelated … provider … changes." That's a merge-hygiene call for you, not a defect in the output-language feature.

I built the PR in an isolated worktree (real npm ci + build, no symlinked node_modules) and drove the real CLI — headless and the interactive TUI under tmux — against a real model (DeepSeek deepseek-chat) through a logging proxy that records exactly what reaches the model. Evidence below is what the model actually received and what the TUI actually rendered.

Claim (my read of the diff): runSideQuery gains an opt-in respectOutputLanguagePreference. The old unconditional output-language injection is removed; now only user-visible side queries opt in, and the injected block is User output language preference from output-language.md:\n<preference>\nThis preference overrides any earlier language-selection rule…. A new stat-signature cache (mtimeMs:size) avoids re-reading the file each call.

Method: worktree .qwen/tmp/review-pr-4519 (head eeee993, fresh merge-base 75e8f259); isolated HOME with ~/.qwen/output-language.md set to French; OPENAI_BASE_URL → proxy → real DeepSeek. Requests captured; side queries identified by their system instruction.

Steps

✅ User-visible side query injects (headless) — web_fetch https://example.com. The summary side query (Extract and summarize the requested information…) carried the exact injected block:
```
User output language preference from output-language.md:
# Output language preference: French
...
## Rule
You MUST always respond in **French** ...
```
Model answered in French: "La page example.com indique que ce domaine est réservé par l'IANA…"
✅ Session-title side query injects (TUI, tmux) — after a French exchange, /rename --auto fired the Generate a concise, sentence-case title side query with the injected block; the TUI footer showed a French title: Session renamed to "Définition d'un arbre binaire de recherche". The session title also fires automatically via chatRecordingService after the first exchange (also injected).
✅ Tool-use-summary side query injects (TUI) — running echo bonjour-from-shell produced the Write a short summary label… side query (injected) and the TUI rendered the label in French: ● Exécuté echo bonjour-from-shell.
✅ Internal / main calls are NOT injected — the main-agent requests and the internal memory-extraction subagent carried no injected header. Code audit confirms the split: 9 user-visible callers opt in (sessionTitle, sessionRecap, toolUseSummary, suggestionGenerator, web-fetch, subagentGenerator, ArenaManager, DataProcessor ×2, LlmRewriter); the 5 internal runSideQuery callers (classifier, nextSpeakerChecker, chatCompressionService, relevanceSelector, forget) do not.
🔍 Probe — stat-cache invalidation — I edited output-language.md to German directly on disk (no /language, so clearOutputLanguagePreferenceCache is not called — only the mtimeMs:size signature can catch it). The next session-title side query's injected header switched to # Output language preference: German. The cache invalidation works. ✅
🔍 Probe — session-title obedience — with German injected (+ the "overrides any earlier language-selection rule" line), the produced title came back English ("Binary search tree explanation"), not German. session-title's own system prompt hardcodes "Match the dominant language of the conversation (English or Chinese)", which competes with the injected override — and here the hardcoded set won. The web-fetch and tool-use summaries (no competing rule) did obey French. So injection is reliable; effectiveness depends on whether the side query's own prompt fights it.
🔍 Probe — workstream-B scope — on a non-DeepSeek endpoint the 58 tools go out in registration order (sorted? False). The PR's sortToolsForCacheStableRequest is gated on isDeepSeekHostname, so on real api.deepseek.com it would reorder them. I could not exercise that path at runtime (it requires redirecting api.deepseek.com, which I didn't do); confirmed by code + the gated baseline. DeepSeek responses do carry prompt_cache_hit_tokens/prompt_cache_miss_tokens (observed via the proxy), which converter.ts now reads.

Sample (live TUI, French throughout):

> In one sentence, what is a binary search tree?
✦ Un arbre binaire de recherche est une structure de données où chaque nœud possède au plus deux enfants…
● Session renamed to "Définition d'un arbre binaire de recherche"
> Run the shell command: echo bonjour-from-shell — then tell me what it printed.
  ✓  Shell echo bonjour-from-shell → bonjour-from-shell
● Exécuté echo bonjour-from-shell
✦ La commande a affiché : bonjour-from-shell.

Findings

The output-language feature works as intended. These are for your merge decision:

⚠️ This PR is two unrelated changes in one. Its own non-merge commits split by author into (a) output language (640ab41f… "honor output language in side queries", JerryLee) and (b) DeepSeek prompt-cache diagnostics (a0eb7a4f… "deepseek cache diagnostics", Jerry2003826) — the latter touching openaiContentGenerator/converter.ts (cache-token fields), pipeline.ts (sortToolsForCacheStableRequest + provider-labeled diagnostics), runtimeDiagnostics.ts (+128), and 949 lines of docs/superpowers/…/deepseek-cache-diagnostics design docs. None of (b) is on origin/main, so it's genuinely introduced here. The PR body says "No unrelated compression, provider, or UI behavior changes", but (b) reorders the tools sent to DeepSeek and changes cache-token accounting — both provider behavior changes. Recommend splitting (b) into its own PR, or at least updating the description so the provider change is reviewed on its own merits.
⚠️ The injected "override" doesn't reliably win for session-title. Because sessionTitle.ts's system prompt hardcodes "Match the dominant language of the conversation (English or Chinese)", a configured language outside that set (German, in my run) yielded an English title despite the override line. Other user-visible side queries with no competing language rule obeyed correctly. If session titles in the user's language matter, that prompt's hardcoded rule likely needs to defer to the injected preference.
The stat-signature cache is keyed on mtimeMs:size. Two same-size edits within the same mtime tick would collide and serve stale text; /language sidesteps this by calling clearOutputLanguagePreferenceCache, but a direct/programmatic edit relies solely on the signature. Low risk, worth a comment.

Not runtime-exercised: 6 of the 9 user-visible callers (verified by code audit, same runSideQuery mechanism as the 3 I drove); the DeepSeek-only tool-sort firing (gated to api.deepseek.com, not redirected). Per the verify method I did not run the PR's test suite — this is runtime observation, not CI.

中文版本（点击展开）

验证：让用户可见的 side query 遵守输出语言

结论：通过（PASS）（针对 PR 声明的功能，已对真实模型端到端验证）。一个重要提醒： 本 PR 还夹带了一个未声明的第二项改动（DeepSeek 提示词缓存相关），这与 PR 描述中「No unrelated … provider … changes（无无关的 provider 改动）」相矛盾。这是合并卫生层面的判断，交给你定夺，并非输出语言功能本身的缺陷。

我在隔离 worktree 中构建本 PR（真实 npm ci + 构建，无软链 node_modules），并驱动真实 CLI——无界面模式与 tmux 下的交互式 TUI——对接真实模型（DeepSeek deepseek-chat），通过记录型代理捕获到达模型的确切内容。下文证据即模型实际收到的内容与 TUI 实际渲染的内容。

被验证的声明： runSideQuery 新增 opt-in 参数 respectOutputLanguagePreference；移除原先无条件注入，仅用户可见 side query opt in；注入块为 User output language preference from output-language.md:\n<preference>\nThis preference overrides any earlier language-selection rule…。新增基于 mtimeMs:size 的签名缓存以避免每次读盘。

方法： worktree .qwen/tmp/review-pr-4519（head eeee993，新 merge-base 75e8f259）；隔离 HOME，~/.qwen/output-language.md 设为 French；OPENAI_BASE_URL → 代理 → 真实 DeepSeek。

步骤

✅ 用户可见 side query 注入（headless） —— web_fetch https://example.com 的摘要 side query 携带完整注入块（You MUST always respond in **French**）；模型用法语作答："La page example.com indique que ce domaine est réservé par l'IANA…"
✅ session-title side query 注入（TUI/tmux） —— 法语对话后 /rename --auto 触发标题 side query（已注入），TUI 页脚显示法语标题 Session renamed to "Définition d'un arbre binaire de recherche"。首次对话后 chatRecordingService 也会自动触发该 side query（同样注入）。
✅ tool-use-summary side query 注入（TUI） —— 执行 echo bonjour-from-shell 触发摘要标签 side query（已注入），TUI 渲染为法语：● Exécuté echo bonjour-from-shell。
✅ 内部/主对话不注入 —— 主对话请求与内部 memory-extraction 子代理均不带注入头。代码审计确认划分：9 个用户可见调用 opt in；5 个内部 runSideQuery 调用（classifier、nextSpeakerChecker、chatCompressionService、relevanceSelector、forget）不 opt in。
🔍 探针——缓存失效 —— 我直接在磁盘上把 output-language.md 改为 German（不经 /language，因此不会调用 clearOutputLanguagePreferenceCache，只能靠 mtimeMs:size 签名捕获）。下一个 session-title side query 的注入头切换为 # Output language preference: German。缓存失效逻辑有效。✅
🔍 探针——session-title 是否遵守 —— 注入 German（且带「overrides any earlier language-selection rule」）后，生成的标题却是英文（"Binary search tree explanation"），并非德文。原因：sessionTitle.ts 自身的系统提示硬编码了 "Match the dominant language of the conversation (English or Chinese)"，与注入的 override 冲突，此处硬编码集合（英文）胜出。而 web-fetch、tool-use 摘要（无冲突规则）确实遵守了法语。即：注入可靠，但实际生效取决于该 side query 自身提示是否与之冲突。
🔍 探针——workstream B 范围 —— 在非 DeepSeek 端点上，58 个工具按注册顺序发出（sorted? False）。PR 的 sortToolsForCacheStableRequest 由 isDeepSeekHostname 把关，因此对真实 api.deepseek.com 会重排工具。我未在运行时验证该路径（需重定向 api.deepseek.com，我没有这样做）；以代码 + 该基线佐证。DeepSeek 响应确实带 prompt_cache_hit_tokens/prompt_cache_miss_tokens（经代理观察到），converter.ts 现已读取。

实时 TUI（全程法语）：

> In one sentence, what is a binary search tree?
✦ Un arbre binaire de recherche est une structure de données où chaque nœud possède au plus deux enfants…
● Session renamed to "Définition d'un arbre binaire de recherche"
> Run the shell command: echo bonjour-from-shell …
  ✓  Shell echo bonjour-from-shell → bonjour-from-shell
● Exécuté echo bonjour-from-shell
✦ La commande a affiché : bonjour-from-shell.

观察

输出语言功能符合预期。以下供合并决策参考：

⚠️ 本 PR 实为两个不相关改动的合体。 其自身非合并提交按作者可拆分为：(a) 输出语言（640ab41f…，JerryLee）与 (b) DeepSeek 提示词缓存诊断（a0eb7a4f…，Jerry2003826）；后者改动 converter.ts（缓存 token 字段）、pipeline.ts（sortToolsForCacheStableRequest + 带 provider 标签的诊断）、runtimeDiagnostics.ts（+128），以及 949 行 deepseek-cache-diagnostics 设计文档。(b) 均不在 origin/main 上，确为本 PR 引入。PR 描述称 「No unrelated compression, provider, or UI behavior changes」，但 (b) 重排了发往 DeepSeek 的工具顺序并改变缓存 token 计数——都是 provider 行为改动。建议将 (b) 拆为独立 PR，或至少更新描述，使该 provider 改动单独评审。
⚠️ 注入的 override 对 session-title 并不稳定生效。 因为 sessionTitle.ts 的系统提示硬编码了 "Match the dominant language of the conversation (English or Chinese)"，当配置语言在该集合之外（本次为德文）时，尽管有 override，标题仍返回英文。其他无冲突规则的用户可见 side query 均正确遵守。若用户语言的会话标题很重要，该提示的硬编码规则可能需要让位于注入偏好。
签名缓存以 mtimeMs:size 为键。同一 mtime tick 内两次等长编辑会哈希碰撞并返回旧文本；/language 通过调用 clearOutputLanguagePreferenceCache 规避，但直接/程序化编辑仅依赖签名。风险较低，值得加注释说明。

未在运行时验证： 9 个用户可见调用中的 6 个（经代码审计，与已驱动的 3 个机制相同）；DeepSeek 专属的工具排序触发（把关于 api.deepseek.com，未重定向）。按验证方法，我未运行 PR 的测试套件——本报告是运行时观察，而非 CI。

wenshao

No review findings. Downgraded from Approve to Comment: CI still running. — qwen3.7-max via Qwen Code /review

wenshao · 2026-06-14T22:24:02Z

✅ Local Linux verification — author's commands, a real-execution harness, and a live tmux side-query session

Verified this PR three ways on Linux (the table marks macOS/Linux as CI-only), including the live tmux Before/After evidence requested above. Posting as a merge reference.

Setup

Branch codex/fix-side-query-output-language @ dc3cd09 (fresh git worktree + npm install + npm run build), CLI v0.18.0
Platform 🐧 Linux / Node v22.22.2

1) Author's Reviewer Test Plan reproduced — all green

Check	Result
core `sideQuery` + `sessionRecap` + `classifier`	61 passed
all 8 changed-caller core test files (sideQuery, sessionTitle, sessionRecap, toolUseSummary, suggestionGenerator, web-fetch, ArenaManager, subagentGenerator)	157 passed
CLI `insight/generators/DataProcessor.test.ts`	47 passed
`prettier --check` (key changed files)	✅ clean
`eslint` (key changed files)	✅ clean
`tsc --noEmit` (core)	✅ no errors
`git diff --check`	✅ clean

My earlier review's three missed user-visible callers now all opt in: web-fetch.ts:166, ArenaManager.ts:1702, subagentGenerator.ts:134 — and the requested sideQuery edge-case tests (missing file / blank file / undefined path) are present and pass.

2) Real-execution harness against the compiled `runSideQuery` — 10/10

Imported the built runSideQuery from dist and captured the systemInstruction actually handed to the LLM client:

1. opt-in TRUE:  base instruction preserved + output-language.md content injected
                 + the new "overrides any earlier language-selection rule" line     PASS
2. NO flag (internal/classifier query): systemInstruction UNCHANGED (not injected)  PASS
3. opt-in + output-language file MISSING: graceful, systemInstruction unchanged     PASS
4. caching: stat-signature picks up a rewrite (FR → DE on next call)                PASS
5. clearOutputLanguagePreferenceCache exported                                       PASS
==== 10 passed, 0 failed ====

This independently confirms, against the compiled bundle, the core of the fix:

Opt-in only — the output-language preference is appended only when respectOutputLanguagePreference: true; internal structured side queries (classifier/selector) are left untouched (no duplicate / no corruption of structured output).
The new "…overrides any earlier language-selection rule…" line is appended, which is what lets opt-in callers win without conflicting with prompts that already carry a language hint.
Graceful when the file is missing, and the stat-based cache correctly re-reads after the file changes.

3) Live `tmux` session (Linux) — a user-visible side query honors the language end-to-end

Temporarily set the output language to French (global output-language.md backed up and restored afterward), then triggered a user-visible opt-in side query — the web_fetch summary (web-fetch.ts:166, runs on the main model):

✓ WebFetch  Fetching content from https://example.com … prompt: "Summarize the entire page content in one sentence."
✦ The result is in French (likely because my output language is set to French). …
✦ Voici le résumé en une phrase :
  `example.com` est un domaine réservé par l'IANA pour être utilisé comme exemple dans la documentation …

The model itself notes “The result is in French … because my output language is set to French.” — i.e. the web-fetch summary side query honored output-language.md. A plain conversational turn likewise returned French (“Trois avantages des tests unitaires : …”), confirming the plumbing end-to-end on Linux.

Note: the fast-model-based side queries (follow-up suggestions, session title) didn't fire in my environment because the configured fast model (gpt-5.4) isn't reachable here — an environment limitation, not a PR issue. I used web_fetch, which runs on the main model, to exercise the same opt-in path.

Verdict

LGTM — recommend merge. The change cleanly converts side-query output-language injection from unconditional to opt-in: user-visible callers (title, recap, tool-use summary, suggestions, web-fetch, subagent artifacts, arena summaries, /insight, ACP rewriter) opt in and keep honoring output-language.md, while internal classifier/selector queries and the self-carrying project-summary prompt correctly do not — verified against real shipped code (tests), the compiled bundle (harness), and a live session on Linux. My earlier review suggestions are all addressed.

中文版（点击展开）

✅ 本地 Linux 验证 —— 复现作者命令、真实执行 harness、以及一次实时 tmux side-query 会话

在 Linux 上用三种方式验证（表格里 macOS/Linux 为 CI-only），并补上了上面 review 要求的 tmux Before/After 证据。作为合并参考。

环境

分支 codex/fix-side-query-output-language @ dc3cd09（全新 worktree + npm install + npm run build），CLI v0.18.0
平台 🐧 Linux / Node v22.22.2

1) 复现作者 Reviewer Test Plan —— 全绿

core sideQuery+sessionRecap+classifier 61 passed；8 个 changed-caller core 测试文件 157 passed；CLI DataProcessor.test.ts 47 passed；
prettier --check ✅、eslint ✅、core tsc --noEmit ✅、git diff --check ✅。
我之前 review 指出的三个漏掉的用户可见 caller 现都已 opt in：web-fetch.ts:166、ArenaManager.ts:1702、subagentGenerator.ts:134；要求补的 sideQuery 边界用例（文件缺失/空白/路径 undefined）已覆盖并通过。

2) 针对编译产物 `runSideQuery` 的真实执行 harness —— 10/10

从 dist 导入编译后的 runSideQuery，捕获真正传给 LLM client 的 systemInstruction：

opt-in=true：保留原 system instruction + 注入 output-language.md 内容 + 新增的「overrides any earlier language-selection rule」一行；
不传 flag（内部/classifier 查询）：systemInstruction 不变（不注入）——内部结构化查询不受影响、不会重复或污染；
opt-in + 文件缺失：优雅降级、不变；
缓存：基于 stat 签名，文件改写后下次调用能读到新内容（FR → DE）；
clearOutputLanguagePreferenceCache 已导出。

==== 10 passed, 0 failed ====

3) `tmux` 实时会话（Linux）—— 用户可见 side query 端到端遵守语言

临时把输出语言设为 French（全局 output-language.md 已备份并在测试后还原），触发一个用户可见的 opt-in side query —— web_fetch 摘要（web-fetch.ts:166，走主模型）：

✓ WebFetch  Fetching content from https://example.com …
✦ The result is in French (likely because my output language is set to French). …
✦ Voici le résumé en une phrase : `example.com` est un domaine réservé par l'IANA …

模型自己也指出「The result is in French … because my output language is set to French」—— 即 web-fetch 摘要 side query 遵守了 output-language.md。普通对话轮同样返回法语（"Trois avantages des tests unitaires : …"），端到端验证在 Linux 上成立。

说明：基于 fast model 的 side query（follow-up 建议、session title）在本环境未触发，因为配置的 fast model（gpt-5.4）在此不可达——属环境限制、非 PR 问题。我改用走主模型的 web_fetch 触发同一 opt-in 路径。

结论

LGTM — 建议合并。 该改动把 side-query 的输出语言注入从「无条件」改为「opt-in」：用户可见的 caller（title、recap、tool-use summary、建议、web-fetch、subagent artifact、arena 摘要、/insight、ACP rewriter）opt in、继续遵守 output-language.md；内部 classifier/selector 查询和自带语言规则的 project-summary prompt 正确地不注入。已对真实发布代码（测试）、编译产物（harness）、Linux 实时会话三方面验证。我之前的 review 建议均已处理。

wenshao · 2026-06-14T23:07:07Z

Minimal-change assessment: this PR overshoots

TL;DR: The bug this PR targets (#4494) is already fixed on main by #4636 (1c48e412, "apply output language to side queries"); #4494 was closed as COMPLETED on 2026-05-31. So this isn't fixing an open bug — it re-implements an already-working mechanism as a ~1095-line refactor, the net user-facing benefit over main is marginal, and it adds a cache-staleness risk main doesn't have.

`main` vs this PR

	`main` (after #4636)	this PR
Mechanism	`runSideQuery` unconditionally appends `output-language.md` to every side query	opt-in: append only when `respectOutputLanguagePreference: true`
Size	~20 lines	+1095 / −71, 23 files (12 source + 11 test)
The 4 features in #4494 (recap / title / tool-use summary / suggestions)	already honor output language	behavior unchanged (just default-on → explicit-on)

The net behavioral delta is small

After opting in all 10 user-visible callers, the only real change vs main is that ~6–7 internal structured queries stop receiving the append: permissions/classifier (×2), nextSpeakerChecker, relevanceSelector, forget, chatCompression, project-summary.

I checked how those results are consumed — they're constrained by a JSON schema (shouldBlock: boolean, next_speaker: 'user'|'model', ID lists). The unconditional append on main does not break them; at most it renders an internal reasoning free-text field in another language. So the opt-in inversion is a tidiness change, not a correctness fix.

The one genuine improvement

sessionTitle / sessionRecap still carry a Match the dominant language of the conversation line that conflicts with the appended preference, so the model can still fall back to English. This PR removes those lines and adds "This preference overrides any earlier language-selection rule". That is the one defensible user-facing win — and it's ~4 lines, independent of the opt-in flip or the cache.

A new risk this PR introduces

The new cache keys on mtimeMs:size. An external edit to output-language.md of the same byte length within the mtime-granularity window can serve stale content. The clearOutputLanguagePreferenceCache() call in languageCommand only patches the /language path; edits made outside it can still go stale. main re-reads the file every time and never has this problem — this is complexity added to optimize, then more code to patch the hole the optimization opened.

Separately, the opt-in direction is a maintainability regression: any future user-visible side query silently regresses to English unless the author remembers to opt in, whereas main's default-on makes new callers correct automatically.

What the minimal version looks like

If the desired gain is "don't append to internal queries," the minimal path is keep default-on + add an opt-out:

Add skipOutputLanguagePreference?: boolean (default false) to runSideQuery;
Set it on the ~6–7 internal callers;
Remove the conflicting hint in title/recap and add the "overrides" wording.

That's roughly 8 files / 30–40 lines, no new public API, no cache, no languageCommand change — and it annotates fewer call sites (6–7) than this PR's opt-in (10), while preserving the safe default.

Bottom line: the opt-in + cache redesign is a reasonable long-term design, but #4494 is already fixed by #4636 in ~20 lines. Replacing it with ~1095 lines for marginal net gain plus a new staleness risk overshoots the minimal-change bar. The ~4-line conflicting-hint cleanup is the part worth keeping.

中文版

最小改动评估：这个 PR 超标了

结论速览： 这个 PR 想修的 bug（#4494）已经在 main 上被 #4636 修复（1c48e412，"apply output language to side queries"），#4494 已于 2026-05-31 以 COMPLETED 关闭。所以它不是在修一个未解决的 bug，而是把一个已经能工作的机制重写成约 1095 行的重构；相对 main 的净用户可见收益很边际，还引入了一个 main 没有的缓存陈旧风险。

`main` 与本 PR 对比

	`main`（#4636 之后）	本 PR
机制	`runSideQuery` 对所有 side query 无条件追加 `output-language.md`	opt-in：只有传 `respectOutputLanguagePreference: true` 才追加
体量	~20 行	+1095 / −71，23 个文件（12 源文件 + 11 测试）
#4494 里的 4 个功能（recap / title / tool-use summary / suggestion）	已经遵守输出语言	行为不变（只是从"默认开"变成"显式开"）

净行为差异其实很小

把 10 个用户可见调用点全部 opt-in 之后，相对 main 唯一真正的变化是：约 6~7 个内部结构化 query 不再被追加语言指令——permissions/classifier（×2）、nextSpeakerChecker、relevanceSelector、forget、chatCompression、project-summary。

我核对了这些返回值的消费方：它们都由 JSON schema 约束（shouldBlock: boolean、next_speaker: 'user'|'model'、ID 列表）。main 上的无条件追加并不会破坏它们，最多让内部的 reasoning 自由文本变成另一种语言。所以 opt-in 翻转改的是「整洁度」，不是「正确性」。

唯一一个真正的改进

sessionTitle / sessionRecap 里仍留着一句 Match the dominant language of the conversation，它和追加进来的语言偏好互相冲突，模型仍可能回退到英文。这个 PR 删掉了那两句、并加上了 "This preference overrides any earlier language-selection rule"。这是它唯一站得住脚的用户可见改进——而且只有约 4 行，与 opt-in 翻转、缓存都无关。

它新引入的风险

新缓存以 mtimeMs:size 作为签名。如果在 mtime 粒度窗口内对 output-language.md 做一次等字节长度的外部编辑，缓存可能发陈旧内容。languageCommand 里的 clearOutputLanguagePreferenceCache() 只覆盖了 /language 这条路径，走它之外的编辑仍可能 stale。main 每次都重新读文件，根本没有这个问题——这是「为优化引入复杂度，再写代码去补优化挖的坑」。

另外，opt-in 这个方向本身是可维护性倒退：以后任何新增的用户可见 side query，只要作者忘了 opt-in 就会静默回退成英文；而 main 的「默认开」让新调用点自动正确。

最小版本应该长什么样

如果想要的收益是「不要给内部 query 追加语言」，最小做法是保持默认开 + 加一个 opt-out：

给 runSideQuery 加 skipOutputLanguagePreference?: boolean（默认 false）；
在那 6~7 个内部调用点设置它；
删掉 title/recap 里冲突的 hint，并加上 "overrides" 文案。

这样大约 8 个文件 / 30~40 行，不需要新的公共 API、不需要缓存、不需要碰 languageCommand——而且标注的调用点（6~7 个）比本 PR 的 opt-in（10 个）还更少，同时保住了「默认安全」这个属性。

一句话： opt-in + 缓存的重设计作为长期设计是合理的，但 #4494 已被 #4636 用约 20 行修好。用约 1095 行去置换它、净收益边际、还新引入陈旧风险——以「最小改动原则」衡量是超标的。真正值得保留的是删冲突 hint 那约 4 行。

Replace the oversized opt-in refactor with the minimal approach from review feedback: keep main's default-on language append, add skipOutputLanguagePreference for internal structured queries, and remove the conflicting English/Chinese-only hints from session title/recap prompts so the appended preference can take effect. Co-authored-by: JerryLee <Jerry2003826@users.noreply.github.com>

wenshao · 2026-06-15T08:19:12Z

🧪 Local runtime verification (real CLI + mock provider, tmux)

This change is prompt-construction routing, so I verified it on the real on-the-wire system instruction: a real qwen (built from this PR's core) talking to a mock OpenAI server that captures every /v1/chat/completions body, with a sentinel output-language file (~/.qwen/output-language.md → Klingon_SENTINEL_4519). I then checked, per side query, whether the injected marker "Follow the user-visible output language preference below for this side query." is present. Built on PR head bf77c7bb6.

TL;DR

✅ Implementation is correct and correctly routed. Internal/structured side queries no longer receive the output-language rule; every user-visible one still does. One caveat: the PR description is stale — it describes an opt-in respectOutputLanguagePreference design and DataProcessor.ts changes that do not exist in the diff (the code uses opt-out skipOutputLanguagePreference). Please refresh the description before merge.

Evidence 1 — Real-wire A/B (tmux, mock provider)

Captured side-query system instructions from a live session with the preference configured:

Side query (purpose)	Class	Output-language marker on the wire
main agent turn	user response	present (via main prompt — unaffected)
`session-recap`	user-visible	PRESENT ✓
`chat-compression`	internal	ABSENT ✓
`chat-compression` (skip flag removed)	counterfactual	PRESENT again ⟵ proves the flag is the cause

The counterfactual is the clincher: I surgically deleted skipOutputLanguagePreference: true from the compression call site, rebuilt, and the marker (plus the PR's new override line and the full preference block) reappeared on the compression request — then vanished again once restored. So the flag is exactly what removes the injection.

session-recap is also the regression check: this PR deletes recap's own "Match the dominant language…" line, so recap now depends on the centralized injection — and the wire confirms it still gets it.

Evidence 2 — Complete routing audit (every `runSideQuery` call site)

`skipOutputLanguagePreference: true` (internal — 7)	default = inject (user-visible — 9)
`permission_classifier_stage1` / `stage2`	`session-title`, `session-recap`
`auto-memory-recall`, `auto-memory-forget-selection`	`tool-use-summary`, `web-fetch`
`next-speaker`	`arena-approach-summary`, `subagent-generator`
`chat-compression`	`prompt-suggestion`, `insight-session-analysis` / `-qualitative-generate`
`project-summary`	`acp-rewrite`

Both non-obvious calls check out:

acp-rewrite stays inject — its prompt produces "user-friendly progress updates" (user-visible), so it should honor the language. ✓
project-summary skips — but it passes systemInstruction: chatSystemInstruction (the main session prompt, which already embeds the language rule), so skipping just avoids a duplicate; the summary still respects the language. ✓

Evidence 3 — Mechanism, the new override line, and test teeth

sideQuery.test.ts: 21 pass, including the new "skips output language when skipOutputLanguagePreference is true". Mutation check: make runSideQuery ignore the flag (always inject) → that test fails, so it genuinely pins the behavior.
The PR also adds "This preference overrides any earlier language-selection rule in this system instruction." to the injected block. It appears on the wire when injected (confirmed in the counterfactual), and it's what lets session-title/session-recap safely drop their own per-prompt language lines — the centralized rule now explicitly wins over any earlier one.

Evidence 4 — Description ≠ implementation (please fix before merge)

The PR body and its "Reviewer Test Plan" describe a different design than the code:

PR body says	Actual diff
opt-in flag `respectOutputLanguagePreference: true`	opt-out flag `skipOutputLanguagePreference: true` (0 occurrences of the body's name)
`DataProcessor.ts` opts the `/insight` queries in; `DataProcessor.test.ts` added	`DataProcessor.ts` unchanged (insight injects by default); no such test
`sessionRecap.test.ts` added	not in the diff

The commit message ("skip output language on internal side queries") matches the real code; the PR description appears to be from an earlier revision. The implementation is fine — just update the prose/test-plan so history and reviewers aren't misled.

Verdict

✅ Safe to merge once the description is refreshed. The routing is correct end-to-end: internal/structured queries (classifiers, memory selectors, next-speaker, compression, project-summary) no longer get told to answer in the user's language — which is what could corrupt their JSON/enum output (#4494) — while every user-visible summary still honors output-language.md.

🇨🇳 中文版（点击展开）

🧪 本地运行验证（真实 CLI + mock provider，tmux）

这个改动是 prompt 构造/路由，所以我在真实的线上 system instruction 上做了验证：用真实 qwen（基于本 PR 的 core 构建）连到一个会抓取每个 /v1/chat/completions body 的 mock OpenAI server，并配置了一个哨兵输出语言文件（~/.qwen/output-language.md → Klingon_SENTINEL_4519）。然后逐个 side query 检查注入的标记句 "Follow the user-visible output language preference below for this side query." 是否存在。基于 PR head bf77c7bb6。

结论速览

✅ 实现正确、路由正确。 内部/结构化 side query 不再收到输出语言规则；所有用户可见的仍然收到。一个提醒： PR 描述是过时的——它描述的是 opt-in 的 respectOutputLanguagePreference 设计和 DataProcessor.ts 改动，而这些在 diff 里根本不存在（实际代码用的是 opt-out 的 skipOutputLanguagePreference）。合并前请更新描述。

证据 1 —— 真实线上 A/B（tmux + mock provider）

在配置了语言偏好的真实会话里抓取的 side-query system instruction：

Side query (purpose)	类型	线上是否有输出语言标记
主 agent 轮	用户回复	有（来自主 prompt，不受影响）
`session-recap`	用户可见	有 ✓
`chat-compression`	内部	没有 ✓
`chat-compression`（去掉 skip flag）	反事实	又出现了 ⟵ 证明就是这个 flag 的作用

反事实是决定性的：我手术式删掉压缩调用点的 skipOutputLanguagePreference: true、重新构建，压缩请求上的标记（以及 PR 新增的 override 行和完整偏好块）又出现了——恢复后又消失。所以正是这个 flag 移除了注入。

session-recap 同时是回归检查：本 PR 删掉了 recap 自己的 "Match the dominant language…" 行，于是 recap 现在依赖集中式注入——线上确认它仍然拿到了。

证据 2 —— 完整路由审计（每一个 `runSideQuery` 调用点）

`skipOutputLanguagePreference: true`（内部 — 7）	默认 = 注入（用户可见 — 9）
`permission_classifier_stage1` / `stage2`	`session-title`, `session-recap`
`auto-memory-recall`, `auto-memory-forget-selection`	`tool-use-summary`, `web-fetch`
`next-speaker`	`arena-approach-summary`, `subagent-generator`
`chat-compression`	`prompt-suggestion`, `insight-session-analysis` / `-qualitative-generate`
`project-summary`	`acp-rewrite`

两个不那么显然的调用点也都没问题：

acp-rewrite 保持注入——它的 prompt 产出 "user-friendly progress updates"（用户可见），应当遵守语言。✓
project-summary 跳过——但它传了 systemInstruction: chatSystemInstruction（主会话 prompt，已经内嵌语言规则），所以跳过只是避免重复；摘要仍然遵守语言。✓

证据 3 —— 机制、新增 override 行、测试牙齿

sideQuery.test.ts：21 通过，含新增的 "skips output language when skipOutputLanguagePreference is true"。变异测试： 让 runSideQuery 忽略该 flag（恒注入）→ 该测试失败，说明它真的钉住了行为。
PR 还在注入块里加了 "This preference overrides any earlier language-selection rule in this system instruction."。注入时它会出现在线上（反事实里确认），并且正是它让 session-title/session-recap 能安全删掉各自 prompt 里的语言行——集中式规则现在明确地压过任何更早的规则。

证据 4 —— 描述 ≠ 实现（合并前请修正）

PR 正文和它的 "Reviewer Test Plan" 描述的是和代码不一样的设计：

PR 正文说	实际 diff
opt-in flag `respectOutputLanguagePreference: true`	opt-out flag `skipOutputLanguagePreference: true`（正文那个名字 0 次出现）
`DataProcessor.ts` 让 `/insight` opt in；新增 `DataProcessor.test.ts`	`DataProcessor.ts` 未改动（insight 默认注入）；没有这个测试
新增 `sessionRecap.test.ts`	不在 diff 里

commit message（"skip output language on internal side queries"）和真实代码一致；PR 描述看起来是早期版本残留。实现没问题——只需更新文字/测试计划，免得误导历史和审阅者。

结论

✅ 更新描述后即可合并。 路由端到端正确：内部/结构化查询（分类器、记忆选择器、next-speaker、压缩、project-summary）不再被要求用用户语言回答——这正是会破坏它们 JSON/enum 输出的原因（#4494）——而每个用户可见的摘要仍然遵守 output-language.md。

_{Method: real qwen (bundled from this PR's core) + a mock OpenAI server capturing every request body, in tmux; sentinel ~/.qwen/output-language.md; per-side-query marker check; surgical flag-removal counterfactual; complete call-site audit; sideQuery.test.ts + a stub-mutation teeth check. PR head bf77c7bb6.}

wenshao

No review findings. Downgraded from Approve to Comment: CI still running. The opt-out flip is clean — caller categorization is correct (internal callers skip, user-visible callers get the preference by default), override sentence resolves manual-language conflicts, and the test covers both inject and skip paths. — qwen3.7-max via Qwen Code /review

DragonnZhang

Well-designed fix: skipOutputLanguagePreference flag prevents the user's output-language.md rule from being injected into internal machine-to-machine side queries (permission classifier, compression, memory recall, next-speaker checker, session title, recap). The flag defaults to false so new user-visible side queries still honor the preference. Removed redundant "match the dominant language" instructions from recap/title prompts. Test coverage validates the skip behavior. CI green. LGTM ✅ — claude-opus-4-6 via Qwen Code /review

wenshao · 2026-06-18T22:53:24Z

@qwen-code /triage

qwen-code-ci-bot · 2026-06-18T22:56:27Z

Thanks for the PR!

Template looks good ✓ — all required sections present.

On direction: the bug (#4494) was already fixed on main by #4636 (merged 2026-05-31), which applies output language to all side queries unconditionally. This PR is a refinement on top of that — adding an opt-out (skipOutputLanguagePreference) for internal side queries that don't produce user-visible text (classifier, compression, memory selectors, next-speaker checker) and cleaning up redundant language hints from prompts that now receive automatic injection. That's a legitimate improvement: it reduces token waste on internal queries and eliminates the duplicated "Match the dominant language" hints in sessionRecap/sessionTitle. Aligned with the project.

On approach: the current diff (+55/-4 across 10 files) is tight and focused. Each change serves the stated goal. Two minor notes:

PR body is stale. The body repeatedly references respectOutputLanguagePreference: true (opt-in semantics) but the code uses skipOutputLanguagePreference (opt-out). The body should be updated to match the actual implementation — reviewers reading the description get the wrong impression of the API surface.
Override sentence not documented. The diff adds 'This preference overrides any earlier language-selection rule in this system instruction.' to getOutputLanguageInstruction(), but this behavioral addition isn't mentioned in the PR body. It's a reasonable addition (resolves conflicts between old per-prompt language hints and the new automatic injection), but should be called out.

Neither is blocking. Moving on to code review. 🔍

中文说明

感谢贡献！

模板完整 ✓

方向：底层 bug (#4494) 已被 #4636（2026-05-31 合并）修复——对所有 side query 无条件注入输出语言。本 PR 是在此基础上的优化：为不产生用户可见文本的内部 side query（分类器、压缩、记忆选择器、next-speaker 检查器）增加 skipOutputLanguagePreference opt-out，并清理 prompt 中已有的冗余语言提示。这是合理的改进——减少内部查询的 token 浪费，消除 sessionRecap/sessionTitle 中重复的"匹配对话的主要语言"提示。方向与项目一致。

方案：当前 diff（+55/-4，10 个文件）紧凑且聚焦，每处改动都服务于既定目标。两个小问题：

PR 描述已过时。 描述中多次提到 respectOutputLanguagePreference: true（opt-in 语义），但代码使用 skipOutputLanguagePreference（opt-out）。建议更新描述以匹配实际实现——否则审阅者对 API 设计会有误解。
Override 语句未说明。 diff 在 getOutputLanguageInstruction() 中增加了 'This preference overrides any earlier language-selection rule in this system instruction.'，但 PR 描述中未提及这一行为变更。这个新增是合理的（解决了旧 prompt 语言提示与新的自动注入之间的冲突），但应该在描述中明确说明。

两者都不构成阻塞。进入代码审查 🔍

— Qwen Code · qwen3.7-max

qwen-code-ci-bot · 2026-06-18T23:05:16Z

Code Review

Independent proposal (before reading diff): add an opt-out flag to runSideQuery for internal callers, remove redundant language hints from sessionRecap/sessionTitle, add a test for the skip path.

The PR matches this proposal exactly. Caller categorization is correct:

Correctly skipped (internal, not user-visible):

classifier.ts (permission classifier stages 1 & 2)
chatCompressionService.ts (chat compression summary)
forget.ts (memory selection for /forget)
relevanceSelector.ts (auto-memory recall)
nextSpeakerChecker.ts (next speaker detection)
summaryCommand.ts (uses main agent's systemInstruction which already contains the language preference)

Correctly getting language by default (user-visible):

sessionRecap.ts — old "Match the dominant language" hint removed ✅
sessionTitle.ts — old "Match the dominant language" hint removed ✅

No critical blockers. No AGENTS.md violations. The override sentence added to getOutputLanguageInstruction() is a reasonable conflict-resolution mechanism for prompts that still carry legacy language hints.

Test Results

All targeted tests pass on the PR branch:

✓ src/utils/sideQuery.test.ts          — 21 tests (20 existing + 1 new skip test)
✓ src/permissions/classifier.test.ts   — 23 tests
✓ src/services/insight/generators/DataProcessor.test.ts — 45 tests
✓ TypeScript typecheck (packages/core) — clean
✓ ESLint (all 8 changed source files)  — clean
✓ Build                                — clean (0 errors)

Smoke Test (tmux)

$ npm run dev -- -p 'say hello in one word' --max-turns 1 2>&1 | tee /tmp/triage-smoke.log

[CLI help displayed — --max-turns is not a valid flag (--max-session-turns)]
[CLI started and exited cleanly, no crashes or runtime errors]
runner@runnervm7b5n9:~/work/qwen-code/qwen-code/.qwen/worktrees/triage$

CLI starts and exits cleanly. The help display is from an incorrect flag name in my test, not a PR issue. The prompt-construction change is verified by unit tests (the new test asserts the system instruction does NOT contain the output language instruction when skipOutputLanguagePreference: true).

中文说明

代码审查

独立方案（看 diff 前）：为 runSideQuery 添加 opt-out 标志供内部调用者使用，移除 sessionRecap/sessionTitle 中冗余的语言提示，添加 skip 路径测试。

PR 与方案完全一致。调用者分类正确：

正确跳过（内部，非用户可见）： classifier（权限分类两阶段）、chatCompressionService、forget（记忆选择）、relevanceSelector（自动记忆召回）、nextSpeakerChecker、summaryCommand（使用已含语言偏好的主 agent systemInstruction）。

正确默认获取语言指令（用户可见）： sessionRecap（移除旧提示 ✅）、sessionTitle（移除旧提示 ✅）。

无阻塞性问题，无 AGENTS.md 违规。getOutputLanguageInstruction() 中添加的 override 语句是合理的冲突解决机制。

测试结果

所有目标测试在 PR 分支上通过：sideQuery 21 项、classifier 23 项、DataProcessor 45 项、TypeScript 类型检查通过、ESLint 通过、构建通过。

冒烟测试

CLI 正常启动和退出，无崩溃或运行时错误。prompt 构造变更由单元测试验证。

— Qwen Code · qwen3.7-max

qwen-code-ci-bot · 2026-06-18T23:05:42Z

Reflection

This PR went through 8+ review rounds and has been reduced to exactly what it should be: a focused refinement on top of #4636's unconditional language injection. The diff is tight (+55/-4, 10 files), every change earns its place, and the caller categorization is correct.

The core value: internal side queries (classifiers, compression, memory selectors) no longer receive a language instruction they don't need, saving tokens and avoiding potential model confusion. The redundant "Match the dominant language" hints in sessionRecap and sessionTitle are cleaned up now that automatic injection handles it.

The override sentence in getOutputLanguageInstruction() is a small but useful addition — it resolves priority conflicts when legacy language hints still exist in prompts not yet cleaned up.

My independent proposal matched this PR's approach exactly. No simpler path was missed.

The two notes from Stage 1 (stale PR body mentioning respectOutputLanguagePreference and the undocumented override sentence) are cosmetic — they don't affect correctness or merge-readiness. The author can update the body in a follow-up commit or at merge time.

Verdict: Approve. Ship it. ✅

中文说明

反思

本 PR 经历了 8+ 轮审查，已精简为应有的形态：在 #4636 无条件语言注入基础上的聚焦优化。diff 紧凑（+55/-4，10 个文件），每处改动都有必要，调用者分类正确。

核心价值：内部 side query（分类器、压缩、记忆选择器）不再接收不需要的语言指令，节省 token 并避免模型混淆。sessionRecap 和 sessionTitle 中冗余的"匹配主要语言"提示已被清理。

getOutputLanguageInstruction() 中的 override 语句是合理的小改进——解决了尚未清理的旧语言提示与新自动注入之间的优先级冲突。

独立方案与本 PR 完全一致，未发现更简路径。

Stage 1 提到的两个问题（PR 描述过时、override 语句未说明）是表面问题——不影响正确性或合并就绪度。作者可在后续提交或合并时更新描述。

结论：批准。 可以合并。✅

— Qwen Code · qwen3.7-max

qwen-code-ci-bot

LGTM, looks ready to ship. ✅

fixed

Jerry2003826 marked this pull request as ready for review May 25, 2026 22:07

wenshao reviewed May 26, 2026

View reviewed changes

pomelo-nwu previously requested changes May 26, 2026

View reviewed changes

wenshao added the type/bug Something isn't working as expected label May 26, 2026

github-actions Bot mentioned this pull request May 26, 2026

📊 AI CLI 工具社区动态日报 2026-05-26 ivanweng2077/big_model_radar#91

Open

wenshao reviewed May 26, 2026

View reviewed changes

wenshao mentioned this pull request May 26, 2026

fix(core): bound hard rescue compression retries #4526

Merged

Jerry2003826 force-pushed the codex/fix-side-query-output-language branch from aaf6883 to e0c28da Compare May 26, 2026 09:00

wenshao previously approved these changes May 26, 2026

View reviewed changes

Jerry2003826 dismissed wenshao’s stale review via 84eec74 May 26, 2026 13:41

wenshao reviewed May 26, 2026

View reviewed changes

wenshao reviewed May 27, 2026

View reviewed changes

BZ-D previously approved these changes Jun 1, 2026

View reviewed changes

Jerry2003826 dismissed BZ-D’s stale review via d286835 June 1, 2026 08:39

wenshao reviewed Jun 1, 2026

View reviewed changes

Jerry2003826 force-pushed the codex/fix-side-query-output-language branch 5 times, most recently from d8db7cb to 6ae137a Compare June 1, 2026 19:59

xaelistic pushed a commit to xaelistic/qwen-code that referenced this pull request Jun 7, 2026

[ide-mode] Add openDiff tool to IDE MCP server (QwenLM#4519)

a031093

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

DragonnZhang reviewed Jun 8, 2026

View reviewed changes

github-actions Bot mentioned this pull request Jun 10, 2026

📊 AI CLI 工具社区动态日报 2026-06-10 zx0828/big_model_radar#99

Open

wenshao mentioned this pull request Jun 10, 2026

fix(ci): normalize dev launcher path assertions on Windows #4915

Merged

DragonnZhang reviewed Jun 10, 2026

View reviewed changes

DragonnZhang previously approved these changes Jun 10, 2026

View reviewed changes

cursor Bot dismissed DragonnZhang’s stale review via 16d1da5 June 14, 2026 18:07

wenshao previously approved these changes Jun 14, 2026

View reviewed changes

cursor Bot dismissed wenshao’s stale review via dc3cd09 June 14, 2026 18:59

wenshao reviewed Jun 14, 2026

View reviewed changes

wenshao closed this Jun 14, 2026

wenshao reopened this Jun 14, 2026

cursor Bot force-pushed the codex/fix-side-query-output-language branch from dc3cd09 to bf77c7b Compare June 15, 2026 01:48

wenshao reviewed Jun 15, 2026

View reviewed changes

DragonnZhang approved these changes Jun 15, 2026

View reviewed changes

qwen-code-ci-bot approved these changes Jun 18, 2026

View reviewed changes

wenshao merged commit ecec261 into QwenLM:main Jun 18, 2026
28 checks passed

	clearOutputLanguagePreferenceCache(config?.getOutputLanguageFilePath?.());
	clearOutputLanguagePreferenceCache();

-  const systemInstruction = await applyOutputLanguagePreference(
+      const result = await runSideQuery(this.config, {
+        purpose: 'acp-rewrite',
+        model,
+        maxAttempts: 1,
+        systemInstruction: this.prompt,
+        respectOutputLanguagePreference: true,

-  let abortController: AbortController;
+  beforeEach(() => {
+    clearOutputLanguagePreferenceCache();
+    abortController = new AbortController();
+    mockBaseLlmClient = {
+      generateJson: vi.fn(),
+      generateText: vi.fn(),
+    } as unknown as BaseLlmClient;
+    mockConfig = {
+      getBaseLlmClient: vi.fn(() => mockBaseLlmClient),
+      getModel: vi.fn(() => 'qwen3-coder-plus'),
+      getFastModel: vi.fn(() => 'fast-model'),
+      getOutputLanguageFilePath: vi.fn(),
+    } as unknown as Config;
+  });

Conversation

Jerry2003826 commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does

Why it's needed

Reviewer Test Plan

How to verify

Evidence (Before & After)

Tested on

Environment (optional)

Risk & Scope

Linked Issues

这个 PR 做了什么

为什么需要

Reviewer Test Plan

证据

风险和范围

Uh oh!

wenshao left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pomelo-nwu left a comment

Choose a reason for hiding this comment

Uh oh!

Jerry2003826 commented May 26, 2026

Uh oh!

Jerry2003826 commented May 26, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Jerry2003826 commented May 26, 2026

Uh oh!

wenshao left a comment

Choose a reason for hiding this comment

Uh oh!

wenshao left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wenshao left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Jerry2003826 commented May 25, 2026 •

edited

Loading