fix(core): honor output language in side queries#4519
Conversation
wenshao
left a comment
There was a problem hiding this comment.
[Suggestion] Missed callers that produce user-visible text
Three additional runSideQuery callers produce user-visible output but were not updated with respectOutputLanguagePreference: true:
packages/core/src/tools/web-fetch.ts:154— web content summaries shown to userspackages/core/src/agents/arena/ArenaManager.ts:1690— agent approach summaries for side-by-side comparisonpackages/core/src/utils/subagentGenerator.ts:129— generated agent name/description visible to users
Consider adding the flag to these call sites for consistency with the four callers already updated.
[Suggestion] Test gaps in sideQuery.test.ts
The new tests cover the happy path (flag=true, valid file with content) but miss three realistic edge cases:
- File does not exist (exercises the
catch {}path atsideQuery.ts:157) - File is empty or whitespace-only (exercises
content.trim() || undefined) getOutputLanguageFilePath()returnsundefinedwithrespectOutputLanguagePreference: true(exercises the early return atsideQuery.ts:152)
— qwen3.7-max via Qwen Code /review
| config: Config, | ||
| ): Promise<string | undefined> { | ||
| const filePath = config.getOutputLanguageFilePath?.(); | ||
| if (!filePath) return undefined; |
There was a problem hiding this comment.
[Suggestion] readOutputLanguagePreference calls readFile on every runSideQuery invocation with no caching. The file path is static for the Config instance lifetime, and generateToolUseSummary fires on every completed tool batch — so the same small file is read from disk dozens of times per session.
Consider memoizing the result (keyed by filePath):
let cachedPreference: { filePath: string; value: string | undefined } | undefined;
async function readOutputLanguagePreference(
config: Config,
): Promise<string | undefined> {
const filePath = config.getOutputLanguageFilePath?.();
if (!filePath) return undefined;
if (cachedPreference?.filePath === filePath) return cachedPreference.value;
try {
const content = await readFile(filePath, 'utf8');
const value = content.trim() || undefined;
cachedPreference = { filePath, value };
return value;
} catch {
cachedPreference = { filePath, value: undefined };
return undefined;
}
}— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Handled in 84eec74. readOutputLanguagePreference now memoizes by file path, and the new regression mutates the same output-language.md after the first side query to verify subsequent calls reuse the cached preference instead of re-reading.
| } as unknown as Config; | ||
| }); | ||
|
|
||
| function writeOutputLanguageFile(content: string): string { |
There was a problem hiding this comment.
[Suggestion] writeOutputLanguageFile is duplicated verbatim across 5 test files (sideQuery.test.ts, sessionRecap.test.ts, sessionTitle.test.ts, toolUseSummary.test.ts, suggestionGenerator.test.ts). Only the temp directory prefix string differs.
Consider extracting to a shared test utility:
// packages/core/src/test-utils/outputLanguage.ts
export function writeOutputLanguageFile(content: string): string {
const dir = fs.mkdtempSync(path.join(os.tmpdir(), 'qwen-output-lang-'));
const file = path.join(dir, 'output-language.md');
fs.writeFileSync(file, content, 'utf8');
return file;
}— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
I left this one out of the focused bug fix for now. Extracting the duplicated test helper would touch several unrelated test files, while this PR's behavioral changes stay limited to side-query output-language plumbing and its direct coverage.
| { role: 'user', parts: [{ text: RECAP_USER_PROMPT }] }, | ||
| ], | ||
| systemInstruction: RECAP_SYSTEM_PROMPT, | ||
| respectOutputLanguagePreference: true, |
There was a problem hiding this comment.
[Suggestion] RECAP_SYSTEM_PROMPT (line 20) and TITLE_SYSTEM_PROMPT (line 26 in sessionTitle.ts) both contain the instruction "Match the dominant language of the conversation (English or Chinese)." When respectOutputLanguagePreference: true appends a conflicting directive like "You MUST always respond in Chinese", the model receives two contradictory language instructions. The outcome is model-dependent and non-deterministic.
Consider either removing the "Match the dominant language" sentence from these prompts, or appending an explicit override (e.g., "This overrides any prior language instructions in this prompt.").
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Handled in 84eec74. The output-language append now includes an explicit override sentence, so a configured output-language preference supersedes earlier language-selection guidance in prompts like recap/title without editing each caller prompt separately.
pomelo-nwu
left a comment
There was a problem hiding this comment.
Hi @Jerry2003826, thank you for your continued contributions — 9 PRs in a short time is impressive! 🎉
As we review your changes, we'd like to ask you to update each PR to follow the latest PR template on the main branch. The most important section is the Reviewer Test Plan, which significantly accelerates the review and merge process.
Specifically, for each PR please include:
- How to verify — clear reproduction steps so a reviewer can confirm the fix/feature
- Evidence (Before & After) — use the
tmux-real-user-testingskill (or manual tmux capture) to show before/after screenshots of the TUI behavior. Side-by-side evidence makes it much faster for maintainers to validate and merge - Tested on — fill in the OS table (macOS / Windows / Linux)
PRs with a complete Reviewer Test Plan are prioritized for review — without it, review may be delayed.
You can see the full template at: .github/pull_request_template.md
Thanks again for your effort — looking forward to getting these merged! 🚀
中文说明
你好 @Jerry2003826,感谢你的持续贡献——短时间内提交了 9 个 PR,非常高效!🎉
在 review 过程中,我们希望你能按照 main 分支上最新的 PR 模版更新每个 PR 的描述。其中最关键的部分是 Reviewer Test Plan,它能显著加速审核和合并流程。
具体来说,请为每个 PR 补充:
- How to verify — 清晰的复现步骤,让 reviewer 能确认修复/功能的效果
- Evidence (Before & After) — 使用
tmux-real-user-testingskill(或手动 tmux 截取)展示修改前后的 TUI 截图对比,前后对比能让维护者更快地验证和合并 - Tested on — 填写操作系统测试表格(macOS / Windows / Linux)
有完整 Reviewer Test Plan 的 PR 会被优先审核——缺少该部分可能会导致审核延迟。
完整模版见:.github/pull_request_template.md
再次感谢你的付出,期待尽快把这些 PR 合并!🚀
— Qwen Code
|
Updated in a22d233. This addresses the review comments by adding
I also added coverage for the requested Validation run locally on Windows: npm run test --workspace=packages/core -- src/tools/web-fetch.test.ts -t "output language"
npm run test --workspace=packages/core -- src/utils/subagentGenerator.test.ts -t "output language"
npm run test --workspace=packages/core -- src/agents/arena/ArenaManager.test.ts -t "routes all approach summaries"
npm run test --workspace=packages/core -- src/utils/sideQuery.test.ts -t "output language path|output language file is blank"
npm run test --workspace=packages/core -- src/utils/sideQuery.test.ts src/services/sessionTitle.test.ts src/services/sessionRecap.test.ts src/services/toolUseSummary.test.ts src/followup/suggestionGenerator.test.ts src/tools/web-fetch.test.ts src/agents/arena/ArenaManager.test.ts src/utils/subagentGenerator.test.ts
npx prettier --check packages/core/src/utils/sideQuery.test.ts packages/core/src/tools/web-fetch.ts packages/core/src/tools/web-fetch.test.ts packages/core/src/agents/arena/ArenaManager.ts packages/core/src/agents/arena/ArenaManager.test.ts packages/core/src/utils/subagentGenerator.ts packages/core/src/utils/subagentGenerator.test.ts
npx eslint packages/core/src/utils/sideQuery.test.ts packages/core/src/tools/web-fetch.ts packages/core/src/tools/web-fetch.test.ts packages/core/src/agents/arena/ArenaManager.ts packages/core/src/agents/arena/ArenaManager.test.ts packages/core/src/utils/subagentGenerator.ts packages/core/src/utils/subagentGenerator.test.ts
npm run lint --workspace=packages/core
npm run typecheck --workspace=packages/core
git diff --checkNote: when I initially ran several vitest commands in parallel locally on Windows, two runs hit the known |
|
Added aaf6883 for the remaining I also rechecked the three user-visible call sites from the review against the current branch head; they already pass
Validation: npm run test --workspace=@qwen-code/qwen-code-core -- src/utils/sideQuery.test.ts -t "output language"
npm run test --workspace=@qwen-code/qwen-code-core -- src/utils/sideQuery.test.ts
npx eslint packages/core/src/utils/sideQuery.ts packages/core/src/utils/sideQuery.test.ts packages/core/src/tools/web-fetch.ts packages/core/src/agents/arena/ArenaManager.ts packages/core/src/utils/subagentGenerator.ts
npx prettier --check packages/core/src/utils/sideQuery.ts packages/core/src/utils/sideQuery.test.ts packages/core/src/tools/web-fetch.ts packages/core/src/agents/arena/ArenaManager.ts packages/core/src/utils/subagentGenerator.ts
npm run lint --workspace=@qwen-code/qwen-code-core
npm run typecheck --workspace=@qwen-code/qwen-code-core
git diff --check |
| ); | ||
| }); | ||
|
|
||
| it('leaves JSON system instruction unchanged when output language path is missing', async () => { |
There was a problem hiding this comment.
[Suggestion] JSON-mode negative test coverage is asymmetric with text mode. Text mode has three negative tests (path undefined, file missing, file blank), but JSON mode has only one (file missing). Also: the test name says "path is missing" when the file at the path is missing — the text-mode tests correctly distinguish these cases. And the bare path.join(os.tmpdir(), 'qwen-side-query-missing.md') is not parallel-safe; the text-mode "file is missing" test at line ~515 correctly uses fs.mkdtempSync for isolation.
Consider adding two JSON-mode tests mirroring text mode, and using mkdtempSync for the existing one:
it('leaves JSON system instruction unchanged when output language path is undefined', async () => {
vi.mocked(mockBaseLlmClient.generateJson).mockResolvedValue({ ok: true });
// mockConfig already returns undefined for getOutputLanguageFilePath
await runSideQuery<{ ok: boolean }>(mockConfig, { /* ... */ respectOutputLanguagePreference: true });
expect(callArg.systemInstruction).toBe('custom JSON side query prompt');
});
it('leaves JSON system instruction unchanged when output language file is missing', async () => {
vi.mocked(mockBaseLlmClient.generateJson).mockResolvedValue({ ok: true });
const dir = fs.mkdtempSync(path.join(os.tmpdir(), 'qwen-side-query-missing-'));
vi.mocked(mockConfig.getOutputLanguageFilePath).mockReturnValue(
path.join(dir, 'output-language.md'),
);
// ...
});
it('leaves JSON system instruction unchanged when output language file is blank', async () => {
vi.mocked(mockBaseLlmClient.generateJson).mockResolvedValue({ ok: true });
vi.mocked(mockConfig.getOutputLanguageFilePath).mockReturnValue(
writeOutputLanguageFile(' \n\t '),
);
// ...
});— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Added in e0c28da. JSON mode now mirrors the text-mode negative coverage: undefined output-language path, missing file using an isolated temp dir, and blank file all leave the original JSON systemInstruction unchanged.
| const callArg = mockGenerateContent.mock.calls[0]?.[0]; | ||
| expect(callArg.systemInstruction).toContain( | ||
| 'You MUST always respond in Chinese.', | ||
| ); |
There was a problem hiding this comment.
[Suggestion] This assertion only checks that the language preference is present, but doesn't verify the original systemInstruction is preserved. Compare with existing tests in toolUseSummary.test.ts and sessionRecap.test.ts which assert both:
// Established pattern (e.g. toolUseSummary.test.ts:322-326):
expect(options.systemInstruction).toContain(TOOL_USE_SUMMARY_SYSTEM_PROMPT);
expect(options.systemInstruction).toContain('You MUST always answer in Chinese.');If appendSystemInstructionText had a bug that replaced rather than appended, this test would still pass. Add:
| ); | |
| expect(callArg.systemInstruction).toContain( | |
| 'Extract and summarize the requested information', | |
| ); | |
| expect(callArg.systemInstruction).toContain( | |
| 'You MUST always answer in Chinese.', | |
| ); |
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Added in e0c28da. The web-fetch output-language test now asserts both that the original web-fetch system instruction (Extract and summarize the requested information...) is preserved and that the configured language preference is appended.
| const generateJsonCall = (mockClient.generateJson as Mock).mock.calls[0]; | ||
| const callParams = generateJsonCall[0]; | ||
| expect(callParams.systemInstruction).toContain( | ||
| 'You MUST always respond in Chinese.', |
There was a problem hiding this comment.
[Suggestion] Same as web-fetch.test.ts — this only asserts the language preference is present. The established pattern in toolUseSummary.test.ts and sessionRecap.test.ts also verifies the original systemInstruction is preserved:
| 'You MUST always respond in Chinese.', | |
| expect(callParams.systemInstruction).toContain( | |
| 'You are an elite AI agent architect', | |
| ); | |
| expect(callParams.systemInstruction).toContain( | |
| 'You MUST always answer in Chinese.', | |
| ); |
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Added in e0c28da. The subagent output-language test now follows the same pattern: it verifies the original You are an elite AI agent architect system prompt remains present and the configured language preference is appended.
aaf6883 to
e0c28da
Compare
|
Updated in e0c28da for the latest review suggestions. Changes made:
Validation run locally on Windows: npm run test --workspace=packages/core -- src/utils/sideQuery.test.ts src/tools/web-fetch.test.ts src/utils/subagentGenerator.test.ts -t "output language|JSON system instruction"
npm run test --workspace=packages/core -- src/utils/sideQuery.test.ts src/tools/web-fetch.test.ts src/utils/subagentGenerator.test.ts
npx prettier --check packages/core/src/utils/sideQuery.test.ts packages/core/src/tools/web-fetch.test.ts packages/core/src/utils/subagentGenerator.test.ts
npx eslint packages/core/src/utils/sideQuery.test.ts packages/core/src/tools/web-fetch.test.ts packages/core/src/utils/subagentGenerator.test.ts
npm run typecheck --workspace=packages/core
npm run lint --workspace=packages/core
git diff --check |
wenshao
left a comment
There was a problem hiding this comment.
R3: All R2 suggestions addressed. No new issues found. ✅ — qwen3.7-max via Qwen Code /review
wenshao
left a comment
There was a problem hiding this comment.
R4 incremental review (delta since R3: sideQuery.ts memoization + override sentence, sideQuery.test.ts new tests). The memoization and override sentence address the remaining R1 Suggestions. One new Suggestion below regarding cache invalidation for long-running sessions.
— qwen3.7-max via Qwen Code /review
| 'User output language preference from output-language.md:'; | ||
| const OUTPUT_LANGUAGE_PREFERENCE_OVERRIDE = | ||
| 'This preference overrides any earlier language-selection rule in this system instruction.'; | ||
| const outputLanguagePreferenceCache = new Map< |
There was a problem hiding this comment.
[Suggestion] outputLanguagePreferenceCache is a module-level Map with no invalidation path. When a user runs /language mid-session, languageCommand.ts rewrites the file and refreshes the main conversation's system instruction — but this cache still holds the old resolved Promise. All 7 respectOutputLanguagePreference: true consumers (session title, recap, tool-use summary, follow-up suggestions, web-fetch, subagent generation, Arena) will serve the stale language rule for the remainder of the process.
The new memoization test (sideQuery.test.ts) correctly asserts this behavior, but it codifies a regression for daemon mode or any long-lived session where the user changes language.
Suggested fix: Export a clearOutputLanguagePreferenceCache() function and call it from the /language command handler (alongside the existing config.refreshHierarchicalMemory() call), or scope the cache to the Config instance lifetime rather than the module:
export function clearOutputLanguagePreferenceCache(): void {
outputLanguagePreferenceCache.clear();
}Then in languageCommand.ts:
import { clearOutputLanguagePreferenceCache } from '../utils/sideQuery.js';
// ...after updateOutputLanguageFile():
clearOutputLanguagePreferenceCache();— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Fixed in 28aee11. Side-query output-language caching now invalidates on file signature changes, and /language output ... explicitly clears the cached path after rewriting output-language.md. Added regressions for both the side-query refresh and the live /language command cache clear.
wenshao
left a comment
There was a problem hiding this comment.
R5 incremental review (delta since R4: cache invalidation via stat-based signatures + clearOutputLanguagePreferenceCache export). CI 10/10 all pass. 75 tests pass.
Missed caller (out of diff scope): packages/cli/src/ui/commands/summaryCommand.ts:107 calls runSideQuery to generate a user-visible project summary (displayed in TUI and saved to .qwen/PROJECT_SUMMARY.md) but does not set respectOutputLanguagePreference: true. If the user configures output language to Chinese, the project summary would still be generated in English. Consider adding respectOutputLanguagePreference: true to match the PR's stated goal.
— qwen3.7-max via Qwen Code /review
|
|
||
| // Update the rule file with the resolved language | ||
| updateOutputLanguageFile(settingValue); | ||
| clearOutputLanguagePreferenceCache(config?.getOutputLanguageFilePath?.()); |
There was a problem hiding this comment.
[Suggestion] Path mismatch between write target and cache invalidation target.
updateOutputLanguageFile(settingValue) writes to the global path (~/.qwen/output-language.md, hardcoded in languageUtils.ts), but clearOutputLanguagePreferenceCache(config?.getOutputLanguageFilePath?.()) clears the cache for the config's path, which may be a project-level path (<cwd>/.qwen/output-language.md). When a project-level file exists, the write modifies the global file while side queries re-read the unchanged project-level file — the user's /language command has no visible effect on side-query outputs.
Additionally, when config?.getOutputLanguageFilePath?.() returns undefined (e.g., config lacks the method), clearOutputLanguagePreferenceCache(undefined) fires the clear-all branch, wiping every cached path — including entries from other sessions in daemon mode.
| clearOutputLanguagePreferenceCache(config?.getOutputLanguageFilePath?.()); | |
| clearOutputLanguagePreferenceCache(); |
Using the no-arg version clears all entries as a belt-and-suspenders approach. The stat-based invalidation in readOutputLanguagePreference handles correctness automatically on the next call, so the only cost is one extra readFile per session.
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Addressed in the current branch. /language output now calls clearOutputLanguagePreferenceCache() without a path after updateOutputLanguageFile(), so project/global path differences cannot leave a stale side-query language preference cached.
| } | ||
| >(); | ||
|
|
||
| export function clearOutputLanguagePreferenceCache(filePath?: string): void { |
There was a problem hiding this comment.
[Suggestion] This new public API has two branches (specific-path deletion and clear-all) but no behavioral unit tests. It is only verified via vi.fn() mock in languageCommand.test.ts, which confirms the function was called but not that it works.
Consider adding tests in sideQuery.test.ts:
- Specific-path deletion: populate cache via
runSideQuery, callclearOutputLanguagePreferenceCache(path), verify next call re-reads the file. - Clear-all: populate cache with entries, call
clearOutputLanguagePreferenceCache(), verify all entries are removed.
Also consider adding a JSDoc note that this function is a performance hint — stat-based invalidation in readOutputLanguagePreference handles correctness automatically.
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Covered in the current branch. sideQuery.test.ts now exercises both cache-clear branches: specific-path deletion preserves other cached entries, and no-argument clearing refreshes all cached preferences. Verified with npm run test --workspace=packages/core -- src/utils/sideQuery.test.ts.
| ], | ||
| }, | ||
| ], | ||
| respectOutputLanguagePreference: true, |
There was a problem hiding this comment.
[Suggestion] Output language rule is appended twice to this side query. The main session's systemInstruction (extracted at line 99-105 via getGenerationConfig().systemInstruction) already embeds output-language.md content as part of userMemory (loaded through getExtensionContextFilePaths() → loadServerHierarchicalMemory()). Setting respectOutputLanguagePreference: true causes applyOutputLanguagePreference in sideQuery.ts to read the same file and append the language rule a second time.
The model receives the instruction duplicated in every summary generation — redundant tokens and a confusing artifact for anyone debugging system instructions.
Consider either:
- Removing
respectOutputLanguagePreference: truehere since the carried-overchatSystemInstructionalready contains the rule, or - Stripping the output-language section from
chatSystemInstructionbefore passing it, and keeping the explicit flag.
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Addressed in the current branch. summaryCommand no longer sets respectOutputLanguagePreference: true for the project-summary side query because it carries over the main chat system instruction, which already includes the configured output-language rule through user memory.
| expect(secondCallArg.systemInstruction).not.toContain('Chinese'); | ||
| }); | ||
|
|
||
| it('clears one cached output language preference by file path', async () => { |
There was a problem hiding this comment.
[Suggestion] This test creates only a single file and verifies its cache was cleared, but does not include a negative assertion — it never checks that a different cached file's entry survived the selective clear. If clearOutputLanguagePreferenceCache(file) were accidentally changed to call .clear() instead of .delete(filePath), this test would still pass.
Consider creating a second file, caching it via runSideQuery, then after clearOutputLanguagePreferenceCache(firstFile), running a side query for the second file and asserting it still contains the original (stale) preference — proving its cache was not disturbed. This would make the two new tests properly complementary (one proves selective clear, the other proves blanket clear).
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Covered in the current branch. The specific-path cache test now populates two files, clears only the first path, and asserts the second cached preference survives instead of being globally cleared. Verified with npm run test --workspace=packages/core -- src/utils/sideQuery.test.ts.
| signature, | ||
| preference: readFile(filePath, 'utf8') | ||
| .then((content) => content.trim() || undefined) | ||
| .catch(() => undefined), |
There was a problem hiding this comment.
[Suggestion] Transient readFile failure permanently caches undefined until the file's mtime or size changes.
When stat() succeeds but readFile() fails (EMFILE, permission flap, transient IO error), .catch(() => undefined) stores a Promise<undefined> as cached.preference with the current signature. On subsequent calls, stat() returns the same mtimeMs:size, the cache hits, and the language preference is silently lost.
| .catch(() => undefined), | |
| .catch(() => { | |
| outputLanguagePreferenceCache.delete(filePath); | |
| return undefined; | |
| }), |
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Addressed in the current branch. readOutputLanguagePreference() now deletes the cache entry when readFile() fails for the current signature, so transient read failures do not permanently cache undefined.
Verification Report — PR #4519Reviewer: wenshao 1. Build & Type Check
2. Unit TestsPR-related tests (all pass):
Full core suite: 9438 passed, 1 failed, 3 skipped (344 files). The single failure ( 3. Code ReviewArchitecture (sideQuery.ts)The design is clean and well-layered:
Callers — Correct Opt-in7 side-query callers correctly opt in with
Test CoverageThe new tests thoroughly cover:
4. Issues Found
5. SummaryThis is a well-designed, well-tested fix. The opt-in Recommendation: ✅ Approve. Verified locally on 2026-05-28 |
|
Updated in b86309b for the transient read failure cache issue. If Validation run locally on Windows: npm run test --workspace=packages/core -- src/utils/sideQuery.test.ts -t "output language"
npm run test --workspace=packages/core -- src/utils/sideQuery.test.ts
npx prettier --check packages/core/src/utils/sideQuery.ts packages/core/src/utils/sideQuery.test.ts
npx eslint packages/core/src/utils/sideQuery.ts packages/core/src/utils/sideQuery.test.ts
npm run typecheck --workspace=packages/core |
|
CI note for the latest run: the macOS failure appears unrelated to this PR's side-query change. The failed test is in This PR only changes npm run test --workspace=packages/core -- src/utils/sideQuery.test.ts -t "output language"
npm run test --workspace=packages/core -- src/utils/sideQuery.test.ts
npm run test --workspace=packages/cli -- src/ui/components/messages/AskUserQuestionDialog.test.tsx -t "keeps bare k/j"I tried to rerun the failed workflow, but GitHub rejected it because I do not have repository admin rights. Could a maintainer rerun the failed macOS job? |
BZ-D
left a comment
There was a problem hiding this comment.
LGTM. The respectOutputLanguagePreference plumbing is thorough — correctly wired into session title, recap, tool-use summary, suggestion generator, web fetch, subagent generator, and arena manager. The stat-based cache with mtime+size signature is a good balance of freshness vs performance, and clearOutputLanguagePreferenceCache gives the /language command an explicit flush path. Edge cases (missing file, blank content, transient read failures) are all covered.
|
This PR has merge conflicts with main — please rebase or merge main to resolve before merging. |
|
Merged current Resolution notes:
Validation:
|
| const promptId = options.promptId ?? buildDefaultPromptId(options.purpose); | ||
| const requestConfig = applyThinkingDefault(options.config); | ||
| const systemInstruction = appendSystemInstruction( | ||
| const systemInstruction = await applyOutputLanguagePreference( |
There was a problem hiding this comment.
[Suggestion] Missed caller: packages/cli/src/acp-integration/session/rewrite/LlmRewriter.ts:117 calls runSideQuery without respectOutputLanguagePreference: true. Before this PR, getOutputLanguageInstruction(config) was called unconditionally for ALL side queries, so LlmRewriter received the language rule. After the opt-in gate, it silently loses language enforcement. LlmRewriter produces user-visible ACP progress updates — a user who configured Chinese output will see Chinese titles, recaps, summaries, and subagent specs, but ACP rewrites will come back in whatever language the model picks.
| const systemInstruction = await applyOutputLanguagePreference( | |
| const result = await runSideQuery(this.config, { | |
| purpose: 'acp-rewrite', | |
| model, | |
| maxAttempts: 1, | |
| systemInstruction: this.prompt, | |
| respectOutputLanguagePreference: true, |
— qwen-latest-series-invite-beta-v38 via Qwen Code /review
There was a problem hiding this comment.
Covered in the current branch. LlmRewriter now calls runSideQuery with respectOutputLanguagePreference: true, and LlmRewriter.test.ts asserts that option for user-visible rewrites. Verified with npm run test --workspace=packages/cli -- src/acp-integration/session/rewrite/LlmRewriter.test.ts.
| describe('runSideQuery', () => { | ||
| let mockBaseLlmClient: BaseLlmClient; | ||
| let mockConfig: Config; | ||
| let abortController: AbortController; |
There was a problem hiding this comment.
[Suggestion] No negative test for the opt-in gate. Every test in this file sets respectOutputLanguagePreference: true. There is no test verifying that when the flag is omitted (or false) AND a valid output-language.md file exists, the system instruction is passed through unchanged. This is the central behavioral change of the PR — old code always appended, new code is opt-in — yet no test covers the false/omitted path.
A regression that re-introduces unconditional language injection would go undetected.
| let abortController: AbortController; | |
| beforeEach(() => { | |
| clearOutputLanguagePreferenceCache(); | |
| abortController = new AbortController(); | |
| mockBaseLlmClient = { | |
| generateJson: vi.fn(), | |
| generateText: vi.fn(), | |
| } as unknown as BaseLlmClient; | |
| mockConfig = { | |
| getBaseLlmClient: vi.fn(() => mockBaseLlmClient), | |
| getModel: vi.fn(() => 'qwen3-coder-plus'), | |
| getFastModel: vi.fn(() => 'fast-model'), | |
| getOutputLanguageFilePath: vi.fn(), | |
| } as unknown as Config; | |
| }); |
Add a test like:
it('does not append language when respectOutputLanguagePreference is omitted', async () => {
mockTextResult('ok');
vi.mocked(mockConfig.getOutputLanguageFilePath).mockReturnValue(
writeOutputLanguageFile('You MUST always respond in Chinese.'),
);
await runSideQuery(mockConfig, {
purpose: 'p',
contents: [{ role: 'user', parts: [{ text: 'q' }] }],
abortSignal: abortController.signal,
systemInstruction: 'Summarize the tool batch.',
});
expect(vi.mocked(mockBaseLlmClient.generateText).mock.calls[0][0].systemInstruction)
.toBe('Summarize the tool batch.');
});— qwen-latest-series-invite-beta-v38 via Qwen Code /review
There was a problem hiding this comment.
Covered in the current branch. sideQuery.test.ts includes does not append output language preference when opt-in is omitted, which writes a valid output-language.md and verifies the system instruction is unchanged without respectOutputLanguagePreference. Verified with npm run test --workspace=packages/core -- src/utils/sideQuery.test.ts.
| if (systemInstruction === undefined) return outputLanguageInstruction; | ||
| text: string, | ||
| ): string | Part | Part[] | Content { | ||
| if (!systemInstruction) { |
There was a problem hiding this comment.
[Suggestion] appendSystemInstructionText handles five systemInstruction variants (string, Part, Part[], Content, undefined), but the Part[] and Content branches are untested — every test in this PR passes a string or undefined. If a future caller passes a non-string systemInstruction with respectOutputLanguagePreference: true, these three branches (Part[] append, Content.parts append, single-Part wrapping) have no regression protection.
Consider adding at least one test per branch, e.g.:
it('appends to Part[] system instructions', async () => {
mockTextResult('ok');
vi.mocked(mockConfig.getOutputLanguageFilePath).mockReturnValue(
writeOutputLanguageFile('Respond in Chinese.'),
);
await runSideQuery(mockConfig, {
purpose: 'p',
contents: [{ role: 'user', parts: [{ text: 'q' }] }],
abortSignal: abortController.signal,
systemInstruction: [{ text: 'existing prompt' }],
respectOutputLanguagePreference: true,
});
const parts = vi.mocked(mockBaseLlmClient.generateText).mock.calls[0][0].systemInstruction;
expect(parts).toEqual([
{ text: 'existing prompt' },
expect.objectContaining({ text: expect.stringContaining('Respond in Chinese.') }),
]);
});— qwen-latest-series-invite-beta-v38 via Qwen Code /review
There was a problem hiding this comment.
Covered in the current branch. sideQuery.test.ts now covers Part[], Content, and single-Part system instructions with respectOutputLanguagePreference: true, verifying the appended preference is preserved in the correct shape. Verified with npm run test --workspace=packages/core -- src/utils/sideQuery.test.ts.
d8db7cb to
6ae137a
Compare
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
DragonnZhang
left a comment
There was a problem hiding this comment.
Review Summary
Verdict: No new findings. This PR has been reviewed exhaustively (8+ prior review rounds, 23 inline comments, 9 issue comments). After reading all 24 changed files in full and cross-referencing every existing comment, I found no high-confidence issues that are not already tracked.
What was checked
- sideQuery.ts core logic:
respectOutputLanguagePreferenceopt-in gate,readOutputLanguagePreferencecaching with stat-based invalidation,appendSystemInstructionTextfor all 5systemInstructionvariants,clearOutputLanguagePreferenceCacheselective and full clear,isJsonOptionstype guard, concurrent-call safety via sharedPromisededuplication. Implementation is correct. - All 11 caller sites: sessionTitle, sessionRecap, toolUseSummary, suggestionGenerator, web-fetch, subagentGenerator, ArenaManager, DataProcessor (both
analyzeSessionandgenerateQualitativeInsights), LlmRewriter, summaryCommand (correctly skipped with explanatory comment). Each user-visible caller opts in; internal/classifier callers do not. - Test coverage: ~512 new lines of tests covering positive opt-in for text and JSON modes, negative paths (missing file, blank file, undefined path),
Part[]/Content/single-Partsystem instruction variants, cache refresh on file change, selective and full cache clear, transient read failure recovery, and opt-in gate negative test. Coverage is thorough. - languageCommand.ts:
clearOutputLanguagePreferenceCache()called afterupdateOutputLanguageFile()with clear-all (no path arg) to handle global-vs-project path differences. Correct. - config.ts:
getOutputLanguageFilePathrelocated for grouping; no behavioral change.
CI status
CI is currently failing (Lint, CodeQL, Test, Post Coverage Comment, Classify PR). Some failures appear environment-related (the macOS AskUserQuestionDialog test failure mentioned by the author). Approve downgraded to Comment per presubmit policy.
Open items from prior reviewers
Several open inline comments remain valid and tracked:
- Path mismatch between
updateOutputLanguageFile(global) andgetOutputLanguageFilePath(project-scoped) — mitigated by clear-all strategy sessionRecap.test.tsreduced from end-to-end to builder-only testingsummaryCommand.tsdouble-injection concern — addressed by inline comment explaining the carried system instruction already includes the language rule
— qwen-code via Qwen Code /review
|
Updated this branch against current The
Local validation on Windows:
Note: full |
DragonnZhang
left a comment
There was a problem hiding this comment.
Incremental Review (R2) — delta since R1
HEAD: 8730d9f06579 (two new commits: merge from origin/main + test: normalize dev launcher paths on Windows)
CI: 13/13 checks pass.
Delta reviewed
scripts/tests/dev.test.js— AddsnormalizePathhelper that converts Windows backslashes to forward slashes before substring assertions. Correct fix:String(filePath).endsWith('node_modules/tsx/dist/cli.mjs')would fail on native Windows paths wherefilePathcontains backslashes. The refactor to destructurespawnMock.mock.calls[0]and assert each field individually is clearer than the prior monolithictoHaveBeenCalledWith.
Verdict
No high-confidence findings in the delta. The Windows path normalization is a clean, well-scoped test fix.
The six stale inline comments from prior reviewers (IDs 3300572305, 3301123073, 3307003468, 3308736776, 3312297210, 3337943627) are all addressed in the current code:
- sessionRecap.ts language override — the appended text includes an explicit override sentence.
- JSON negative coverage — mirrored text-mode negative tests now present.
- languageCommand path mismatch —
clearOutputLanguagePreferenceCache()(no-arg) correctly clears the entire map. - summaryCommand double injection — correctly omits
respectOutputLanguagePreference: truesince the carried-over system instruction already embeds the rule. - Transient readFile caching — the
.catchhandler now deletes the cache entry on read failure. - sessionRecap test refactor — the
generateSessionRecapend-to-end test is retained withrunSideQuerymocked;buildSessionRecapSideQueryOptionsgets separate unit coverage.
— qwen3-coder via Qwen Code /review
DragonnZhang
left a comment
There was a problem hiding this comment.
Review: APPROVE
Scope: 30 files, +1151/-101 — adds opt-in respectOutputLanguagePreference to runSideQuery for user-visible side queries (session title, recap, tool-use summary, suggestions, web-fetch summaries, subagent artifacts, arena summaries, /insight reports, LLM rewriter). Includes stat-based file cache with invalidation on language change. Bundles two small fixes (cronScheduler creation-minute guard, chatCompression hard-threshold formula) from upstream main merges.
CI: All checks pass (Lint, Test x3 platforms, CodeQL).
Findings: No high-confidence bugs, security issues, or critical logic errors identified.
Analysis
- Side-query opt-in coverage: Correctly applied to all 10 user-visible call sites (sessionTitle, sessionRecap, toolUseSummary, suggestionGenerator, web-fetch, subagentGenerator, ArenaManager, LlmRewriter, DataProcessor x2). Intentionally omitted from internal classifiers (permissions/classifier, relevanceSelector, forget, nextSpeakerChecker, chatCompression) and from summaryCommand (which already carries the language rule through user memory).
- Caching: Stat-based signature (
mtimeMs:size) with module-level Map keyed by file path. Race-safe: concurrent reads share the same Promise; readFile failures clean up only the current entry.clearOutputLanguagePreferenceCache()called on language change in languageCommand, clearing all entries to handle global/project path differences. appendSystemInstructionText: Correctly handles all system instruction shapes (string, Part, Part[], Content with parts array, and single Part fallback).- Session recap refactor: Extracted
buildSessionRecapSideQueryOptionsfor testability without behavioral change. - Bundled fixes from main merges: cronScheduler
lastFiredAtinitialization and chatCompressionhardthreshold formula — both sound and already validated by their own PRs.
|
@Jerry2003826 heads up — this PR currently has merge conflicts with Conflicting files:
The rest merges cleanly. Thanks! 中文@Jerry2003826 提个醒 —— 这个 PR 目前和 冲突文件:
其余文件可以自动合并。谢谢! |
Merge upstream/main into the side-query output language branch and reconcile four conflicting files: - languageCommand: use writeOutputLanguageAndRegisterPath from main while keeping clearOutputLanguagePreferenceCache on language change - languageCommand.test: align mocks with config-bound path registration - config: keep get/setOutputLanguageFilePath near getUserMemory and remove the duplicate getter - sessionRecap: keep buildSessionRecapSideQueryOptions opt-in path with upstream debug logging Co-authored-by: JerryLee <Jerry2003826@users.noreply.github.com>
Verification: honor output language in user-visible side queriesVerdict: PASS for the stated feature — verified end-to-end against a real model. One material caveat: this PR also bundles an undisclosed second change (DeepSeek prompt-cache work) that contradicts the PR body's "No unrelated … provider … changes." That's a merge-hygiene call for you, not a defect in the output-language feature. I built the PR in an isolated worktree (real Claim (my read of the diff): Method: worktree Steps
Sample (live TUI, French throughout): FindingsThe output-language feature works as intended. These are for your merge decision:
Not runtime-exercised: 6 of the 9 user-visible callers (verified by code audit, same 中文版本(点击展开)验证:让用户可见的 side query 遵守输出语言结论:通过(PASS)(针对 PR 声明的功能,已对真实模型端到端验证)。一个重要提醒: 本 PR 还夹带了一个未声明的第二项改动(DeepSeek 提示词缓存相关),这与 PR 描述中「No unrelated … provider … changes(无无关的 provider 改动)」相矛盾。这是合并卫生层面的判断,交给你定夺,并非输出语言功能本身的缺陷。 我在隔离 worktree 中构建本 PR(真实 被验证的声明: 方法: worktree 步骤
实时 TUI(全程法语): 观察输出语言功能符合预期。以下供合并决策参考:
未在运行时验证: 9 个用户可见调用中的 6 个(经代码审计,与已驱动的 3 个机制相同);DeepSeek 专属的工具排序触发(把关于 |
wenshao
left a comment
There was a problem hiding this comment.
No review findings. Downgraded from Approve to Comment: CI still running. — qwen3.7-max via Qwen Code /review
✅ Local Linux verification — author's commands, a real-execution harness, and a live tmux side-query sessionVerified this PR three ways on Linux (the table marks macOS/Linux as CI-only), including the live tmux Before/After evidence requested above. Posting as a merge reference. Setup
1) Author's Reviewer Test Plan reproduced — all green
My earlier review's three missed user-visible callers now all opt in: 2) Real-execution harness against the compiled
|
Minimal-change assessment: this PR overshootsTL;DR: The bug this PR targets (#4494) is already fixed on
|
main (after #4636) |
this PR | |
|---|---|---|
| Mechanism | runSideQuery unconditionally appends output-language.md to every side query |
opt-in: append only when respectOutputLanguagePreference: true |
| Size | ~20 lines | +1095 / −71, 23 files (12 source + 11 test) |
| The 4 features in #4494 (recap / title / tool-use summary / suggestions) | already honor output language | behavior unchanged (just default-on → explicit-on) |
The net behavioral delta is small
After opting in all 10 user-visible callers, the only real change vs main is that ~6–7 internal structured queries stop receiving the append: permissions/classifier (×2), nextSpeakerChecker, relevanceSelector, forget, chatCompression, project-summary.
I checked how those results are consumed — they're constrained by a JSON schema (shouldBlock: boolean, next_speaker: 'user'|'model', ID lists). The unconditional append on main does not break them; at most it renders an internal reasoning free-text field in another language. So the opt-in inversion is a tidiness change, not a correctness fix.
The one genuine improvement
sessionTitle / sessionRecap still carry a Match the dominant language of the conversation line that conflicts with the appended preference, so the model can still fall back to English. This PR removes those lines and adds "This preference overrides any earlier language-selection rule". That is the one defensible user-facing win — and it's ~4 lines, independent of the opt-in flip or the cache.
A new risk this PR introduces
The new cache keys on mtimeMs:size. An external edit to output-language.md of the same byte length within the mtime-granularity window can serve stale content. The clearOutputLanguagePreferenceCache() call in languageCommand only patches the /language path; edits made outside it can still go stale. main re-reads the file every time and never has this problem — this is complexity added to optimize, then more code to patch the hole the optimization opened.
Separately, the opt-in direction is a maintainability regression: any future user-visible side query silently regresses to English unless the author remembers to opt in, whereas main's default-on makes new callers correct automatically.
What the minimal version looks like
If the desired gain is "don't append to internal queries," the minimal path is keep default-on + add an opt-out:
- Add
skipOutputLanguagePreference?: boolean(defaultfalse) torunSideQuery; - Set it on the ~6–7 internal callers;
- Remove the conflicting hint in title/recap and add the "overrides" wording.
That's roughly 8 files / 30–40 lines, no new public API, no cache, no languageCommand change — and it annotates fewer call sites (6–7) than this PR's opt-in (10), while preserving the safe default.
Bottom line: the opt-in + cache redesign is a reasonable long-term design, but #4494 is already fixed by #4636 in ~20 lines. Replacing it with ~1095 lines for marginal net gain plus a new staleness risk overshoots the minimal-change bar. The ~4-line conflicting-hint cleanup is the part worth keeping.
中文版
最小改动评估:这个 PR 超标了
结论速览: 这个 PR 想修的 bug(#4494)已经在 main 上被 #4636 修复(1c48e412,"apply output language to side queries"),#4494 已于 2026-05-31 以 COMPLETED 关闭。所以它不是在修一个未解决的 bug,而是把一个已经能工作的机制重写成约 1095 行的重构;相对 main 的净用户可见收益很边际,还引入了一个 main 没有的缓存陈旧风险。
main 与本 PR 对比
main(#4636 之后) |
本 PR | |
|---|---|---|
| 机制 | runSideQuery 对所有 side query 无条件追加 output-language.md |
opt-in:只有传 respectOutputLanguagePreference: true 才追加 |
| 体量 | ~20 行 | +1095 / −71,23 个文件(12 源文件 + 11 测试) |
| #4494 里的 4 个功能(recap / title / tool-use summary / suggestion) | 已经遵守输出语言 | 行为不变(只是从"默认开"变成"显式开") |
净行为差异其实很小
把 10 个用户可见调用点全部 opt-in 之后,相对 main 唯一真正的变化是:约 6~7 个内部结构化 query 不再被追加语言指令——permissions/classifier(×2)、nextSpeakerChecker、relevanceSelector、forget、chatCompression、project-summary。
我核对了这些返回值的消费方:它们都由 JSON schema 约束(shouldBlock: boolean、next_speaker: 'user'|'model'、ID 列表)。main 上的无条件追加并不会破坏它们,最多让内部的 reasoning 自由文本变成另一种语言。所以 opt-in 翻转改的是「整洁度」,不是「正确性」。
唯一一个真正的改进
sessionTitle / sessionRecap 里仍留着一句 Match the dominant language of the conversation,它和追加进来的语言偏好互相冲突,模型仍可能回退到英文。这个 PR 删掉了那两句、并加上了 "This preference overrides any earlier language-selection rule"。这是它唯一站得住脚的用户可见改进——而且只有约 4 行,与 opt-in 翻转、缓存都无关。
它新引入的风险
新缓存以 mtimeMs:size 作为签名。如果在 mtime 粒度窗口内对 output-language.md 做一次等字节长度的外部编辑,缓存可能发陈旧内容。languageCommand 里的 clearOutputLanguagePreferenceCache() 只覆盖了 /language 这条路径,走它之外的编辑仍可能 stale。main 每次都重新读文件,根本没有这个问题——这是「为优化引入复杂度,再写代码去补优化挖的坑」。
另外,opt-in 这个方向本身是可维护性倒退:以后任何新增的用户可见 side query,只要作者忘了 opt-in 就会静默回退成英文;而 main 的「默认开」让新调用点自动正确。
最小版本应该长什么样
如果想要的收益是「不要给内部 query 追加语言」,最小做法是保持默认开 + 加一个 opt-out:
- 给
runSideQuery加skipOutputLanguagePreference?: boolean(默认false); - 在那 6~7 个内部调用点设置它;
- 删掉 title/recap 里冲突的 hint,并加上 "overrides" 文案。
这样大约 8 个文件 / 30~40 行,不需要新的公共 API、不需要缓存、不需要碰 languageCommand——而且标注的调用点(6~7 个)比本 PR 的 opt-in(10 个)还更少,同时保住了「默认安全」这个属性。
一句话: opt-in + 缓存的重设计作为长期设计是合理的,但 #4494 已被 #4636 用约 20 行修好。用约 1095 行去置换它、净收益边际、还新引入陈旧风险——以「最小改动原则」衡量是超标的。真正值得保留的是删冲突 hint 那约 4 行。
Replace the oversized opt-in refactor with the minimal approach from review feedback: keep main's default-on language append, add skipOutputLanguagePreference for internal structured queries, and remove the conflicting English/Chinese-only hints from session title/recap prompts so the appended preference can take effect. Co-authored-by: JerryLee <Jerry2003826@users.noreply.github.com>
dc3cd09 to
bf77c7b
Compare
🧪 Local runtime verification (real CLI + mock provider, tmux)This change is prompt-construction routing, so I verified it on the real on-the-wire system instruction: a real TL;DR
Evidence 1 — Real-wire A/B (tmux, mock provider)Captured side-query system instructions from a live session with the preference configured:
The counterfactual is the clincher: I surgically deleted
Evidence 2 — Complete routing audit (every
|
skipOutputLanguagePreference: true (internal — 7) |
default = inject (user-visible — 9) |
|---|---|
permission_classifier_stage1 / stage2 |
session-title, session-recap |
auto-memory-recall, auto-memory-forget-selection |
tool-use-summary, web-fetch |
next-speaker |
arena-approach-summary, subagent-generator |
chat-compression |
prompt-suggestion, insight-session-analysis / -qualitative-generate |
project-summary |
acp-rewrite |
Both non-obvious calls check out:
acp-rewritestays inject — its prompt produces "user-friendly progress updates" (user-visible), so it should honor the language. ✓project-summaryskips — but it passessystemInstruction: chatSystemInstruction(the main session prompt, which already embeds the language rule), so skipping just avoids a duplicate; the summary still respects the language. ✓
Evidence 3 — Mechanism, the new override line, and test teeth
sideQuery.test.ts: 21 pass, including the new "skips output language when skipOutputLanguagePreference is true". Mutation check: makerunSideQueryignore the flag (always inject) → that test fails, so it genuinely pins the behavior.- The PR also adds "This preference overrides any earlier language-selection rule in this system instruction." to the injected block. It appears on the wire when injected (confirmed in the counterfactual), and it's what lets
session-title/session-recapsafely drop their own per-prompt language lines — the centralized rule now explicitly wins over any earlier one.
Evidence 4 — Description ≠ implementation (please fix before merge)
The PR body and its "Reviewer Test Plan" describe a different design than the code:
| PR body says | Actual diff |
|---|---|
opt-in flag respectOutputLanguagePreference: true |
opt-out flag skipOutputLanguagePreference: true (0 occurrences of the body's name) |
DataProcessor.ts opts the /insight queries in; DataProcessor.test.ts added |
DataProcessor.ts unchanged (insight injects by default); no such test |
sessionRecap.test.ts added |
not in the diff |
The commit message ("skip output language on internal side queries") matches the real code; the PR description appears to be from an earlier revision. The implementation is fine — just update the prose/test-plan so history and reviewers aren't misled.
Verdict
✅ Safe to merge once the description is refreshed. The routing is correct end-to-end: internal/structured queries (classifiers, memory selectors, next-speaker, compression, project-summary) no longer get told to answer in the user's language — which is what could corrupt their JSON/enum output (#4494) — while every user-visible summary still honors output-language.md.
🇨🇳 中文版(点击展开)
🧪 本地运行验证(真实 CLI + mock provider,tmux)
这个改动是 prompt 构造/路由,所以我在真实的线上 system instruction 上做了验证:用真实 qwen(基于本 PR 的 core 构建)连到一个会抓取每个 /v1/chat/completions body 的 mock OpenAI server,并配置了一个哨兵输出语言文件(~/.qwen/output-language.md → Klingon_SENTINEL_4519)。然后逐个 side query 检查注入的标记句 "Follow the user-visible output language preference below for this side query." 是否存在。基于 PR head bf77c7bb6。
结论速览
✅ 实现正确、路由正确。 内部/结构化 side query 不再收到输出语言规则;所有用户可见的仍然收到。一个提醒: PR 描述是过时的——它描述的是 opt-in 的
respectOutputLanguagePreference设计和DataProcessor.ts改动,而这些在 diff 里根本不存在(实际代码用的是 opt-out 的skipOutputLanguagePreference)。合并前请更新描述。
证据 1 —— 真实线上 A/B(tmux + mock provider)
在配置了语言偏好的真实会话里抓取的 side-query system instruction:
| Side query (purpose) | 类型 | 线上是否有输出语言标记 |
|---|---|---|
| 主 agent 轮 | 用户回复 | 有(来自主 prompt,不受影响) |
session-recap |
用户可见 | 有 ✓ |
chat-compression |
内部 | 没有 ✓ |
chat-compression(去掉 skip flag) |
反事实 | 又出现了 ⟵ 证明就是这个 flag 的作用 |
反事实是决定性的:我手术式删掉压缩调用点的 skipOutputLanguagePreference: true、重新构建,压缩请求上的标记(以及 PR 新增的 override 行和完整偏好块)又出现了——恢复后又消失。所以正是这个 flag 移除了注入。
session-recap 同时是回归检查:本 PR 删掉了 recap 自己的 "Match the dominant language…" 行,于是 recap 现在依赖集中式注入——线上确认它仍然拿到了。
证据 2 —— 完整路由审计(每一个 runSideQuery 调用点)
skipOutputLanguagePreference: true(内部 — 7) |
默认 = 注入(用户可见 — 9) |
|---|---|
permission_classifier_stage1 / stage2 |
session-title, session-recap |
auto-memory-recall, auto-memory-forget-selection |
tool-use-summary, web-fetch |
next-speaker |
arena-approach-summary, subagent-generator |
chat-compression |
prompt-suggestion, insight-session-analysis / -qualitative-generate |
project-summary |
acp-rewrite |
两个不那么显然的调用点也都没问题:
acp-rewrite保持注入——它的 prompt 产出 "user-friendly progress updates"(用户可见),应当遵守语言。✓project-summary跳过——但它传了systemInstruction: chatSystemInstruction(主会话 prompt,已经内嵌语言规则),所以跳过只是避免重复;摘要仍然遵守语言。✓
证据 3 —— 机制、新增 override 行、测试牙齿
sideQuery.test.ts:21 通过,含新增的 "skips output language when skipOutputLanguagePreference is true"。变异测试: 让runSideQuery忽略该 flag(恒注入)→ 该测试失败,说明它真的钉住了行为。- PR 还在注入块里加了 "This preference overrides any earlier language-selection rule in this system instruction."。注入时它会出现在线上(反事实里确认),并且正是它让
session-title/session-recap能安全删掉各自 prompt 里的语言行——集中式规则现在明确地压过任何更早的规则。
证据 4 —— 描述 ≠ 实现(合并前请修正)
PR 正文和它的 "Reviewer Test Plan" 描述的是和代码不一样的设计:
| PR 正文说 | 实际 diff |
|---|---|
opt-in flag respectOutputLanguagePreference: true |
opt-out flag skipOutputLanguagePreference: true(正文那个名字 0 次出现) |
DataProcessor.ts 让 /insight opt in;新增 DataProcessor.test.ts |
DataProcessor.ts 未改动(insight 默认注入);没有这个测试 |
新增 sessionRecap.test.ts |
不在 diff 里 |
commit message("skip output language on internal side queries")和真实代码一致;PR 描述看起来是早期版本残留。实现没问题——只需更新文字/测试计划,免得误导历史和审阅者。
结论
✅ 更新描述后即可合并。 路由端到端正确:内部/结构化查询(分类器、记忆选择器、next-speaker、压缩、project-summary)不再被要求用用户语言回答——这正是会破坏它们 JSON/enum 输出的原因(#4494)——而每个用户可见的摘要仍然遵守 output-language.md。
Method: real qwen (bundled from this PR's core) + a mock OpenAI server capturing every request body, in tmux; sentinel ~/.qwen/output-language.md; per-side-query marker check; surgical flag-removal counterfactual; complete call-site audit; sideQuery.test.ts + a stub-mutation teeth check. PR head bf77c7bb6.
wenshao
left a comment
There was a problem hiding this comment.
No review findings. Downgraded from Approve to Comment: CI still running. The opt-out flip is clean — caller categorization is correct (internal callers skip, user-visible callers get the preference by default), override sentence resolves manual-language conflicts, and the test covers both inject and skip paths. — qwen3.7-max via Qwen Code /review
DragonnZhang
left a comment
There was a problem hiding this comment.
Well-designed fix: skipOutputLanguagePreference flag prevents the user's output-language.md rule from being injected into internal machine-to-machine side queries (permission classifier, compression, memory recall, next-speaker checker, session title, recap). The flag defaults to false so new user-visible side queries still honor the preference. Removed redundant "match the dominant language" instructions from recap/title prompts. Test coverage validates the skip behavior. CI green. LGTM ✅ — claude-opus-4-6 via Qwen Code /review
|
@qwen-code /triage |
|
Thanks for the PR! Template looks good ✓ — all required sections present. On direction: the bug (#4494) was already fixed on On approach: the current diff (+55/-4 across 10 files) is tight and focused. Each change serves the stated goal. Two minor notes:
Neither is blocking. Moving on to code review. 🔍 中文说明感谢贡献! 模板完整 ✓ 方向:底层 bug (#4494) 已被 #4636(2026-05-31 合并)修复——对所有 side query 无条件注入输出语言。本 PR 是在此基础上的优化:为不产生用户可见文本的内部 side query(分类器、压缩、记忆选择器、next-speaker 检查器)增加 方案:当前 diff(+55/-4,10 个文件)紧凑且聚焦,每处改动都服务于既定目标。两个小问题:
两者都不构成阻塞。进入代码审查 🔍 — Qwen Code · qwen3.7-max |
Code ReviewIndependent proposal (before reading diff): add an opt-out flag to The PR matches this proposal exactly. Caller categorization is correct: Correctly skipped (internal, not user-visible):
Correctly getting language by default (user-visible):
No critical blockers. No AGENTS.md violations. The override sentence added to Test ResultsAll targeted tests pass on the PR branch: Smoke Test (tmux)CLI starts and exits cleanly. The help display is from an incorrect flag name in my test, not a PR issue. The prompt-construction change is verified by unit tests (the new test asserts the system instruction does NOT contain the output language instruction when 中文说明代码审查独立方案(看 diff 前):为 PR 与方案完全一致。调用者分类正确: 正确跳过(内部,非用户可见): classifier(权限分类两阶段)、chatCompressionService、forget(记忆选择)、relevanceSelector(自动记忆召回)、nextSpeakerChecker、summaryCommand(使用已含语言偏好的主 agent systemInstruction)。 正确默认获取语言指令(用户可见): sessionRecap(移除旧提示 ✅)、sessionTitle(移除旧提示 ✅)。 无阻塞性问题,无 AGENTS.md 违规。 测试结果所有目标测试在 PR 分支上通过:sideQuery 21 项、classifier 23 项、DataProcessor 45 项、TypeScript 类型检查通过、ESLint 通过、构建通过。 冒烟测试CLI 正常启动和退出,无崩溃或运行时错误。prompt 构造变更由单元测试验证。 — Qwen Code · qwen3.7-max |
ReflectionThis PR went through 8+ review rounds and has been reduced to exactly what it should be: a focused refinement on top of #4636's unconditional language injection. The diff is tight (+55/-4, 10 files), every change earns its place, and the caller categorization is correct. The core value: internal side queries (classifiers, compression, memory selectors) no longer receive a language instruction they don't need, saving tokens and avoiding potential model confusion. The redundant "Match the dominant language" hints in sessionRecap and sessionTitle are cleaned up now that automatic injection handles it. The override sentence in My independent proposal matched this PR's approach exactly. No simpler path was missed. The two notes from Stage 1 (stale PR body mentioning Verdict: Approve. Ship it. ✅ 中文说明反思本 PR 经历了 8+ 轮审查,已精简为应有的形态:在 #4636 无条件语言注入基础上的聚焦优化。diff 紧凑(+55/-4,10 个文件),每处改动都有必要,调用者分类正确。 核心价值:内部 side query(分类器、压缩、记忆选择器)不再接收不需要的语言指令,节省 token 并避免模型混淆。sessionRecap 和 sessionTitle 中冗余的"匹配主要语言"提示已被清理。
独立方案与本 PR 完全一致,未发现更简路径。 Stage 1 提到的两个问题(PR 描述过时、override 语句未说明)是表面问题——不影响正确性或合并就绪度。作者可在后续提交或合并时更新描述。 结论:批准。 可以合并。✅ — Qwen Code · qwen3.7-max |
qwen-code-ci-bot
left a comment
There was a problem hiding this comment.
LGTM, looks ready to ship. ✅
What this PR does
Honors the configured output language for user-visible side queries without duplicating the language instruction in project-summary prompts.
Why it's needed
Some side-query results are shown directly to users, such as session titles, session recaps, tool-use summaries, suggestions, web-fetch summaries, generated subagent artifacts, arena summaries, and
/insightreports. These should followoutput-language.md. Internal structured side queries and summary prompts that already carry the language instruction should not receive duplicate language rules.Reviewer Test Plan
How to verify
Review the changed
runSideQuerycall sites and confirm only user-visible side queries passrespectOutputLanguagePreference: true, while internal classifier/selector-style calls and project-summary prompts do not. Then run the targeted checks below.npm run test:ci --workspace=packages/core -- src/permissions/classifier.test.ts src/services/sessionRecap.test.ts src/utils/sideQuery.test.ts npm run test --workspace=packages/cli -- src/services/insight/generators/DataProcessor.test.ts npx prettier --check packages/core/src/services/sessionRecap.test.ts packages/cli/src/services/insight/generators/DataProcessor.ts packages/cli/src/services/insight/generators/DataProcessor.test.ts npx eslint packages/core/src/services/sessionRecap.test.ts packages/cli/src/services/insight/generators/DataProcessor.ts packages/cli/src/services/insight/generators/DataProcessor.test.ts npm run typecheck --workspace=packages/core git diff --checkExpected result: the core side-query tests pass, the insight generator tests pass, and the DataProcessor calls for
insight-session-analysisandinsight-qualitative-generateboth includerespectOutputLanguagePreference: true.Evidence (Before & After)
Before: switching
runSideQueryfrom unconditional output-language injection to opt-in could make/insightreports ignoreoutput-language.md, and the session recap path lacked direct coverage thatgenerateSessionRecap()wires the recap side query with output-language opt-in.After:
DataProcessoropts in for both user-visible/insightJSON side queries,DataProcessor.test.tsverifies both call boundaries, andsessionRecap.test.tsverifiesgenerateSessionRecap()calls the recap side query withrespectOutputLanguagePreference: trueand extracts the tagged recap. This is prompt-construction behavior, so TUI screenshots are N/A.Tested on
Environment (optional)
Local Windows/PowerShell checkout with repository npm workspaces. Full local
packages/clitypecheck is not listed as a passing local check because this checkout currently reports unrelated generated/package-link errors in serve/acp/channel/web-template areas; the changed files pass targeted tests and eslint, and CI runs the repository matrix.Risk & Scope
Linked Issues
Fixes #4494
中文说明
这个 PR 做了什么
让用户可见的 side query 遵守
output-language.md,同时避免在 project summary prompt 里重复注入语言要求。为什么需要
session title、session recap、tool-use summary、follow-up suggestion、web-fetch summary、subagent artifact、arena summary 和
/insightreport 这些内容会展示给用户,因此应该遵守输出语言偏好。内部 classifier/selector 类 side query 不应该被扩大影响。Reviewer Test Plan
审核时可以检查用户可见的
runSideQuery调用是否传入respectOutputLanguagePreference: true,内部结构化调用是否没有 opt in。已在 Windows 本地验证 core side-query/sessionRecap 组合测试、CLI DataProcessor 测试、changed-file prettier/eslint、core typecheck 和git diff --check。证据
修改前:opt-in 切换后
/insight这类用户可见报告可能漏掉输出语言偏好,session recap 外层调用也缺少直接覆盖。修改后:DataProcessor两个/insightside query 都 opt in,测试覆盖 call boundary;generateSessionRecap()测试确认 recap side query opt in 并提取<recap>内容。该改动是 prompt 构造逻辑,TUI 截图不适用。风险和范围
风险主要在 opt-in 覆盖范围:需要覆盖用户可见 side query,但不扩大到内部 classifier/selector。没有公共 API 或迁移影响。