fix(core): truncate model-facing tool output#4520
Conversation
wenshao
left a comment
There was a problem hiding this comment.
[Suggestion] prompt_id not passed to truncateToolOutput — ToolOutputTruncatedEvent is constructed with prompt_id: '' (in truncation.ts:141), but scheduledCall.request.prompt_id is available here. Truncation telemetry events cannot be correlated with specific prompts or turns. Fix: add a promptId parameter to truncateToolOutput and pass it from the scheduler.
— qwen3.7-max via Qwen Code /review
| } | ||
| } | ||
|
|
||
| if (typeof content === 'string') { |
There was a problem hiding this comment.
[Critical] truncateToolOutput failure converts successful tool calls into errors
The await truncateToolOutput(...) call has no local try/catch. It sits inside an outer try whose catch treats any exception as a tool execution failure. If config.getTruncateToolOutputThreshold(), crypto.randomBytes, or logToolOutputTruncated throws, a tool that executed successfully gets reported as failed to the model — with a cryptic error that doesn't mention truncation.
Truncation is a presentation-layer concern and must never cause a successful tool call to appear as an error.
| if (typeof content === 'string') { | |
| if (typeof content === 'string') { | |
| try { | |
| const truncated = await truncateToolOutput( | |
| this.config, | |
| toolName, | |
| content, | |
| ); | |
| content = truncated.content; | |
| contentLength = content.length; | |
| } catch (truncErr) { | |
| // Truncation is best-effort. If it fails, pass the original | |
| // content through unchanged rather than failing the tool call. | |
| console.warn( | |
| `[CoreToolScheduler] truncation failed for ${toolName}:`, | |
| truncErr instanceof Error ? truncErr.message : truncErr, | |
| ); | |
| } | |
| } |
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Fixed in e3075df. The scheduler now wraps truncation in a local try/catch, logs a warning on truncation failure, and preserves the original successful tool output. Added a regression test that forces threshold lookup to throw and asserts the tool call still completes as success with the original output.
| if (toolResult.error === undefined) { | ||
| let content = toolResult.llmContent; | ||
| const contentLength = | ||
| let contentLength = |
There was a problem hiding this comment.
[Suggestion] contentLength stale after hook/reminder appends when truncation doesn't fire
The const → let change only updates contentLength inside the truncation block. But content is also mutated at two earlier points (PostToolUse hook additionalContext and conditional rules/skill reminders) that do NOT update contentLength. When hooks/reminders append text but the total stays under threshold, contentLength underreports the actual content size.
Fix: add a single contentLength recalculation after ALL content mutations, just before convertToFunctionResponse:
contentLength = typeof content === 'string' ? content.length : undefined;— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Fixed in e3075df. contentLength is now recomputed after post-hook context, rule/skill reminders, and truncation handling. Added a regression test where PostToolUse appends context without truncation and asserts contentLength matches the final model-facing content.
| toolName, | ||
| content, | ||
| ); | ||
| content = truncated.content; |
There was a problem hiding this comment.
[Suggestion] truncated.outputFile discarded — user-facing returnDisplay not updated
The scheduler only reads truncated.content and ignores truncated.outputFile. For tools that don't self-truncate (the exact target audience of this safety net), the user's TUI display (returnDisplay) never mentions that truncation occurred or where the full output was saved. Compare with shell.ts which appends "Output too long and was saved to: <path>" to the display message.
Consider surfacing the truncation in returnDisplay:
if (truncated.outputFile) {
toolResult.returnDisplay =
(typeof toolResult.returnDisplay === 'string' ? toolResult.returnDisplay + '\n' : '') +
`Output too long and was saved to: ${truncated.outputFile}`;
}— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Fixed in e3075df. When truncation successfully saves the full output and returnDisplay is a string, the scheduler appends Output too long and was saved to: <path>. Added regression coverage for the user-facing display path.
| config: mockConfig, | ||
| onAllToolCallsComplete, | ||
| onToolCallsUpdate, | ||
| getPreferredEditor: () => 'vscode', |
There was a problem hiding this comment.
[Suggestion] No negative test for within-threshold passthrough
The single test only verifies that oversized output IS truncated. A regression that always truncates (e.g., an off-by-one in the threshold comparison, or truncateToolOutput being called with wrong arguments) would pass the existing test undetected.
Add a second test with output under the threshold (e.g., 50 chars with threshold=100) asserting the output equals the original content exactly and does not contain [CONTENT TRUNCATED].
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Added in e3075df. The new passthrough regression test uses output below the threshold and asserts the model-facing output equals the original string exactly and does not contain [CONTENT TRUNCATED].
pomelo-nwu
left a comment
There was a problem hiding this comment.
Hi @Jerry2003826, thank you for your continued contributions — 9 PRs in a short time is impressive! 🎉
As we review your changes, we'd like to ask you to update each PR to follow the latest PR template on the main branch. The most important section is the Reviewer Test Plan, which significantly accelerates the review and merge process.
Specifically, for each PR please include:
- How to verify — clear reproduction steps so a reviewer can confirm the fix/feature
- Evidence (Before & After) — use the
tmux-real-user-testingskill (or manual tmux capture) to show before/after screenshots of the TUI behavior. Side-by-side evidence makes it much faster for maintainers to validate and merge - Tested on — fill in the OS table (macOS / Windows / Linux)
PRs with a complete Reviewer Test Plan are prioritized for review — without it, review may be delayed.
You can see the full template at: .github/pull_request_template.md
Thanks again for your effort — looking forward to getting these merged! 🚀
中文说明
你好 @Jerry2003826,感谢你的持续贡献——短时间内提交了 9 个 PR,非常高效!🎉
在 review 过程中,我们希望你能按照 main 分支上最新的 PR 模版更新每个 PR 的描述。其中最关键的部分是 Reviewer Test Plan,它能显著加速审核和合并流程。
具体来说,请为每个 PR 补充:
- How to verify — 清晰的复现步骤,让 reviewer 能确认修复/功能的效果
- Evidence (Before & After) — 使用
tmux-real-user-testingskill(或手动 tmux 截取)展示修改前后的 TUI 截图对比,前后对比能让维护者更快地验证和合并 - Tested on — 填写操作系统测试表格(macOS / Windows / Linux)
有完整 Reviewer Test Plan 的 PR 会被优先审核——缺少该部分可能会导致审核延迟。
完整模版见:.github/pull_request_template.md
再次感谢你的付出,期待尽快把这些 PR 合并!🚀
— Qwen Code
61c7865 to
e3075df
Compare
|
Updated in e3075df. This addresses the remaining review items: scheduler now passes Validation run locally on Windows: npm run test --workspace=@qwen-code/qwen-code-core -- src/utils/truncation.test.ts -t "caller prompt id"
npm run test --workspace=@qwen-code/qwen-code-core -- src/core/coreToolScheduler.test.ts -t "tool output truncation"
npm run test --workspace=@qwen-code/qwen-code-core -- src/utils/truncation.test.ts
npm run test --workspace=@qwen-code/qwen-code-core -- src/core/coreToolScheduler.test.ts
npx prettier --check packages/core/src/core/coreToolScheduler.ts packages/core/src/core/coreToolScheduler.test.ts packages/core/src/utils/truncation.ts packages/core/src/utils/truncation.test.ts
npx eslint packages/core/src/core/coreToolScheduler.ts packages/core/src/core/coreToolScheduler.test.ts packages/core/src/utils/truncation.ts packages/core/src/utils/truncation.test.ts
npm run lint --workspace=@qwen-code/qwen-code-core
npm run typecheck --workspace=@qwen-code/qwen-code-core
git diff --check |
|
The Current code passes Current PR checks are green across lint, CodeQL, macOS, Ubuntu, and Windows. |
|
Updated the PR description to match the latest reviewer template and removed the stale/garbled details block. The code is unchanged from e3075df; current CI is green across lint, CodeQL, macOS, Ubuntu, and Windows. Ready for re-review when you have time. |
|
|
||
| if (typeof content === 'string') { | ||
| try { | ||
| const truncated = await truncateToolOutput( |
There was a problem hiding this comment.
[Suggestion] Truncation runs AFTER hook context and <system-reminder> blocks are appended
At this point, content includes the raw tool output PLUS PostToolUse hook additionalContext, conditional-rule reminders, and skill-activation notices. The head/tail truncation algorithm is structure-blind — if the combined payload exceeds the threshold, it will bisect <system-reminder> XML envelopes, producing malformed markup the model then tries to interpret. Additionally, skill activation side effects (refreshSkills, notifyChangeListeners) have already fired, but the reminder telling the model about the new skill may get truncated away.
Consider moving the truncation block to run immediately after content = toolResult.llmContent (before PostToolUse hooks, conditional rules, and skill activation), so only raw tool output is measured and truncated. Then let hooks and reminders append onto the already-truncated content — they always reach the model intact.
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Updated in ad9ee15. I moved the truncation block to run before PostToolUse additionalContext, conditional-rule reminders, and skill-activation reminders are appended. That keeps the head/tail truncator focused on the raw tool output and lets hook/reminder metadata reach the model intact after truncation.
| completedCalls[0].response.responseParts, | ||
| ); | ||
|
|
||
| expect(output).toContain('[CONTENT TRUNCATED]'); |
There was a problem hiding this comment.
[Suggestion] contentLength not asserted when truncation fires
This test triggers truncation and checks the model-facing output, but never reads completedCalls[0].response.contentLength. The new post-truncation contentLength reassignment (line ~2983 in coreToolScheduler.ts — the reason const was changed to let) has zero behavioral verification in the truncation path. Test 4 checks contentLength but only in the no-truncation case.
Consider adding:
expect(completedCalls[0].response.contentLength).toBe(output.length);
expect(completedCalls[0].response.contentLength).toBeLessThan(5000);— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Added in ad9ee15. The large-output truncation regression now asserts completedCalls[0].response.contentLength === output.length, so the post-truncation recomputation is covered when truncation actually fires.
| expect(completedCalls[0].response.resultDisplay).toEqual( | ||
| expect.stringContaining('Output too long and was saved to:'), | ||
| ); | ||
| }); |
There was a problem hiding this comment.
[Suggestion] No test for hook context + truncation interaction
No test exercises both PostToolUse hook additionalContext AND truncation firing simultaneously. Test 4 uses hooks with threshold: 1000 and a 12-char output (no truncation fires). Test 1 triggers truncation without hooks. The integration boundary — where hook context pushes combined content over the threshold — is untested.
This matters especially if the truncation-ordering suggestion above is not adopted: hook-injected content (e.g., policy reminders) could silently disappear when the combined payload triggers truncation.
Consider adding a test with hooks enabled, output + hook context exceeding the threshold, and asserting the truncation marker is present, contentLength reflects the truncated length, and the full-output file contains the hook context.
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Added a regression test in ad9ee15: preserves hook context outside truncated tool output. It enables PostToolUse hooks, forces truncation, asserts the truncation marker/contentLength, and verifies the full hook context is appended after the truncated tool output instead of being bisected. Since the implementation now truncates raw tool output before appending hook/reminder metadata, the saved full-output file intentionally captures the raw tool output rather than the later hook context.
| config: Config, | ||
| toolName: string, | ||
| content: string, | ||
| promptId = '', |
There was a problem hiding this comment.
[Suggestion] promptId = '' default masks telemetry gaps in existing callers
shell.ts:2062 and mcp-tool.ts:438 call truncateToolOutput with 3 arguments (no promptId), so their telemetry events carry prompt_id: ''. This defeats the purpose of adding promptId for telemetry correlation — two of the three callers emit truncation events that cannot be traced to a specific prompt.
Consider either making promptId required (so the compiler enforces it), or — if the plan is to consolidate truncation at the scheduler level — removing the tool-level truncateToolOutput calls from shell.ts and mcp-tool.ts entirely. That would also eliminate the double-truncation edge case where scheduler-level truncation runs on already-truncated content after hooks push it back over the threshold.
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Documented the fallback in ad9ee15. The scheduler-level path now passes scheduledCall.request.prompt_id, which covers the model-facing history truncation added by this PR. I kept promptId optional for the existing direct tool-level callers because shell/MCP execute() do not receive scheduler request context today, and removing those direct truncation guards would widen this PR by changing behavior for direct tool invocations outside the scheduler. The JSDoc now calls out that scheduler callers should pass the prompt id while direct tool implementations may omit it when they do not have that context.
e3075df to
ad9ee15
Compare
|
Updated in ad9ee15 for the latest review pass. Changes made:
Validation run locally on Windows: npm run test --workspace=packages/core -- src/core/coreToolScheduler.test.ts -t "preserves hook context outside truncated tool output|truncates large model-facing tool output|reports contentLength after hook context"
npm run test --workspace=packages/core -- src/core/coreToolScheduler.test.ts
npx prettier --check packages/core/src/core/coreToolScheduler.ts packages/core/src/core/coreToolScheduler.test.ts packages/core/src/utils/truncation.ts
npx eslint packages/core/src/core/coreToolScheduler.ts packages/core/src/core/coreToolScheduler.test.ts packages/core/src/utils/truncation.ts
npm run typecheck --workspace=packages/core
npm run lint --workspace=packages/core
git diff --check |
wenshao
left a comment
There was a problem hiding this comment.
[Suggestion] fs.writeFile in truncation.ts:89 lacks { mode: 0o600 } — the saved tool output files inherit the process umask (typically 0o644 on Linux), making them world-readable on shared systems. Tool output frequently contains secrets (env vars, API keys). This PR extends truncation from shell/MCP-only to ALL tools, widening the exposure. Other sensitive-file writers in this codebase use { mode: 0o600 } (see file-token-storage.ts, oauth-token-storage.ts, sessionService.ts). Consider adding { mode: 0o600 } to the writeFile call and { mode: 0o700 } to the mkdir call. Pre-existing issue, but the wider surface area makes it worth addressing in a follow-up.
— qwen3.7-max via Qwen Code /review
| } | ||
| } | ||
|
|
||
| if (typeof content === 'string') { |
There was a problem hiding this comment.
[Critical] Double truncation for shell tool outputs
shell.ts:2062 already calls truncateToolOutput on llmContent before returning to the scheduler. The scheduler then calls it a second time here. truncateAndSaveToFile produces output of approximately threshold + ~330 chars (the header text) and truncateLines + ~8 lines (header + separator), which exceeds both the char threshold and line limit on the second pass — triggering actual re-truncation with the default config (25K chars, 1000 lines).
Consequences:
- Nested truncation headers: the model sees two
[CONTENT TRUNCATED]envelopes - Two temp files: file B (from the second pass) contains the already-truncated content, not the original. File A (from the first pass, containing the real full output) is orphaned — the model-facing message points to B, not A
- Duplicate
returnDisplaymutation: shell.ts:2072 already appends"Output too long and was saved to: <A>", and the scheduler appends a second"Output too long and was saved to: <B>" - Duplicate telemetry:
logToolOutputTruncatedfires twice per tool call
Suggested fix: remove the tool-level truncateToolOutput calls from shell.ts (lines 2061-2076) and let the scheduler be the single truncation point. This aligns with the PR's stated goal of centralizing truncation and eliminates the dual-responsibility. mcp-tool.ts is safe because it returns Part[] (non-string), which the typeof guard skips.
| if (typeof content === 'string') { | |
| if (typeof content === 'string' && !content.startsWith('Tool output was too large')) { | |
| try { | |
| const truncated = await truncateToolOutput( |
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Fixed in 8ef3cb3. Shell no longer calls truncateToolOutput directly, so the scheduler is the single model-facing string-output truncation point. The scheduler also skips content that already starts with TOOL_OUTPUT_TRUNCATED_PREFIX, which protects existing direct truncation callers from being wrapped again.
| if (toolResult.error === undefined) { | ||
| let content = toolResult.llmContent; | ||
| const contentLength = | ||
| let contentLength = |
There was a problem hiding this comment.
[Suggestion] Dead initial contentLength assignment
This let contentLength is always overwritten by the identical expression at line ~2984 before it is ever read. The only early-return path between the two assignments (shouldStop) never reads contentLength.
The landmine: if a future refactor removes the recalculation at line ~2984 (thinking "contentLength is already set at the top"), contentLength silently regresses to the raw tool output size, and downstream consumers see a value that excludes truncation, hook context, and reminders.
| let contentLength = | |
| let contentLength: number | undefined; |
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Fixed in 8ef3cb3. Removed the top-of-block assignment and now declares contentLength only at the final post-mutation calculation before convertToFunctionResponse.
| }); | ||
| }); | ||
|
|
||
| describe('CoreToolScheduler tool output truncation', () => { |
There was a problem hiding this comment.
[Suggestion] No test for the double-truncation scenario
All tests here use LargeOutputTool, which returns raw un-truncated content. The most dangerous production path — shell tool output exceeding the threshold — is precisely the scenario where truncateToolOutput runs twice (once in shell.ts, once in the scheduler), and it has zero coverage.
Consider adding a test where the tool's execute() returns content that already includes the "Tool output was too large" preamble (simulating what shell.ts produces after its own truncation), and verify the scheduler does not produce a second truncation header or a second file write.
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Added in 8ef3cb3. The new regression returns already-truncated content, asserts the model-facing output is unchanged, keeps the existing returnDisplay, and verifies fs.writeFile is not called a second time.
ad9ee15 to
8ef3cb3
Compare
|
Updated in 8ef3cb3 for the latest review pass. Changes made:
Validation run locally on Windows: npm run test --workspace=packages/core -- src/core/coreToolScheduler.test.ts src/utils/truncation.test.ts -t "already-truncated|line limit exceeded|binding constraint|very long lines|correct file path|path traversal"
npm run test --workspace=packages/core -- src/core/coreToolScheduler.test.ts
npm run test --workspace=packages/core -- src/utils/truncation.test.ts
npm run test --workspace=packages/core -- src/tools/shell.test.ts -t "appends the hint after command output is assembled"
npm run test --workspace=packages/core -- src/tools/shell.test.ts
npx prettier --check packages/core/src/core/coreToolScheduler.ts packages/core/src/core/coreToolScheduler.test.ts packages/core/src/utils/truncation.ts packages/core/src/utils/truncation.test.ts packages/core/src/tools/shell.ts packages/core/src/tools/shell.test.ts
npx eslint packages/core/src/core/coreToolScheduler.ts packages/core/src/core/coreToolScheduler.test.ts packages/core/src/utils/truncation.ts packages/core/src/utils/truncation.test.ts packages/core/src/tools/shell.ts packages/core/src/tools/shell.test.ts
npm run typecheck --workspace=packages/core
npm run lint --workspace=packages/core
git diff --check |
wenshao
left a comment
There was a problem hiding this comment.
[Suggestion] Stale comment in shell.ts still references truncation I/O (~line 1980)
The comment block near shell.ts:~1980 reads "intentionally BEFORE the post-processing block below (truncation I/O, output-file write)" and "Truncation time is bounded by the temp-dir backend and isn't representative of the command's actual wait." Both references describe behavior removed by this PR — truncation I/O and output-file writes no longer happen in shell.ts. The PR already updated surrounding comments (returnDisplayMessage build order, long-run advisory) but missed this one.
Future readers will waste time looking for a truncation block that no longer exists in shell.ts, or may be misled about what the timing measurement accounts for.
// - Wall-clock duration >= threshold. Measured spawn -> resultPromise
// settle, intentionally BEFORE post-processing (attribution,
// returnDisplay build, hint append). The hint reports how long
// the COMMAND blocked the agent, not how long the tool call
// spent including post-processing.
(This finding is outside the PR diff, so it's posted in the review body rather than as an inline comment.)
— qwen3.7-max via Qwen Code /review
8ef3cb3 to
a3bdc99
Compare
wenshao
left a comment
There was a problem hiding this comment.
[Suggestion] fs.writeFile / fs.mkdir failure swallows the underlying error (truncation.ts:104-108)
The catch (_error) block inside truncateAndSaveToFile returns a generic "[Note: Could not save full output to file]" message but discards the actual error. At 3 AM investigating why truncation silently fell back on every tool call (disk full, temp dir deleted, permissions change), there would be no log, no telemetry, and no error details. Since this is a scheduler-level code path every tool hits, a systemic temp-dir problem would cascade silently.
Consider adding debugLogger.warn(\Failed to save truncated tool output to ${outputFile}: ` + (error instanceof Error ? error.message : String(error)))` inside the catch block before returning.
— qwen3.7-max via Qwen Code /review
| content = truncated.content; | ||
| if ( | ||
| truncated.outputFile && | ||
| typeof toolResult.returnDisplay === 'string' |
There was a problem hiding this comment.
[Suggestion] typeof toolResult.returnDisplay === 'string' guard not covered by tests
When truncation fires and returnDisplay is a structured type (FileDiff, TodoResultDisplay, AnsiOutputDisplay, etc.), this typeof guard prevents appending the "Output too long and was saved to: <path>" notice. No scheduler test currently exercises this branch with a non-string returnDisplay — all truncation tests use LargeOutputTool which returns a string returnDisplay.
If a future refactor were to remove this guard and unconditionally concatenate, structured display objects would be corrupted (string concat on an object produces [object Object]). No test would fail.
Consider adding a regression test where LargeOutputTool returns a structured returnDisplay (e.g., { type: 'ansi', content: '...' }) alongside oversized string content, then assert content is truncated but resultDisplay is the original structured object unchanged.
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Fixed in 378c2ac. Added a scheduler regression test with a structured returnDisplay; the model-facing content is truncated while resultDisplay remains the original structured object unchanged.
| await fs.mkdir(projectTempDir, { recursive: true }); | ||
| await fs.writeFile(outputFile, content); | ||
| await fs.mkdir(projectTempDir, { recursive: true, mode: 0o700 }); | ||
| await fs.writeFile(outputFile, content, { mode: 0o600 }); |
There was a problem hiding this comment.
[Suggestion] ToolOutputTruncatedEvent does not carry the saved-file path
result.outputFile is available at the call site below (around line 148) but is not forwarded into ToolOutputTruncatedEvent. When investigating a truncation event in production (e.g., "why did the model behave oddly after truncation?"), there is no way to correlate the event with the actual file that holds the full output.
Consider adding an output_file: string field to ToolOutputTruncatedEvent (in packages/core/src/telemetry/types.ts) and forwarding result.outputFile here, so post-mortem analysis can jump straight to the saved file.
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Fixed in 378c2ac. ToolOutputTruncatedEvent now carries output_file, truncateToolOutput forwards the saved path, and the telemetry logger/QwenLogger coverage was updated.
| } | ||
| } catch (truncationError) { | ||
| debugLogger.warn( | ||
| `Tool output truncation failed for ${toolName}: ` + |
There was a problem hiding this comment.
[Suggestion] Truncation-failure log lacks callId / prompt_id
The warning only includes toolName. When the same tool runs many times in a session (common for Bash or read_file), a truncation-failure warning can't be correlated to the specific invocation that failed.
scheduledCall.callId and scheduledCall.request.prompt_id are both in scope here — including them makes the warning actionable when paging through logs.
| `Tool output truncation failed for ${toolName}: ` + | |
| debugLogger.warn( | |
| `Tool output truncation failed for ${toolName} ` + | |
| `(callId=${scheduledCall.callId}, prompt_id=${scheduledCall.request.prompt_id}): ` + | |
| (truncationError instanceof Error | |
| ? truncationError.message | |
| : String(truncationError)), | |
| ); |
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Fixed in 378c2ac. The truncation-failure warning now includes the scheduled request callId and prompt_id; current types store the call id on scheduledCall.request.callId.
a3bdc99 to
97531e2
Compare
|
Updated in 97531e2 for the latest review-body suggestion. Changes made:
Validation run locally on Windows: npm run test --workspace=packages/core -- src/utils/truncation.test.ts -t "file write errors"
npm run test --workspace=packages/core -- src/utils/truncation.test.ts
npx prettier --check packages/core/src/utils/truncation.ts packages/core/src/utils/truncation.test.ts
npx eslint packages/core/src/utils/truncation.ts packages/core/src/utils/truncation.test.ts
git diff --check |
Local verification report (Linux, tmux)Verified PR head Environment
Results
Behavior coveredThe new scheduler-level
Diff-level sanity checks (read alongside the tests)
Notes
ConclusionLGTM from a verification standpoint. All commands listed in the PR's reviewer test plan pass locally on Linux; the additional Verification artifacts (logs) retained locally at |
| try { | ||
| await fs.mkdir(projectTempDir, { recursive: true }); | ||
| await fs.writeFile(outputFile, content); | ||
| await fs.mkdir(projectTempDir, { recursive: true, mode: 0o700 }); |
There was a problem hiding this comment.
[Suggestion] mode: 0o700 on fs.mkdir is silently defeated by prior directory creation
logger.ts (line ~181) calls await fs.mkdir(this.qwenDir, { recursive: true }) on the same getProjectTempDir() path early in session initialization — without a restrictive mode. Since fs.mkdir with recursive: true does not modify permissions of existing directories, by the time truncateAndSaveToFile runs, the directory already exists with default permissions (typically 0o755). The 0o700 mode here is effectively dead code.
The file-level 0o600 on writeFile still works correctly for newly-created files, so the immediate data protection is intact. But directory listing by other local users remains possible on shared systems.
Consider one of:
- (a) Add
mode: 0o700where the directory is first created (logger.tsorstorage.ts) - (b) Add
await fs.chmod(projectTempDir, 0o700)after themkdircall here to enforce the mode regardless of who created it - (c) Remove the misleading
modeparameter and add a comment explaining that directory permission hardening is handled elsewhere
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Fixed in a90adf9. After mkdir, truncateAndSaveToFile now best-effort chmods the project temp dir to 0o700 so an existing dir with looser permissions is tightened before writing the saved output. chmod failure is logged but does not prevent saving the 0o600 file; added regression coverage for that fallback.
| } | ||
|
|
||
| if ( | ||
| typeof content === 'string' && |
There was a problem hiding this comment.
[Suggestion] Shell-level metadata (long-run hint, attribution warning) is subject to head/tail truncation without protection
Shell.ts appends the long-run advisory hint (~530 chars) and attributionWarning to llmContent before returning to the scheduler. The scheduler's truncation then treats this combined string as raw tool output. Unlike PostToolUse hook context — which is explicitly deferred to after truncation (lines 2901-2905) — tool-level metadata appended inside the tool implementation receives no equivalent protection.
With the default threshold (25,000 chars), the tail budget is generous enough that the hint reliably survives. The practical risk is low. However, a future reduction in default thresholds or a user-configured low threshold could bisect the hint or attribution warning mid-sentence. The old shell test that pinned the hint's survival through truncation was removed and replaced with one that doesn't exercise the truncation interaction.
Consider either:
- (a) Documenting in
shell.tsthat the hint and attribution warning are intentionally subject to scheduler truncation - (b) Having tools expose metadata separately (e.g., on
toolResult) so the scheduler can append it after truncation, similar topostToolUseAdditionalContext
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Handled in a90adf9 with the lower-risk option. I documented in shell.ts that the model-facing copies of the long-run hint and attribution warning are intentionally part of shell llmContent and therefore subject to scheduler truncation, while returnDisplay still keeps the full user-facing metadata. I did not add a separate tool metadata API in this PR to avoid widening the change beyond the truncation fix.
97531e2 to
378c2ac
Compare
|
Updated PR description to the latest template with Reviewer Test Plan, Evidence, Tested on, and the Chinese explanation section. Also pushed 378c2ac for the latest inline feedback:
Validation run locally on Windows: npm run test --workspace=packages/core -- src/utils/truncation.test.ts
npm run test --workspace=packages/core -- src/core/coreToolScheduler.test.ts
npm run test --workspace=packages/core -- src/telemetry/loggers.test.ts
npx prettier --check packages/core/src/core/coreToolScheduler.ts packages/core/src/core/coreToolScheduler.test.ts packages/core/src/utils/truncation.ts packages/core/src/utils/truncation.test.ts packages/core/src/telemetry/types.ts packages/core/src/telemetry/loggers.test.ts packages/core/src/telemetry/qwen-logger/qwen-logger.ts
npx eslint packages/core/src/core/coreToolScheduler.ts packages/core/src/core/coreToolScheduler.test.ts packages/core/src/utils/truncation.ts packages/core/src/utils/truncation.test.ts packages/core/src/telemetry/types.ts packages/core/src/telemetry/loggers.test.ts packages/core/src/telemetry/qwen-logger/qwen-logger.ts
npm run typecheck --workspace=packages/core
git diff --check |
|
CI note: the latest Qwen Code CI run did not reach lint/test execution. The Lint job failed during |
| content, | ||
| ); | ||
| content = truncated.content; | ||
| if (truncated.outputFile && typeof resultDisplay === 'string') { |
There was a problem hiding this comment.
[Suggestion] toolOutputAlreadyTruncated flag is gated on typeof resultDisplay === 'string', but content = truncated.content on line 3174 runs unconditionally. When a tool returns string llmContent that gets truncated with an outputFile, but a non-string resultDisplay (e.g., FileDiff from edit tool, AgentResultDisplay from agent tool), the flag stays false even though content has already been truncated. The combined-content guard at line ~3272 will then re-truncate already-truncated content when PostToolUse context or conditional-rule reminders are appended — the exact double-truncation scenario this commit aims to prevent.
Decouple the flag from the display type check:
| if (truncated.outputFile && typeof resultDisplay === 'string') { | |
| if (truncated.outputFile) { | |
| toolOutputAlreadyTruncated = true; | |
| if (typeof resultDisplay === 'string') { | |
| resultDisplay += | |
| (resultDisplay ? '\n' : '') + | |
| `Output too long and was saved to: ${truncated.outputFile}`; | |
| } | |
| } |
The regression test covers the string-display path but does not exercise a structured returnDisplay with large string llmContent + PostToolUse hook context. Consider adding a test case for that edge case as well.
— qwen3.7-plus via Qwen Code /review
There was a problem hiding this comment.
Handled in the current branch. toolOutputAlreadyTruncated is now set whenever the first truncation pass persists raw string output to an outputFile, independent of whether resultDisplay is a string. The display save notice remains string-only.
Added a regression with structured FileDiff result display plus PostToolUse additional context. It asserts the raw output truncation runs once, the appended context is preserved, and the structured display object is left unchanged.
Validated:
npm run test --workspace=packages/core -- src/core/coreToolScheduler.test.ts -t "structured result display"npm run test --workspace=packages/core -- src/core/coreToolScheduler.test.tsnpx prettier --check packages/core/src/core/coreToolScheduler.ts packages/core/src/core/coreToolScheduler.test.tsnpx eslint packages/core/src/core/coreToolScheduler.ts packages/core/src/core/coreToolScheduler.test.ts
…-history-truncation # Conflicts: # packages/core/src/core/coreToolScheduler.ts
37a65fe to
efa179a
Compare
DragonnZhang
left a comment
There was a problem hiding this comment.
Review — PR #4520 (commit efa179a)
Direction: Sound. The toolOutputAlreadyTruncated guard added in a811f425d correctly prevents the combined guard from re-truncating content that was already persisted by the first truncation pass. This closes the nested-truncation-envelope concern from the previous review.
CI: all_pass (24/24 checks).
Detected issues: None beyond the extensively-discussed backlog (39 stale inline comments, all addressed by subsequent commits). The rescoped 4-file core change is focused and well-tested.
Incremental since R19 (commit 2ea9955): +8 lines in coreToolScheduler.ts (the toolOutputAlreadyTruncated flag + combined-guard skip), +54 lines of regression tests. No regressions in the diff.
— qwen3-coder via Qwen Code /review
|
Addressed the latest truncation review batch in What changed:
Validated locally:
|
|
Pushed additional review follow-ups in Changes added:
Validation:
One intentional behavior note: I kept structured |
|
@Jerry2003826 heads up — this PR currently has merge conflicts with Conflicting files:
The rest merges cleanly. Thanks! 中文@Jerry2003826 提个醒 —— 这个 PR 目前和 冲突文件:
其余文件可以自动合并。谢谢! |
Merge upstream/main and reconcile the tool-output truncation conflicts: - Adopt main's scheduler truncation pipeline (truncateLlmContent, per-tool budgets, batch offload) and matching test coverage - Keep PR i18n strings for the fork feature gate in zh locales - Remove duplicate/conflicting PR scheduler paths superseded by main Co-authored-by: JerryLee <Jerry2003826@users.noreply.github.com>
Verification: model-facing tool-output truncation moved into
|
|
@qwen-code /triage |
|
@Jerry2003826 heads up — CI is red on the latest commit ( Root cause is a TypeScript build error in This fails 中文@Jerry2003826 提个醒 —— 最新 commit( 根因是 这会导致 |
Merge-reference follow-up: the feature is already in
|
| File | What it is | State |
|---|---|---|
core/.../telemetry/qwen-logger.ts (+4) |
extra truncation-telemetry fields (call_id, output_file_saved, save_error_*) |
❌ breaks tsc/build — those fields aren't on ToolOutputTruncatedEvent (per the CI-red note above) |
cli/.../i18n/zh.js, zh-TW.js (+2 each) |
a /fork feature-gate translation |
❌ also breaks check-i18n + off-topic (see below) |
core/.../tools/shell.test.ts (-41) |
removes shell-local truncation tests | ✅ fine — redundant now that truncation is in the scheduler in main; shell.test.ts still passes (214) |
A second red check, not just the build
Beyond the tsc break already noted, npm run check-i18n (which runs in PR CI's Lint job) also fails:
❌ Errors:
- Extra key in zh-TW.js (not in en.js): "The /fork command requires the fork feature gate. Set QWEN_CODE_ENABLE_FORK_SUBAGENT=1 to enable it."
- Extra key in zh.js (not in en.js): "...same..."
These zh/zh-TW entries have no en.js source key (confirmed PR-added — main's zh.js lacks it). They're also a /fork message, unrelated to this PR's tool-output-truncation scope — most likely swept in by the merge.
Why this happened
The merge-resolution commit a04b91de5 correctly took main's side for coreToolScheduler.ts / shell.ts / telemetry/types.ts / truncation.ts (which already carry the feature), dropping the now-duplicate move — but it kept the qwen-logger.ts and /fork i18n bits whose supporting types.ts fields and en.js key were on the dropped side. The leftover is internally inconsistent: it can't compile and can't pass i18n.
Recommendation
Close this PR. Its feature is merged; the remainder doesn't build, fails check-i18n, and is partly off-topic. Repairing the build wouldn't make it worth merging — it would just be a telemetry-field add + an unrelated /fork translation + a redundant test removal. If the extra truncation telemetry (call_id + save-error fields) is still wanted, it's a clean, small, focused follow-up: add those fields to ToolOutputTruncatedEvent and the code that sets them, and leave the /fork i18n out.
中文版(Chinese version)
合并参考跟进:功能已在 main —— 建议关闭
接着上面两条 —— 我此前的 PASS 验证(测的是合并前 SHA 81498ed)和 CI 全红的提醒(a04b91de5 上 qwen-logger.ts 的 TS2339 构建错误)—— 这是用当前 head 对当前 main 复查后的状态,以及它指向的合并决策。
结论:本 PR 要加的截断功能已经在 main 里了;而解决合并冲突的提交把 PR 削减成了一个无法构建、且过不了 check-i18n 的小残留。我倾向于关闭它,而不是去修。
功能已合入 main
truncateToolOutput() 与「Tool output was too large and has been truncated.」这条面向模型的路径,现已在 main 的 packages/core/src/core/coreToolScheduler.ts + shell.ts 中(正是我此前报告端到端验证过的行为)。因此本分支相对当前 main 的真实 diff(origin/main...HEAD)只剩 4 个文件、+22/-55 —— 截断迁移本身已不在其中。
这 4 个残留文件是什么
| 文件 | 是什么 | 状态 |
|---|---|---|
core/.../telemetry/qwen-logger.ts (+4) |
额外的截断遥测字段(call_id、output_file_saved、save_error_*) |
❌ 破坏 tsc/构建 —— 这些字段不在 ToolOutputTruncatedEvent 上(见上方 CI 红的提醒) |
cli/.../i18n/zh.js、zh-TW.js(各 +2) |
一条 /fork 功能开关翻译 |
❌ 同时破坏 check-i18n + 跑题(见下) |
core/.../tools/shell.test.ts (-41) |
移除 shell 本地截断测试 | ✅ 没问题 —— 截断已在 main 的 scheduler 中,这些已冗余;shell.test.ts 仍通过(214) |
不止构建一处红
除了已提到的 tsc 错误,npm run check-i18n(运行在 PR CI 的 Lint job 中)也失败:
❌ Errors:
- zh-TW.js 中多出的 key(不在 en.js):"The /fork command requires the fork feature gate. ..."
- zh.js 中多出的 key(不在 en.js):"...同上..."
这两个 zh/zh-TW 条目在 en.js 中没有源 key(确认为 PR 新增 —— main 的 zh.js 没有)。它们还是 /fork 消息,与本 PR 的工具输出截断范围无关 —— 很可能是 merge 时带进来的。
为什么会这样
解决合并冲突的提交 a04b91de5 对 coreToolScheduler.ts / shell.ts / telemetry/types.ts / truncation.ts 正确地取了 main 那一侧(其已携带该功能),丢弃了重复的迁移 —— 但保留了依赖被丢弃侧的 types.ts 字段与 en.js key 的 qwen-logger.ts 与 /fork i18n。残留因而自相矛盾:既不能编译,也过不了 i18n。
建议
关闭本 PR。 其功能已合并;剩余部分不能构建、过不了 check-i18n、且部分跑题。即便修好构建也不值得合 —— 那只是「加一组遥测字段 + 一条无关的 /fork 翻译 + 删一批冗余测试」。如果仍想要那批额外的截断遥测(call_id + save-error 字段),可另开一个干净、聚焦的小后续:把这些字段加到 ToolOutputTruncatedEvent 以及填充它们的代码里,并去掉 /fork i18n。
|
Thanks for the PR @Jerry2003826 — and for the persistence through 22 commits of review iteration. Template: All required sections present ✓. Minor: the Direction: Clearly aligned. Unbounded tool outputs overflowing context tokens is a real user problem (#4049). Centralising truncation at the scheduler level so all string-returning tools benefit — not just shell — is the right architectural move. Approach — important context about the current diff: The PR has been through extensive iteration, and much of the original work (scheduler-level truncation in
Two concerns at this stage:
Blocking on the CI fix. Once that's green I'll proceed to code review and testing. 中文说明感谢 @Jerry2003826 的持续投入(22 次提交的迭代)。 模板: 必填部分齐全 ✓,缺少 方向: 完全对齐。工具输出无界增长导致上下文溢出是真实问题(#4049),将截断逻辑上移到 scheduler 使所有字符串返回工具受益,架构方向正确。 方案——关于当前 diff 的重要说明: 经过大量迭代,原始工作的大部分(
两个问题:
CI 修复后进入代码审查和测试阶段。 — Qwen Code · qwen3.7-max |
Follow review follow-up for PR QwenLM#4520 after the merge-resolution commit: - Extend ToolOutputTruncatedEvent with call_id, output_file_saved, and optional save-error metadata expected by qwen-logger - Pass callId from CoreToolScheduler through truncateLlmContent and log truncation whenever model-facing content is actually bounded - Remove off-topic /fork i18n keys that broke check-i18n Co-authored-by: JerryLee <Jerry2003826@users.noreply.github.com>
2a. Code ReviewIndependent proposal (before reading diff): Given the goal of "truncate model-facing tool output at the scheduler level," I would: (1) add truncation logic to Comparison with the diff: Most of this work is already on main (scheduler truncation in
Notable omission —
2b. Real-Scenario Testing⛔ Cannot run — build is broken. The TypeScript compilation errors in Previous verification rounds (by @wenshao on June 1 and June 14) showed the core scheduler truncation working end-to-end with a real model. Those verifications were against earlier commits where the build was green. The current head ( Summary of blockers:
中文说明2a. 代码审查独立方案(读 diff 前): 在 与 diff 对比: 大部分工作已在 main 上。剩余 diff 逐文件评估:
重要遗漏: 2b. 真实场景测试⛔ 无法运行——构建失败。 阻塞项:
— Qwen Code · qwen3.7-max |
ReflectionStepping back, this PR has had an unusually long journey — 22 commits, multiple rounds of review from @wenshao and @LaZzyMan, and several rewrites. The core idea (scheduler-level truncation for all string-returning tools) is sound, and the good news is that most of the actual work is already on main: The current net diff (+22/−55 across 4 files) is what remains after all that iteration — and it's where the problems are:
The contributor has shown strong commitment and responsiveness throughout this process. The fixes needed are small and mechanical. But the PR can't merge while CI is red. Verdict: Requesting changes. 中文说明反思这个 PR 经历了漫长的迭代(22 次提交、多轮审查),核心思路(scheduler 级截断)是正确的,而且大部分工作已在 main 上。 当前 net diff(+22/−55,4 个文件)是迭代后的残余,问题出在这里:
贡献者展现了很强的投入和响应能力,修复项都很小。但 CI 红灯时无法合入。 结论:请求修改。 — Qwen Code · qwen3.7-max |
qwen-code-ci-bot
left a comment
There was a problem hiding this comment.
Needs a few small fixes before this can land — see my notes above. The core truncation work is solid and already on main; the remaining diff just needs the types.ts update, i18n cleanup, and a CI-green run. 🙏
Re-verification after the fix push — both issues resolved, now green ✅Update to my previous note (which recommended closing because the head Verdict: the build + i18n breakage is fixed; the PR is now a small, coherent, green, on-topic residual (truncation-telemetry enrichment + a test cleanup) on top of the already-merged feature. No longer a "close it" — it's mergeable; just update the stale description. Both prior blockers are resolved
Confirmed the fix is load-bearing: reverting just the four What the PR now contributes (net diff vs current
|
之前(a04b91de5) |
现在(7481a831e7) |
|---|---|
❌ 构建红 —— 4× TS2339(qwen-logger 读了 ToolOutputTruncatedEvent 上不存在的字段) |
✅ npm run build / typecheck 绿 —— types.ts 现已声明 call_id? / output_file_saved / save_error_code? / save_error_message?,构造函数也会赋值 |
❌ check-i18n 红 —— 孤儿 /fork key(不在 en.js) |
✅ check-i18n exit 0 —— 跑题的 /fork zh/zh-TW 条目已移除 |
确认该修复是承重的:仅回退这四个 types.ts 字段声明,就会重现 8 个 TS2339 错误。
本 PR 现在贡献了什么(相对当前 main 的净 diff,6 文件 +50/−57)
截断迁移本身已在 main(我此前用真实 CLI/tmux 端到端验证过)。本分支现在为既有的截断事件添加了一个干净、切题的可观测性增强:
types.ts(+12)——ToolOutputTruncatedEvent新增call_id、output_file_saved、save_error_code、save_error_message。truncation.ts(+6)/coreToolScheduler.ts(+8)—— 填充它们(outputFileSaved: Boolean(result.outputFile)、callId,以及临时文件写入失败时的 save-error code/message)。qwen-logger.ts(+4)—— 在截断遥测快照中输出它们。loggers.test.ts(+8)—— 断言这些字段,包括保存失败路径(output_file_saved: false、save_error_code: 'EACCES'、save_error_message: 'permission denied')。shell.test.ts(−41)—— 移除 shell 本地截断测试,在main中截断已归入 scheduler,这些已冗余。
测试 + 检查(7481a831e7)
npm run build/typecheck --workspace=packages/core:exit 0npm run check-i18n:exit 0vitestloggers + coreToolScheduler + shell:467 通过
关于 CI-bot 的审查(它评的是修复前的提交)
上方 bot 的 stage-2/3 评论(「⛔ 构建挂掉」「请求修改」)是针对修复前的 head a04b91de5 生成的 —— 修复 7481a831e7 于 18:23 推送,恰在这些评论之前,但它们没采纳。它列的三个构建/i18n 阻塞项(types.ts 字段、移除 /fork i18n、CI 转绿)正是 7481a831e7 所做的,因此在当前 head 已全部解决。
它唯一仍开放的问题 —— shell.ts 双重截断(main 上 shell.ts 仍调用 truncateToolOutput,即 shell 先截断、scheduler 再截断一次)—— 确实存在,但不在本 PR 范围内:本 diff 未改动 shell.ts 源码(shell.test.ts 的改动是纯测试),且这种分层是 main 既有行为。我此前的真实模型运行中,分层结果端到端仍可用(shell 限界流式缓冲;scheduler 限界最终面向模型的字符串)。是否合并为单一路径,是另一项独立清理,不属于本次遥测后续。
一处不阻塞的小问题
PR 描述仍写成原始的全范围改动(把截断迁移到 scheduler、Part[] 处理等),其中大部分已在 main。建议精简到当前范围 —— 「为工具输出截断事件新增 call_id + 保存状态/错误遥测;删除已冗余的 shell 本地截断测试」 —— 让评审看清这实际改了什么。(CI-bot 也提示了描述漂移和缺少 中文说明。)
总体:我上一条的顾虑均已解决、CI 全绿。这现在是已合入截断功能之上的一个合理小后续 —— 刷新描述后即可合并。
|
Thanks for all the work here, @Jerry2003826 — and for the persistence through The core of this PR — moving model-facing tool-output truncation into That telemetry addition is reasonable and CI is green, but since the feature Thanks again for pushing this forward; the end-to-end behavior you were after is |
What this PR does
Moves model-facing string tool-output truncation from the shell tool into
CoreToolScheduler, so any tool that returns stringllmContentcan be bounded before the result is recorded into conversation history.The PR intentionally keeps the scope narrow:
truncateToolOutput()helper and temp-file behavior;Part[]outputs on the existing path.Why it's needed
Unbounded tool results can overflow context tokens and make the session unable to continue. Shell output already had a truncation helper, but other string-returning tools could still place large outputs directly into model history.
Reviewer Test Plan
How to verify
Evidence (Before & After)
Before: string outputs from non-shell tools could enter model history without passing through the existing tool-output truncation helper.
After:
CoreToolSchedulerinvokestruncateToolOutput()for stringllmContentbefore converting the result into a function response. Tests cover large string output truncation, PostToolUse context placement after raw output truncation, shell long-run hint behavior after removing shell-local truncation, and non-stringPart[]passthrough.This is model-facing history behavior, so TUI screenshots are N/A.
Tested on
Environment (optional)
Local Windows/PowerShell checkout with repository npm workspaces. No tmux/TUI capture is included because the behavior is core scheduler/history logic rather than a visible TUI state.
Risk & Scope
Part[]output remains unchanged.ToolResultAPI changes, split-budget hook-context truncation, UI changes, and unrelated refactors.Linked Issues
Fixes #4049