Skip to content

fix(core): truncate model-facing tool output#4520

Closed
Jerry2003826 wants to merge 23 commits into
QwenLM:mainfrom
Jerry2003826:codex/fix-tool-output-history-truncation
Closed

fix(core): truncate model-facing tool output#4520
Jerry2003826 wants to merge 23 commits into
QwenLM:mainfrom
Jerry2003826:codex/fix-tool-output-history-truncation

Conversation

@Jerry2003826

@Jerry2003826 Jerry2003826 commented May 25, 2026

Copy link
Copy Markdown
Contributor

What this PR does

Moves model-facing string tool-output truncation from the shell tool into CoreToolScheduler, so any tool that returns string llmContent can be bounded before the result is recorded into conversation history.

The PR intentionally keeps the scope narrow:

  • reuses the existing truncateToolOutput() helper and temp-file behavior;
  • removes shell-local truncation so scheduler is the single string-output path;
  • appends PostToolUse hook context after raw output truncation so hook-injected metadata is not split by the head/tail truncator;
  • leaves non-string Part[] outputs on the existing path.

Why it's needed

Unbounded tool results can overflow context tokens and make the session unable to continue. Shell output already had a truncation helper, but other string-returning tools could still place large outputs directly into model history.

Reviewer Test Plan

How to verify

npm run test --workspace=packages/core -- src/core/coreToolScheduler.test.ts -t "model-facing output truncation"
npm run test --workspace=packages/core -- src/core/coreToolScheduler.test.ts src/tools/shell.test.ts -t "model-facing output truncation|appends the hint after command output is assembled"
npm run test --workspace=packages/core -- src/core/coreToolScheduler.test.ts src/tools/shell.test.ts
npx prettier --check packages/core/src/core/coreToolScheduler.ts packages/core/src/core/coreToolScheduler.test.ts packages/core/src/tools/shell.ts packages/core/src/tools/shell.test.ts
npx eslint packages/core/src/core/coreToolScheduler.ts packages/core/src/core/coreToolScheduler.test.ts packages/core/src/tools/shell.ts packages/core/src/tools/shell.test.ts
npm run typecheck --workspace=packages/core

Evidence (Before & After)

Before: string outputs from non-shell tools could enter model history without passing through the existing tool-output truncation helper.

After: CoreToolScheduler invokes truncateToolOutput() for string llmContent before converting the result into a function response. Tests cover large string output truncation, PostToolUse context placement after raw output truncation, shell long-run hint behavior after removing shell-local truncation, and non-string Part[] passthrough.

This is model-facing history behavior, so TUI screenshots are N/A.

Tested on

OS Status
macOS GitHub Actions passed
Windows Tested locally + GitHub Actions passed
Linux GitHub Actions passed

Environment (optional)

Local Windows/PowerShell checkout with repository npm workspaces. No tmux/TUI capture is included because the behavior is core scheduler/history logic rather than a visible TUI state.

Risk & Scope

  • Main risk or tradeoff: the scheduler only truncates string model-facing content; structured Part[] output remains unchanged.
  • Out of scope: new telemetry events, temp-file permission hardening, ToolResult API changes, split-budget hook-context truncation, UI changes, and unrelated refactors.
  • Breaking changes / migration notes: none expected.

Linked Issues

Fixes #4049

@Jerry2003826 Jerry2003826 marked this pull request as ready for review May 25, 2026 22:38

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] prompt_id not passed to truncateToolOutputToolOutputTruncatedEvent is constructed with prompt_id: '' (in truncation.ts:141), but scheduledCall.request.prompt_id is available here. Truncation telemetry events cannot be correlated with specific prompts or turns. Fix: add a promptId parameter to truncateToolOutput and pass it from the scheduler.

— qwen3.7-max via Qwen Code /review

}
}

if (typeof content === 'string') {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Critical] truncateToolOutput failure converts successful tool calls into errors

The await truncateToolOutput(...) call has no local try/catch. It sits inside an outer try whose catch treats any exception as a tool execution failure. If config.getTruncateToolOutputThreshold(), crypto.randomBytes, or logToolOutputTruncated throws, a tool that executed successfully gets reported as failed to the model — with a cryptic error that doesn't mention truncation.

Truncation is a presentation-layer concern and must never cause a successful tool call to appear as an error.

Suggested change
if (typeof content === 'string') {
if (typeof content === 'string') {
try {
const truncated = await truncateToolOutput(
this.config,
toolName,
content,
);
content = truncated.content;
contentLength = content.length;
} catch (truncErr) {
// Truncation is best-effort. If it fails, pass the original
// content through unchanged rather than failing the tool call.
console.warn(
`[CoreToolScheduler] truncation failed for ${toolName}:`,
truncErr instanceof Error ? truncErr.message : truncErr,
);
}
}

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in e3075df. The scheduler now wraps truncation in a local try/catch, logs a warning on truncation failure, and preserves the original successful tool output. Added a regression test that forces threshold lookup to throw and asserts the tool call still completes as success with the original output.

if (toolResult.error === undefined) {
let content = toolResult.llmContent;
const contentLength =
let contentLength =

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] contentLength stale after hook/reminder appends when truncation doesn't fire

The constlet change only updates contentLength inside the truncation block. But content is also mutated at two earlier points (PostToolUse hook additionalContext and conditional rules/skill reminders) that do NOT update contentLength. When hooks/reminders append text but the total stays under threshold, contentLength underreports the actual content size.

Fix: add a single contentLength recalculation after ALL content mutations, just before convertToFunctionResponse:

contentLength = typeof content === 'string' ? content.length : undefined;

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in e3075df. contentLength is now recomputed after post-hook context, rule/skill reminders, and truncation handling. Added a regression test where PostToolUse appends context without truncation and asserts contentLength matches the final model-facing content.

toolName,
content,
);
content = truncated.content;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] truncated.outputFile discarded — user-facing returnDisplay not updated

The scheduler only reads truncated.content and ignores truncated.outputFile. For tools that don't self-truncate (the exact target audience of this safety net), the user's TUI display (returnDisplay) never mentions that truncation occurred or where the full output was saved. Compare with shell.ts which appends "Output too long and was saved to: <path>" to the display message.

Consider surfacing the truncation in returnDisplay:

if (truncated.outputFile) {
  toolResult.returnDisplay =
    (typeof toolResult.returnDisplay === 'string' ? toolResult.returnDisplay + '\n' : '') +
    `Output too long and was saved to: ${truncated.outputFile}`;
}

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in e3075df. When truncation successfully saves the full output and returnDisplay is a string, the scheduler appends Output too long and was saved to: <path>. Added regression coverage for the user-facing display path.

config: mockConfig,
onAllToolCallsComplete,
onToolCallsUpdate,
getPreferredEditor: () => 'vscode',

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] No negative test for within-threshold passthrough

The single test only verifies that oversized output IS truncated. A regression that always truncates (e.g., an off-by-one in the threshold comparison, or truncateToolOutput being called with wrong arguments) would pass the existing test undetected.

Add a second test with output under the threshold (e.g., 50 chars with threshold=100) asserting the output equals the original content exactly and does not contain [CONTENT TRUNCATED].

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in e3075df. The new passthrough regression test uses output below the threshold and asserts the model-facing output equals the original string exactly and does not contain [CONTENT TRUNCATED].

@pomelo-nwu pomelo-nwu left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Jerry2003826, thank you for your continued contributions — 9 PRs in a short time is impressive! 🎉

As we review your changes, we'd like to ask you to update each PR to follow the latest PR template on the main branch. The most important section is the Reviewer Test Plan, which significantly accelerates the review and merge process.

Specifically, for each PR please include:

  • How to verify — clear reproduction steps so a reviewer can confirm the fix/feature
  • Evidence (Before & After) — use the tmux-real-user-testing skill (or manual tmux capture) to show before/after screenshots of the TUI behavior. Side-by-side evidence makes it much faster for maintainers to validate and merge
  • Tested on — fill in the OS table (macOS / Windows / Linux)

PRs with a complete Reviewer Test Plan are prioritized for review — without it, review may be delayed.

You can see the full template at: .github/pull_request_template.md

Thanks again for your effort — looking forward to getting these merged! 🚀

中文说明

你好 @Jerry2003826,感谢你的持续贡献——短时间内提交了 9 个 PR,非常高效!🎉

在 review 过程中,我们希望你能按照 main 分支上最新的 PR 模版更新每个 PR 的描述。其中最关键的部分是 Reviewer Test Plan,它能显著加速审核和合并流程。

具体来说,请为每个 PR 补充:

  • How to verify — 清晰的复现步骤,让 reviewer 能确认修复/功能的效果
  • Evidence (Before & After) — 使用 tmux-real-user-testing skill(或手动 tmux 截取)展示修改前后的 TUI 截图对比,前后对比能让维护者更快地验证和合并
  • Tested on — 填写操作系统测试表格(macOS / Windows / Linux)

有完整 Reviewer Test Plan 的 PR 会被优先审核——缺少该部分可能会导致审核延迟。

完整模版见:.github/pull_request_template.md

再次感谢你的付出,期待尽快把这些 PR 合并!🚀

— Qwen Code

@Jerry2003826 Jerry2003826 force-pushed the codex/fix-tool-output-history-truncation branch from 61c7865 to e3075df Compare May 26, 2026 01:50
@Jerry2003826

Copy link
Copy Markdown
Contributor Author

Updated in e3075df. This addresses the remaining review items: scheduler now passes scheduledCall.request.prompt_id into truncateToolOutput, truncation failures are best-effort and preserve successful tool calls, contentLength is recomputed after final content mutations, string returnDisplay includes the saved full-output path, and small-output passthrough is covered. I also updated the PR description to the latest template with Reviewer Test Plan / Evidence / Tested on sections.

Validation run locally on Windows:

npm run test --workspace=@qwen-code/qwen-code-core -- src/utils/truncation.test.ts -t "caller prompt id"
npm run test --workspace=@qwen-code/qwen-code-core -- src/core/coreToolScheduler.test.ts -t "tool output truncation"
npm run test --workspace=@qwen-code/qwen-code-core -- src/utils/truncation.test.ts
npm run test --workspace=@qwen-code/qwen-code-core -- src/core/coreToolScheduler.test.ts
npx prettier --check packages/core/src/core/coreToolScheduler.ts packages/core/src/core/coreToolScheduler.test.ts packages/core/src/utils/truncation.ts packages/core/src/utils/truncation.test.ts
npx eslint packages/core/src/core/coreToolScheduler.ts packages/core/src/core/coreToolScheduler.test.ts packages/core/src/utils/truncation.ts packages/core/src/utils/truncation.test.ts
npm run lint --workspace=@qwen-code/qwen-code-core
npm run typecheck --workspace=@qwen-code/qwen-code-core
git diff --check

@wenshao wenshao added the type/bug Something isn't working as expected label May 26, 2026
@Jerry2003826

Copy link
Copy Markdown
Contributor Author

The prompt_id telemetry correlation issue is already fixed in e3075df.

Current code passes scheduledCall.request.prompt_id into truncateToolOutput(...), and truncateToolOutput forwards that value into ToolOutputTruncatedEvent. I also added truncation.test.ts coverage: should log truncation telemetry with the caller prompt id.

Current PR checks are green across lint, CodeQL, macOS, Ubuntu, and Windows.

@Jerry2003826

Copy link
Copy Markdown
Contributor Author

Updated the PR description to match the latest reviewer template and removed the stale/garbled details block. The code is unchanged from e3075df; current CI is green across lint, CodeQL, macOS, Ubuntu, and Windows. Ready for re-review when you have time.


if (typeof content === 'string') {
try {
const truncated = await truncateToolOutput(

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] Truncation runs AFTER hook context and <system-reminder> blocks are appended

At this point, content includes the raw tool output PLUS PostToolUse hook additionalContext, conditional-rule reminders, and skill-activation notices. The head/tail truncation algorithm is structure-blind — if the combined payload exceeds the threshold, it will bisect <system-reminder> XML envelopes, producing malformed markup the model then tries to interpret. Additionally, skill activation side effects (refreshSkills, notifyChangeListeners) have already fired, but the reminder telling the model about the new skill may get truncated away.

Consider moving the truncation block to run immediately after content = toolResult.llmContent (before PostToolUse hooks, conditional rules, and skill activation), so only raw tool output is measured and truncated. Then let hooks and reminders append onto the already-truncated content — they always reach the model intact.

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in ad9ee15. I moved the truncation block to run before PostToolUse additionalContext, conditional-rule reminders, and skill-activation reminders are appended. That keeps the head/tail truncator focused on the raw tool output and lets hook/reminder metadata reach the model intact after truncation.

completedCalls[0].response.responseParts,
);

expect(output).toContain('[CONTENT TRUNCATED]');

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] contentLength not asserted when truncation fires

This test triggers truncation and checks the model-facing output, but never reads completedCalls[0].response.contentLength. The new post-truncation contentLength reassignment (line ~2983 in coreToolScheduler.ts — the reason const was changed to let) has zero behavioral verification in the truncation path. Test 4 checks contentLength but only in the no-truncation case.

Consider adding:

expect(completedCalls[0].response.contentLength).toBe(output.length);
expect(completedCalls[0].response.contentLength).toBeLessThan(5000);

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in ad9ee15. The large-output truncation regression now asserts completedCalls[0].response.contentLength === output.length, so the post-truncation recomputation is covered when truncation actually fires.

expect(completedCalls[0].response.resultDisplay).toEqual(
expect.stringContaining('Output too long and was saved to:'),
);
});

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] No test for hook context + truncation interaction

No test exercises both PostToolUse hook additionalContext AND truncation firing simultaneously. Test 4 uses hooks with threshold: 1000 and a 12-char output (no truncation fires). Test 1 triggers truncation without hooks. The integration boundary — where hook context pushes combined content over the threshold — is untested.

This matters especially if the truncation-ordering suggestion above is not adopted: hook-injected content (e.g., policy reminders) could silently disappear when the combined payload triggers truncation.

Consider adding a test with hooks enabled, output + hook context exceeding the threshold, and asserting the truncation marker is present, contentLength reflects the truncated length, and the full-output file contains the hook context.

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a regression test in ad9ee15: preserves hook context outside truncated tool output. It enables PostToolUse hooks, forces truncation, asserts the truncation marker/contentLength, and verifies the full hook context is appended after the truncated tool output instead of being bisected. Since the implementation now truncates raw tool output before appending hook/reminder metadata, the saved full-output file intentionally captures the raw tool output rather than the later hook context.

Comment thread packages/core/src/utils/truncation.ts Outdated
config: Config,
toolName: string,
content: string,
promptId = '',

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] promptId = '' default masks telemetry gaps in existing callers

shell.ts:2062 and mcp-tool.ts:438 call truncateToolOutput with 3 arguments (no promptId), so their telemetry events carry prompt_id: ''. This defeats the purpose of adding promptId for telemetry correlation — two of the three callers emit truncation events that cannot be traced to a specific prompt.

Consider either making promptId required (so the compiler enforces it), or — if the plan is to consolidate truncation at the scheduler level — removing the tool-level truncateToolOutput calls from shell.ts and mcp-tool.ts entirely. That would also eliminate the double-truncation edge case where scheduler-level truncation runs on already-truncated content after hooks push it back over the threshold.

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documented the fallback in ad9ee15. The scheduler-level path now passes scheduledCall.request.prompt_id, which covers the model-facing history truncation added by this PR. I kept promptId optional for the existing direct tool-level callers because shell/MCP execute() do not receive scheduler request context today, and removing those direct truncation guards would widen this PR by changing behavior for direct tool invocations outside the scheduler. The JSDoc now calls out that scheduler callers should pass the prompt id while direct tool implementations may omit it when they do not have that context.

@Jerry2003826 Jerry2003826 force-pushed the codex/fix-tool-output-history-truncation branch from e3075df to ad9ee15 Compare May 26, 2026 08:53
@Jerry2003826

Copy link
Copy Markdown
Contributor Author

Updated in ad9ee15 for the latest review pass.

Changes made:

  • Moved scheduler-level truncation before PostToolUse additionalContext, conditional-rule reminders, and skill-activation reminders are appended, so structure-blind head/tail truncation only applies to raw tool output.
  • Added coverage that hook context remains intact after truncation and contentLength reflects the final model-facing output.
  • Added the missing contentLength assertion to the truncation regression.
  • Documented why promptId remains optional for direct shell/MCP tool callers that do not receive scheduler prompt context, while the scheduler path passes scheduledCall.request.prompt_id.

Validation run locally on Windows:

npm run test --workspace=packages/core -- src/core/coreToolScheduler.test.ts -t "preserves hook context outside truncated tool output|truncates large model-facing tool output|reports contentLength after hook context"
npm run test --workspace=packages/core -- src/core/coreToolScheduler.test.ts
npx prettier --check packages/core/src/core/coreToolScheduler.ts packages/core/src/core/coreToolScheduler.test.ts packages/core/src/utils/truncation.ts
npx eslint packages/core/src/core/coreToolScheduler.ts packages/core/src/core/coreToolScheduler.test.ts packages/core/src/utils/truncation.ts
npm run typecheck --workspace=packages/core
npm run lint --workspace=packages/core
git diff --check

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] fs.writeFile in truncation.ts:89 lacks { mode: 0o600 } — the saved tool output files inherit the process umask (typically 0o644 on Linux), making them world-readable on shared systems. Tool output frequently contains secrets (env vars, API keys). This PR extends truncation from shell/MCP-only to ALL tools, widening the exposure. Other sensitive-file writers in this codebase use { mode: 0o600 } (see file-token-storage.ts, oauth-token-storage.ts, sessionService.ts). Consider adding { mode: 0o600 } to the writeFile call and { mode: 0o700 } to the mkdir call. Pre-existing issue, but the wider surface area makes it worth addressing in a follow-up.

— qwen3.7-max via Qwen Code /review

}
}

if (typeof content === 'string') {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Critical] Double truncation for shell tool outputs

shell.ts:2062 already calls truncateToolOutput on llmContent before returning to the scheduler. The scheduler then calls it a second time here. truncateAndSaveToFile produces output of approximately threshold + ~330 chars (the header text) and truncateLines + ~8 lines (header + separator), which exceeds both the char threshold and line limit on the second pass — triggering actual re-truncation with the default config (25K chars, 1000 lines).

Consequences:

  • Nested truncation headers: the model sees two [CONTENT TRUNCATED] envelopes
  • Two temp files: file B (from the second pass) contains the already-truncated content, not the original. File A (from the first pass, containing the real full output) is orphaned — the model-facing message points to B, not A
  • Duplicate returnDisplay mutation: shell.ts:2072 already appends "Output too long and was saved to: <A>", and the scheduler appends a second "Output too long and was saved to: <B>"
  • Duplicate telemetry: logToolOutputTruncated fires twice per tool call

Suggested fix: remove the tool-level truncateToolOutput calls from shell.ts (lines 2061-2076) and let the scheduler be the single truncation point. This aligns with the PR's stated goal of centralizing truncation and eliminates the dual-responsibility. mcp-tool.ts is safe because it returns Part[] (non-string), which the typeof guard skips.

Suggested change
if (typeof content === 'string') {
if (typeof content === 'string' && !content.startsWith('Tool output was too large')) {
try {
const truncated = await truncateToolOutput(

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 8ef3cb3. Shell no longer calls truncateToolOutput directly, so the scheduler is the single model-facing string-output truncation point. The scheduler also skips content that already starts with TOOL_OUTPUT_TRUNCATED_PREFIX, which protects existing direct truncation callers from being wrapped again.

if (toolResult.error === undefined) {
let content = toolResult.llmContent;
const contentLength =
let contentLength =

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] Dead initial contentLength assignment

This let contentLength is always overwritten by the identical expression at line ~2984 before it is ever read. The only early-return path between the two assignments (shouldStop) never reads contentLength.

The landmine: if a future refactor removes the recalculation at line ~2984 (thinking "contentLength is already set at the top"), contentLength silently regresses to the raw tool output size, and downstream consumers see a value that excludes truncation, hook context, and reminders.

Suggested change
let contentLength =
let contentLength: number | undefined;

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 8ef3cb3. Removed the top-of-block assignment and now declares contentLength only at the final post-mutation calculation before convertToFunctionResponse.

});
});

describe('CoreToolScheduler tool output truncation', () => {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] No test for the double-truncation scenario

All tests here use LargeOutputTool, which returns raw un-truncated content. The most dangerous production path — shell tool output exceeding the threshold — is precisely the scenario where truncateToolOutput runs twice (once in shell.ts, once in the scheduler), and it has zero coverage.

Consider adding a test where the tool's execute() returns content that already includes the "Tool output was too large" preamble (simulating what shell.ts produces after its own truncation), and verify the scheduler does not produce a second truncation header or a second file write.

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in 8ef3cb3. The new regression returns already-truncated content, asserts the model-facing output is unchanged, keeps the existing returnDisplay, and verifies fs.writeFile is not called a second time.

@Jerry2003826 Jerry2003826 force-pushed the codex/fix-tool-output-history-truncation branch from ad9ee15 to 8ef3cb3 Compare May 26, 2026 10:01
@Jerry2003826

Copy link
Copy Markdown
Contributor Author

Updated in 8ef3cb3 for the latest review pass.

Changes made:

  • Removed shell-level truncateToolOutput so the scheduler is the single string-output truncation point.
  • Added an already-truncated guard in the scheduler to avoid nested truncation headers, duplicate temp files, duplicate display updates, and duplicate telemetry.
  • Tightened saved full-output temp permissions with directory mode 0o700 and file mode 0o600.
  • Removed the dead initial contentLength assignment; contentLength is now calculated once after final content mutations.
  • Updated the shell long-run hint test now that truncation is scheduler-owned.

Validation run locally on Windows:

npm run test --workspace=packages/core -- src/core/coreToolScheduler.test.ts src/utils/truncation.test.ts -t "already-truncated|line limit exceeded|binding constraint|very long lines|correct file path|path traversal"
npm run test --workspace=packages/core -- src/core/coreToolScheduler.test.ts
npm run test --workspace=packages/core -- src/utils/truncation.test.ts
npm run test --workspace=packages/core -- src/tools/shell.test.ts -t "appends the hint after command output is assembled"
npm run test --workspace=packages/core -- src/tools/shell.test.ts
npx prettier --check packages/core/src/core/coreToolScheduler.ts packages/core/src/core/coreToolScheduler.test.ts packages/core/src/utils/truncation.ts packages/core/src/utils/truncation.test.ts packages/core/src/tools/shell.ts packages/core/src/tools/shell.test.ts
npx eslint packages/core/src/core/coreToolScheduler.ts packages/core/src/core/coreToolScheduler.test.ts packages/core/src/utils/truncation.ts packages/core/src/utils/truncation.test.ts packages/core/src/tools/shell.ts packages/core/src/tools/shell.test.ts
npm run typecheck --workspace=packages/core
npm run lint --workspace=packages/core
git diff --check

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] Stale comment in shell.ts still references truncation I/O (~line 1980)

The comment block near shell.ts:~1980 reads "intentionally BEFORE the post-processing block below (truncation I/O, output-file write)" and "Truncation time is bounded by the temp-dir backend and isn't representative of the command's actual wait." Both references describe behavior removed by this PR — truncation I/O and output-file writes no longer happen in shell.ts. The PR already updated surrounding comments (returnDisplayMessage build order, long-run advisory) but missed this one.

Future readers will waste time looking for a truncation block that no longer exists in shell.ts, or may be misled about what the timing measurement accounts for.

//   - Wall-clock duration >= threshold. Measured spawn -> resultPromise
//     settle, intentionally BEFORE post-processing (attribution,
//     returnDisplay build, hint append). The hint reports how long
//     the COMMAND blocked the agent, not how long the tool call
//     spent including post-processing.

(This finding is outside the PR diff, so it's posted in the review body rather than as an inline comment.)

— qwen3.7-max via Qwen Code /review

@Jerry2003826 Jerry2003826 force-pushed the codex/fix-tool-output-history-truncation branch from 8ef3cb3 to a3bdc99 Compare May 26, 2026 10:41

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] fs.writeFile / fs.mkdir failure swallows the underlying error (truncation.ts:104-108)

The catch (_error) block inside truncateAndSaveToFile returns a generic "[Note: Could not save full output to file]" message but discards the actual error. At 3 AM investigating why truncation silently fell back on every tool call (disk full, temp dir deleted, permissions change), there would be no log, no telemetry, and no error details. Since this is a scheduler-level code path every tool hits, a systemic temp-dir problem would cascade silently.

Consider adding debugLogger.warn(\Failed to save truncated tool output to ${outputFile}: ` + (error instanceof Error ? error.message : String(error)))` inside the catch block before returning.

— qwen3.7-max via Qwen Code /review

content = truncated.content;
if (
truncated.outputFile &&
typeof toolResult.returnDisplay === 'string'

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] typeof toolResult.returnDisplay === 'string' guard not covered by tests

When truncation fires and returnDisplay is a structured type (FileDiff, TodoResultDisplay, AnsiOutputDisplay, etc.), this typeof guard prevents appending the "Output too long and was saved to: <path>" notice. No scheduler test currently exercises this branch with a non-string returnDisplay — all truncation tests use LargeOutputTool which returns a string returnDisplay.

If a future refactor were to remove this guard and unconditionally concatenate, structured display objects would be corrupted (string concat on an object produces [object Object]). No test would fail.

Consider adding a regression test where LargeOutputTool returns a structured returnDisplay (e.g., { type: 'ansi', content: '...' }) alongside oversized string content, then assert content is truncated but resultDisplay is the original structured object unchanged.

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 378c2ac. Added a scheduler regression test with a structured returnDisplay; the model-facing content is truncated while resultDisplay remains the original structured object unchanged.

Comment thread packages/core/src/utils/truncation.ts Outdated
await fs.mkdir(projectTempDir, { recursive: true });
await fs.writeFile(outputFile, content);
await fs.mkdir(projectTempDir, { recursive: true, mode: 0o700 });
await fs.writeFile(outputFile, content, { mode: 0o600 });

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] ToolOutputTruncatedEvent does not carry the saved-file path

result.outputFile is available at the call site below (around line 148) but is not forwarded into ToolOutputTruncatedEvent. When investigating a truncation event in production (e.g., "why did the model behave oddly after truncation?"), there is no way to correlate the event with the actual file that holds the full output.

Consider adding an output_file: string field to ToolOutputTruncatedEvent (in packages/core/src/telemetry/types.ts) and forwarding result.outputFile here, so post-mortem analysis can jump straight to the saved file.

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 378c2ac. ToolOutputTruncatedEvent now carries output_file, truncateToolOutput forwards the saved path, and the telemetry logger/QwenLogger coverage was updated.

}
} catch (truncationError) {
debugLogger.warn(
`Tool output truncation failed for ${toolName}: ` +

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] Truncation-failure log lacks callId / prompt_id

The warning only includes toolName. When the same tool runs many times in a session (common for Bash or read_file), a truncation-failure warning can't be correlated to the specific invocation that failed.

scheduledCall.callId and scheduledCall.request.prompt_id are both in scope here — including them makes the warning actionable when paging through logs.

Suggested change
`Tool output truncation failed for ${toolName}: ` +
debugLogger.warn(
`Tool output truncation failed for ${toolName} ` +
`(callId=${scheduledCall.callId}, prompt_id=${scheduledCall.request.prompt_id}): ` +
(truncationError instanceof Error
? truncationError.message
: String(truncationError)),
);

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 378c2ac. The truncation-failure warning now includes the scheduled request callId and prompt_id; current types store the call id on scheduledCall.request.callId.

@Jerry2003826 Jerry2003826 force-pushed the codex/fix-tool-output-history-truncation branch from a3bdc99 to 97531e2 Compare May 26, 2026 11:57
@Jerry2003826

Copy link
Copy Markdown
Contributor Author

Updated in 97531e2 for the latest review-body suggestion.

Changes made:

  • Added a TRUNCATION debug warning when saving the full truncated output file fails, including the target output path and underlying error message.
  • Kept the existing graceful fallback behavior unchanged: the tool still returns truncated content with [Note: Could not save full output to file].
  • Added regression coverage for the warning path.

Validation run locally on Windows:

npm run test --workspace=packages/core -- src/utils/truncation.test.ts -t "file write errors"
npm run test --workspace=packages/core -- src/utils/truncation.test.ts
npx prettier --check packages/core/src/utils/truncation.ts packages/core/src/utils/truncation.test.ts
npx eslint packages/core/src/utils/truncation.ts packages/core/src/utils/truncation.test.ts
git diff --check

@wenshao

wenshao commented May 26, 2026

Copy link
Copy Markdown
Collaborator

Local verification report (Linux, tmux)

Verified PR head 97531e2eb (fix(core): harden tool output truncation handling, on top of 465d240a6 fix(core): truncate model-facing tool output) in an isolated worktree against main on Linux (Node v22.22.2, npm 10.9.7) inside a dedicated tmux session (tmux: pr4520). All commands from the PR's "How to verify" section were run plus shell.test.ts (the PR removes the local truncation block from shell.ts).

Environment

Item Value
OS Linux (Debian 13, kernel 6.12)
Node v22.22.2
npm 10.9.7
Worktree pr-4520 @ 97531e2eb
Test runner vitest 3.2.4 (workspace @qwen-code/qwen-code-core)

Results

Command Outcome
npm install OK (1374 packages, prepare/build/bundle ran)
truncation.test.ts -t "caller prompt id" 1 passed, 10 skipped (11 total)
coreToolScheduler.test.ts -t "tool output truncation" 7 passed, 163 skipped (170 total)
truncation.test.ts (full) 11 / 11 passed
coreToolScheduler.test.ts (full) 170 / 170 passed (~5.4s)
shell.test.ts (full, additional — refactor surface) 194 / 194 passed
prettier --check (6 PR files) clean
eslint (6 PR files) 0 errors / 0 warnings (exit 0)
npm run lint --workspace=@qwen-code/qwen-code-core exit 0
npm run typecheck --workspace=@qwen-code/qwen-code-core (tsc --noEmit) exit 0
git diff --check (PR files) no whitespace errors

Behavior covered

The new scheduler-level describe('CoreToolScheduler tool output truncation') block (test file lines 7253–end) covers the full set of behaviors claimed by the PR description, all observed passing:

  • truncates large model-facing tool output before it enters history — exercises the central truncation path with prompt_id plumbed through.
  • does not fail a successful tool call when output truncation fails — confirms the new best-effort try { ... } catch (truncationError) handler around truncateToolOutput: a truncation/I-O failure no longer demotes a successful tool call to an error.
  • reports contentLength after hook context is appended without truncation — verifies the moved contentLength recomputation lands after hook/reminder/truncation mutations (matches the relocation seen in the diff).
  • preserves hook context outside truncated tool output — verifies the new "defer postToolUseAdditionalContext append until after truncation" ordering, so a structure-blind head/tail truncator cannot bisect hook-injected metadata.
  • does not truncate already-truncated tool output again — verifies the TOOL_OUTPUT_TRUNCATED_PREFIX guard against double-truncation. (The exported constant is now shared between truncation.ts and coreToolScheduler.ts.)
  • adds the saved full output path to string returnDisplay when truncating — verifies the user-facing recovery path: the saved temp-file path is appended to a string returnDisplay so the user can read_file the full output.

truncation.test.ts adds should log truncation telemetry with the caller prompt id covering the new promptId parameter into ToolOutputTruncatedEvent. Pre-existing tests in the file (within-threshold passthrough, write-error graceful path, sanitized filename, etc.) all continue to pass.

shell.test.ts still passes 194/194 after the local truncation block is removed from shell.ts, confirming the shell-side refactor does not regress shell behavior.

Diff-level sanity checks (read alongside the tests)

  • Truncation responsibility is centralized in CoreToolScheduler.processCompleted... — applies to all string-producing tools, not just shell.
  • Re-entrancy guard: the scheduler skips truncation when content.startsWith(TOOL_OUTPUT_TRUNCATED_PREFIX), preventing already-truncated content from being trimmed again on a re-emit path.
  • Hook ordering: postToolUseAdditionalContext is captured during the hook step but only appended after the truncation step, so the truncator only operates on raw tool output.
  • Best-effort error handling: truncation failures are logged via the debug logger and the call falls through with the original content intact.
  • Tighter temp-file permissions: mkdir(..., { mode: 0o700 }), writeFile(..., { mode: 0o600 }) — saved-output files are now mode-restricted by default.
  • Telemetry: ToolOutputTruncatedEvent carries the originating tool request's prompt_id, enabling correlation with the rest of the scheduled-call telemetry.
  • shell.ts no longer imports or calls truncateToolOutput and the long-run advisory append comment is updated to reflect that command output assembly no longer interacts with truncation.

Notes

  • PR description's "Tested on" table marks Linux as "GitHub Actions passed" without a local Linux check — this report adds Linux-local coverage.
  • No UI surface changed; no manual TUI run was required.

Conclusion

LGTM from a verification standpoint. All commands listed in the PR's reviewer test plan pass locally on Linux; the additional shell.test.ts run (added because the PR removes the in-shell truncation block) also passes. No extra failures or warnings observed outside the documented behavior. Safe to merge from this PR's own scope of changes.

Verification artifacts (logs) retained locally at /tmp/pr4520-{install,test-prompt-id,test-trunc-scheduler,test-trunc-full,test-cts-full,test-shell,prettier,eslint,lint,typecheck,diff-check}.log.

wenshao
wenshao previously approved these changes May 26, 2026
Comment thread packages/core/src/utils/truncation.ts Outdated
try {
await fs.mkdir(projectTempDir, { recursive: true });
await fs.writeFile(outputFile, content);
await fs.mkdir(projectTempDir, { recursive: true, mode: 0o700 });

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] mode: 0o700 on fs.mkdir is silently defeated by prior directory creation

logger.ts (line ~181) calls await fs.mkdir(this.qwenDir, { recursive: true }) on the same getProjectTempDir() path early in session initialization — without a restrictive mode. Since fs.mkdir with recursive: true does not modify permissions of existing directories, by the time truncateAndSaveToFile runs, the directory already exists with default permissions (typically 0o755). The 0o700 mode here is effectively dead code.

The file-level 0o600 on writeFile still works correctly for newly-created files, so the immediate data protection is intact. But directory listing by other local users remains possible on shared systems.

Consider one of:

  • (a) Add mode: 0o700 where the directory is first created (logger.ts or storage.ts)
  • (b) Add await fs.chmod(projectTempDir, 0o700) after the mkdir call here to enforce the mode regardless of who created it
  • (c) Remove the misleading mode parameter and add a comment explaining that directory permission hardening is handled elsewhere

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in a90adf9. After mkdir, truncateAndSaveToFile now best-effort chmods the project temp dir to 0o700 so an existing dir with looser permissions is tightened before writing the saved output. chmod failure is logged but does not prevent saving the 0o600 file; added regression coverage for that fallback.

}

if (
typeof content === 'string' &&

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] Shell-level metadata (long-run hint, attribution warning) is subject to head/tail truncation without protection

Shell.ts appends the long-run advisory hint (~530 chars) and attributionWarning to llmContent before returning to the scheduler. The scheduler's truncation then treats this combined string as raw tool output. Unlike PostToolUse hook context — which is explicitly deferred to after truncation (lines 2901-2905) — tool-level metadata appended inside the tool implementation receives no equivalent protection.

With the default threshold (25,000 chars), the tail budget is generous enough that the hint reliably survives. The practical risk is low. However, a future reduction in default thresholds or a user-configured low threshold could bisect the hint or attribution warning mid-sentence. The old shell test that pinned the hint's survival through truncation was removed and replaced with one that doesn't exercise the truncation interaction.

Consider either:

  • (a) Documenting in shell.ts that the hint and attribution warning are intentionally subject to scheduler truncation
  • (b) Having tools expose metadata separately (e.g., on toolResult) so the scheduler can append it after truncation, similar to postToolUseAdditionalContext

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handled in a90adf9 with the lower-risk option. I documented in shell.ts that the model-facing copies of the long-run hint and attribution warning are intentionally part of shell llmContent and therefore subject to scheduler truncation, while returnDisplay still keeps the full user-facing metadata. I did not add a separate tool metadata API in this PR to avoid widening the change beyond the truncation fix.

@Jerry2003826 Jerry2003826 force-pushed the codex/fix-tool-output-history-truncation branch from 97531e2 to 378c2ac Compare May 26, 2026 12:31
@Jerry2003826

Jerry2003826 commented May 26, 2026

Copy link
Copy Markdown
Contributor Author

Updated PR description to the latest template with Reviewer Test Plan, Evidence, Tested on, and the Chinese explanation section.

Also pushed 378c2ac for the latest inline feedback:

  • added structured returnDisplay regression coverage;
  • forwarded the saved truncation file path into ToolOutputTruncatedEvent.output_file and logger snapshots;
  • included callId and prompt_id in truncation-failure diagnostics.

Validation run locally on Windows:

npm run test --workspace=packages/core -- src/utils/truncation.test.ts
npm run test --workspace=packages/core -- src/core/coreToolScheduler.test.ts
npm run test --workspace=packages/core -- src/telemetry/loggers.test.ts
npx prettier --check packages/core/src/core/coreToolScheduler.ts packages/core/src/core/coreToolScheduler.test.ts packages/core/src/utils/truncation.ts packages/core/src/utils/truncation.test.ts packages/core/src/telemetry/types.ts packages/core/src/telemetry/loggers.test.ts packages/core/src/telemetry/qwen-logger/qwen-logger.ts
npx eslint packages/core/src/core/coreToolScheduler.ts packages/core/src/core/coreToolScheduler.test.ts packages/core/src/utils/truncation.ts packages/core/src/utils/truncation.test.ts packages/core/src/telemetry/types.ts packages/core/src/telemetry/loggers.test.ts packages/core/src/telemetry/qwen-logger/qwen-logger.ts
npm run typecheck --workspace=packages/core
git diff --check

@Jerry2003826

Copy link
Copy Markdown
Contributor Author

CI note: the latest Qwen Code CI run did not reach lint/test execution. The Lint job failed during actions/checkout with a GitHub 403, and the OS test jobs failed while downloading dorny/test-reporter before checkout/test steps. I do not have repository permissions to rerun the workflow; local validation commands in the comment above passed on this branch head.

@Jerry2003826 Jerry2003826 dismissed stale reviews from wenshao and BZ-D via a811f42 June 8, 2026 09:06
content,
);
content = truncated.content;
if (truncated.outputFile && typeof resultDisplay === 'string') {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] toolOutputAlreadyTruncated flag is gated on typeof resultDisplay === 'string', but content = truncated.content on line 3174 runs unconditionally. When a tool returns string llmContent that gets truncated with an outputFile, but a non-string resultDisplay (e.g., FileDiff from edit tool, AgentResultDisplay from agent tool), the flag stays false even though content has already been truncated. The combined-content guard at line ~3272 will then re-truncate already-truncated content when PostToolUse context or conditional-rule reminders are appended — the exact double-truncation scenario this commit aims to prevent.

Decouple the flag from the display type check:

Suggested change
if (truncated.outputFile && typeof resultDisplay === 'string') {
if (truncated.outputFile) {
toolOutputAlreadyTruncated = true;
if (typeof resultDisplay === 'string') {
resultDisplay +=
(resultDisplay ? '\n' : '') +
`Output too long and was saved to: ${truncated.outputFile}`;
}
}

The regression test covers the string-display path but does not exercise a structured returnDisplay with large string llmContent + PostToolUse hook context. Consider adding a test case for that edge case as well.

— qwen3.7-plus via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handled in the current branch. toolOutputAlreadyTruncated is now set whenever the first truncation pass persists raw string output to an outputFile, independent of whether resultDisplay is a string. The display save notice remains string-only.

Added a regression with structured FileDiff result display plus PostToolUse additional context. It asserts the raw output truncation runs once, the appended context is preserved, and the structured display object is left unchanged.

Validated:

  • npm run test --workspace=packages/core -- src/core/coreToolScheduler.test.ts -t "structured result display"
  • npm run test --workspace=packages/core -- src/core/coreToolScheduler.test.ts
  • npx prettier --check packages/core/src/core/coreToolScheduler.ts packages/core/src/core/coreToolScheduler.test.ts
  • npx eslint packages/core/src/core/coreToolScheduler.ts packages/core/src/core/coreToolScheduler.test.ts

…-history-truncation

# Conflicts:
#	packages/core/src/core/coreToolScheduler.ts
@Jerry2003826 Jerry2003826 force-pushed the codex/fix-tool-output-history-truncation branch from 37a65fe to efa179a Compare June 9, 2026 08:00
wenshao
wenshao previously approved these changes Jun 9, 2026

@DragonnZhang DragonnZhang left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — PR #4520 (commit efa179a)

Direction: Sound. The toolOutputAlreadyTruncated guard added in a811f425d correctly prevents the combined guard from re-truncating content that was already persisted by the first truncation pass. This closes the nested-truncation-envelope concern from the previous review.

CI: all_pass (24/24 checks).

Detected issues: None beyond the extensively-discussed backlog (39 stale inline comments, all addressed by subsequent commits). The rescoped 4-file core change is focused and well-tested.

Incremental since R19 (commit 2ea9955): +8 lines in coreToolScheduler.ts (the toolOutputAlreadyTruncated flag + combined-guard skip), +54 lines of regression tests. No regressions in the diff.

— qwen3-coder via Qwen Code /review

@Jerry2003826

Copy link
Copy Markdown
Contributor Author

Addressed the latest truncation review batch in fafe8ed93.

What changed:

  • Truncate pure text Part[] outputs before they enter model-facing history, while keeping non-text/multimodal Part[] on the existing path.
  • Added a shared scheduler truncation helper with in-memory fallback when persistence/truncation unexpectedly fails.
  • Covered the combined PostToolUse/context path with the split budget fallback, so fallback does not use the full raw-output budget.
  • Sanitized generated output filenames for Windows-invalid characters.
  • Extended tool_output_truncated telemetry with call_id, output_file_saved, and sanitized save error metadata.
  • Added direct utility tests for in-memory truncation, formatting, filename sanitization, and telemetry error redaction.

Validated locally:

  • npm run test --workspace=packages/core -- src/utils/truncation.test.ts src/core/coreToolScheduler.test.ts src/telemetry/loggers.test.ts
  • npm run lint --workspace=packages/core
  • npm run typecheck --workspace=packages/core
  • npx prettier --check packages/core/src/utils/truncation.ts packages/core/src/utils/truncation.test.ts packages/core/src/core/coreToolScheduler.ts packages/core/src/core/coreToolScheduler.test.ts packages/core/src/telemetry/types.ts packages/core/src/telemetry/loggers.test.ts packages/core/src/telemetry/qwen-logger/qwen-logger.ts

@Jerry2003826

Copy link
Copy Markdown
Contributor Author

Pushed additional review follow-ups in f949b3379 and 81498ed70.

Changes added:

  • Documented TOOL_OUTPUT_TRUNCATED_PREFIX as the scheduler sentinel to avoid accidental UX-only edits breaking double-truncation detection.
  • Added a serialized-size estimate to the non-text content truncation skip debug log.
  • Parameterized the truncation envelope text for custom contentLabel values instead of hardcoding output in the body.
  • Added telemetry for unexpected truncateToolOutput() failures that fall back to in-memory truncation, including call id, save failure metadata, and sanitized error text.
  • Added/updated regression assertions for the new label-aware envelope and non-text skip debug log.

Validation:

  • npm run test --workspace=packages/core -- src/utils/truncation.test.ts src/core/coreToolScheduler.test.ts src/telemetry/loggers.test.ts
  • npm run lint --workspace=packages/core
  • npm run typecheck --workspace=packages/core
  • npx prettier --check packages/core/src/utils/truncation.ts packages/core/src/utils/truncation.test.ts packages/core/src/core/coreToolScheduler.ts packages/core/src/core/coreToolScheduler.test.ts

One intentional behavior note: I kept structured resultDisplay values intact rather than converting them into strings to append the saved-output notice. The model-facing content still carries the truncation envelope, and string/undefined displays get the notice; preserving structured displays avoids breaking file diff / todo / agent result rendering.

@wenshao

wenshao commented Jun 14, 2026

Copy link
Copy Markdown
Collaborator

@Jerry2003826 heads up — this PR currently has merge conflicts with main and can't be merged as-is. Could you merge main in (or rebase) and resolve them when you get a chance?

Conflicting files:

  • packages/cli/src/i18n/locales/zh-TW.js
  • packages/cli/src/i18n/locales/zh.js
  • packages/core/src/core/coreToolScheduler.ts
  • packages/core/src/tools/shell.ts
  • packages/core/src/utils/truncation.test.ts
  • packages/core/src/utils/truncation.ts

The rest merges cleanly. Thanks!

中文

@Jerry2003826 提个醒 —— 这个 PR 目前和 main 有合并冲突,暂时没法直接合入。方便的时候麻烦把最新的 main merge 进来(或 rebase)解决一下冲突。

冲突文件:

  • packages/cli/src/i18n/locales/zh-TW.js
  • packages/cli/src/i18n/locales/zh.js
  • packages/core/src/core/coreToolScheduler.ts
  • packages/core/src/tools/shell.ts
  • packages/core/src/utils/truncation.test.ts
  • packages/core/src/utils/truncation.ts

其余文件可以自动合并。谢谢!

Merge upstream/main and reconcile the tool-output truncation conflicts:

- Adopt main's scheduler truncation pipeline (truncateLlmContent,
  per-tool budgets, batch offload) and matching test coverage
- Keep PR i18n strings for the fork feature gate in zh locales
- Remove duplicate/conflicting PR scheduler paths superseded by main

Co-authored-by: JerryLee <Jerry2003826@users.noreply.github.com>
@wenshao

wenshao commented Jun 14, 2026

Copy link
Copy Markdown
Collaborator

Verification: model-facing tool-output truncation moved into CoreToolScheduler

Verdict: PASS (with three noted observations — all pre-existing or edge-case, none blocking).

I built this PR in an isolated worktree (real npm ci + build, no symlinked node_modules) and drove the real CLI — both headless and the interactive TUI under tmux — against a real model (DeepSeek, OpenAI-compatible) routed through a logging proxy that records exactly what bytes reach the model. So the evidence below is what the model actually received, not a unit test.

Claim (my read of the diff): truncation of model-facing tool output is moved out of shell.ts into CoreToolScheduler.truncateModelFacingToolContent(...), so any string-returning tool's llmContent is bounded before it is recorded into conversation history. Over-threshold output is replaced with a head/tail envelope + a Tool output was too large and has been truncated. marker + a pointer to a temp file holding the full result; under-threshold output passes through untouched.

Method: worktree .qwen/tmp/review-pr-4520 (fetched SHA 81498ed, merge-base 27b056b); isolated HOME; OPENAI_BASE_URL → logging proxy → deepseek-chat. Model requests captured to proxy.log; scheduler temp files inspected on disk.

Steps

  1. Large shell output, headlessrun_shell_command: seq 1 200000 (1.27 MB raw stdout). The tool result recorded for the model was a 7,622-char truncated envelope; the full request to the model was ≈101 KB, not 1.27 MB. The model still answered the question (last line = 200000) from the envelope's tail. Full result saved to ~/.qwen/tmp/<proj>/run_shell_command_*.output.

  2. 🔍 Under-threshold proberun_shell_command: seq 1 50 (~140 bytes). grep -c 'Tool output was too large' over all model requests = 0; no temp file written. Sub-threshold output passes through untruncated, as intended.

  3. Generalization to a non-shell toolread_file on a 48,040-char file. The model-facing result carried the truncation marker and a pointer to a read_file_*.output temp file (a non-shell tool's output saved by the scheduler). This is the headline new behavior — truncation is no longer shell-only. Model answered correctly (1 distinct character).

  4. Interactive TUI under tmuxseq 1 100000. The live TUI rendered ... first 10027 lines hidden ... and Output too long and was saved to: <temp file>. The model recognized the truncation and followed the pointer, running tail/grep on the saved file, then answered 100000. The marker + saved-file mechanism works end-to-end through the real UI.

Sample (live TUI frame, Step 4):

│ ✓  Shell seq 1 100000 (Run seq 1 to 100000)
│    ... first 10027 lines hidden ...
│    99688
│    Output too long and was saved to: ~/.qwen/tmp/<proj>/run_shell_command_3b5852ac287f.output
✦ The output was truncated, but I can read the end of the saved output file to find the last line.
│ ✓  Shell tail -1 .../run_shell_command_3b5852ac287f.output
...
✦ The very last line of the output is 100000.

Envelope head the model receives (Step 1/3):

Tool output was too large and has been truncated.
The full tool output has been saved to: ~/.qwen/tmp/<proj>/<tool>_<hash>.output
To read the complete tool output, use the read_file tool with the absolute file path above.
The truncated tool output below shows the beginning and end of the output...

Findings

The core change works as claimed; these are observations from running it, for your judgement at merge time — none block the PR:

  • ⚠️ "The full tool output has been saved to <file>" over-promises. The scheduler can only save what the tool handed it, after that tool's own truncation. For read_file the saved file literally begins Showing lines 1-21 of 41 total lines. and ends ... [truncated] — i.e. it is read_file's 21-of-41-line view, not the full file. For run_shell_command it is the shell tool's last-~10k-line streaming buffer. So a model that follows the pointer and read_files the temp file still won't recover the true full output; it would need to re-invoke the original tool with paging. Pre-existing layering (the tool truncates before the scheduler), but the new wording makes a "full output" promise the saved file doesn't always keep.

  • ⚠️ Near-threshold, the "truncated" envelope can be larger than its input. In Step 3 the scheduler received 25,059 chars and emitted a 25,505-char envelope (char-mode keeps ~5 KB head + ~20 KB tail ≈ the whole input, then prepends ~0.5 KB of marker/boilerplate) — so it removed ~59 chars from the middle, added ~505, and still spent a temp-file write. Just over TRUNCATE_TOOL_OUTPUT_THRESHOLD (25,000), truncation costs more than it saves. A minimum-savings guard (skip unless it drops at least the boilerplate's worth) would avoid the boundary case. The large case (Step 1: 70 KB buffer → 7.6 KB envelope, line-mode) behaves exactly as intended.

  • The shell streaming buffer's tail-preservation is timing-dependent: Step 1 (seq 1 200000) kept the true last line 200000, but Step 4 (seq 1 100000) the saved file was cut at 99688 (the model noticed and re-ran seq … | tail -1). This is pre-existing shell.ts behavior — git diff <merge-base>..HEAD -- packages/core/src/tools/shell.ts | grep -E 'maxLines|slice\(-|MAX_' returns nothing — not introduced here, but it compounds the first finding.

Not runtime-exercised: the "defer PostToolUse hook context until after raw-string truncation so hook metadata isn't bisected" path needs a configured PostToolUse hook to observe; I relied on the diff + the PR's unit tests for that one. Everything else above was driven through the real app.

中文版本(点击展开)

验证:将面向模型的工具输出截断逻辑迁移到 CoreToolScheduler

结论:通过(PASS),附三条观察(均为既有行为或边界情形,不阻塞合并)。

我在隔离的 worktree 中构建了本 PR(真实 npm ci + 构建,未使用软链接的 node_modules),并驱动真实 CLI——既包括无界面(headless)模式,也包括 tmux 下的交互式 TUI——对接真实模型(DeepSeek,OpenAI 兼容接口),并通过一个记录型代理捕获到达模型的确切字节。因此下文证据是模型实际收到的内容,而非单元测试。

被验证的声明(我对 diff 的理解): 把面向模型的工具输出截断从 shell.ts 迁移到 CoreToolScheduler.truncateModelFacingToolContent(...),使任何返回字符串的工具的 llmContent 在写入对话历史前都会被限制大小。超阈值输出被替换为「头/尾摘要 + Tool output was too large and has been truncated. 标记 + 指向保存完整结果的临时文件的指针」;未超阈值的输出原样透传。

方法: worktree .qwen/tmp/review-pr-4520(fetch SHA 81498ed,merge-base 27b056b);隔离 HOMEOPENAI_BASE_URL → 记录型代理 → deepseek-chat。模型请求记录到 proxy.log,并在磁盘上检查 scheduler 生成的临时文件。

步骤

  1. 大体量 shell 输出(headless) —— seq 1 200000(原始 stdout 1.27 MB)。记录给模型的工具结果是 7,622 字符的截断信封;发往模型的完整请求约 101 KB,而非 1.27 MB。模型仍能从信封尾部正确作答(最后一行 = 200000)。完整结果保存到 run_shell_command_*.output

  2. 🔍 未超阈值探针 —— seq 1 50(约 140 字节)。在所有模型请求中 grep -c 'Tool output was too large' = 0,且未生成临时文件。未超阈值输出按预期原样透传。

  3. 推广到非 shell 工具 —— 对一个 48,040 字符的文件执行 read_file。面向模型的结果带有截断标记,并指向一个 read_file_*.output 临时文件(由 scheduler 保存的非 shell 工具输出)。这正是本 PR 的核心新行为——截断不再只针对 shell。模型正确作答(1 种不同字符)。

  4. tmux 下的交互式 TUI —— seq 1 100000。实时 TUI 渲染出 ... first 10027 lines hidden ...Output too long and was saved to: <临时文件>。模型识别出截断并跟随指针,对保存文件执行 tail/grep,最终答出 100000。标记 + 保存文件机制在真实 UI 中端到端可用。

实时 TUI 画面(步骤 4):

│ ✓  Shell seq 1 100000 (Run seq 1 to 100000)
│    ... first 10027 lines hidden ...
│    99688
│    Output too long and was saved to: ~/.qwen/tmp/<proj>/run_shell_command_3b5852ac287f.output
✦ The output was truncated, but I can read the end of the saved output file to find the last line.
│ ✓  Shell tail -1 .../run_shell_command_3b5852ac287f.output
✦ The very last line of the output is 100000.

观察

核心改动符合声明;以下是运行时观察,供合并时参考,均不阻塞 PR:

  • ⚠️ 「The full tool output has been saved to <file>」言过其实。 scheduler 只能保存工具交给它的内容,即该工具自身截断之后的内容。对 read_file,保存文件开头是 Showing lines 1-21 of 41 total lines.、结尾是 ... [truncated]——即 read_file 的「41 行取 21 行」视图,而非完整文件;对 run_shell_command,则是 shell 工具最后约 1 万行的流式缓冲。因此模型即使跟随指针去 read_file 该临时文件,仍拿不到真正完整的输出,需要重新分页调用原工具。这属于既有的分层设计(工具先截断、scheduler 后处理),但新措辞做出了一个保存文件并不总能兑现的「完整输出」承诺。

  • ⚠️ 临界阈值附近,「截断后」信封可能比其输入更大。 步骤 3 中 scheduler 收到 25,059 字符,输出了 25,505 字符的信封(字符模式保留约 5 KB 头 + 约 20 KB 尾 ≈ 整个输入,再前置约 0.5 KB 标记/样板)——即仅从中间删去约 59 字符、却添加约 505 字符,还额外写了一个临时文件。在刚超过 TRUNCATE_TOOL_OUTPUT_THRESHOLD(25,000)时,截断得不偿失。可加一个「最小收益」保护(除非至少省下样板那么多字符,否则不截断)来规避该边界。大体量情形(步骤 1:70 KB 缓冲 → 7.6 KB 信封,行模式)则完全符合预期。

  • shell 流式缓冲对尾部的保留与时序相关:步骤 1(seq 1 200000)保留了真正的最后一行 200000,但步骤 4(seq 1 100000)保存文件在 99688 处被截断(模型注意到并重跑了 seq … | tail -1)。这是 shell.ts既有行为——git diff <merge-base>..HEAD -- packages/core/src/tools/shell.ts | grep -E 'maxLines|slice\(-|MAX_' 无任何命中——并非本 PR 引入,但会放大第一条观察。

未在运行时验证: 「将 PostToolUse hook 上下文延后到原始字符串截断之后,以免 hook 注入的元数据被腰斩」这条路径需要配置 PostToolUse hook 才能观测;该项我依据 diff 与 PR 自带单元测试判断。其余均通过真实应用驱动验证。

@wenshao

wenshao commented Jun 14, 2026

Copy link
Copy Markdown
Collaborator

@qwen-code /triage

wenshao
wenshao previously approved these changes Jun 14, 2026
@wenshao

wenshao commented Jun 14, 2026

Copy link
Copy Markdown
Collaborator

@Jerry2003826 heads up — CI is red on the latest commit (a04b91de5): Lint and Test (macOS / Ubuntu / Windows) all fail.

Root cause is a TypeScript build error in packages/core/src/telemetry/qwen-logger/qwen-logger.ts — three properties don't exist on ToolOutputTruncatedEvent:

src/telemetry/qwen-logger/qwen-logger.ts(598,24): error TS2339: Property 'call_id' does not exist on type 'ToolOutputTruncatedEvent'.
src/telemetry/qwen-logger/qwen-logger.ts(603,34): error TS2339: Property 'output_file_saved' does not exist on type 'ToolOutputTruncatedEvent'.
src/telemetry/qwen-logger/qwen-logger.ts(604,32): error TS2339: Property 'save_error_code' does not exist on type 'ToolOutputTruncatedEvent'.

This fails tsc --build for packages/core (npm run build --workspace=packages/core → exit 1), and since Lint and every Test job build core first, all four checks cascade to red. It looks like the ToolOutputTruncatedEvent type and its usage drifted apart in the merge-main commit a04b91de5. Aligning the event type's fields with the logger (add the three fields to the type, or update the logger to the current field names) should clear all four checks at once.

中文

@Jerry2003826 提个醒 —— 最新 commit(a04b91de5)的 CI 全红:LintTest(macOS / Ubuntu / Windows) 都失败了。

根因是 packages/core/src/telemetry/qwen-logger/qwen-logger.ts 的 TypeScript 编译错误 —— ToolOutputTruncatedEvent 类型上不存在这三个属性:call_idoutput_file_savedsave_error_code(见上方 TS2339)。

这会导致 packages/coretsc --build 失败(npm run build --workspace=packages/core → exit 1);而 Lint 和所有 Test 都要先 build core,所以四个 check 连锁变红。看起来是昨天那个解决 merge 冲突的 commit a04b91de5 里,ToolOutputTruncatedEvent 的类型定义和它的用法没对齐。把事件类型的字段和 logger 的用法对齐一下(给类型补上这三个字段,或把 logger 改成当前的字段名)应该能让四个 check 一次性通过。

@wenshao

wenshao commented Jun 14, 2026

Copy link
Copy Markdown
Collaborator

Merge-reference follow-up: the feature is already in main — recommend closing

Building on the two notes above — my earlier PASS verification (which tested the pre-merge SHA 81498ed) and the CI-red note (the qwen-logger.ts TS2339 build break on a04b91de5) — here's the state after re-checking the current head against current main, and the merge decision it points to.

Bottom line: the truncation feature this PR adds is already in main, and the merge-resolution commit reduced the PR to a small residual that doesn't build and fails check-i18n. I'd close it rather than repair it.

The feature already landed in main

truncateToolOutput() and the "Tool output was too large and has been truncated." model-facing path now live in main's packages/core/src/core/coreToolScheduler.ts + shell.ts (that's the behavior my earlier report verified end-to-end). So the real diff of this branch vs current main (origin/main...HEAD) is only 4 files, +22/-55 — the truncation move itself is no longer part of it.

What those 4 residual files are

File What it is State
core/.../telemetry/qwen-logger.ts (+4) extra truncation-telemetry fields (call_id, output_file_saved, save_error_*) breaks tsc/build — those fields aren't on ToolOutputTruncatedEvent (per the CI-red note above)
cli/.../i18n/zh.js, zh-TW.js (+2 each) a /fork feature-gate translation also breaks check-i18n + off-topic (see below)
core/.../tools/shell.test.ts (-41) removes shell-local truncation tests ✅ fine — redundant now that truncation is in the scheduler in main; shell.test.ts still passes (214)

A second red check, not just the build

Beyond the tsc break already noted, npm run check-i18n (which runs in PR CI's Lint job) also fails:

❌ Errors:
  - Extra key in zh-TW.js (not in en.js): "The /fork command requires the fork feature gate. Set QWEN_CODE_ENABLE_FORK_SUBAGENT=1 to enable it."
  - Extra key in zh.js  (not in en.js): "...same..."

These zh/zh-TW entries have no en.js source key (confirmed PR-added — main's zh.js lacks it). They're also a /fork message, unrelated to this PR's tool-output-truncation scope — most likely swept in by the merge.

Why this happened

The merge-resolution commit a04b91de5 correctly took main's side for coreToolScheduler.ts / shell.ts / telemetry/types.ts / truncation.ts (which already carry the feature), dropping the now-duplicate move — but it kept the qwen-logger.ts and /fork i18n bits whose supporting types.ts fields and en.js key were on the dropped side. The leftover is internally inconsistent: it can't compile and can't pass i18n.

Recommendation

Close this PR. Its feature is merged; the remainder doesn't build, fails check-i18n, and is partly off-topic. Repairing the build wouldn't make it worth merging — it would just be a telemetry-field add + an unrelated /fork translation + a redundant test removal. If the extra truncation telemetry (call_id + save-error fields) is still wanted, it's a clean, small, focused follow-up: add those fields to ToolOutputTruncatedEvent and the code that sets them, and leave the /fork i18n out.

中文版(Chinese version)

合并参考跟进:功能已在 main —— 建议关闭

接着上面两条 —— 我此前的 PASS 验证(测的是合并前 SHA 81498ed)和 CI 全红的提醒(a04b91de5qwen-logger.ts 的 TS2339 构建错误)—— 这是用当前 head 对当前 main 复查后的状态,以及它指向的合并决策。

结论:本 PR 要加的截断功能已经在 main 里了;而解决合并冲突的提交把 PR 削减成了一个无法构建、且过不了 check-i18n 的小残留。我倾向于关闭它,而不是去修。

功能已合入 main

truncateToolOutput() 与「Tool output was too large and has been truncated.」这条面向模型的路径,现已在 mainpackages/core/src/core/coreToolScheduler.ts + shell.ts 中(正是我此前报告端到端验证过的行为)。因此本分支相对当前 main 的真实 diff(origin/main...HEAD)只剩 4 个文件、+22/-55 —— 截断迁移本身已不在其中。

这 4 个残留文件是什么

文件 是什么 状态
core/.../telemetry/qwen-logger.ts (+4) 额外的截断遥测字段(call_idoutput_file_savedsave_error_*) 破坏 tsc/构建 —— 这些字段不在 ToolOutputTruncatedEvent 上(见上方 CI 红的提醒)
cli/.../i18n/zh.jszh-TW.js(各 +2) 一条 /fork 功能开关翻译 同时破坏 check-i18n + 跑题(见下)
core/.../tools/shell.test.ts (-41) 移除 shell 本地截断测试 ✅ 没问题 —— 截断已在 main 的 scheduler 中,这些已冗余;shell.test.ts 仍通过(214)

不止构建一处红

除了已提到的 tsc 错误,npm run check-i18n(运行在 PR CI 的 Lint job 中)也失败:

❌ Errors:
  - zh-TW.js 中多出的 key(不在 en.js):"The /fork command requires the fork feature gate. ..."
  - zh.js 中多出的 key(不在 en.js):"...同上..."

这两个 zh/zh-TW 条目在 en.js 中没有源 key(确认为 PR 新增 —— mainzh.js 没有)。它们还是 /fork 消息,与本 PR 的工具输出截断范围无关 —— 很可能是 merge 时带进来的。

为什么会这样

解决合并冲突的提交 a04b91de5coreToolScheduler.ts / shell.ts / telemetry/types.ts / truncation.ts 正确地取了 main 那一侧(其已携带该功能),丢弃了重复的迁移 —— 但保留了依赖被丢弃侧的 types.ts 字段与 en.js key 的 qwen-logger.ts/fork i18n。残留因而自相矛盾:既不能编译,也过不了 i18n。

建议

关闭本 PR。 其功能已合并;剩余部分不能构建、过不了 check-i18n、且部分跑题。即便修好构建也不值得合 —— 那只是「加一组遥测字段 + 一条无关的 /fork 翻译 + 删一批冗余测试」。如果仍想要那批额外的截断遥测(call_id + save-error 字段),可另开一个干净、聚焦的小后续:把这些字段加到 ToolOutputTruncatedEvent 以及填充它们的代码里,并去掉 /fork i18n。

@qwen-code-ci-bot

Copy link
Copy Markdown
Collaborator

Thanks for the PR @Jerry2003826 — and for the persistence through 22 commits of review iteration.

Template: All required sections present ✓. Minor: the <details>中文说明</details> block from the template is missing — please add it on the next push.

Direction: Clearly aligned. Unbounded tool outputs overflowing context tokens is a real user problem (#4049). Centralising truncation at the scheduler level so all string-returning tools benefit — not just shell — is the right architectural move.

Approach — important context about the current diff:

The PR has been through extensive iteration, and much of the original work (scheduler-level truncation in coreToolScheduler.ts, truncation.ts updates, test coverage) is already on main. The current net diff is only +22/−55 across 4 files:

  • shell.test.ts (+14/−55): simplifies a mock-based truncation test — fine since truncation responsibility moved to scheduler
  • qwen-logger.ts (+4): adds telemetry fields (call_id, output_file_saved, save_error_code, save_error_message)
  • zh.js / zh-TW.js (+2 each): i18n translations for a fork feature gate string

Two concerns at this stage:

  1. ⛔ CI is broken on all platforms. qwen-logger.ts references event.call_id, event.output_file_saved, event.save_error_code, event.save_error_message — but ToolOutputTruncatedEvent (in types.ts) doesn't declare those fields. The merge-conflict resolution (a04b91de5) appears to have lost the types.ts update. Build fails with 4 TS2339 errors.

  2. i18n changes are out of scope. The fork feature gate translations in zh.js/zh-TW.js have nothing to do with truncation. Likely picked up from the fork's branch history during a rebase. These should be in a separate PR.

  3. PR description vs. actual diff. The body still describes the original full-scope change (moving truncation to scheduler, Part[] support, etc.), but most of that is already merged. The description should be updated to reflect what this PR currently changes, so reviewers know what they're actually looking at.

Blocking on the CI fix. Once that's green I'll proceed to code review and testing.

中文说明

感谢 @Jerry2003826 的持续投入(22 次提交的迭代)。

模板: 必填部分齐全 ✓,缺少 <details>中文说明</details> 块,下次推送请补上。

方向: 完全对齐。工具输出无界增长导致上下文溢出是真实问题(#4049),将截断逻辑上移到 scheduler 使所有字符串返回工具受益,架构方向正确。

方案——关于当前 diff 的重要说明:

经过大量迭代,原始工作的大部分(coreToolScheduler.ts 中的 scheduler 级截断、truncation.ts 更新、测试覆盖)已在 main 上。当前净 diff 仅 +22/−55,涉及 4 个文件:

  • shell.test.ts (+14/−55):简化 mock 截断测试——合理,因截断职责已移至 scheduler
  • qwen-logger.ts (+4):新增遥测字段
  • zh.js / zh-TW.js (各+2):fork 功能开关的 i18n 翻译

两个问题:

  1. ⛔ 全平台 CI 挂掉。 qwen-logger.ts 引用了 ToolOutputTruncatedEvent 上不存在的字段(call_idoutput_file_saved 等)。合并冲突解决(a04b91de5)似乎丢失了 types.ts 的更新,构建报 4 个 TS2339 错误。
  2. i18n 变更超出范围。 zh.js/zh-TW.js 中的 fork 功能开关翻译与截断无关,可能来自 rebase 时拉入的 fork 分支历史,应单独提 PR。
  3. PR 描述与实际 diff 不符。 正文仍描述原始完整变更,但大部分已合入。应更新为当前实际改动。

CI 修复后进入代码审查和测试阶段。

Qwen Code · qwen3.7-max

Follow review follow-up for PR QwenLM#4520 after the merge-resolution commit:

- Extend ToolOutputTruncatedEvent with call_id, output_file_saved, and
  optional save-error metadata expected by qwen-logger
- Pass callId from CoreToolScheduler through truncateLlmContent and log
  truncation whenever model-facing content is actually bounded
- Remove off-topic /fork i18n keys that broke check-i18n

Co-authored-by: JerryLee <Jerry2003826@users.noreply.github.com>
@qwen-code-ci-bot

Copy link
Copy Markdown
Collaborator

2a. Code Review

Independent proposal (before reading diff):

Given the goal of "truncate model-facing tool output at the scheduler level," I would: (1) add truncation logic to CoreToolScheduler for string llmContent before it enters conversation history, (2) remove the duplicate truncation from shell.ts, (3) update ToolOutputTruncatedEvent with any new telemetry fields, and (4) adjust tests to cover the new scheduler path.

Comparison with the diff:

Most of this work is already on main (scheduler truncation in coreToolScheduler.ts exists, tests in coreToolScheduler.test.ts are comprehensive). The remaining diff is small and I'll assess each file:

shell.test.ts (+14/−55) — ✓ Reasonable. Replaces a complex mock-based test (which pinned truncation ordering at the shell level) with a simpler assertion that the long-run hint appears after the command output. Makes sense since truncation responsibility moved to the scheduler. The old mock was testing an implementation detail that no longer exists at the shell level.

qwen-logger.ts (+4) — ⛔ Broken. The diff adds call_id, output_file_saved, save_error_code, save_error_message to the snapshots JSON, but the ToolOutputTruncatedEvent type in types.ts is not updated. The merge-conflict resolution commit (a04b91de5) appears to have dropped the types.ts change. This causes TS2339 errors on all 4 new field references and breaks the build on every platform.

zh.js / zh-TW.js (+2 each) — ⚠️ Out of scope. These add i18n translations for 'The /fork command requires the fork feature gate...' — a string completely unrelated to tool output truncation. This was likely picked up from the fork branch's history during a rebase. It should be reverted from this PR and submitted separately.

Notable omission — shell.ts still truncates on main:

shell.ts on main still imports and calls truncateToolOutput (4 references at lines 27, 2030, 2165, 2189). Combined with the scheduler-level truncation that's already on main, this means double truncation — the shell output gets truncated in shell.ts, and then the already-truncated result gets run through the scheduler's truncation again. The PR's stated goal was to remove shell-local truncation so the scheduler is the single path, but the current diff doesn't touch shell.ts at all. This needs to be addressed — either the shell-level truncation was already removed in a separate merge (in which case the diff is correct), or it needs to be removed in this PR.

2b. Real-Scenario Testing

⛔ Cannot run — build is broken. The TypeScript compilation errors in qwen-logger.ts prevent npm run build from succeeding, which blocks both npm run dev and any tmux-based real-user testing. The workflow requires this step, but there are no workarounds when the build itself fails.

Previous verification rounds (by @wenshao on June 1 and June 14) showed the core scheduler truncation working end-to-end with a real model. Those verifications were against earlier commits where the build was green. The current head (a04b91de5) needs the types.ts fix before testing can resume.

Summary of blockers:

  1. Fix ToolOutputTruncatedEvent in types.ts — add the 4 missing fields and wire them through the constructor
  2. Remove the out-of-scope i18n changes (zh.js, zh-TW.js)
  3. Confirm whether shell.ts truncation cleanup is already merged or needs to be added
  4. Re-run CI green before real-scenario testing can proceed
中文说明

2a. 代码审查

独立方案(读 diff 前):CoreToolScheduler 中对 string llmContent 加截断、移除 shell.ts 中的重复截断、更新 ToolOutputTruncatedEvent 类型、调整测试。

与 diff 对比: 大部分工作已在 main 上。剩余 diff 逐文件评估:

  • shell.test.ts:简化 mock 测试 ✓,合理
  • qwen-logger.ts:⛔ 构建失败。新增 4 个遥测字段但 types.ts 未更新,合并冲突解决时丢失了类型定义
  • zh.js / zh-TW.js⚠️ 超出范围。fork 功能开关翻译与截断无关,应移除

重要遗漏: shell.ts 在 main 上仍有 4 处 truncateToolOutput 调用,与 scheduler 级截断形成双重截断。需确认 shell 级截断是否已在别处移除。

2b. 真实场景测试

⛔ 无法运行——构建失败。qwen-logger.ts 的 TypeScript 错误阻止了 npm run build,无法进行 tmux 真实用户测试。

阻塞项:

  1. 修复 types.tsToolOutputTruncatedEvent 的 4 个缺失字段
  2. 移除无关 i18n 变更
  3. 确认 shell.ts 截断清理状态
  4. CI 全绿后方可进行真实场景测试

Qwen Code · qwen3.7-max

@qwen-code-ci-bot

Copy link
Copy Markdown
Collaborator

Reflection

Stepping back, this PR has had an unusually long journey — 22 commits, multiple rounds of review from @wenshao and @LaZzyMan, and several rewrites. The core idea (scheduler-level truncation for all string-returning tools) is sound, and the good news is that most of the actual work is already on main: coreToolScheduler.ts truncates string llmContent via truncateToolOutput, the test coverage in coreToolScheduler.test.ts is comprehensive, and previous real-world verifications showed it working end-to-end.

The current net diff (+22/−55 across 4 files) is what remains after all that iteration — and it's where the problems are:

  1. The build is broken. Four TypeScript errors because ToolOutputTruncatedEvent in types.ts was never updated to include the new telemetry fields that qwen-logger.ts now references. This looks like a merge-conflict resolution casualty — the types.ts change was likely in an earlier commit but got lost during the rebase in a04b91de5.

  2. Unrelated i18n changes slipped in. The fork feature gate translations in zh.js/zh-TW.js have nothing to do with truncation. These came from the fork branch's history, not from this PR's intent.

  3. The PR description overshoots the diff. The body describes the original full-scope change, but a reviewer reading it today would be confused — the actual changes are a test simplification and 4 telemetry fields.

  4. The double-truncation question is unresolved. shell.ts on main still calls truncateToolOutput with a comment claiming the scheduler pass becomes a no-op. If that's true by design, the comment should be clear about it. If shell-level truncation was supposed to be removed, it's missing from this diff.

The contributor has shown strong commitment and responsiveness throughout this process. The fixes needed are small and mechanical. But the PR can't merge while CI is red.

Verdict: Requesting changes.

中文说明

反思

这个 PR 经历了漫长的迭代(22 次提交、多轮审查),核心思路(scheduler 级截断)是正确的,而且大部分工作已在 main 上

当前 net diff(+22/−55,4 个文件)是迭代后的残余,问题出在这里:

  1. 构建挂掉。 types.tsToolOutputTruncatedEvent 缺少新遥测字段,合并冲突解决时丢失。
  2. 无关 i18n 混入。 fork 功能开关翻译应移除。
  3. PR 描述与实际 diff 脱节。 应更新为当前实际改动。
  4. 双重截断未解决。 shell.ts 仍调用 truncateToolOutput,需明确是有意为之还是需要移除。

贡献者展现了很强的投入和响应能力,修复项都很小。但 CI 红灯时无法合入。

结论:请求修改。

Qwen Code · qwen3.7-max

@qwen-code-ci-bot qwen-code-ci-bot left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a few small fixes before this can land — see my notes above. The core truncation work is solid and already on main; the remaining diff just needs the types.ts update, i18n cleanup, and a CI-green run. 🙏

@wenshao

wenshao commented Jun 14, 2026

Copy link
Copy Markdown
Collaborator

Re-verification after the fix push — both issues resolved, now green ✅

Update to my previous note (which recommended closing because the head a04b91de5 didn't build and failed check-i18n). The new commit 7481a831e7 "align truncation telemetry with logger fields" fixes both. Re-verified at that head on Linux (Node 22.22.2).

Verdict: the build + i18n breakage is fixed; the PR is now a small, coherent, green, on-topic residual (truncation-telemetry enrichment + a test cleanup) on top of the already-merged feature. No longer a "close it" — it's mergeable; just update the stale description.

Both prior blockers are resolved

Was (at a04b91de5) Now (at 7481a831e7)
❌ build red — 4× TS2339 (qwen-logger reads fields not on ToolOutputTruncatedEvent) npm run build / typecheck greentypes.ts now declares call_id? / output_file_saved / save_error_code? / save_error_message? and the constructor sets them
❌ check-i18n red — orphan /fork keys (not in en.js) check-i18n exit 0 — the off-topic /fork zh/zh-TW entries were removed

Confirmed the fix is load-bearing: reverting just the four types.ts field declarations reproduces the 8 TS2339 errors.

What the PR now contributes (net diff vs current main, 6 files +50/−57)

The headline truncation move is already in main (verified end-to-end in my earlier real-CLI/tmux run). This branch now adds a clean, on-topic observability enrichment of the existing truncation event:

  • types.ts (+12) — ToolOutputTruncatedEvent gains call_id, output_file_saved, save_error_code, save_error_message.
  • truncation.ts (+6) / coreToolScheduler.ts (+8) — populate them (outputFileSaved: Boolean(result.outputFile), callId, and the save-error code/message when the temp-file write fails).
  • qwen-logger.ts (+4) — emit them in the truncation telemetry snapshot.
  • loggers.test.ts (+8) — asserts the fields, including the save-failure path (output_file_saved: false, save_error_code: 'EACCES', save_error_message: 'permission denied').
  • shell.test.ts (−41) — removes shell-local truncation tests, now redundant since truncation lives in the scheduler in main.

Tests + checks (at 7481a831e7)

  • npm run build / typecheck --workspace=packages/core: exit 0
  • npm run check-i18n: exit 0
  • vitest loggers + coreToolScheduler + shell: 467 passed

On the CI-bot's review (it's evaluating the pre-fix commit)

The bot's stage-2/3 comments above ("⛔ build broken", "Requesting changes") were generated against the pre-fix head a04b91de5 — the fix 7481a831e7 was pushed at 18:23, just before those comments, and they didn't pick it up. Its three build/i18n blockers (types.ts fields, drop /fork i18n, CI green) are exactly what 7481a831e7 does, so they're already resolved at the current head.

Its one still-open question — shell.ts double truncation (shell.ts on main still calls truncateToolOutput, so shell truncates and then the scheduler truncates again) — is real but out of scope for this PR: the diff doesn't touch shell.ts source (the shell.test.ts change is test-only), and the layering is pre-existing main behavior. In my earlier real-model run the layered result was still usable end-to-end (shell bounded the streaming buffer; the scheduler bounded the final model-facing string). Whether to collapse it to a single path is a separate cleanup, not part of this telemetry follow-up.

One non-blocking nit

The PR description still reads as the original full-scope change (moving truncation into the scheduler, Part[] handling, etc.), most of which is now in main. Worth trimming it to the current scope — "add call_id + save-status/error telemetry to the tool-output-truncation event; drop the now-redundant shell-local truncation tests" — so reviewers see what this actually changes. (The CI-bot also flagged the description drift and a missing 中文说明 block.)

Net: the concerns from my last note are addressed and CI is green. This is now a reasonable small follow-up to the merged truncation feature — mergeable after a description refresh.

中文版(Chinese version)

修复推送后的复验 —— 两个问题都已解决,现已全绿 ✅

这是对我上一条(当时因为 head a04b91de5 无法构建且过不了 check-i18n 而建议关闭)的更新。新提交 7481a831e7「align truncation telemetry with logger fields」 把两者都修了。我在该 head 上于 Linux(Node 22.22.2)重新验证。

结论:构建 + i18n 的破坏已修复;本 PR 现在是一个在「已合入功能」之上的、小而自洽、全绿、切题的残留(截断遥测增强 + 一处测试清理)。不再是「建议关闭」—— 可以合并,只需更新过时的描述。

此前两个阻塞项都已解决

之前(a04b91de5) 现在(7481a831e7)
❌ 构建红 —— 4× TS2339(qwen-logger 读了 ToolOutputTruncatedEvent 上不存在的字段) npm run build / typecheck 绿 —— types.ts 现已声明 call_id? / output_file_saved / save_error_code? / save_error_message?,构造函数也会赋值
❌ check-i18n 红 —— 孤儿 /fork key(不在 en.js) check-i18n exit 0 —— 跑题的 /fork zh/zh-TW 条目已移除

确认该修复是承重的:仅回退这四个 types.ts 字段声明,就会重现 8 个 TS2339 错误。

本 PR 现在贡献了什么(相对当前 main 的净 diff,6 文件 +50/−57)

截断迁移本身已在 main(我此前用真实 CLI/tmux 端到端验证过)。本分支现在为既有的截断事件添加了一个干净、切题的可观测性增强:

  • types.ts(+12)—— ToolOutputTruncatedEvent 新增 call_idoutput_file_savedsave_error_codesave_error_message
  • truncation.ts(+6)/ coreToolScheduler.ts(+8)—— 填充它们(outputFileSaved: Boolean(result.outputFile)callId,以及临时文件写入失败时的 save-error code/message)。
  • qwen-logger.ts(+4)—— 在截断遥测快照中输出它们。
  • loggers.test.ts(+8)—— 断言这些字段,包括保存失败路径(output_file_saved: falsesave_error_code: 'EACCES'save_error_message: 'permission denied')。
  • shell.test.ts(−41)—— 移除 shell 本地截断测试,在 main 中截断已归入 scheduler,这些已冗余。

测试 + 检查(7481a831e7)

  • npm run build / typecheck --workspace=packages/core:exit 0
  • npm run check-i18n:exit 0
  • vitest loggers + coreToolScheduler + shell:467 通过

关于 CI-bot 的审查(它评的是修复前的提交)

上方 bot 的 stage-2/3 评论(「⛔ 构建挂掉」「请求修改」)是针对修复前的 head a04b91de5 生成的 —— 修复 7481a831e7 于 18:23 推送,恰在这些评论之前,但它们没采纳。它列的三个构建/i18n 阻塞项(types.ts 字段、移除 /fork i18n、CI 转绿)正是 7481a831e7 所做的,因此在当前 head 已全部解决。

它唯一仍开放的问题 —— shell.ts 双重截断(mainshell.ts 仍调用 truncateToolOutput,即 shell 先截断、scheduler 再截断一次)—— 确实存在,但不在本 PR 范围内:本 diff 未改动 shell.ts 源码(shell.test.ts 的改动是纯测试),且这种分层是 main 既有行为。我此前的真实模型运行中,分层结果端到端仍可用(shell 限界流式缓冲;scheduler 限界最终面向模型的字符串)。是否合并为单一路径,是另一项独立清理,不属于本次遥测后续。

一处不阻塞的小问题

PR 描述仍写成原始的全范围改动(把截断迁移到 scheduler、Part[] 处理等),其中大部分已在 main。建议精简到当前范围 —— 「为工具输出截断事件新增 call_id + 保存状态/错误遥测;删除已冗余的 shell 本地截断测试」 —— 让评审看清这实际改了什么。(CI-bot 也提示了描述漂移和缺少 中文说明。)

总体:我上一条的顾虑均已解决、CI 全绿。这现在是已合入截断功能之上的一个合理小后续 —— 刷新描述后即可合并。

@LaZzyMan

Copy link
Copy Markdown
Collaborator

Thanks for all the work here, @Jerry2003826 — and for the persistence through
23 commits and several rounds of review.

The core of this PR — moving model-facing tool-output truncation into
CoreToolScheduler — has since landed in main via #4880, which generalized it
into a layered model (single-result truncation + per-message budget + per-tool
limits). After the rebase, what remains here is a focused truncation-telemetry
enrichment
(call_id, output_file_saved, save_error_*) plus a
redundant-test cleanup.

That telemetry addition is reasonable and CI is green, but since the feature
itself is now in main, I'm closing this PR to keep the queue focused. If the
extra observability is still useful, a small standalone PR that just adds those
fields to ToolOutputTruncatedEvent and the code that populates them would be
very welcome — it'd be a clean, easy review.

Thanks again for pushing this forward; the end-to-end behavior you were after is
shipping in main.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type/bug Something isn't working as expected

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] 工具输出未截断导致 Context Token 溢出,Session 无法继续

9 participants