Skip to content

fix(core): compress when usage metadata is missing#4528

Merged
wenshao merged 9 commits into
QwenLM:mainfrom
Jerry2003826:Jiarui/fix-compression-missing-usage
Jun 14, 2026
Merged

fix(core): compress when usage metadata is missing#4528
wenshao merged 9 commits into
QwenLM:mainfrom
Jerry2003826:Jiarui/fix-compression-missing-usage

Conversation

@Jerry2003826

@Jerry2003826 Jerry2003826 commented May 26, 2026

Copy link
Copy Markdown
Contributor

What this PR does

Allows chat compression to proceed safely when provider usage metadata is missing, while rejecting inflated local token deltas that would make the compression result unsafe.

Why it's needed

Some model responses omit usage metadata. The compression path still needs a conservative fallback instead of silently skipping or accepting an unsafe inflated estimate.

Reviewer Test Plan

How to verify

Run: npm test --workspace=packages/core -- src/services/chatCompressionService.test.ts -t "usage metadata|inflated local delta|cap-sized"; npm test --workspace=packages/core -- src/services/chatCompressionService.test.ts; npx eslint packages/core/src/services/chatCompressionService.test.ts; npx prettier --check packages/core/src/services/chatCompressionService.test.ts; npm run typecheck --workspace=packages/core.

Evidence (Before & After)

Before: missing usage metadata could leave the compression path without coverage for inflated local estimates. After: regression tests cover missing-usage compression, inflated fallback rejection, and cap-sized safe fallback. This is core service behavior, so TUI screenshots are N/A.

Tested on

OS Status
macOS CI only
Windows Tested locally
Linux CI only

Environment (optional)

Local Windows/PowerShell checkout with repository npm workspaces. No tmux/TUI capture is included for PRs whose behavior is core, session, parser, or transport logic rather than a visible TUI state.

Risk & Scope

  • Main risk or tradeoff: The fallback remains conservative and only affects compression decisions when provider usage metadata is absent.
  • Not validated / out of scope: No unrelated refactors, public API changes, UI redesigns, or behavior outside the linked issue scope.
  • Breaking changes / migration notes: None expected.

Linked Issues

Fixes #3282

中文说明

这个 PR 做了什么

在 provider 没有返回 usage metadata 时,仍然允许压缩走安全的保守路径,并拒绝膨胀的本地 token 估算。

为什么需要

部分模型响应没有 usage metadata,压缩逻辑需要有可靠 fallback,不能跳过或接受不安全估算。

Reviewer Test Plan

本地在 Windows 跑了 chatCompressionService 定向测试、完整文件测试、eslint、prettier check 和 core typecheck;macOS/Linux 依赖 GitHub CI。

风险和范围

只影响缺失 usage metadata 时的压缩判断,不改压缩公共接口。

@longbinlai

Copy link
Copy Markdown

⚠️ Cross-PR Conflict Detected — Token/Prompt Handling Group (4 PRs)

This PR shares code with PR #4525, PR #4526, and PR #4531.

Key shared functions:

  • compresspackages/core/src/services/chatCompressionService.ts
  • estimateContentTokenspackages/core/src/services/tokenEstimation.ts

Recommendation: This PR is smaller (2 functions) but shares compression service code with PR #4525.


This conflict was detected automatically by CodeScope skills — structural analysis of call graphs and file modifications.

@longbinlai

Copy link
Copy Markdown

⚠️ Cross-PR Conflict Detected — Token/Prompt Handling Group (4 PRs)

This PR shares code with PR #4525, PR #4526, and PR #4531.

Key shared functions:

  • compresspackages/core/src/services/chatCompressionService.ts
  • estimateContentTokenspackages/core/src/services/tokenEstimation.ts

Recommendation: This PR is smaller (2 functions) but shares compression service code with PR #4525.


This conflict was detected automatically by CodeScope skills — structural analysis of call graphs and file modifications.

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion (pre-existing code, not on diff line): The output-truncation guard at chatCompressionService.ts:560 (compressionOutputTokenCount >= COMPACT_MAX_OUTPUT_TOKENS) is a no-op when usage is undefined, because compressionOutputTokenCount is also undefined. A usage-omitting provider that returns a max-length truncated summary will have it accepted unconditionally by the new fallback path. Consider logging a warning when the fallback path is used and the truncation guard cannot fire, so the risk is at least observable.

— qwen3.7-max via Qwen Code /review

(compressionInputTokenCount - 1000) +
compressionOutputTokenCount,
);
} else {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Critical] estimateContentTokens(extraHistory) produces a char/4 lower-bound of only the visible post-compression messages (summary + ack + historyToKeep). But originalTokenCount is API-reported and includes system prompt + tool definitions (~15–25K tokens of invisible context). These are on fundamentally different scales.

Downstream, geminiChat.ts:1362 stores newTokenCount as lastPromptTokenCount. On the next send, estimatePromptTokens() takes the lastPromptTokenCount > 0 branch and returns lastPromptTokenCount + estimateContentTokens([userMessage]) — missing the system-prompt-and-tools delta entirely. The auto-compaction gate then undercounts by 15–25K tokens until the next API response corrects it. For providers that consistently omit usage, the undercount persists across compression cycles.

Additionally, the newTokenCount > originalTokenCount (INFLATED_TOKEN_COUNT) guard becomes effectively dead code: the char/4 estimate of visible content will almost always be less than the API-reported full-prompt count, so even a pathological summary that makes the history larger would pass this check.

Suggested fix — estimate the delta rather than the absolute post-compression size, so newTokenCount stays on the same scale as originalTokenCount:

} else {
  // Preserve the API-reported baseline (system prompt + tools) by
  // estimating the reduction from content replacement rather than
  // the absolute post-compression size.
  const estimatedRemoved = estimateContentTokens(
    historyToCompress,
    slimmingConfig.imageTokenEstimate,
  );
  const syntheticMessages = extraHistory.slice(
    0,
    extraHistory.length - historyToKeep.length,
  );
  const estimatedAdded = estimateContentTokens(
    syntheticMessages,
    slimmingConfig.imageTokenEstimate,
  );
  newTokenCount = Math.max(
    0,
    originalTokenCount - estimatedRemoved + estimatedAdded,
  );
  canCalculateNewTokenCount = newTokenCount > 0;
}

This mirrors the primary path's structure (originalTokenCount - removed + added) and keeps both sides of the INFLATED guard comparison on the same measurement scale.

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in the current branch. The missing-usage path no longer replaces the API-reported total with a visible-history-only estimate. It computes the visible delta only, preserves the API-reported non-visible remainder (originalTokenCount - estimatedOriginalVisibleTokenCount), and adds the estimated post-compression visible history back on top.

Coverage:

  • should use estimated token count if usage metadata is missing

Validated:

  • npm run test --workspace=packages/core -- src/services/chatCompressionService.test.ts
  • npx prettier --check packages/core/src/services/chatCompressionService.ts packages/core/src/services/chatCompressionService.test.ts
  • npx eslint packages/core/src/services/chatCompressionService.ts packages/core/src/services/chatCompressionService.test.ts

newTokenCount = Math.max(
0,
originalTokenCount -
(compressionInputTokenCount - 1000) +

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] This fallback branch silently switches measurement methodology with no log trace. Other paths in this file emit debug/warn messages (e.g., lines 494, 567). A 3 AM operator sees a chat_compression telemetry event with tokens_after populated but compression_input_token_count / compression_output_token_count absent — there is no breadcrumb explaining why.

Suggested change
(compressionInputTokenCount - 1000) +
} else {
config.getDebugLogger().debug(
`[chat-compression] usage metadata omitted by provider; ` +
`falling back to local token estimation.`,
);
newTokenCount = estimateContentTokens(

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 6020b735b: the missing-usage fallback path now has an explicit debug breadcrumb. The test captures getDebugLogger().debug and asserts the usage metadata missing / preserved non-visible remainder message is emitted.

expect(result.info.originalTokenCount).toBe(800);
expect(result.info.newTokenCount).toBe(800);
expect(result.newHistory).toBeNull();
expect(result.info.newTokenCount).toBeLessThan(800);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] toBeLessThan(800) is too loose to verify the estimation fallback actually ran. With the test's tiny history ("Summary" + "Got it..." ack + a few kept messages ≈ 57 chars), estimateContentTokens returns approximately Math.ceil(57/4) = 15 tokens. The upper bound of 800 would pass even if the fallback branch were never entered and newTokenCount were accidentally close to originalTokenCount.

Consider a tighter assertion that actually proves the estimator ran:

Suggested change
expect(result.info.newTokenCount).toBeLessThan(800);
expect(result.info.newTokenCount).toBeGreaterThan(0);
expect(result.info.newTokenCount).toBeLessThan(50);

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 6020b735b with a tighter assertion. I did not use <50 here because the current fallback intentionally preserves the API-reported non-visible remainder (5000 - 4000 = 1000) and only replaces the visible-history delta. The test now asserts newTokenCount > 1000 and < 1100, which verifies the fallback estimator ran while preserving the non-visible remainder.

expect(result.info.newTokenCount).toBeLessThan(5_000);
expect(result.newHistory).not.toBeNull();
expect(result.newHistory![0].parts![0].text).toBe('Summary');
});

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] Missing test coverage for the delta-based estimation path producing COMPRESSION_FAILED_INFLATED_TOKEN_COUNT. All existing INFLATED tests (lines 959 and 1171) use explicit promptTokenCount/candidatesTokenCount in mock usage, exercising the primary formula. No test covers the scenario where delta-based estimation (estimatedNewContentTokenCount > estimatedOriginalContentTokenCount) triggers the INFLATED guard — e.g., when the side-query returns a verbose summary larger than the compressed input in estimated terms. This branch at chatCompressionService.ts:723 is reachable via the delta path but untested.

Consider adding a test that constructs a small historyForSplit with a verbose summary (e.g., text: 'x'.repeat(40_000)) and usage: undefined, asserting COMPRESSION_FAILED_INFLATED_TOKEN_COUNT and newHistory === null.

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Covered in the current branch by should reject inflated local delta if usage metadata is missing. That test sets usage: undefined, exercises the local delta estimation path, and asserts COMPRESSION_FAILED_INFLATED_TOKEN_COUNT with newHistory === null when the estimated post-compression count is inflated.

Validated:

  • npm run test --workspace=packages/core -- src/services/chatCompressionService.test.ts
  • npx prettier --check packages/core/src/services/chatCompressionService.ts packages/core/src/services/chatCompressionService.test.ts
  • npx eslint packages/core/src/services/chatCompressionService.ts packages/core/src/services/chatCompressionService.test.ts

wenshao
wenshao previously approved these changes May 27, 2026

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All R1-R2 findings addressed. Delta-based estimation approach is sound, tests cover the key scenarios (happy path, inflated rejection, cap guard). CI green (8/8). — qwen3.7-max via Qwen Code /review

@wenshao

wenshao commented May 28, 2026

Copy link
Copy Markdown
Collaborator

PR #4528 Verification Report

Branch: Jiarui/fix-compression-missing-usage
Test Environment: macOS Darwin 25.4.0

Build & Type Check

  • npm run buildPASSED
  • npx tsc --noEmitPASSED

Unit Tests

  • npx vitest run packages/core/src/services/chatCompressionService.test.ts71 tests passed

New/Modified Test Coverage

  1. Missing-usage compression succeeds with local estimation — when provider omits usage metadata, estimateContentTokens() fallback produces valid token counts and compression completes with COMPRESSED status
  2. Inflated local delta rejected — 40K char summary on 800-token input correctly triggers COMPRESSION_FAILED_OUTPUT_TRUNCATED, preventing runaway estimation
  3. Cap-sized summaries rejected — summaries hitting the output cap are properly rejected even when usage metadata is missing

Code Review Observations

  • Line ~546: compressionOutputTokenCount fallback uses estimateContentTokens(summary) with a warning log when usage metadata is absent — clean degradation path
  • Line ~659: New else branch for newTokenCount calculation estimates visible token delta locally while preserving the API-reported non-visible remainder (system/tool/prompt tokens) — avoids double-counting
  • Warning logs provide observability for when the fallback path activates
  • Guard rails (inflated delta rejection, cap-sized rejection) prevent local estimation from silently producing bad results

Verdict

Ready to merge. The fallback logic is well-bounded with proper rejection guards, and all 71 tests pass including the 3 new coverage scenarios.

— wenshao

@BZ-D BZ-D left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR 整体逻辑合理:delta-based 估算保留了 API 侧不可见 token 开销,仅对可见对话部分做本地 char/4 估算,方向安全(保守)。truncation guard 的本地补充也正确。发现一个会导致编译失败的 merge conflict。

);
} else {
const estimatedOriginalVisibleTokenCount = estimateContentTokens(
historyForSplit,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Critical — 合并冲突] historyForSplit 在当前 main 分支上已不存在。PR #4599 将 tail-preservation 压缩重构为 summary + restoration attachments 模式后,该变量被移除。GitHub 已将本 PR 标记为 CONFLICTING。

需要 rebase 到最新 main。rebase 后此处应改用 curatedHistory(全量对话历史),或根据新代码结构中 historyForCompose(已剥离 orphaned functionCall 的版本)来估算原始可见 token。语义上 curatedHistory 更合适,因为 token 估算需要覆盖压缩前的完整可见内容。

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 299e3d7.

I merged current origin/main into the PR branch, resolved the chatCompressionService.ts conflict, and changed the missing-usage fallback to estimate the original visible token baseline from curatedHistory instead of the removed historyForSplit path. The merge also keeps main's post-compact attachment token adjustment on the usage-metadata path.

Validation:

  • npm run test --workspace=packages/core -- src/services/chatCompressionService.test.ts -t "usage metadata|missing usage|inflated|cap"
  • npm run test --workspace=packages/core -- src/services/chatCompressionService.test.ts
  • npx prettier --check packages/core/src/services/chatCompressionService.ts packages/core/src/services/chatCompressionService.test.ts
  • npx eslint packages/core/src/services/chatCompressionService.ts packages/core/src/services/chatCompressionService.test.ts
  • npm run typecheck --workspace=packages/core

…n-missing-usage

# Conflicts:
#	packages/core/src/services/chatCompressionService.ts
if (compressionOutputTokenCount === undefined && !isSummaryEmpty) {
compressionOutputTokenCount = estimateContentTokens(
[{ role: 'model', parts: [{ text: summary }] }],
slimmingConfig.imageTokenEstimate,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

estimateContentTokens uses a fixed ~4 chars/token ratio (CHARS_PER_TOKEN). For CJK text the actual ratio is closer to ~0.67 chars/token (each CJK character ≈ 1.5 tokens). A CJK summary truncated at COMPACT_MAX_OUTPUT_TOKENS tokens would have only ~COMPACT_MAX_OUTPUT_TOKENS × 0.67 characters, yielding an estimate of ~COMPACT_MAX_OUTPUT_TOKENS × 0.17 — well below the threshold, so the truncation guard on line 413 would not fire.

This only affects providers that omit usage metadata AND produce CJK output, but for those providers a truncated summary could silently corrupt the compressed history.

One possible mitigation: supplement the char-based estimate with a length check against COMPACT_MAX_OUTPUT_TOKENS × CHARS_PER_TOKEN chars (the maximum the model could have written), or apply a safety margin.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in the current branch. The output-cap guard now uses a CJK-aware local estimate for summary output when provider usage is missing, so CJK summaries near the output-token cap are rejected as COMPRESSION_FAILED_OUTPUT_TRUNCATED instead of passing through via char/4 underestimation.

Coverage:

  • should reject CJK cap-sized summaries when usage metadata is missing

I also pushed 93b07ea08 to write that fixture as \u4e00, avoiding non-ASCII display/encoding noise while keeping the same CJK token-estimation case.

Validated:

  • npm run test --workspace=packages/core -- src/services/chatCompressionService.test.ts
  • npx prettier --check packages/core/src/services/chatCompressionService.ts packages/core/src/services/chatCompressionService.test.ts
  • npx eslint packages/core/src/services/chatCompressionService.ts packages/core/src/services/chatCompressionService.test.ts

@BZ-D BZ-D left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

整体逻辑清晰,fallback 路径设计合理。留了一个 inline comment 关于 CJK 场景下 estimateContentTokens 的 char/token 比率偏差 —— 在无 usage metadata 的 provider + CJK 输出场景下,truncation guard 可能漏判。不 block,供参考。

BZ-D
BZ-D previously approved these changes Jun 1, 2026
wenshao
wenshao previously approved these changes Jun 1, 2026

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No new issues found. All R1-R3 findings addressed. The missing-usage fallback chain (totalTokenCount delta → local estimate → undefined) and the delta-based token math with non-visible remainder preservation are well-designed. Merge conflict resolution (historyForSplit → curatedHistory) is correct. LGTM ✅ — qwen-latest-series-invite-beta-v38 via Qwen Code /review

BZ-D
BZ-D previously approved these changes Jun 2, 2026

@BZ-D BZ-D left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the diff and ran the targeted compression tests; no blocking issues found.

…n-missing-usage

# Conflicts:
#	packages/core/src/services/chatCompressionService.ts
@Jerry2003826

Copy link
Copy Markdown
Contributor Author

Merged latest origin/main in b39d03b8b and cleared the compression-service merge conflict.

Conflict resolution:

  • Kept this PR's local fallback path for missing compression usage metadata, including the CJK-aware estimateSummaryOutputTokens() helper.
  • Kept upstream's new MAX_HOOK_INSTRUCTIONS_CHARS cap and buildStateReminderParts import/path.
  • The two blocks are independent, so the resolution preserves both behaviors.

Validation on Windows:

  • npm run test --workspace=packages/core -- src/services/chatCompressionService.test.ts -t "missing usage|usage metadata|CJK|local estimate|fallback|output cap|cap-sized|inflated"
  • npm run test --workspace=packages/core -- src/services/chatCompressionService.test.ts
  • npx prettier --check packages/core/src/services/chatCompressionService.ts packages/core/src/services/chatCompressionService.test.ts
  • npx eslint packages/core/src/services/chatCompressionService.ts packages/core/src/services/chatCompressionService.test.ts
  • npm run typecheck --workspace=packages/core
  • git diff --check

wenshao
wenshao previously approved these changes Jun 4, 2026
DragonnZhang
DragonnZhang previously approved these changes Jun 8, 2026

@DragonnZhang DragonnZhang left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

Verdict: APPROVE

No new findings. The fallback chain for missing usage metadata is well-designed and conservative throughout.

What was reviewed

  • chatCompressionService.ts: New estimateSummaryOutputTokens function with CJK awareness, local estimation fallback in the token math else branch, and the integration of local estimates into the existing cap guard and inflation guard.
  • chatCompressionService.test.ts: Four new/updated tests covering happy path, inflated delta rejection, cap-sized summary rejection, and CJK cap-sized summary rejection — all without usage metadata.

Assessment

  • The fallback chain (totalTokenCount delta → local char-based estimate → undefined) is sound and conservative.
  • The non-visible remainder preservation (originalTokenCount - estimatedOriginalVisibleTokenCount) correctly prevents the local estimator from replacing the API-authoritative total with a visible-history-only underestimate.
  • The CJK-aware output estimate takes max(generic, cjkAware), which is the safe (higher) direction — reducing false negatives in the cap guard.
  • The inflation guard (newTokenCount > originalTokenCount) correctly catches oversized summaries even without provider usage data.
  • Tests are comprehensive and validate both success and failure paths under missing usage metadata.
  • Deterministic checks (tsc, eslint): 0 findings.
  • CI: 14/14 checks passing.

— qwen-code via Qwen Code /review

@Jerry2003826

Copy link
Copy Markdown
Contributor Author

Addressed the review suggestions in 6020b735b.

Updates:

  • The missing-usage fallback test now asserts the post-compression estimate is tightly bounded (> 1000 and < 1100) instead of only checking that it is below the original 5000 tokens.
  • The same test now captures getDebugLogger().debug and asserts the fallback trace is emitted, including the preserved API-reported non-visible remainder.

Validated locally:

  • npm run test --workspace=packages/core -- src/services/chatCompressionService.test.ts
  • npm run lint --workspace=packages/core
  • npm run typecheck --workspace=packages/core
  • npx prettier --check packages/core/src/services/chatCompressionService.test.ts

@wenshao

wenshao commented Jun 14, 2026

Copy link
Copy Markdown
Collaborator

@qwen-code /triage

@qwen-code-ci-bot

Copy link
Copy Markdown
Collaborator

Thanks for the PR, @Jerry2003826!

Template looks good ✓ — all required sections present, bilingual, linked issue.

On direction: this is a clear-cut bug fix for a 100% reproducible failure. MiniMax-2.7 (and likely other OpenAI-compatible providers) omit usage on the compression side-query, which currently hard-fails compression with a token-count error. That's a real user-facing regression for anyone on those providers. Well within scope — core compression logic should degrade gracefully when optional metadata is absent. Issue #3282 confirms the report.

On approach: scope is tight — 2 files, ~270 lines, focused entirely on the missing-metadata path. The two-part fallback (output token estimation + non-visible remainder preservation) is the right shape. No scope creep, no speculative features. Clean.

Moving on to code review. 🔍

中文说明

感谢 @Jerry2003826 的贡献!

模板完整 ✓ — 所有必要章节齐全,中英双语,关联了 issue。

方向:这是一个 100% 可复现的 bug 修复。MiniMax-2.7(以及其他 OpenAI 兼容 provider)在压缩 side-query 上省略了 usage 元数据,导致压缩直接以 token 计数错误而失败。这对使用这些 provider 的用户来说是真实的用户侧问题。修复范围合理——核心压缩逻辑在可选元数据缺失时应该优雅降级。Issue #3282 确认了这个报告。

方案:范围紧凑——2 个文件,约 270 行,完全聚焦于缺失元数据的处理路径。两部分 fallback(输出 token 估算 + 非可见部分保留)是正确的思路。没有范围蔓延,没有投机性功能。干净。

进入代码审查 🔍

Qwen Code · qwen3.7-max

@qwen-code-ci-bot

Copy link
Copy Markdown
Collaborator

Code Review

Independent proposal (before reading the diff): when a provider omits usage metadata on the compression side-query, fall back to local char-based token estimation — the same estimator used by the auto-compaction gate. Keep it conservative, and reject estimates that would inflate the token count. That's roughly what the PR does.

The PR's actual approach goes one step further with a non-visible remainder preservation technique: instead of estimating the total post-compression count from scratch (which would miss ~15-20K of system prompt + tool definitions), it computes nonVisibleTokens = originalCount - estimatedVisibleBefore, then newCount = nonVisibleTokens + estimatedVisibleAfter. This is clever and more accurate than a naive full-estimate fallback.

CJK-aware output estimation (estimateSummaryOutputTokens) is a nice touch — CJK characters tokenize at ~1.5× the rate of Latin text, so a plain char/4 estimate would significantly under-count CJK summaries. The multiplier is a reasonable heuristic.

No blockers found. Specific observations:

  • The cap guard (>= COMPACT_MAX_OUTPUT_TOKENS) and inflation guard (newTokenCount > originalTokenCount) naturally extend to estimated values — guard rails preserved without extra code.
  • The fallback only activates when compressionOutputTokenCount === undefined, so existing behavior with metadata-present providers is completely untouched.
  • Comment cleanup in the else branch is a nice readability improvement (the old version had awkward mid-line breaks).

Test Verification

$ cd packages/core && npx vitest run src/services/chatCompressionService.test.ts

 ✓ src/services/chatCompressionService.test.ts (70 tests) 94ms

 Test Files  1 passed (1)
      Tests  70 passed (70)
   Duration  6.94s

New tests cover all three fallback paths:

  • Missing-usage happy path: compression succeeds with local estimation, token count bounded 1000-1100 ✓
  • Inflated delta rejection: 40K char summary on 800-token input → COMPRESSION_FAILED_INFLATED_TOKEN_COUNT
  • Cap-sized rejection (Latin): COMPACT_MAX_OUTPUT_TOKENS * 4 char summary → COMPRESSION_FAILED_OUTPUT_TRUNCATED
  • Cap-sized rejection (CJK): CJK-heavy summary hitting cap → COMPRESSION_FAILED_OUTPUT_TRUNCATED
$ tsc --noEmit  (exit 0, no errors)
$ eslint chatCompressionService.ts chatCompressionService.test.ts  (exit 0, no issues)

Real-Scenario Testing

N/A for this PR — the behavior change is internal compression service logic triggered only by providers that omit usage metadata (e.g. MiniMax-2.7). Reproducing the original bug requires that specific provider's API. The unit tests comprehensively verify all fallback paths and guard rails.

中文说明

代码审查

独立方案(读 diff 前):当 provider 在压缩 side-query 上省略 usage 元数据时,回退到本地 char-based token 估算——与自动压缩门控使用的同一个估算器。保守处理,拒绝会导致 token 计数膨胀的估算。PR 的做法大致如此。

PR 的实际方案更进一步,使用了非可见部分保留技术:不是从零估算压缩后的总量(会漏掉 ~15-20K 的系统提示 + 工具定义),而是计算 nonVisibleTokens = originalCount - estimatedVisibleBefore,然后 newCount = nonVisibleTokens + estimatedVisibleAfter。这比简单的全量估算回退更准确。

CJK 感知的输出估算(estimateSummaryOutputTokens)是一个很好的补充——CJK 字符的 token 化率约为拉丁文本的 1.5 倍,单纯的 char/4 估算会显著低估 CJK 摘要。乘数是一个合理的启发式值。

未发现阻塞问题。 具体观察:

  • Cap 守卫(>= COMPACT_MAX_OUTPUT_TOKENS)和膨胀守卫(newTokenCount > originalTokenCount)自然扩展到估算值——无需额外代码即可保留防护。
  • Fallback 仅在 compressionOutputTokenCount === undefined 时激活,因此有元数据的 provider 行为完全不受影响。

测试验证

70 个测试全部通过,包括 4 个新测试覆盖所有 fallback 路径和守卫。TypeScript 类型检查和 ESLint 均无问题。

真实场景测试

N/A——行为变化是内部压缩服务逻辑,仅在省略 usage 元数据的 provider(如 MiniMax-2.7)上触发。复现原始 bug 需要该特定 provider 的 API。单元测试全面验证了所有 fallback 路径和防护守卫。

Qwen Code · qwen3.7-max

@qwen-code-ci-bot

Copy link
Copy Markdown
Collaborator

Stepping back: this is a well-executed bug fix that solves a real, reproducible problem. The MiniMax-2.7 user report in #3282 is clear — compression fails 100% of the time because the provider omits usage metadata. The fix is conservative (only activates when metadata is genuinely missing), well-guarded (cap and inflation checks still fire on estimated values), and accurately preserves the non-visible token overhead that a naive full-estimate would lose.

My independent proposal was simpler — just estimate everything locally. The PR's non-visible remainder technique is genuinely better: it anchors the system prompt + tool definition cost to the API-reported total and only estimates the delta. That's the right call.

Code is clean, tests cover all paths, typecheck and lint are green. Nothing to nitpick.

Approving. ✅

中文说明

总结:这是一个执行良好的 bug 修复,解决了一个真实的、可复现的问题。#3282 中的 MiniMax-2.7 用户报告很清楚——因为 provider 省略了 usage 元数据,压缩 100% 失败。修复是保守的(仅在元数据确实缺失时激活)、有良好防护的(cap 和膨胀检查在估算值上仍然触发),并且准确保留了朴素全量估算会丢失的非可见 token 开销。

我的独立方案更简单——全部本地估算。PR 的非可见部分保留技术确实更好:将系统提示 + 工具定义的开销锚定在 API 报告的总数上,只估算增量。这是正确的选择。

代码干净,测试覆盖所有路径,类型检查和 lint 均通过。没有可挑剔的地方。

批准。✅

Qwen Code · qwen3.7-max

@qwen-code-ci-bot qwen-code-ci-bot left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, looks ready to ship. ✅

@wenshao

wenshao commented Jun 14, 2026

Copy link
Copy Markdown
Collaborator

✅ Fresh local Linux verification — author's commands, an independent math harness, and a live /compress session

The PR table lists Linux as CI only, so I re-verified end-to-end on Linux: reproduced the author's commands, independently checked the fix's math against the compiled bundle, and ran a real /compress under tmux. Posting as an updated merge reference.

Setup

  • Branch Jiarui/fix-compression-missing-usage @ 6020b73 (fresh git worktree + npm install + npm run build), CLI v0.17.1
  • Platform 🐧 Linux / Node v22.22.2

1) Author's Reviewer Test Plan reproduced — all green

Check Result
vitest …chatCompressionService.test.ts -t "usage metadata|inflated local delta|cap-sized" 4 passed
vitest …chatCompressionService.test.ts (full file) 70 passed
eslint (test + impl) ✅ clean
prettier --check (test + impl) ✅ "All matched files use Prettier code style!"
tsc --noEmit -p packages/core ✅ no errors
npm run build ✅ exit 0

The four tests that exercise the actual compress() against a usage-omitting side-query response ({text, usage: undefined} — the real bug scenario) all pass:

  • should use estimated token count if usage metadata is missingCOMPRESSED
  • should reject inflated local delta if usage metadata is missingCOMPRESSION_FAILED_INFLATED_TOKEN_COUNT
  • should reject cap-sized summaries even if usage metadata is missingCOMPRESSION_FAILED_OUTPUT_TRUNCATED
  • should reject CJK cap-sized summaries when usage metadata is missing → rejected

2) Independent math harness against the compiled bundle — 10/10

To check the fix's math without reusing the PR's own vitest mocks, I imported the built estimateContentTokens / COMPACT_MAX_OUTPUT_TOKENS from dist and reproduced the two formulae the fix introduces:

A1. estimated visible tokens of 4×4000-char history = 4000                    PASS
A2. preserved non-visible remainder = 1000  (matches the debug log asserted)  PASS
A3. resulting newTokenCount ∈ (1000,1100) — matches the test's assertion      PASS (1004)
B1. CJK-aware estimate >= generic estimate (conservative/higher direction)    PASS (3000 vs 500)
B3. ASCII summary uses generic estimate (chars/4)                             PASS (500)
C1. CJK cap fixture estimate >= COMPACT_MAX_OUTPUT_TOKENS  => rejected         PASS (20001 ≥ 20000)
C2. generic-only estimate WOULD slip past the cap (the bug the fix prevents)  PASS (3334 < 20000)
D1. 40000-char summary (10000 tok) > original 800 → inflation guard fires     PASS
==== 10 passed, 0 failed ====

This independently confirms the design:

  • The non-visible-remainder fallback (originalTokenCount − estimatedVisible preserved, only the visible delta re-estimated) reproduces the exact numbers the test asserts (1000 remainder, ~1004 result), so missing usage cannot replace the authoritative total with a visible-only underestimate.
  • The CJK-aware cap guard takes max(generic, cjkAware); the test's CJK cap fixture estimates 20001 ≥ 20000 and is rejected, whereas the generic-only estimate would be 3334 and slip past the cap — i.e., the harness directly demonstrates the CJK under-count bug this guard closes (the concern raised earlier in review).

3) Live tmux /compress session (--debug) — works end-to-end, no regression

Built up a real ~18.7k-token conversation with glm-5.1, then ran /compress:

✦ Chat history compressed from 18729 to 17858 tokens.
  • ✅ Real end-to-end compression succeeds on Linux; 0 compression errors in the debug log.
  • The live provider returned usage metadata, so this run took the normal (non-fallback) path — confirming the change is non-regressive for usage-reporting providers. The new fallback fires only when a provider omits usage on the side-query (the PR cites MiniMax-2.7); that branch is covered by §1 (real compress() over a usage-omitting response) and §2 (the math), since a usage-omitting provider can't be forced deterministically through a live transport.

Verdict

LGTM — recommend merge. (Consistent with my earlier approval; this adds Linux coverage the table was missing.) The missing-usage fallback chain is conservative and correct: it preserves the API-reported non-visible remainder, only locally re-estimates the visible delta, and the CJK-aware cap/inflation guards correctly reject unsafe summaries — verified against real shipped code, the compiled bundle's math, and a live session. Tests/lint/typecheck/build are all clean on Linux.

中文版(点击展开)

✅ 全新本地 Linux 验证 —— 复现作者命令、对编译产物做独立数学校验、并跑真实 /compress 会话

PR 表格里 Linux 标注为 CI only,所以我在 Linux 上端到端重新验证:复现作者命令、对编译后产物独立校验该修复的数学、并在 tmux 里跑了真实 /compress。作为更新后的合并参考。

环境

  • 分支 Jiarui/fix-compression-missing-usage @ 6020b73(全新 worktree + npm install + npm run build),CLI v0.17.1
  • 平台 🐧 Linux / Node v22.22.2

1) 复现作者 Reviewer Test Plan —— 全绿

  • 定向测试(-t "usage metadata|inflated local delta|cap-sized"4 passed;完整文件 70 passed
  • eslint ✅ 干净、prettier --check ✅ 全部符合、tsc --noEmit -p packages/core ✅ 无错误、npm run build ✅。
  • 四个直接用 usage: undefined(真实 bug 场景)驱动 compress() 的测试全部通过:估算成功压缩 / 拒绝膨胀 delta / 拒绝 cap-sized 摘要 / 拒绝 CJK cap-sized 摘要。

2) 针对编译产物的独立数学 harness —— 10/10

不复用 PR 自带 mock,从 dist 导入 estimateContentTokens / COMPACT_MAX_OUTPUT_TOKENS,复刻该修复引入的两段公式:

  • non-visible remainder fallback:4×4000 字符历史的可见估算=4000、保留的不可见余量=1000、最终 newTokenCount≈1004 —— 与测试断言完全一致,说明缺失 usage 不会用“仅可见历史”的低估替换权威总量。
  • CJK-aware cap guard:取 max(generic, cjkAware);测试的 CJK cap fixture 估算 20001 ≥ 20000 被拒,而仅用 generic 估算会是 3334漏过 cap —— harness 直接复现了该 guard 所修复的 CJK 低估 bug(此前 review 提出的关注点)。
  • 膨胀 guard:40000 字符摘要(≈10000 tok) > 原始 800 → 触发拒绝。
==== 10 passed, 0 failed ====

3) tmux /compress 实时会话(--debug)—— 端到端可用、无回归

glm-5.1 构造约 18.7k token 的真实对话后执行 /compress

✦ Chat history compressed from 18729 to 17858 tokens.
  • ✅ Linux 上真实端到端压缩成功,debug 日志 0 压缩错误。
  • 实时 provider 返回了 usage metadata,所以本次走正常(非 fallback)路径 —— 证明该改动对会返回 usage 的 provider 无回归。新 fallback 仅在 provider 在 side-query 上省略 usage 时触发(PR 提到 MiniMax-2.7);该分支由 §1(真实 compress() + usage 缺失响应)与 §2(数学)覆盖,因为无法通过真实传输确定性地强制某 provider 省略 usage。

结论

LGTM — 建议合并。(与我此前的 approve 一致;本次补上了表格缺失的 Linux 覆盖。)缺失 usage 的 fallback 链保守且正确:保留 API 报告的不可见余量、仅本地重估可见 delta,CJK-aware 的 cap/膨胀 guard 能正确拒绝不安全摘要 —— 已对真实发布代码、编译产物的数学、以及实时会话三方面验证。Linux 上 tests/lint/typecheck/build 全绿。

@wenshao wenshao merged commit d979e3d into QwenLM:main Jun 14, 2026
43 checks passed
doudouOUC pushed a commit that referenced this pull request Jun 15, 2026
* fix(core): compress when usage metadata is missing

* fix(core): harden missing compression usage fallback

* test(core): cover missing-usage inflated compression fallback

* fix(core): clarify missing usage compression fallback

* test(core): align compression fallback assertion with summary trailer

* test(core): use ascii cjk fixture escape

* test(core): tighten missing-usage compression assertions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

使用 MiniMax-2.7 模型,上下文压缩报错

6 participants