fix(core): compress when usage metadata is missing#4528
Conversation
|
This PR shares code with PR #4525, PR #4526, and PR #4531. Key shared functions:
Recommendation: This PR is smaller (2 functions) but shares compression service code with PR #4525. This conflict was detected automatically by CodeScope skills — structural analysis of call graphs and file modifications. |
|
This PR shares code with PR #4525, PR #4526, and PR #4531. Key shared functions:
Recommendation: This PR is smaller (2 functions) but shares compression service code with PR #4525. This conflict was detected automatically by CodeScope skills — structural analysis of call graphs and file modifications. |
wenshao
left a comment
There was a problem hiding this comment.
Suggestion (pre-existing code, not on diff line): The output-truncation guard at chatCompressionService.ts:560 (compressionOutputTokenCount >= COMPACT_MAX_OUTPUT_TOKENS) is a no-op when usage is undefined, because compressionOutputTokenCount is also undefined. A usage-omitting provider that returns a max-length truncated summary will have it accepted unconditionally by the new fallback path. Consider logging a warning when the fallback path is used and the truncation guard cannot fire, so the risk is at least observable.
— qwen3.7-max via Qwen Code /review
| (compressionInputTokenCount - 1000) + | ||
| compressionOutputTokenCount, | ||
| ); | ||
| } else { |
There was a problem hiding this comment.
[Critical] estimateContentTokens(extraHistory) produces a char/4 lower-bound of only the visible post-compression messages (summary + ack + historyToKeep). But originalTokenCount is API-reported and includes system prompt + tool definitions (~15–25K tokens of invisible context). These are on fundamentally different scales.
Downstream, geminiChat.ts:1362 stores newTokenCount as lastPromptTokenCount. On the next send, estimatePromptTokens() takes the lastPromptTokenCount > 0 branch and returns lastPromptTokenCount + estimateContentTokens([userMessage]) — missing the system-prompt-and-tools delta entirely. The auto-compaction gate then undercounts by 15–25K tokens until the next API response corrects it. For providers that consistently omit usage, the undercount persists across compression cycles.
Additionally, the newTokenCount > originalTokenCount (INFLATED_TOKEN_COUNT) guard becomes effectively dead code: the char/4 estimate of visible content will almost always be less than the API-reported full-prompt count, so even a pathological summary that makes the history larger would pass this check.
Suggested fix — estimate the delta rather than the absolute post-compression size, so newTokenCount stays on the same scale as originalTokenCount:
} else {
// Preserve the API-reported baseline (system prompt + tools) by
// estimating the reduction from content replacement rather than
// the absolute post-compression size.
const estimatedRemoved = estimateContentTokens(
historyToCompress,
slimmingConfig.imageTokenEstimate,
);
const syntheticMessages = extraHistory.slice(
0,
extraHistory.length - historyToKeep.length,
);
const estimatedAdded = estimateContentTokens(
syntheticMessages,
slimmingConfig.imageTokenEstimate,
);
newTokenCount = Math.max(
0,
originalTokenCount - estimatedRemoved + estimatedAdded,
);
canCalculateNewTokenCount = newTokenCount > 0;
}This mirrors the primary path's structure (originalTokenCount - removed + added) and keeps both sides of the INFLATED guard comparison on the same measurement scale.
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Addressed in the current branch. The missing-usage path no longer replaces the API-reported total with a visible-history-only estimate. It computes the visible delta only, preserves the API-reported non-visible remainder (originalTokenCount - estimatedOriginalVisibleTokenCount), and adds the estimated post-compression visible history back on top.
Coverage:
should use estimated token count if usage metadata is missing
Validated:
npm run test --workspace=packages/core -- src/services/chatCompressionService.test.tsnpx prettier --check packages/core/src/services/chatCompressionService.ts packages/core/src/services/chatCompressionService.test.tsnpx eslint packages/core/src/services/chatCompressionService.ts packages/core/src/services/chatCompressionService.test.ts
| newTokenCount = Math.max( | ||
| 0, | ||
| originalTokenCount - | ||
| (compressionInputTokenCount - 1000) + |
There was a problem hiding this comment.
[Suggestion] This fallback branch silently switches measurement methodology with no log trace. Other paths in this file emit debug/warn messages (e.g., lines 494, 567). A 3 AM operator sees a chat_compression telemetry event with tokens_after populated but compression_input_token_count / compression_output_token_count absent — there is no breadcrumb explaining why.
| (compressionInputTokenCount - 1000) + | |
| } else { | |
| config.getDebugLogger().debug( | |
| `[chat-compression] usage metadata omitted by provider; ` + | |
| `falling back to local token estimation.`, | |
| ); | |
| newTokenCount = estimateContentTokens( |
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Addressed in 6020b735b: the missing-usage fallback path now has an explicit debug breadcrumb. The test captures getDebugLogger().debug and asserts the usage metadata missing / preserved non-visible remainder message is emitted.
| expect(result.info.originalTokenCount).toBe(800); | ||
| expect(result.info.newTokenCount).toBe(800); | ||
| expect(result.newHistory).toBeNull(); | ||
| expect(result.info.newTokenCount).toBeLessThan(800); |
There was a problem hiding this comment.
[Suggestion] toBeLessThan(800) is too loose to verify the estimation fallback actually ran. With the test's tiny history ("Summary" + "Got it..." ack + a few kept messages ≈ 57 chars), estimateContentTokens returns approximately Math.ceil(57/4) = 15 tokens. The upper bound of 800 would pass even if the fallback branch were never entered and newTokenCount were accidentally close to originalTokenCount.
Consider a tighter assertion that actually proves the estimator ran:
| expect(result.info.newTokenCount).toBeLessThan(800); | |
| expect(result.info.newTokenCount).toBeGreaterThan(0); | |
| expect(result.info.newTokenCount).toBeLessThan(50); |
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Addressed in 6020b735b with a tighter assertion. I did not use <50 here because the current fallback intentionally preserves the API-reported non-visible remainder (5000 - 4000 = 1000) and only replaces the visible-history delta. The test now asserts newTokenCount > 1000 and < 1100, which verifies the fallback estimator ran while preserving the non-visible remainder.
| expect(result.info.newTokenCount).toBeLessThan(5_000); | ||
| expect(result.newHistory).not.toBeNull(); | ||
| expect(result.newHistory![0].parts![0].text).toBe('Summary'); | ||
| }); |
There was a problem hiding this comment.
[Suggestion] Missing test coverage for the delta-based estimation path producing COMPRESSION_FAILED_INFLATED_TOKEN_COUNT. All existing INFLATED tests (lines 959 and 1171) use explicit promptTokenCount/candidatesTokenCount in mock usage, exercising the primary formula. No test covers the scenario where delta-based estimation (estimatedNewContentTokenCount > estimatedOriginalContentTokenCount) triggers the INFLATED guard — e.g., when the side-query returns a verbose summary larger than the compressed input in estimated terms. This branch at chatCompressionService.ts:723 is reachable via the delta path but untested.
Consider adding a test that constructs a small historyForSplit with a verbose summary (e.g., text: 'x'.repeat(40_000)) and usage: undefined, asserting COMPRESSION_FAILED_INFLATED_TOKEN_COUNT and newHistory === null.
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Covered in the current branch by should reject inflated local delta if usage metadata is missing. That test sets usage: undefined, exercises the local delta estimation path, and asserts COMPRESSION_FAILED_INFLATED_TOKEN_COUNT with newHistory === null when the estimated post-compression count is inflated.
Validated:
npm run test --workspace=packages/core -- src/services/chatCompressionService.test.tsnpx prettier --check packages/core/src/services/chatCompressionService.ts packages/core/src/services/chatCompressionService.test.tsnpx eslint packages/core/src/services/chatCompressionService.ts packages/core/src/services/chatCompressionService.test.ts
wenshao
left a comment
There was a problem hiding this comment.
All R1-R2 findings addressed. Delta-based estimation approach is sound, tests cover the key scenarios (happy path, inflated rejection, cap guard). CI green (8/8). — qwen3.7-max via Qwen Code /review
PR #4528 Verification ReportBranch: Build & Type Check
Unit Tests
New/Modified Test Coverage
Code Review Observations
VerdictReady to merge. The fallback logic is well-bounded with proper rejection guards, and all 71 tests pass including the 3 new coverage scenarios. — wenshao |
BZ-D
left a comment
There was a problem hiding this comment.
PR 整体逻辑合理:delta-based 估算保留了 API 侧不可见 token 开销,仅对可见对话部分做本地 char/4 估算,方向安全(保守)。truncation guard 的本地补充也正确。发现一个会导致编译失败的 merge conflict。
| ); | ||
| } else { | ||
| const estimatedOriginalVisibleTokenCount = estimateContentTokens( | ||
| historyForSplit, |
There was a problem hiding this comment.
[Critical — 合并冲突] historyForSplit 在当前 main 分支上已不存在。PR #4599 将 tail-preservation 压缩重构为 summary + restoration attachments 模式后,该变量被移除。GitHub 已将本 PR 标记为 CONFLICTING。
需要 rebase 到最新 main。rebase 后此处应改用 curatedHistory(全量对话历史),或根据新代码结构中 historyForCompose(已剥离 orphaned functionCall 的版本)来估算原始可见 token。语义上 curatedHistory 更合适,因为 token 估算需要覆盖压缩前的完整可见内容。
There was a problem hiding this comment.
Fixed in 299e3d7.
I merged current origin/main into the PR branch, resolved the chatCompressionService.ts conflict, and changed the missing-usage fallback to estimate the original visible token baseline from curatedHistory instead of the removed historyForSplit path. The merge also keeps main's post-compact attachment token adjustment on the usage-metadata path.
Validation:
npm run test --workspace=packages/core -- src/services/chatCompressionService.test.ts -t "usage metadata|missing usage|inflated|cap"npm run test --workspace=packages/core -- src/services/chatCompressionService.test.tsnpx prettier --check packages/core/src/services/chatCompressionService.ts packages/core/src/services/chatCompressionService.test.tsnpx eslint packages/core/src/services/chatCompressionService.ts packages/core/src/services/chatCompressionService.test.tsnpm run typecheck --workspace=packages/core
…n-missing-usage # Conflicts: # packages/core/src/services/chatCompressionService.ts
| if (compressionOutputTokenCount === undefined && !isSummaryEmpty) { | ||
| compressionOutputTokenCount = estimateContentTokens( | ||
| [{ role: 'model', parts: [{ text: summary }] }], | ||
| slimmingConfig.imageTokenEstimate, |
There was a problem hiding this comment.
estimateContentTokens uses a fixed ~4 chars/token ratio (CHARS_PER_TOKEN). For CJK text the actual ratio is closer to ~0.67 chars/token (each CJK character ≈ 1.5 tokens). A CJK summary truncated at COMPACT_MAX_OUTPUT_TOKENS tokens would have only ~COMPACT_MAX_OUTPUT_TOKENS × 0.67 characters, yielding an estimate of ~COMPACT_MAX_OUTPUT_TOKENS × 0.17 — well below the threshold, so the truncation guard on line 413 would not fire.
This only affects providers that omit usage metadata AND produce CJK output, but for those providers a truncated summary could silently corrupt the compressed history.
One possible mitigation: supplement the char-based estimate with a length check against COMPACT_MAX_OUTPUT_TOKENS × CHARS_PER_TOKEN chars (the maximum the model could have written), or apply a safety margin.
There was a problem hiding this comment.
Addressed in the current branch. The output-cap guard now uses a CJK-aware local estimate for summary output when provider usage is missing, so CJK summaries near the output-token cap are rejected as COMPRESSION_FAILED_OUTPUT_TRUNCATED instead of passing through via char/4 underestimation.
Coverage:
should reject CJK cap-sized summaries when usage metadata is missing
I also pushed 93b07ea08 to write that fixture as \u4e00, avoiding non-ASCII display/encoding noise while keeping the same CJK token-estimation case.
Validated:
npm run test --workspace=packages/core -- src/services/chatCompressionService.test.tsnpx prettier --check packages/core/src/services/chatCompressionService.ts packages/core/src/services/chatCompressionService.test.tsnpx eslint packages/core/src/services/chatCompressionService.ts packages/core/src/services/chatCompressionService.test.ts
BZ-D
left a comment
There was a problem hiding this comment.
整体逻辑清晰,fallback 路径设计合理。留了一个 inline comment 关于 CJK 场景下 estimateContentTokens 的 char/token 比率偏差 —— 在无 usage metadata 的 provider + CJK 输出场景下,truncation guard 可能漏判。不 block,供参考。
wenshao
left a comment
There was a problem hiding this comment.
No new issues found. All R1-R3 findings addressed. The missing-usage fallback chain (totalTokenCount delta → local estimate → undefined) and the delta-based token math with non-visible remainder preservation are well-designed. Merge conflict resolution (historyForSplit → curatedHistory) is correct. LGTM ✅ — qwen-latest-series-invite-beta-v38 via Qwen Code /review
299e3d7 to
9facc14
Compare
BZ-D
left a comment
There was a problem hiding this comment.
Reviewed the diff and ran the targeted compression tests; no blocking issues found.
…n-missing-usage # Conflicts: # packages/core/src/services/chatCompressionService.ts
|
Merged latest Conflict resolution:
Validation on Windows:
|
DragonnZhang
left a comment
There was a problem hiding this comment.
Review Summary
Verdict: APPROVE
No new findings. The fallback chain for missing usage metadata is well-designed and conservative throughout.
What was reviewed
chatCompressionService.ts: NewestimateSummaryOutputTokensfunction with CJK awareness, local estimation fallback in the token mathelsebranch, and the integration of local estimates into the existing cap guard and inflation guard.chatCompressionService.test.ts: Four new/updated tests covering happy path, inflated delta rejection, cap-sized summary rejection, and CJK cap-sized summary rejection — all without usage metadata.
Assessment
- The fallback chain (
totalTokenCountdelta → local char-based estimate →undefined) is sound and conservative. - The non-visible remainder preservation (
originalTokenCount - estimatedOriginalVisibleTokenCount) correctly prevents the local estimator from replacing the API-authoritative total with a visible-history-only underestimate. - The CJK-aware output estimate takes
max(generic, cjkAware), which is the safe (higher) direction — reducing false negatives in the cap guard. - The inflation guard (
newTokenCount > originalTokenCount) correctly catches oversized summaries even without provider usage data. - Tests are comprehensive and validate both success and failure paths under missing usage metadata.
- Deterministic checks (tsc, eslint): 0 findings.
- CI: 14/14 checks passing.
— qwen-code via Qwen Code /review
|
Addressed the review suggestions in Updates:
Validated locally:
|
|
@qwen-code /triage |
|
Thanks for the PR, @Jerry2003826! Template looks good ✓ — all required sections present, bilingual, linked issue. On direction: this is a clear-cut bug fix for a 100% reproducible failure. MiniMax-2.7 (and likely other OpenAI-compatible providers) omit On approach: scope is tight — 2 files, ~270 lines, focused entirely on the missing-metadata path. The two-part fallback (output token estimation + non-visible remainder preservation) is the right shape. No scope creep, no speculative features. Clean. Moving on to code review. 🔍 中文说明感谢 @Jerry2003826 的贡献! 模板完整 ✓ — 所有必要章节齐全,中英双语,关联了 issue。 方向:这是一个 100% 可复现的 bug 修复。MiniMax-2.7(以及其他 OpenAI 兼容 provider)在压缩 side-query 上省略了 方案:范围紧凑——2 个文件,约 270 行,完全聚焦于缺失元数据的处理路径。两部分 fallback(输出 token 估算 + 非可见部分保留)是正确的思路。没有范围蔓延,没有投机性功能。干净。 进入代码审查 🔍 — Qwen Code · qwen3.7-max |
Code ReviewIndependent proposal (before reading the diff): when a provider omits usage metadata on the compression side-query, fall back to local char-based token estimation — the same estimator used by the auto-compaction gate. Keep it conservative, and reject estimates that would inflate the token count. That's roughly what the PR does. The PR's actual approach goes one step further with a non-visible remainder preservation technique: instead of estimating the total post-compression count from scratch (which would miss ~15-20K of system prompt + tool definitions), it computes CJK-aware output estimation ( No blockers found. Specific observations:
Test VerificationNew tests cover all three fallback paths:
Real-Scenario TestingN/A for this PR — the behavior change is internal compression service logic triggered only by providers that omit usage metadata (e.g. MiniMax-2.7). Reproducing the original bug requires that specific provider's API. The unit tests comprehensively verify all fallback paths and guard rails. 中文说明代码审查独立方案(读 diff 前):当 provider 在压缩 side-query 上省略 usage 元数据时,回退到本地 char-based token 估算——与自动压缩门控使用的同一个估算器。保守处理,拒绝会导致 token 计数膨胀的估算。PR 的做法大致如此。 PR 的实际方案更进一步,使用了非可见部分保留技术:不是从零估算压缩后的总量(会漏掉 ~15-20K 的系统提示 + 工具定义),而是计算 CJK 感知的输出估算( 未发现阻塞问题。 具体观察:
测试验证70 个测试全部通过,包括 4 个新测试覆盖所有 fallback 路径和守卫。TypeScript 类型检查和 ESLint 均无问题。 真实场景测试N/A——行为变化是内部压缩服务逻辑,仅在省略 usage 元数据的 provider(如 MiniMax-2.7)上触发。复现原始 bug 需要该特定 provider 的 API。单元测试全面验证了所有 fallback 路径和防护守卫。 — Qwen Code · qwen3.7-max |
|
Stepping back: this is a well-executed bug fix that solves a real, reproducible problem. The MiniMax-2.7 user report in #3282 is clear — compression fails 100% of the time because the provider omits usage metadata. The fix is conservative (only activates when metadata is genuinely missing), well-guarded (cap and inflation checks still fire on estimated values), and accurately preserves the non-visible token overhead that a naive full-estimate would lose. My independent proposal was simpler — just estimate everything locally. The PR's non-visible remainder technique is genuinely better: it anchors the system prompt + tool definition cost to the API-reported total and only estimates the delta. That's the right call. Code is clean, tests cover all paths, typecheck and lint are green. Nothing to nitpick. Approving. ✅ 中文说明总结:这是一个执行良好的 bug 修复,解决了一个真实的、可复现的问题。#3282 中的 MiniMax-2.7 用户报告很清楚——因为 provider 省略了 usage 元数据,压缩 100% 失败。修复是保守的(仅在元数据确实缺失时激活)、有良好防护的(cap 和膨胀检查在估算值上仍然触发),并且准确保留了朴素全量估算会丢失的非可见 token 开销。 我的独立方案更简单——全部本地估算。PR 的非可见部分保留技术确实更好:将系统提示 + 工具定义的开销锚定在 API 报告的总数上,只估算增量。这是正确的选择。 代码干净,测试覆盖所有路径,类型检查和 lint 均通过。没有可挑剔的地方。 批准。✅ — Qwen Code · qwen3.7-max |
qwen-code-ci-bot
left a comment
There was a problem hiding this comment.
LGTM, looks ready to ship. ✅
✅ Fresh local Linux verification — author's commands, an independent math harness, and a live
|
| Check | Result |
|---|---|
vitest …chatCompressionService.test.ts -t "usage metadata|inflated local delta|cap-sized" |
4 passed |
vitest …chatCompressionService.test.ts (full file) |
70 passed |
eslint (test + impl) |
✅ clean |
prettier --check (test + impl) |
✅ "All matched files use Prettier code style!" |
tsc --noEmit -p packages/core |
✅ no errors |
npm run build |
✅ exit 0 |
The four tests that exercise the actual compress() against a usage-omitting side-query response ({text, usage: undefined} — the real bug scenario) all pass:
should use estimated token count if usage metadata is missing→COMPRESSEDshould reject inflated local delta if usage metadata is missing→COMPRESSION_FAILED_INFLATED_TOKEN_COUNTshould reject cap-sized summaries even if usage metadata is missing→COMPRESSION_FAILED_OUTPUT_TRUNCATEDshould reject CJK cap-sized summaries when usage metadata is missing→ rejected
2) Independent math harness against the compiled bundle — 10/10
To check the fix's math without reusing the PR's own vitest mocks, I imported the built estimateContentTokens / COMPACT_MAX_OUTPUT_TOKENS from dist and reproduced the two formulae the fix introduces:
A1. estimated visible tokens of 4×4000-char history = 4000 PASS
A2. preserved non-visible remainder = 1000 (matches the debug log asserted) PASS
A3. resulting newTokenCount ∈ (1000,1100) — matches the test's assertion PASS (1004)
B1. CJK-aware estimate >= generic estimate (conservative/higher direction) PASS (3000 vs 500)
B3. ASCII summary uses generic estimate (chars/4) PASS (500)
C1. CJK cap fixture estimate >= COMPACT_MAX_OUTPUT_TOKENS => rejected PASS (20001 ≥ 20000)
C2. generic-only estimate WOULD slip past the cap (the bug the fix prevents) PASS (3334 < 20000)
D1. 40000-char summary (10000 tok) > original 800 → inflation guard fires PASS
==== 10 passed, 0 failed ====
This independently confirms the design:
- The non-visible-remainder fallback (
originalTokenCount − estimatedVisiblepreserved, only the visible delta re-estimated) reproduces the exact numbers the test asserts (1000 remainder, ~1004 result), so missing usage cannot replace the authoritative total with a visible-only underestimate. - The CJK-aware cap guard takes
max(generic, cjkAware); the test's CJK cap fixture estimates20001 ≥ 20000and is rejected, whereas the generic-only estimate would be3334and slip past the cap — i.e., the harness directly demonstrates the CJK under-count bug this guard closes (the concern raised earlier in review).
3) Live tmux /compress session (--debug) — works end-to-end, no regression
Built up a real ~18.7k-token conversation with glm-5.1, then ran /compress:
✦ Chat history compressed from 18729 to 17858 tokens.
- ✅ Real end-to-end compression succeeds on Linux;
0compression errors in the debug log. - The live provider returned usage metadata, so this run took the normal (non-fallback) path — confirming the change is non-regressive for usage-reporting providers. The new fallback fires only when a provider omits usage on the side-query (the PR cites MiniMax-2.7); that branch is covered by §1 (real
compress()over a usage-omitting response) and §2 (the math), since a usage-omitting provider can't be forced deterministically through a live transport.
Verdict
LGTM — recommend merge. (Consistent with my earlier approval; this adds Linux coverage the table was missing.) The missing-usage fallback chain is conservative and correct: it preserves the API-reported non-visible remainder, only locally re-estimates the visible delta, and the CJK-aware cap/inflation guards correctly reject unsafe summaries — verified against real shipped code, the compiled bundle's math, and a live session. Tests/lint/typecheck/build are all clean on Linux.
中文版(点击展开)
✅ 全新本地 Linux 验证 —— 复现作者命令、对编译产物做独立数学校验、并跑真实 /compress 会话
PR 表格里 Linux 标注为 CI only,所以我在 Linux 上端到端重新验证:复现作者命令、对编译后产物独立校验该修复的数学、并在 tmux 里跑了真实 /compress。作为更新后的合并参考。
环境
- 分支
Jiarui/fix-compression-missing-usage@6020b73(全新 worktree +npm install+npm run build),CLIv0.17.1 - 平台 🐧 Linux / Node
v22.22.2
1) 复现作者 Reviewer Test Plan —— 全绿
- 定向测试(
-t "usage metadata|inflated local delta|cap-sized")4 passed;完整文件 70 passed; eslint✅ 干净、prettier --check✅ 全部符合、tsc --noEmit -p packages/core✅ 无错误、npm run build✅。- 四个直接用
usage: undefined(真实 bug 场景)驱动compress()的测试全部通过:估算成功压缩 / 拒绝膨胀 delta / 拒绝 cap-sized 摘要 / 拒绝 CJK cap-sized 摘要。
2) 针对编译产物的独立数学 harness —— 10/10
不复用 PR 自带 mock,从 dist 导入 estimateContentTokens / COMPACT_MAX_OUTPUT_TOKENS,复刻该修复引入的两段公式:
- non-visible remainder fallback:4×4000 字符历史的可见估算=4000、保留的不可见余量=1000、最终 newTokenCount≈1004 —— 与测试断言完全一致,说明缺失 usage 不会用“仅可见历史”的低估替换权威总量。
- CJK-aware cap guard:取
max(generic, cjkAware);测试的 CJK cap fixture 估算20001 ≥ 20000被拒,而仅用 generic 估算会是3334、漏过 cap —— harness 直接复现了该 guard 所修复的 CJK 低估 bug(此前 review 提出的关注点)。 - 膨胀 guard:40000 字符摘要(≈10000 tok) > 原始 800 → 触发拒绝。
==== 10 passed, 0 failed ====
3) tmux /compress 实时会话(--debug)—— 端到端可用、无回归
用 glm-5.1 构造约 18.7k token 的真实对话后执行 /compress:
✦ Chat history compressed from 18729 to 17858 tokens.
- ✅ Linux 上真实端到端压缩成功,debug 日志
0压缩错误。 - 实时 provider 返回了 usage metadata,所以本次走正常(非 fallback)路径 —— 证明该改动对会返回 usage 的 provider 无回归。新 fallback 仅在 provider 在 side-query 上省略 usage 时触发(PR 提到 MiniMax-2.7);该分支由 §1(真实
compress()+ usage 缺失响应)与 §2(数学)覆盖,因为无法通过真实传输确定性地强制某 provider 省略 usage。
结论
LGTM — 建议合并。(与我此前的 approve 一致;本次补上了表格缺失的 Linux 覆盖。)缺失 usage 的 fallback 链保守且正确:保留 API 报告的不可见余量、仅本地重估可见 delta,CJK-aware 的 cap/膨胀 guard 能正确拒绝不安全摘要 —— 已对真实发布代码、编译产物的数学、以及实时会话三方面验证。Linux 上 tests/lint/typecheck/build 全绿。
* fix(core): compress when usage metadata is missing * fix(core): harden missing compression usage fallback * test(core): cover missing-usage inflated compression fallback * fix(core): clarify missing usage compression fallback * test(core): align compression fallback assertion with summary trailer * test(core): use ascii cjk fixture escape * test(core): tighten missing-usage compression assertions
What this PR does
Allows chat compression to proceed safely when provider usage metadata is missing, while rejecting inflated local token deltas that would make the compression result unsafe.
Why it's needed
Some model responses omit usage metadata. The compression path still needs a conservative fallback instead of silently skipping or accepting an unsafe inflated estimate.
Reviewer Test Plan
How to verify
Run: npm test --workspace=packages/core -- src/services/chatCompressionService.test.ts -t "usage metadata|inflated local delta|cap-sized"; npm test --workspace=packages/core -- src/services/chatCompressionService.test.ts; npx eslint packages/core/src/services/chatCompressionService.test.ts; npx prettier --check packages/core/src/services/chatCompressionService.test.ts; npm run typecheck --workspace=packages/core.
Evidence (Before & After)
Before: missing usage metadata could leave the compression path without coverage for inflated local estimates. After: regression tests cover missing-usage compression, inflated fallback rejection, and cap-sized safe fallback. This is core service behavior, so TUI screenshots are N/A.
Tested on
Environment (optional)
Local Windows/PowerShell checkout with repository npm workspaces. No tmux/TUI capture is included for PRs whose behavior is core, session, parser, or transport logic rather than a visible TUI state.
Risk & Scope
Linked Issues
Fixes #3282
中文说明
这个 PR 做了什么
在 provider 没有返回 usage metadata 时,仍然允许压缩走安全的保守路径,并拒绝膨胀的本地 token 估算。
为什么需要
部分模型响应没有 usage metadata,压缩逻辑需要有可靠 fallback,不能跳过或接受不安全估算。
Reviewer Test Plan
本地在 Windows 跑了 chatCompressionService 定向测试、完整文件测试、eslint、prettier check 和 core typecheck;macOS/Linux 依赖 GitHub CI。
风险和范围
只影响缺失 usage metadata 时的压缩判断,不改压缩公共接口。