-
Notifications
You must be signed in to change notification settings - Fork 2.6k
fix(core): compress when usage metadata is missing #4528
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
0e4ea8d
12769de
3f39d81
1680110
79b64b9
9facc14
b39d03b
93b07ea
6020b73
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -25,7 +25,11 @@ import { | |
| resolveSlimmingConfig, | ||
| slimCompactionInput, | ||
| } from './compactionInputSlimming.js'; | ||
| import { CHARS_PER_TOKEN, estimatePromptTokens } from './tokenEstimation.js'; | ||
| import { | ||
| CHARS_PER_TOKEN, | ||
| estimateContentTokens, | ||
| estimatePromptTokens, | ||
| } from './tokenEstimation.js'; | ||
| import { | ||
| buildStateReminderParts, | ||
| composePostCompactHistory, | ||
|
|
@@ -91,6 +95,30 @@ export const HARD_BUFFER = 3_000; | |
| */ | ||
| export const MAX_CONSECUTIVE_FAILURES = 3; | ||
|
|
||
| const CJK_CHAR_TOKEN_MULTIPLIER = 1.5; | ||
| const CJK_CHAR_PATTERN = | ||
| /[\u3040-\u30ff\u3400-\u9fff\uf900-\ufaff\uac00-\ud7af]/g; | ||
|
|
||
| function estimateSummaryOutputTokens( | ||
| summary: string, | ||
| imageTokenEstimate: number, | ||
| ): number { | ||
| const genericEstimate = estimateContentTokens( | ||
| [{ role: 'model', parts: [{ text: summary }] }], | ||
| imageTokenEstimate, | ||
| ); | ||
| const cjkCharCount = summary.match(CJK_CHAR_PATTERN)?.length ?? 0; | ||
| if (cjkCharCount === 0) { | ||
| return genericEstimate; | ||
| } | ||
|
|
||
| const nonCjkCharCount = Math.max(0, summary.length - cjkCharCount); | ||
| const cjkAwareEstimate = | ||
| Math.ceil(nonCjkCharCount / CHARS_PER_TOKEN) + | ||
| Math.ceil(cjkCharCount * CJK_CHAR_TOKEN_MULTIPLIER); | ||
| return Math.max(genericEstimate, cjkAwareEstimate); | ||
| } | ||
|
|
||
| /** | ||
| * Hard cap on the PreCompact hook's `additionalContext` once it is merged | ||
| * into the side-query system prompt. The user-supplied `/compress` text is | ||
|
|
@@ -497,6 +525,19 @@ export class ChatCompressionService { | |
| compressionUsageMetadata.totalTokenCount - compressionInputTokenCount, | ||
| ); | ||
| } | ||
| if (compressionOutputTokenCount === undefined && !isSummaryEmpty) { | ||
| compressionOutputTokenCount = estimateSummaryOutputTokens( | ||
| summary, | ||
| slimmingConfig.imageTokenEstimate, | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This only affects providers that omit usage metadata AND produce CJK output, but for those providers a truncated summary could silently corrupt the compressed history. One possible mitigation: supplement the char-based estimate with a length check against
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Addressed in the current branch. The output-cap guard now uses a CJK-aware local estimate for summary output when provider usage is missing, so CJK summaries near the output-token cap are rejected as Coverage:
I also pushed Validated:
|
||
| ); | ||
| config | ||
| .getDebugLogger() | ||
| .warn( | ||
| `[chat-compression] compression side-query omitted usage metadata; ` + | ||
| `using local estimate for summary output token count ` + | ||
| `(${compressionOutputTokenCount}).`, | ||
| ); | ||
| } | ||
|
|
||
| // Defensive guard: if the side-query hit COMPACT_MAX_OUTPUT_TOKENS, the | ||
| // summary is likely truncated mid-content and unsafe to persist. Drop it | ||
|
|
@@ -624,7 +665,12 @@ export class ChatCompressionService { | |
| ]; | ||
| } | ||
|
|
||
| // Best-effort token math using *only* model-reported token counts. | ||
| // Best-effort token math using model-reported token counts when | ||
| // available. Some OpenAI-compatible providers omit usage for the | ||
| // compression side-query; in that case, fall back to the same local | ||
| // content estimator used by the auto-compaction gate so a valid summary | ||
| // can still shrink the history instead of failing with a token-count | ||
| // error. | ||
| // | ||
| // Note: compressionInputTokenCount includes the entire compression | ||
| // system prompt (the <state_snapshot> instructions, ~900 tokens) PLUS | ||
|
|
@@ -654,12 +700,9 @@ export class ChatCompressionService { | |
| // The composer injects file-restoration blocks (up to | ||
| // maxRecentFiles × 5K tokens) and an image-restoration block (up to | ||
| // maxRecentImages images) that are NOT in | ||
| // compressionOutputTokenCount. Estimate their | ||
| // cost locally so the inflation guard below | ||
| // (newTokenCount > originalTokenCount) actually fires when | ||
| // attachments dominate the post-compact size, and so | ||
| // `lastPromptTokenCount` doesn't under-report the next auto- | ||
| // compaction cheap-gate input (Finding 1). | ||
| // compressionOutputTokenCount. Estimate their cost locally so the | ||
| // inflation guard below fires when attachments dominate the | ||
| // post-compact size. | ||
| const restorationChars = extraHistory | ||
| .slice(2) // skip [summary, model ack] | ||
| .reduce( | ||
|
|
@@ -668,6 +711,41 @@ export class ChatCompressionService { | |
| 0, | ||
| ); | ||
| newTokenCount += Math.ceil(restorationChars / CHARS_PER_TOKEN); | ||
| } else { | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [Critical] Downstream, Additionally, the Suggested fix — estimate the delta rather than the absolute post-compression size, so } else {
// Preserve the API-reported baseline (system prompt + tools) by
// estimating the reduction from content replacement rather than
// the absolute post-compression size.
const estimatedRemoved = estimateContentTokens(
historyToCompress,
slimmingConfig.imageTokenEstimate,
);
const syntheticMessages = extraHistory.slice(
0,
extraHistory.length - historyToKeep.length,
);
const estimatedAdded = estimateContentTokens(
syntheticMessages,
slimmingConfig.imageTokenEstimate,
);
newTokenCount = Math.max(
0,
originalTokenCount - estimatedRemoved + estimatedAdded,
);
canCalculateNewTokenCount = newTokenCount > 0;
}This mirrors the primary path's structure ( — qwen3.7-max via Qwen Code /review
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Addressed in the current branch. The missing-usage path no longer replaces the API-reported total with a visible-history-only estimate. It computes the visible delta only, preserves the API-reported non-visible remainder ( Coverage:
Validated:
|
||
| const estimatedOriginalVisibleTokenCount = estimateContentTokens( | ||
| curatedHistory, | ||
| slimmingConfig.imageTokenEstimate, | ||
| ); | ||
| const estimatedNewVisibleTokenCount = estimateContentTokens( | ||
| extraHistory, | ||
| slimmingConfig.imageTokenEstimate, | ||
| ); | ||
| if ( | ||
| estimatedOriginalVisibleTokenCount > 0 && | ||
| estimatedNewVisibleTokenCount > 0 | ||
| ) { | ||
| const estimatedNonVisibleTokenCount = Math.max( | ||
| 0, | ||
| originalTokenCount - estimatedOriginalVisibleTokenCount, | ||
| ); | ||
| // Keep the API-reported system/tool/prompt remainder intact. The | ||
| // local estimator is only used for the visible conversation delta, so | ||
| // missing usage metadata cannot replace the authoritative total with | ||
| // a much smaller visible-history-only estimate. | ||
| newTokenCount = | ||
| estimatedNonVisibleTokenCount + estimatedNewVisibleTokenCount; | ||
| canCalculateNewTokenCount = true; | ||
| config | ||
| .getDebugLogger() | ||
| .debug( | ||
| `[chat-compression] usage metadata missing; estimated ` + | ||
| `post-compression token count by preserving the ` + | ||
| `API-reported non-visible remainder ` + | ||
| `(${estimatedNonVisibleTokenCount}) and replacing the ` + | ||
| `visible-history estimate (${estimatedOriginalVisibleTokenCount} -> ` + | ||
| `${estimatedNewVisibleTokenCount}).`, | ||
| ); | ||
| } | ||
| } | ||
| } | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Suggestion] Missing test coverage for the delta-based estimation path producing
COMPRESSION_FAILED_INFLATED_TOKEN_COUNT. All existing INFLATED tests (lines 959 and 1171) use explicitpromptTokenCount/candidatesTokenCountin mock usage, exercising the primary formula. No test covers the scenario where delta-based estimation (estimatedNewContentTokenCount > estimatedOriginalContentTokenCount) triggers the INFLATED guard — e.g., when the side-query returns a verbose summary larger than the compressed input in estimated terms. This branch atchatCompressionService.ts:723is reachable via the delta path but untested.Consider adding a test that constructs a small
historyForSplitwith a verbose summary (e.g.,text: 'x'.repeat(40_000)) andusage: undefined, assertingCOMPRESSION_FAILED_INFLATED_TOKEN_COUNTandnewHistory === null.— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Covered in the current branch by
should reject inflated local delta if usage metadata is missing. That test setsusage: undefined, exercises the local delta estimation path, and assertsCOMPRESSION_FAILED_INFLATED_TOKEN_COUNTwithnewHistory === nullwhen the estimated post-compression count is inflated.Validated:
npm run test --workspace=packages/core -- src/services/chatCompressionService.test.tsnpx prettier --check packages/core/src/services/chatCompressionService.ts packages/core/src/services/chatCompressionService.test.tsnpx eslint packages/core/src/services/chatCompressionService.ts packages/core/src/services/chatCompressionService.test.ts