feat(cli): add /compress-fast command for no-LLM rule-based context compression#4892
feat(cli): add /compress-fast command for no-LLM rule-based context compression#4892ZijianZhang989 wants to merge 2 commits into
Conversation
…ompression (#4264) Introduce /compress-fast, a fast rule-based alternative to /compress that skips the LLM side-query entirely. Two modes: - --tool-calls (default): strips thinking blocks + force-runs microcompaction to clear old tool results and media. Reuses surgical FileReadCache disarm from the existing microcompaction path. - --keep-last: keeps only the last user message and last model reply. Aggressively trims context and rebuilds via startChat(). Core changes: - microcompact.ts: add { force: true } option to skip time trigger - geminiChat.ts: export stripThoughtPartsFromContent, add compressFast() method - client.ts: add tryCompressChatFast() with mode-aware post-processing and FileReadCache handling Closes #4264
…ehavior --keep-last had a role-detection bug (functionResponse mistaken for user text), required slow startChat() session rebuild, and lost all deferred tools. Users can achieve equivalent result with /clear. The single tool-calls path better matches the 'fast' positioning.
|
Thanks for the PR! Template looks good ✓ — all required sections filled in, bilingual, linked issue. On direction: On approach: scope feels right. The two-step recipe (force microcompaction + strip thought parts) is minimal and composes well with existing primitives — One observation worth noting (not a blocker): Moving on to code review. 🔍 中文说明感谢贡献! 模板完整 ✓ — 所有章节都填写了,双语,关联了 issue。 方向: 方案:范围合理。两步组合(强制 microcompaction + 剥离 thought parts)最小化且可组合——复用了 一个观察(非阻塞): 进入代码审查 🔍 — Qwen Code · qwen3.7-max |
Code ReviewThe implementation is clean and well-disciplined. Key observations:
One minor observation: Unit TestsAll 223 tests pass across the four affected files:
Real-Scenario Testing (tmux)Ran
All scenarios PASS. The feature works as described — instant compression, correct NOOP detection, model continuity preserved, existing 中文说明代码审查实现干净且有纪律。关键观察:
一个小观察: 单元测试四个受影响文件中全部 223 个测试通过:
真实场景测试(tmux)使用 qwen3.7-max 运行
所有场景通过 ✅。功能如描述所示——即时压缩、正确的 NOOP 检测、模型连续性保持、现有 — Qwen Code · qwen3.7-max |
ReflectionStepping back: this PR does exactly what it says, and does it well. My independent proposal would have been nearly identical — add a The real-scenario test tells the story: 22,055 → 2,868 tokens in zero time, the model still knows what we talked about, and the second run correctly says "nothing left to compress." The existing The code is straightforward — reuses existing primitives, mirrors established patterns, no speculative abstractions. The test coverage is thorough (unit tests for every new path, plus the tmux smoke test confirms end-to-end behavior). The only minor wart is boilerplate duplication between the two compress commands, which is acceptable for two consumers with different internals. This solves a real user problem (context management is a top friction point), ships cleanly, and doesn't take on unnecessary risk. Ready to merge. 中文说明退一步看:这个 PR 完全做到了它所承诺的,而且做得很好。 我的独立方案几乎会完全相同——添加一个 真实场景测试说明了一切:22,055 → 2,868 tokens 零延迟,模型仍然知道我们讨论了什么,第二次运行正确提示"nothing left to compress"。现有 代码直截了当——复用现有原语,镜像已建立的模式,没有投机性抽象。测试覆盖全面(每个新路径都有单元测试,加上 tmux 冒烟测试确认端到端行为)。唯一的小瑕疵是两个 compress 命令之间的样板重复,对于两个有不同内部逻辑的消费者来说这是可以接受的。 这解决了一个真实的用户痛点(上下文管理是首要摩擦点),交付干净,没有承担不必要的风险。可以合并。 — Qwen Code · qwen3.7-max |
qwen-code-ci-bot
left a comment
There was a problem hiding this comment.
LGTM, looks ready to ship. ✅
What this PR does
Adds
/compress-fast, a new slash command that compresses conversation context without any LLM side-query. It combines two rule-based steps: (1) force microcompaction to clear old tool results and media parts while keeping the most recent N, and (2) strippingthoughtparts from all model turns. The result is a significantly smaller history — typically freeing thousands of tokens — at zero API latency.A
chat_compressioncheckpoint is written to JSONL so--resumeworks exactly as it does after/compress.Why it's needed
/compressrelies on an LLM side-query (~2-5s, ~30K tokens) to summarise history. For local model deployments and users who just want quick space reclamation, this is too slow./compress-fastruns entirely rules-based: no API call, no token cost, instant feedback. It complements/compress— use/compress-fastwhen you need space right now, and/compresswhen you want semantic summary quality.Resolves #4264.
Reviewer Test Plan
How to verify
# Unit tests npx vitest run \ packages/core/src/services/microcompaction/microcompact.test.ts \ packages/core/src/core/geminiChat.test.ts \ packages/cli/src/ui/commands/compressFastCommand.test.ts \ packages/cli/src/services/BuiltinCommandLoader.test.tsManual smoke test in interactive mode:
Evidence (Before & After)
TUI change: a new
COMPRESSIONhistory item appears after running/compress-fast, showing the token reduction (e.g.15,432 → 8,210). This is identical UX to/compress.Non-UI artifacts: the JSONL transcript gains a
chat_compressionrecord withcompressionStatus: COMPRESSEDandtriggerReason: manual, matching the/compresscheckpoint format.Tested on
Environment (optional)
Local:
npm run devon macOS, Node 22.Risk & Scope
/compressis available if deeper summarization is needed.estimateContentTokensmay have edge cases. The command intentionally does NOT rebuild the session viastartChat()— deferred tools survive, unlike a/clear. This is both a feature (fast, preserves state) and a limitation (does not reclaim system prompt tokens).microcompactHistory()gains an optional{ force: true }parameter that existing callers don't pass.stripThoughtPartsFromContentremains module-private.Linked Issues
Closes #4264
中文说明
这个 PR 做了什么
新增
/compress-fast斜杠命令,在不发起任何 LLM 侧边查询的情况下压缩对话上下文。它组合了两个基于规则的步骤:(1) 强制 microcompaction 清理旧的工具结果和媒体内容,保留最近 N 个;(2) 剥离所有模型回复中的thought部分。结果是在零 API 延迟下显著缩减 history token 数。会写入
chat_compressioncheckpoint 到 JSONL,--resume的行为与/compress完全一致。为什么需要
/compress依赖 LLM 侧边查询来生成摘要(约 2-5 秒,消耗约 30K token)。对于本地模型部署或只想快速释放空间的用户来说太慢了。/compress-fast纯规则驱动:无 API 调用、无 token 开销、即时响应。它与/compress互补——需要立即释放空间时用/compress-fast,需要语义摘要质量时用/compress。解决 #4264。
Reviewer Test Plan
(测试步骤同上,此处省略以保持可读性。)
风险与范围
/compress。estimateContentTokens可能存在边界情况。命令有意不使用startChat()重建 session——deferred tools 会保留,不像/clear。这既是优点(快、保留状态)也是局限(无法回收 system prompt token)。microcompactHistory()新增可选的{ force: true }参数,现有调用方不传此参数。stripThoughtPartsFromContent保持模块私有。关联 Issues
Closes #4264