feat(core): persist file history snapshots for cross-session /rewind (T2.1)#4897
Conversation
qwen-code-ci-bot
left a comment
There was a problem hiding this comment.
Hi @doudouOUC — thanks for working on file-history snapshot persistence, this is a meaningful feature! 👋
However, this PR doesn't follow our PR template. A few sections are missing or renamed:
- "What this PR does" and "Why it's needed" — your
## Summarycovers what, but the why (motivation, user-facing benefit) should be its own section. - "Reviewer Test Plan" — this is the most important missing piece. Reviewers need
How to verifysteps,Evidence (Before & After)(tmux output or screenshots showing/rewindworking across session resume), and theTested onOS table. Without this, review may be significantly delayed. - "Risk & Scope" — your "Known Limitations" section is useful, but the template asks for main risk/tradeoff, what's out of scope, and breaking changes in a structured format.
- "Linked Issues" — mention of PR #4253 is inline but there's no dedicated section with
Closes #N/Fixes #N. - "中文说明" — the bilingual
<details>block is missing.
Could you reformat the PR body to match the template? The content is good — it just needs to be in the right structure so reviewers can evaluate it efficiently. 🙏
— Qwen Code · qwen3.7-max
There was a problem hiding this comment.
Pull request overview
This PR adds persistent file-history snapshot recording to session JSONL logs so /rewind can function across session resume, including restoring the snapshot chain on load and copying backup files when sessions are forked. It also introduces a stable session_resume capability tag while keeping the prior unstable alias.
Changes:
- Persist
FileHistorySnapshotdata asfile_history_snapshotsystem records, and restore snapshots duringSessionService.loadSession()/Config.getFileHistoryService()on resume. - Copy file-history backup files on session fork (hard-link with copy fallback) and validate restored backups with bounded parallel stats.
- Add
session_resumecapability tag (aliasingunstable_session_resume) and update CLI/ACP call sites to pass surviving snapshots during rewind.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/core/src/services/sessionService.ts | Extract persisted snapshot records on resume; copy backup files on session fork. |
| packages/core/src/services/fileHistoryService.ts | Export snapshot constants; add snapshot (de)serialization and restored-backup validation. |
| packages/core/src/services/chatRecordingService.ts | Add file_history_snapshot record subtype and snapshot recording APIs; re-record snapshots after rewind. |
| packages/core/src/core/client.ts | Record latest snapshot to JSONL after makeSnapshot() on user turns. |
| packages/core/src/config/config.ts | Restore snapshots into FileHistoryService and kick off validation on resume. |
| packages/cli/src/ui/AppContainer.tsx | Pass surviving snapshots into rewindRecording() during UI rewind. |
| packages/cli/src/serve/server.test.ts | Update expected capabilities to include session_resume. |
| packages/cli/src/serve/capabilities.ts | Register session_resume and keep unstable_session_resume as deprecated alias. |
| packages/cli/src/acp-integration/session/Session.ts | Pass surviving snapshots into rewindRecording() during ACP rewind. |
| packages/cli/src/acp-integration/session/Session.test.ts | Mock getFileHistoryService().getSnapshots() for updated rewind signature usage. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Thanks — reformatted the PR body to match the template. Added: What/Why sections, Risk & Scope, Linked Issues, Reviewer Test Plan with verification steps, Tested-on table, and 中文说明 details block. |
|
Review round summary (Copilot)
Resolved 4/4 threads. |
wenshao
left a comment
There was a problem hiding this comment.
Additional notes (not on diff lines)
[Suggestion] packages/webui/src/daemon/session/DaemonSessionProvider.tsx:622 — console.warn('[DaemonSessionProvider] SSE stream ended') fires on every normal SSE close. Other console.warn calls in this file are reserved for actual failures. Consider removing or using a debug logger.
(Posted as body because this file is not in the GitHub PR diff.)
— qwen3.7-max via Qwen Code /review
|
Review round summary (wenshao R1)
Resolved: 3 fixed + 2 pushed back + 1 deferred = all threads replied. |
|
Review round summary (wenshao R2)
Resolved 3/3 threads. |
|
Qwen Code review did not complete successfully: Qwen review timed out after 55 minutes. See workflow logs. |
1 similar comment
|
Qwen Code review did not complete successfully: Qwen review timed out after 55 minutes. See workflow logs. |
Local runtime verification reportVerified locally on Linux x64 / Node v22.22.2: PR head TL;DR
Verified working (PASS)
Finding 1 — daemon sessions never make snapshots (Reviewer Test Plan §4 cannot pass)
Observed on the live PR daemon:
To be fair, this gating predates the PR — the same condition already leaves the #4820 in-process daemon rewind endpoints with an empty snapshot list. But the PR's stated goal (unblocking T2.1 daemon resume + Finding 2 — TUI post-resume promptId collision corrupts the persisted chain (new)TUI prompt ids are A rewind targeting "turn 0" in that session silently restores v3 (the newest content) instead of the original — a silent wrong-content restore, worse than an honest failure. Pre-PR the collision existed only in memory; with persistence it becomes durable and survives every later resume. Suggested fix: seed the TUI prompt counter from the resumed conversation (mirror Finding 3 — "last turn may be incomplete" is actually per-file-first-editA file's pre-first-edit backup (v1, captured by Finding 4 — no user-reachable surface performs cross-session file restore yet
Unit-test attribution
Suggested pre-merge actions
If 1–2 are tracked as immediate follow-ups instead, the persistence layer itself is sound and mergeable on its own merits — but the PR description should then stop implying the daemon E2E (§4) currently works. Repro notes
|
|
Fixes for verification findings (wenshao runtime report)
|
| const survivingSnapshots = fileHistoryService | ||
| .getSnapshots() | ||
| .slice(0, targetTurnIndex + 1); | ||
|
|
There was a problem hiding this comment.
[Suggestion] restoreFromSnapshots(survivingSnapshots) replaces in-memory state but never calls cleanupOrphanedBackups for the removed snapshots' backup files. The TUI path uses fileHistoryService.rewind() which calls cleanupOrphanedBackups(removed) explicitly. Over repeated ACP rewinds in long-lived daemon sessions, orphaned backup files accumulate on disk.
Consider exposing cleanupOrphanedBackups and calling it here after restoreFromSnapshots, or having restoreFromSnapshots internally detect and clean up orphans:
const allSnapshots = fileHistoryService.getSnapshots();
const survivingSnapshots = allSnapshots.slice(0, targetTurnIndex + 1);
const removed = allSnapshots.slice(targetTurnIndex + 1);
fileHistoryService.restoreFromSnapshots(survivingSnapshots);
fileHistoryService.cleanupOrphanedBackups(removed);— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Already addressed — orphaned backup cleanup deferred in wenshao review. Bounded by MAX_SNAPSHOTS, cleaned up by subsequent makeSnapshot calls.
| // Snapshot file state before this turn (mirrors the makeSnapshot | ||
| // block in GeminiClient.sendMessageStream). Placed after | ||
| // slash-command and hook early-returns so locally handled commands | ||
| // don't create phantom snapshots that desync the snapshot index. |
There was a problem hiding this comment.
[Suggestion] This makeSnapshot + recordFileHistorySnapshot block (lines 1012–1028) is nearly identical to the one in packages/core/src/core/client.ts:1624-1643 — same 15-line structure with identical nested try/catch and identical error messages. The comment above even acknowledges it "mirrors the makeSnapshot block in GeminiClient.sendMessageStream." The two sites already differ slightly (this one caches the service in a local, client.ts calls the getter twice), and they will silently diverge further over time.
Consider extracting a shared helper:
async function snapshotAndRecordFileHistory(
config: Config,
promptId: string,
): Promise<void> {
try {
const fhs = config.getFileHistoryService();
await fhs.makeSnapshot(promptId);
try {
const latest = fhs.getSnapshots().at(-1);
if (latest) {
config.getChatRecordingService()?.recordFileHistorySnapshot(latest);
}
} catch (e) {
debugLogger.error(`FileHistory: recordSnapshot failed: ${e}`);
}
} catch (e) {
debugLogger.error(`FileHistory: makeSnapshot failed: ${e}`);
}
}— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Already addressed — two call sites with different contexts don't justify extraction. Addressed in multiple prior review rounds.
… /rewind (T2.1) File history snapshots were purely in-memory — lost on process exit, making /rewind unusable after session resume. This adds JSONL persistence so restored sessions can rewind to any pre-resume turn. Key changes: - Serialize/deserialize FileHistorySnapshot to/from JSONL system records - Record each snapshot after makeSnapshot succeeds (incremental per turn) - Re-record surviving snapshots after rewind (full batch on active branch) - Parse file_history_snapshot records in sessionService.loadSession() - Restore snapshot chain in config.getFileHistoryService() on resume - Copy backup files on session fork (hard link with copy fallback) - Add session_resume capability tag (stable alias for unstable_session_resume) - validateRestoredSnapshots with dedup + batched parallel stat 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
- Change snapshot dedup from first-wins to last-wins so rewind batch records (which contain the most up-to-date snapshot state) override earlier incremental records for the same promptId. - Guard validateRestoredSnapshots behind isEnabled() to skip I/O when file checkpointing is disabled. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
- Session.ts: slice snapshots to targetTurnIndex+1 to exclude turns being discarded (fixes ghost-snapshot persistence) - AppContainer.tsx: only pass snapshots when file restore succeeded (avoids writing un-truncated snapshots for conversation-only rewind) - Session.test.ts: update rewindRecording assertion to expect 3 args 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
…ialize warning - AppContainer.tsx: use .slice(0, targetTurnIndex + 1) to match Session.ts behavior (prevents ghost snapshots for conversation-only rewind) - sessionService.ts: log warning instead of silent continue on malformed file_history_snapshot deserialization 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
Two fixes from wenshao's local verification: 1. ACP daemon sessions had fileCheckpointingEnabled=false because stdin is a pipe (non-TTY). Add enableFileCheckpointing() to Config and call it in acpAgent.newSessionConfig so daemon /rewind works. 2. TUI prompt counter restarted at 0 on --resume, colliding with restored snapshot promptIds and corrupting the chain via last-wins dedup. Seed promptCount from the resumed conversation's user turn count so new promptIds don't overlap. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
ACP sessions drive the chat through Session.prompt → GeminiChat, bypassing GeminiClient.sendMessageStream where makeSnapshot lives. This meant daemon-created sessions never produced file history snapshots, leaving /rewind non-functional. Add makeSnapshot + recordFileHistorySnapshot at the start of each ACP prompt turn (mirroring client.ts:1488), using the existing sessionId########turn promptId format. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
Locally handled slash commands (/help, /memory, etc.) previously created file history snapshots even though no model turn was added. This caused the snapshot index to drift from the real user turn count, breaking rewind in web-shell sessions that use slash commands frequently. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove stale line number from comment (Session.ts) - Single cast instead of double cast for systemPayload (sessionService.ts) - Guard mkdir in copyFileHistoryBackups to prevent fork failure (sessionService.ts) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add makeSnapshot/rewind to FileHistoryService mock in Session.test.ts - Invalidate cached FileHistoryService on enableFileCheckpointing() to prevent stale enabled=false if service was lazily created first - Move copyFileHistoryBackups above class JSDoc to fix association Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ileHistorySnapshot - Call restoreFromSnapshots(survivingSnapshots) in ACP rewindToTurn to prevent phantom snapshots from accumulating in the in-memory array after conversation-only rewind - Simplify recordFileHistorySnapshot to delegate to the batch variant Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
7169461 to
714ce9d
Compare
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
qwen-code-ci-bot
left a comment
There was a problem hiding this comment.
No high-confidence issues found. Build passes, 193 unit tests pass, lint clean. Two low-probability edge cases noted for human review (ACP orphan cleanup, snapshot/turn index coupling). LGTM! ✅ — qwen3.7-max via Qwen Code /review
Local verification report — real build + runtime E2EI built this PR at its head ( Verdict: PASS for the primary goal (daemon ✅ What works (verified at runtime)1. Daemon / ACP cross-session The rewind also re-recorded the surviving snapshots on the active branch (JSONL gained a 2.
3. Snapshot persistence + self-healing. Snapshots are serialized to JSONL as 4. Base build lacks the machinery (confirms the PR is what adds it):
|
|
Thanks for the thorough runtime E2E — the daemon cross-session rewind verification is exactly what we needed to confirm T2.1. Re the TUI This is outside the scope of this PR ( |
…(T2.1) (#4897) * feat(core): persist file history snapshots to JSONL for cross-session /rewind (T2.1) File history snapshots were purely in-memory — lost on process exit, making /rewind unusable after session resume. This adds JSONL persistence so restored sessions can rewind to any pre-resume turn. Key changes: - Serialize/deserialize FileHistorySnapshot to/from JSONL system records - Record each snapshot after makeSnapshot succeeds (incremental per turn) - Re-record surviving snapshots after rewind (full batch on active branch) - Parse file_history_snapshot records in sessionService.loadSession() - Restore snapshot chain in config.getFileHistoryService() on resume - Copy backup files on session fork (hard link with copy fallback) - Add session_resume capability tag (stable alias for unstable_session_resume) - validateRestoredSnapshots with dedup + batched parallel stat 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fix: address Copilot review — last-wins dedup + isEnabled guard - Change snapshot dedup from first-wins to last-wins so rewind batch records (which contain the most up-to-date snapshot state) override earlier incremental records for the same promptId. - Guard validateRestoredSnapshots behind isEnabled() to skip I/O when file checkpointing is disabled. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fix: address wenshao review — ghost snapshots + test assertion - Session.ts: slice snapshots to targetTurnIndex+1 to exclude turns being discarded (fixes ghost-snapshot persistence) - AppContainer.tsx: only pass snapshots when file restore succeeded (avoids writing un-truncated snapshots for conversation-only rewind) - Session.test.ts: update rewindRecording assertion to expect 3 args 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fix: address wenshao R2 — slice snapshots in AppContainer + add deserialize warning - AppContainer.tsx: use .slice(0, targetTurnIndex + 1) to match Session.ts behavior (prevents ghost snapshots for conversation-only rewind) - sessionService.ts: log warning instead of silent continue on malformed file_history_snapshot deserialization 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fix: enable daemon file checkpointing + seed TUI promptCount on resume Two fixes from wenshao's local verification: 1. ACP daemon sessions had fileCheckpointingEnabled=false because stdin is a pipe (non-TTY). Add enableFileCheckpointing() to Config and call it in acpAgent.newSessionConfig so daemon /rewind works. 2. TUI prompt counter restarted at 0 on --resume, colliding with restored snapshot promptIds and corrupting the chain via last-wins dedup. Seed promptCount from the resumed conversation's user turn count so new promptIds don't overlap. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fix: add makeSnapshot + recordFileHistorySnapshot to ACP prompt path ACP sessions drive the chat through Session.prompt → GeminiChat, bypassing GeminiClient.sendMessageStream where makeSnapshot lives. This meant daemon-created sessions never produced file history snapshots, leaving /rewind non-functional. Add makeSnapshot + recordFileHistorySnapshot at the start of each ACP prompt turn (mirroring client.ts:1488), using the existing sessionId########turn promptId format. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fix: add session_resume to integration test capability assertion 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fix: add debug logging to ACP makeSnapshot catch blocks 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fix(acp): move makeSnapshot after slash-command and hook checks Locally handled slash commands (/help, /memory, etc.) previously created file history snapshots even though no model turn was added. This caused the snapshot index to drift from the real user turn count, breaking rewind in web-shell sessions that use slash commands frequently. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address qwen-code-ci-bot review round 5 - Remove stale line number from comment (Session.ts) - Single cast instead of double cast for systemPayload (sessionService.ts) - Guard mkdir in copyFileHistoryBackups to prevent fork failure (sessionService.ts) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address wenshao review — mock, ordering guard, JSDoc placement - Add makeSnapshot/rewind to FileHistoryService mock in Session.test.ts - Invalidate cached FileHistoryService on enableFileCheckpointing() to prevent stale enabled=false if service was lazily created first - Move copyFileHistoryBackups above class JSDoc to fix association Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: truncate in-memory snapshots on ACP rewind + deduplicate recordFileHistorySnapshot - Call restoreFromSnapshots(survivingSnapshots) in ACP rewindToTurn to prevent phantom snapshots from accumulating in the in-memory array after conversation-only rewind - Simplify recordFileHistorySnapshot to delegate to the batch variant Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: add restoreFromSnapshots to FileHistoryService mock Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
What this PR does
Persists
FileHistorySnapshotto JSONL asfile_history_snapshotsystem records, enabling/rewindto work across session resume. Previously, file history snapshots were purely in-memory and lost on process exit.Why it's needed
T2.1 (
loadSession/resumegraduation fromunstable_) is blocked because/rewinddoesn't work after session resume — the snapshot chain is empty. This PR closes that gap, making the resume feature reliable enough to drop theunstable_prefix.Replaces the stalled PR #4253 with a fresh implementation.
Key changes
FileHistorySnapshotto/from JSONL system recordsmakeSnapshotsucceeds (incremental per turn)file_history_snapshotrecords insessionService.loadSession()config.getFileHistoryService()on resumesession_resumecapability tag (stable alias forunstable_session_resume)validateRestoredSnapshotswith dedup + batched parallel statenableFileCheckpointing()in Config — daemon/ACP sessions run with piped stdin (non-TTY), sofileCheckpointingEnableddefaulted tofalse; explicitly enabled viaacpAgent.newSessionConfigseedPromptCount()in SessionContext — on TUI--continueresume, the prompt counter restarted at 0, causing promptId collisions with pre-resume turns; now seeded from resumed conversation user turn countSession.promptnow callsmakeSnapshot+recordFileHistorySnapshot— daemon-created sessions previously never produced file history snapshotsRisk & Scope
Main risk: Last turn's snapshot may be incomplete (serialized before
trackEditmutates it). Self-healing for all but the very last turn before exit.Out of scope (follow-up):
promptIdrewriting (forked snapshots carry source session's promptId prefix)Breaking changes: None. Old sessions without
file_history_snapshotrecords resume normally (graceful degradation).Linked Issues
Reviewer Test Plan
How to verify
Tested on
中文说明
这个 PR 做了什么
将
FileHistorySnapshot持久化到 JSONL 的file_history_snapshot系统记录中,使/rewind能跨 session resume 工作。此前快照完全在内存中,进程退出即丢失。为什么需要
T2.1(
loadSession/resume从unstable_毕业)被阻塞,因为 resume 后/rewind不工作——快照链为空。本 PR 修复了这个核心差距。替代已停滞 3 周的 PR #4253。
主要变更
makeSnapshot成功后录制快照(增量写入)loadSession解析file_history_snapshot记录并恢复快照链session_resume稳定 capability tagvalidateRestoredSnapshots去重 + 批量并行 statenableFileCheckpointing()— 修复 daemon/ACP 会话因 piped stdin 导致文件检查点被禁用的问题seedPromptCount()— 修复 TUI--continue恢复后 promptId 计数器归零导致的冲突Session.prompt新增makeSnapshot调用 — 修复 daemon 创建的会话从不产生快照的问题已知限制(后续 PR)
promptId包含源 session ID,可能导致不匹配🤖 Generated with Qwen Code