Summary
The daemon's anti-entropy session finalization pipeline can overwrite good LLM-generated summaries with garbage, and assigns maximum quality scores to unparsable output. Discovered while investigating #445 (dirty team context state from uncommitted session-facts files).
How We Found This
While debugging why the team context repo had 29 staged and 19 untracked files in memory/.session-facts/ (#445), we traced the root cause to a mass re-extraction triggered by changed summary.json content hashes. The hashes changed because the daemon re-finalized ~58 sessions and overwrote their summaries.
What Happened
On Apr 3, a chain of events on a remote machine:
ox doctor: recover sessions (ledger commit f2eb49d, 21:01) restored 302 files from LFS stubs and set .needs-summary markers — including on sessions that already had good summaries from the original ox session stop hook
- Daemon anti-entropy detected markers → enqueued 58 sessions for re-summarization via
ClaudeRunner
ClaudeRunner spawned fresh Claude Code CLI processes to summarize from raw.jsonl alone
- Some sessions got proper re-summaries; others got agentic planning text ("I need permission to run commands and write files...")
- 58 "finalize session" commits landed in the ledger between 21:41–22:37
Example: session 2026-02-13T14-56-ajit-OxmoZK
Before (from original ox session stop): proper title, summary, key_actions, aha_moments, sageox_insights
After (from daemon re-finalize, commit 3cffb8c): empty title, raw LLM planning text in summary field, quality_score: 1, score_reason: "unparsable LLM output, defaulting to upload"
Bugs
Bug A: Anti-entropy overwrites good summaries without comparison
File: internal/daemon/agentwork/session_finalize.go:606-631
ProcessResult unconditionally writes new artifacts. It never checks whether the existing summary.json is better than the new one. A session with a high-quality summary from the original stop hook gets silently overwritten by a potentially worse daemon re-summarization.
Bug B: QualityScore: 1.0 for unparsable LLM output
File: internal/daemon/agentwork/session_finalize.go:624-628
if parseErr != nil {
summaryResp = &session.SummarizeResponse{
Summary: llmOutput, // raw agent planning text
QualityScore: 1.0, // MAXIMUM score — backwards
ScoreReason: "unparsable LLM output, defaulting to upload",
}
}
Unparsable output gets max quality score, ensuring garbage passes all quality gates including the discard check at line 636. Should be 0.0 or below the discard threshold.
Bug C: .needs-summary marker set too broadly
PR #428 added .needs-summary detection for sessions with stub artifacts. But ox doctor set this marker on sessions that already had real LLM-generated summaries — not just stubs. The marker should check whether the existing summary.json is substantive before triggering re-finalization.
Downstream Impact
The corrupted summaries changed content hashes for 58 sessions, which triggered mass re-extraction in the distill pipeline (see #445 for that side of the cascade).
Related
Suggested Fixes
- Guard against summary regression — before overwriting
summary.json, compare existing quality. Keep the better summary.
- Fix
QualityScore for unparsable output — set to 0.0 or below discard threshold, not 1.0
- Narrow
.needs-summary scope — only set marker when existing summary is a stub (stats-only), not when it has real LLM content
Summary
The daemon's anti-entropy session finalization pipeline can overwrite good LLM-generated summaries with garbage, and assigns maximum quality scores to unparsable output. Discovered while investigating #445 (dirty team context state from uncommitted session-facts files).
How We Found This
While debugging why the team context repo had 29 staged and 19 untracked files in
memory/.session-facts/(#445), we traced the root cause to a mass re-extraction triggered by changedsummary.jsoncontent hashes. The hashes changed because the daemon re-finalized ~58 sessions and overwrote their summaries.What Happened
On Apr 3, a chain of events on a remote machine:
ox doctor: recover sessions(ledger commitf2eb49d, 21:01) restored 302 files from LFS stubs and set.needs-summarymarkers — including on sessions that already had good summaries from the originalox session stophookClaudeRunnerClaudeRunnerspawned fresh Claude Code CLI processes to summarize fromraw.jsonlaloneExample: session
2026-02-13T14-56-ajit-OxmoZKBefore (from original
ox session stop): proper title, summary, key_actions, aha_moments, sageox_insightsAfter (from daemon re-finalize, commit
3cffb8c): empty title, raw LLM planning text in summary field,quality_score: 1,score_reason: "unparsable LLM output, defaulting to upload"Bugs
Bug A: Anti-entropy overwrites good summaries without comparison
File:
internal/daemon/agentwork/session_finalize.go:606-631ProcessResultunconditionally writes new artifacts. It never checks whether the existingsummary.jsonis better than the new one. A session with a high-quality summary from the original stop hook gets silently overwritten by a potentially worse daemon re-summarization.Bug B:
QualityScore: 1.0for unparsable LLM outputFile:
internal/daemon/agentwork/session_finalize.go:624-628Unparsable output gets max quality score, ensuring garbage passes all quality gates including the discard check at line 636. Should be 0.0 or below the discard threshold.
Bug C:
.needs-summarymarker set too broadlyPR #428 added
.needs-summarydetection for sessions with stub artifacts. Butox doctorset this marker on sessions that already had real LLM-generated summaries — not just stubs. The marker should check whether the existingsummary.jsonis substantive before triggering re-finalization.Downstream Impact
The corrupted summaries changed content hashes for 58 sessions, which triggered mass re-extraction in the distill pipeline (see #445 for that side of the cascade).
Related
fix(session): skip LFS stubs in detect loop, re-summarize stub artifacts(introduced.needs-summarymarker detection)fix(session): prevent LFS cascade-blocking and cache data loss on push failureSuggested Fixes
summary.json, compare existing quality. Keep the better summary.QualityScorefor unparsable output — set to 0.0 or below discard threshold, not 1.0.needs-summaryscope — only set marker when existing summary is a stub (stats-only), not when it has real LLM content