Skip to content

Daemon anti-entropy re-finalization overwrites good summaries with garbage at QualityScore 1.0 #444

@galexy

Description

@galexy

Summary

The daemon's anti-entropy session finalization pipeline can overwrite good LLM-generated summaries with garbage, and assigns maximum quality scores to unparsable output. Discovered while investigating #445 (dirty team context state from uncommitted session-facts files).

How We Found This

While debugging why the team context repo had 29 staged and 19 untracked files in memory/.session-facts/ (#445), we traced the root cause to a mass re-extraction triggered by changed summary.json content hashes. The hashes changed because the daemon re-finalized ~58 sessions and overwrote their summaries.

What Happened

On Apr 3, a chain of events on a remote machine:

  1. ox doctor: recover sessions (ledger commit f2eb49d, 21:01) restored 302 files from LFS stubs and set .needs-summary markers — including on sessions that already had good summaries from the original ox session stop hook
  2. Daemon anti-entropy detected markers → enqueued 58 sessions for re-summarization via ClaudeRunner
  3. ClaudeRunner spawned fresh Claude Code CLI processes to summarize from raw.jsonl alone
  4. Some sessions got proper re-summaries; others got agentic planning text ("I need permission to run commands and write files...")
  5. 58 "finalize session" commits landed in the ledger between 21:41–22:37

Example: session 2026-02-13T14-56-ajit-OxmoZK

Before (from original ox session stop): proper title, summary, key_actions, aha_moments, sageox_insights

After (from daemon re-finalize, commit 3cffb8c): empty title, raw LLM planning text in summary field, quality_score: 1, score_reason: "unparsable LLM output, defaulting to upload"

Bugs

Bug A: Anti-entropy overwrites good summaries without comparison

File: internal/daemon/agentwork/session_finalize.go:606-631

ProcessResult unconditionally writes new artifacts. It never checks whether the existing summary.json is better than the new one. A session with a high-quality summary from the original stop hook gets silently overwritten by a potentially worse daemon re-summarization.

Bug B: QualityScore: 1.0 for unparsable LLM output

File: internal/daemon/agentwork/session_finalize.go:624-628

if parseErr != nil {
    summaryResp = &session.SummarizeResponse{
        Summary:      llmOutput,      // raw agent planning text
        QualityScore: 1.0,            // MAXIMUM score — backwards
        ScoreReason:  "unparsable LLM output, defaulting to upload",
    }
}

Unparsable output gets max quality score, ensuring garbage passes all quality gates including the discard check at line 636. Should be 0.0 or below the discard threshold.

Bug C: .needs-summary marker set too broadly

PR #428 added .needs-summary detection for sessions with stub artifacts. But ox doctor set this marker on sessions that already had real LLM-generated summaries — not just stubs. The marker should check whether the existing summary.json is substantive before triggering re-finalization.

Downstream Impact

The corrupted summaries changed content hashes for 58 sessions, which triggered mass re-extraction in the distill pipeline (see #445 for that side of the cascade).

Related

Suggested Fixes

  1. Guard against summary regression — before overwriting summary.json, compare existing quality. Keep the better summary.
  2. Fix QualityScore for unparsable output — set to 0.0 or below discard threshold, not 1.0
  3. Narrow .needs-summary scope — only set marker when existing summary is a stub (stats-only), not when it has real LLM content

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions