Summary
The distill extraction pipeline can leave the team context repo in a dirty state with staged-but-uncommitted and untracked files in memory/.session-facts/, preventing git pull --rebase from working. The daemon's --autostash preserves this dirty state indefinitely.
Symptoms
ox status shows:
Team SageOx
Status ⚠ 29 uncommitted (just now)
In the team context repo:
- 29 files staged as "modified" under
Changes to be committed
- 19 untracked files
git pull fails: error: cannot pull with rebase: Your index contains uncommitted changes
- All affected files are in
memory/.session-facts/
Root Cause
Bug 1: Empty marker files never committed (distill_sessions.go:86-100)
When sessionSummaryToFacts() returns 0 facts, a marker file is written to disk for dedup purposes but commitMemoryFile() is never called:
if len(extractedFacts) == 0 {
markerFile := filepath.Join("memory", ".session-facts", s.Date, s.DirName+".jsonl")
if err := facts.WriteFacts(filepath.Join(tc.Path, markerFile), markerHeader, nil); err != nil {
slog.Warn(...)
}
continue // ← skips git add + git commit
}
These accumulate as untracked files.
Bug 2: Missing git reset in commit failure cleanup (distill_sessions.go:120-126, distill_write.go:209-233)
commitMemoryFile() does git add --sparse then git commit. If the commit fails (e.g., lock contention with daemon), the error handler removes the file from disk but never unstages it:
if err := commitMemoryFile(tc.Path, factFile, ...); err != nil {
slog.Warn("failed to commit session facts", ...)
if removeErr := os.Remove(fullPath); removeErr != nil {
slog.Warn(...)
}
continue // file removed from disk, but still staged in git index
}
For discussion facts (distill.go:891-893), the error handler doesn't even attempt os.Remove.
Bug 3: No lookback window on extraction (beads: ox-kq8)
extractSessionFacts and extractDiscussionFacts scan ALL sessions/discussions regardless of age. The 7-day lookback only applies to the downstream daily distillation synthesis, not the extraction step. This means any event that invalidates content hashes (summary regeneration, schema changes) triggers re-extraction across the entire history.
scanPendingSessions (distill_sessions.go:169) takes no since parameter — it enumerates every entry in ledger/sessions/. GitHub extraction is partially mitigated via inferGitHubQueryHighWater (7-day default fallback).
Bug 4: Daemon preserves dirty state indefinitely (sync_managed.go:215)
The daemon pulls with git pull --rebase --autostash, which stashes staged changes before rebase and restores them after. Once files are staged-but-uncommitted, every sync cycle faithfully preserves that dirty state.
Bug 5: Session facts use deterministic filenames (beads: ox-cs2)
Session fact files use {sessionDirName}.jsonl (distill_sessions.go:112), which is deterministic and overwritable — same vulnerability as #439 and #441 for GitHub data.
What Triggered It
In this specific incident, the daemon's anti-entropy re-finalized ~58 sessions and changed their summary.json content (see #444 for details on that bug). The changed hashes triggered mass re-extraction via the no-lookback extraction pipeline, which then hit the commit/staging bugs above.
Related
Suggested Fixes
- Commit empty marker files — call
commitMemoryFile() for 0-fact markers so they're tracked
- Add
git reset HEAD <file> to commit failure cleanup — unstage files when git commit fails
- Add lookback window to extraction — default 30 days for
extractSessionFacts and extractDiscussionFacts
- UUID7 filenames for session facts — per the pattern in
docs/ai/specs/github-facts-per-day-fix.md
Summary
The distill extraction pipeline can leave the team context repo in a dirty state with staged-but-uncommitted and untracked files in
memory/.session-facts/, preventinggit pull --rebasefrom working. The daemon's--autostashpreserves this dirty state indefinitely.Symptoms
ox statusshows:In the team context repo:
Changes to be committedgit pullfails:error: cannot pull with rebase: Your index contains uncommitted changesmemory/.session-facts/Root Cause
Bug 1: Empty marker files never committed (
distill_sessions.go:86-100)When
sessionSummaryToFacts()returns 0 facts, a marker file is written to disk for dedup purposes butcommitMemoryFile()is never called:These accumulate as untracked files.
Bug 2: Missing
git resetin commit failure cleanup (distill_sessions.go:120-126,distill_write.go:209-233)commitMemoryFile()doesgit add --sparsethengit commit. If the commit fails (e.g., lock contention with daemon), the error handler removes the file from disk but never unstages it:For discussion facts (
distill.go:891-893), the error handler doesn't even attemptos.Remove.Bug 3: No lookback window on extraction (beads:
ox-kq8)extractSessionFactsandextractDiscussionFactsscan ALL sessions/discussions regardless of age. The 7-day lookback only applies to the downstream daily distillation synthesis, not the extraction step. This means any event that invalidates content hashes (summary regeneration, schema changes) triggers re-extraction across the entire history.scanPendingSessions(distill_sessions.go:169) takes nosinceparameter — it enumerates every entry inledger/sessions/. GitHub extraction is partially mitigated viainferGitHubQueryHighWater(7-day default fallback).Bug 4: Daemon preserves dirty state indefinitely (
sync_managed.go:215)The daemon pulls with
git pull --rebase --autostash, which stashes staged changes before rebase and restores them after. Once files are staged-but-uncommitted, every sync cycle faithfully preserves that dirty state.Bug 5: Session facts use deterministic filenames (beads:
ox-cs2)Session fact files use
{sessionDirName}.jsonl(distill_sessions.go:112), which is deterministic and overwritable — same vulnerability as #439 and #441 for GitHub data.What Triggered It
In this specific incident, the daemon's anti-entropy re-finalized ~58 sessions and changed their
summary.jsoncontent (see #444 for details on that bug). The changed hashes triggered mass re-extraction via the no-lookback extraction pipeline, which then hit the commit/staging bugs above.Related
ox-kq8(no lookback window),ox-cs2(deterministic filenames)Suggested Fixes
commitMemoryFile()for 0-fact markers so they're trackedgit reset HEAD <file>to commit failure cleanup — unstage files whengit commitfailsextractSessionFactsandextractDiscussionFactsdocs/ai/specs/github-facts-per-day-fix.md