Skip to content

Distill pipeline leaves uncommitted/untracked session-facts files, blocking team context sync #445

@galexy

Description

@galexy

Summary

The distill extraction pipeline can leave the team context repo in a dirty state with staged-but-uncommitted and untracked files in memory/.session-facts/, preventing git pull --rebase from working. The daemon's --autostash preserves this dirty state indefinitely.

Symptoms

ox status shows:

Team                SageOx
  Status            ⚠ 29 uncommitted (just now)

In the team context repo:

  • 29 files staged as "modified" under Changes to be committed
  • 19 untracked files
  • git pull fails: error: cannot pull with rebase: Your index contains uncommitted changes
  • All affected files are in memory/.session-facts/

Root Cause

Bug 1: Empty marker files never committed (distill_sessions.go:86-100)

When sessionSummaryToFacts() returns 0 facts, a marker file is written to disk for dedup purposes but commitMemoryFile() is never called:

if len(extractedFacts) == 0 {
    markerFile := filepath.Join("memory", ".session-facts", s.Date, s.DirName+".jsonl")
    if err := facts.WriteFacts(filepath.Join(tc.Path, markerFile), markerHeader, nil); err != nil {
        slog.Warn(...)
    }
    continue  // ← skips git add + git commit
}

These accumulate as untracked files.

Bug 2: Missing git reset in commit failure cleanup (distill_sessions.go:120-126, distill_write.go:209-233)

commitMemoryFile() does git add --sparse then git commit. If the commit fails (e.g., lock contention with daemon), the error handler removes the file from disk but never unstages it:

if err := commitMemoryFile(tc.Path, factFile, ...); err != nil {
    slog.Warn("failed to commit session facts", ...)
    if removeErr := os.Remove(fullPath); removeErr != nil {
        slog.Warn(...)
    }
    continue  // file removed from disk, but still staged in git index
}

For discussion facts (distill.go:891-893), the error handler doesn't even attempt os.Remove.

Bug 3: No lookback window on extraction (beads: ox-kq8)

extractSessionFacts and extractDiscussionFacts scan ALL sessions/discussions regardless of age. The 7-day lookback only applies to the downstream daily distillation synthesis, not the extraction step. This means any event that invalidates content hashes (summary regeneration, schema changes) triggers re-extraction across the entire history.

scanPendingSessions (distill_sessions.go:169) takes no since parameter — it enumerates every entry in ledger/sessions/. GitHub extraction is partially mitigated via inferGitHubQueryHighWater (7-day default fallback).

Bug 4: Daemon preserves dirty state indefinitely (sync_managed.go:215)

The daemon pulls with git pull --rebase --autostash, which stashes staged changes before rebase and restores them after. Once files are staged-but-uncommitted, every sync cycle faithfully preserves that dirty state.

Bug 5: Session facts use deterministic filenames (beads: ox-cs2)

Session fact files use {sessionDirName}.jsonl (distill_sessions.go:112), which is deterministic and overwritable — same vulnerability as #439 and #441 for GitHub data.

What Triggered It

In this specific incident, the daemon's anti-entropy re-finalized ~58 sessions and changed their summary.json content (see #444 for details on that bug). The changed hashes triggered mass re-extraction via the no-lookback extraction pipeline, which then hit the commit/staging bugs above.

Related

Suggested Fixes

  1. Commit empty marker files — call commitMemoryFile() for 0-fact markers so they're tracked
  2. Add git reset HEAD <file> to commit failure cleanup — unstage files when git commit fails
  3. Add lookback window to extraction — default 30 days for extractSessionFacts and extractDiscussionFacts
  4. UUID7 filenames for session facts — per the pattern in docs/ai/specs/github-facts-per-day-fix.md

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions