perf: reduce memory and network cost for checkpoint metadata by pjbgf · Pull Request #680 · entireio/cli

pjbgf · 2026-03-11T12:19:25Z

Summary

This PR focuses on two key optimisations:

1. Reduce memory allocations when reading checkpoint metadata

Streaming json.Decoder replaces json.Unmarshal for reading metadata.json files, avoiding loading entire JSON documents into memory as intermediate []byte
Lite structs (sessionMetadataLite, checkpointSummaryLite) replace full CommittedMetadata/CheckpointSummary when only a few fields are needed — avoids allocating large unused nested objects (Summary, InitialAttribution, TokenUsage, etc.)
Shared helpers (decodeSummaryLiteFromTree, decodeSessionMetadataLite) deduplicate the metadata reading logic between ListCheckpoints and ReadCheckpointMetadata

2. Reduce network cost for pulling checkpoint data from remote

Shallow treeless fetch (--depth=1 --filter=blob:none) downloads only the tip commit and tree objects for the metadata branch — no blobs, no history
On-demand blob resolution via BlobResolver uses go-git's HasEncodedObject to check the local object store (loose objects + packfile indices) before fetching anything
Targeted blob fetch via git fetch-pack requests only the specific full.jsonl blob hashes that are missing locally, with
automatic fallback to a full branch fetch if the server doesn't support allow-reachable-sha1-in-want
CollectTranscriptBlobHashes walks tree entries to discover full.jsonl blob hashes (including chunked transcripts) without reading blob content — works after a treeless fetch where blobs aren't local

Resume flow (before → after)

Before: git fetch origin entire/checkpoints/v1 → downloads entire branch (all commits, trees, and every blob across all
checkpoints)

After:

git fetch --depth=1 --filter=blob:none → tip commit + trees only
go-git tree walk → find needed full.jsonl blob hashes
HasEncodedObject → skip blobs already in local store
git fetch-pack <url> <hash>... → fetch only the missing blobs

Test plan

BlobResolver unit tests: HasBlob present/missing, ReadBlob present/missing
CollectTranscriptBlobHashes: single session, multi-session, chunked, nonexistent checkpoint
Existing ListCheckpoints and ReadCheckpointMetadata tests still pass with lite structs
Full mise run lint clean
Manual E2E: entire resume on a branch with remote-only checkpoint data

…data Avoid allocating large unused fields (Summary, InitialAttribution, TokenUsage) when reading checkpoint metadata by introducing minimal sessionMetadataLite and checkpointSummaryLite structs with streaming json.Decoder instead of json.Unmarshal into full types. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Entire-Checkpoint: f64fee85be5f

Add treeless fetch (--filter=blob:none) for the metadata branch so only commit and tree objects are downloaded initially. Transcript blobs (full.jsonl) are then fetched individually via git fetch-pack using their SHA-1 hashes, skipping any that already exist in the local object store. - BlobResolver: go-git HasEncodedObject checks for local blob existence - CollectTranscriptBlobHashes: walks tree entries to find full.jsonl blob hashes without reading blob content (works after treeless fetch) - FetchMetadataTreeOnly: git CLI fetch with --filter=blob:none - FetchBlobsByHash: git fetch-pack for specific objects, with full-fetch fallback - resume.go: ensureTranscriptBlobs wired before RestoreLogsOnly Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Entire-Checkpoint: 48e91bc15f41

Add --depth=1 to the filtered fetch so only the tip commit and its tree objects are downloaded, skipping the entire branch history. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Entire-Checkpoint: 564aea6f7ca5

Copilot

Pull request overview

This PR optimizes entire resume and checkpoint metadata handling by reducing memory allocations when decoding checkpoint metadata and by reducing network transfer when resuming from remote-only checkpoint data.

Changes:

Switch metadata reads to streaming json.Decoder + “lite” structs for ListCheckpoints / ReadCheckpointMetadata.
Add a “treeless” metadata-branch fetch (--depth=1 --filter=blob:none) and best-effort prefetching of only required transcript blobs.
Introduce BlobResolver + tree-walk collection of transcript blob hashes (including chunked transcripts) and associated unit tests.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
cmd/entire/cli/strategy/common_helpers_test.go	Adds a preserved “full deserialize” test helper for checkpoint metadata parsing.
cmd/entire/cli/strategy/common.go	Refactors checkpoint listing/reading to stream JSON decoding and use minimal structs.
cmd/entire/cli/resume.go	Attempts treeless fetch first and adds best-effort transcript blob prefetch before restore.
cmd/entire/cli/git_operations.go	Adds treeless metadata fetch and a targeted blob fetch helper.
cmd/entire/cli/checkpoint/blob_resolver.go	Adds local-object-store blob existence/read + transcript blob hash collection via tree entries.
cmd/entire/cli/checkpoint/blob_resolver_test.go	Adds unit tests for blob resolution and transcript hash collection.

Copilot · 2026-03-11T12:23:44Z

cmd/entire/cli/resume.go

+		// If we can't get the tree, blobs can't be checked — caller will handle the error
+		return nil //nolint:nilerr // Tree unavailable means no sparse fetch possible
+	}
+
+	refs, err := checkpoint.CollectTranscriptBlobHashes(metadataTree, checkpointID)
+	if err != nil {
+		// Tree navigation failed — not fatal, ReadSessionContent will fail with a clear error
+		return nil //nolint:nilerr // Best-effort blob prefetch


ensureTranscriptBlobs returns nil on failures to load the metadata tree or to collect transcript blob hashes, but the caller only emits a warning when a non-nil error is returned. This means important failures are silently swallowed and the "restore may fail" warning will never trigger for these cases. Consider returning a wrapped error for these failure paths (still treated as best-effort by the caller), or logging inside ensureTranscriptBlobs so the reason for skipping the sparse fetch is visible.

Suggested change

// If we can't get the tree, blobs can't be checked — caller will handle the error

return nil //nolint:nilerr // Tree unavailable means no sparse fetch possible

}

refs, err := checkpoint.CollectTranscriptBlobHashes(metadataTree, checkpointID)

if err != nil {

// Tree navigation failed — not fatal, ReadSessionContent will fail with a clear error

return nil //nolint:nilerr // Best-effort blob prefetch

// If we can't get the tree, blobs can't be checked — treat as best-effort and let caller warn.

return fmt.Errorf("unable to load metadata tree for sparse transcript blob fetch: %w", err)

}

refs, err := checkpoint.CollectTranscriptBlobHashes(metadataTree, checkpointID)

if err != nil {

// Tree navigation failed — not fatal; this only disables sparse blob prefetch for this checkpoint.

return fmt.Errorf("unable to collect transcript blob hashes for checkpoint %s: %w", checkpointID, err)

Copilot · 2026-03-11T12:23:45Z

cmd/entire/cli/checkpoint/blob_resolver.go

+		// Collect transcript blob hashes from tree entries.
+		// tree.Entries contains the direct children — no blob reads needed.
+		var hasBaseFile bool
+		var chunkEntries []object.TreeEntry
+
+		for _, entry := range sessionTree.Entries {
+			if entry.Name == paths.TranscriptFileName || entry.Name == paths.TranscriptFileNameLegacy {
+				hasBaseFile = true
+				refs = append(refs, TranscriptBlobRef{
+					SessionIndex: i,
+					Hash:         entry.Hash,
+					Path:         sessionDir + "/" + entry.Name,
+				})
+			}
+			// Check for chunk files (full.jsonl.001, full.jsonl.002, etc.)
+			if strings.HasPrefix(entry.Name, paths.TranscriptFileName+".") {
+				idx := agent.ParseChunkIndex(entry.Name, paths.TranscriptFileName)
+				if idx > 0 {
+					chunkEntries = append(chunkEntries, entry)
+				}
+			}
+		}
+
+		// If chunks exist and the base file is also a chunk (chunk 0), it's already collected.
+		// Add the numbered chunk entries.
+		if len(chunkEntries) > 0 {
+			// If there was no base file, that's OK — chunks stand alone
+			_ = hasBaseFile
+			for _, entry := range chunkEntries {


CollectTranscriptBlobHashes tracks hasBaseFile but then discards it (_ = hasBaseFile). This adds noise and makes the chunk-handling logic harder to read. Consider removing hasBaseFile entirely here, or using it to make the chunk collection rules explicit (e.g., include base file as chunk 0 only when present).

Copilot · 2026-03-11T12:23:45Z

cmd/entire/cli/git_operations.go

+// FetchBlobsByHash fetches specific blob objects from the remote by their SHA-1 hashes.
+// Uses git fetch-pack to request individual objects, which works when the server
+// supports allow-reachable-sha1-in-want (GitHub, GitLab, Bitbucket all do).
+//
+// If fetch-pack fails (e.g., old server without SHA-in-want support), falls back
+// to a full metadata branch fetch.
+func FetchBlobsByHash(ctx context.Context, hashes []plumbing.Hash) error {
+	if len(hashes) == 0 {
+		return nil
+	}
+
+	ctx, cancel := context.WithTimeout(ctx, 2*time.Minute)
+	defer cancel()
+
+	// Get remote URL for fetch-pack
+	urlCmd := exec.CommandContext(ctx, "git", "remote", "get-url", "origin")
+	urlOutput, err := urlCmd.Output()
+	if err != nil {
+		return fmt.Errorf("failed to get remote URL: %w", err)
+	}
+	remoteURL := strings.TrimSpace(string(urlOutput))
+
+	// Build fetch-pack args with blob hashes
+	args := []string{"fetch-pack", "--thin", "--no-progress", remoteURL}
+	for _, h := range hashes {
+		args = append(args, h.String())
+	}
+
+	fetchCmd := exec.CommandContext(ctx, "git", args...)
+	if output, fetchErr := fetchCmd.CombinedOutput(); fetchErr != nil {
+		// Fallback: full metadata branch fetch (pack negotiation skips already-local objects)
+		if fallbackErr := FetchMetadataBranch(ctx); fallbackErr != nil {
+			return fmt.Errorf("fetch-pack failed (%s: %w) and fallback fetch also failed: %w",
+				strings.TrimSpace(string(output)), fetchErr, fallbackErr)
+		}
+	}


FetchBlobsByHash runs git fetch-pack and captures CombinedOutput(). git fetch-pack produces a pack stream on stdout and does not install the objects into the local object database by itself; capturing stdout also risks buffering large pack data in memory. As written, this can both fail to make the requested blobs available locally and cause a large memory spike. Consider wiring fetch-pack to git index-pack (streaming stdout directly, avoiding buffering) or switching to a higher-level git fetch mechanism that actually writes the objects into .git/objects.

pjbgf and others added 3 commits March 11, 2026 10:27

perf: use shallow treeless fetch for metadata branch

256deae

Add --depth=1 to the filtered fetch so only the tip commit and its tree objects are downloaded, skipping the entire branch history. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Entire-Checkpoint: 564aea6f7ca5

Copilot AI review requested due to automatic review settings March 11, 2026 12:19

Copilot started reviewing on behalf of pjbgf March 11, 2026 12:19 View session

Copilot AI reviewed Mar 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: reduce memory and network cost for checkpoint metadata#680

perf: reduce memory and network cost for checkpoint metadata#680
pjbgf wants to merge 3 commits intomainfrom
pjbgf/jsonl-data2

pjbgf commented Mar 11, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 11, 2026

Uh oh!

Copilot AI Mar 11, 2026

Uh oh!

Copilot AI Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

Conversation

pjbgf commented Mar 11, 2026

Summary

1. Reduce memory allocations when reading checkpoint metadata

2. Reduce network cost for pulling checkpoint data from remote

Resume flow (before → after)

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants