perf(#1433): sparse-cone consumer materialization for subdir deps#1436
Conversation
Add sparse-cone to both consumer-materialization paths: - bare_cache.materialize_from_bare() gains sparse_paths kwarg - GitCache.get_checkout/_create_checkout gain sparse_paths kwarg; on-disk layout becomes checkouts_v1/<shard>/<sha>/<variant>/ where variant is 'full' or 'sparse-<hash16>'. Wire subdir_path into both call sites in download_subdirectory_package so subdir deps only materialize the requested top-level dir(s) in the consumer working tree. Add verbose-only InstallLogger methods for perf diagnostics: subdir_download_start, bare_clone_strategy, materialize_result, tier_summary. Emit at appropriate flow points. Add perf harness at scripts/perf/measure_dotnet_skills.py. Add unit tests for variant-key derivation, sparse layout, full/sparse coexistence, and per-call sparse paths. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Use contextlib.suppress in resolve.py for tier_summary fallback (ruff SIM105). - Format new test files (ruff format pass). - Drop unused json import in test_git_cache_sparse.py; drop unused pytest import in test_bare_cache_sparse.py. - Harness _wipe_cache_root: re-grant 0o700 perms before rmtree so stale read-only entries from earlier APM runs don't fail the wipe. - Update test_git_cache_branch_coverage + test_git_cache_phase3w4 to pre-create checkout_dir at <sha>/full/ for the new layout. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR improves apm install performance for subdirectory dependencies from large monorepos by adding sparse-cone checkout support to both consumer materialization paths (persistent GitCache and bare_cache.materialize_from_bare). It also introduces verbose-only perf diagnostics, updates cache checkout layout to include per-variant subdirectories, and adds unit tests plus a perf measurement harness.
Changes:
- Add
sparse_pathsplumbing and sparse-checkout setup to bothGitCachecheckout creation andmaterialize_from_bare. - Change on-disk checkout layout to
checkouts_v1/<shard>/<sha>/<variant>/to allow full and sparse variants to coexist. - Add verbose perf logging hooks and new unit tests covering sparse behavior and variant-key determinism.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/test_git_cache_phase3w4.py | Updates expected checkout path to include the full/ variant directory. |
| tests/unit/test_git_cache_branch_coverage.py | Same checkout layout expectation update for branch coverage tests. |
| tests/unit/deps/test_bare_cache_sparse.py | New tests validating sparse-cone behavior in materialize_from_bare. |
| tests/unit/cache/test_proxy_compat.py | Updates fixture checkout path to include full/. |
| tests/unit/cache/test_git_cache.py | Updates multiple tests for new .../<sha>/full/ layout. |
| tests/unit/cache/test_git_cache_sparse.py | New tests for sparse variants, layout, and _variant_key determinism. |
| src/apm_cli/install/phases/resolve.py | Wires an install logger into the downloader and emits resolver tier stats in verbose mode. |
| src/apm_cli/deps/github_downloader.py | Threads sparse_paths through persistent/shared-bare paths and adds verbose perf metrics helpers. |
| src/apm_cli/deps/bare_cache.py | Adds sparse_paths support to consumer materialization (sparse-checkout before checkout). |
| src/apm_cli/core/command_logger.py | Adds verbose-only perf logging methods for subdir downloads and resolver tier summaries. |
| src/apm_cli/cache/git_cache.py | Implements variant-keyed checkout directories and sparse-cone materialization support. |
| scripts/perf/measure_dotnet_skills.py | Adds a perf harness script to measure cold/warm install improvements and cache footprint. |
Copilot's findings
- Files reviewed: 12/12 changed files
- Comments generated: 6
| dep_name: str, | ||
| cache_state: str, | ||
| sha_short: str = "", | ||
| sparse_paths: list | None = None, |
There was a problem hiding this comment.
Addressed in 324e08c: tightened to list[str] | None.
| Names the dep, the bare-cache state (``cold`` / ``warm`` / | ||
| ``persistent``), the resolved SHA (short), and the sparse paths | ||
| being requested. Surfaces enough state to diagnose a perf | ||
| regression from one log line. |
There was a problem hiding this comment.
Addressed in 324e08c: expanded docstring to include shared-bare (chose to widen the contract rather than narrow the call sites since the new vocab is informative).
| def tier_summary(self, stats: dict): | ||
| """Log the tiered ref resolver hit counts (verbose only). | ||
|
|
||
| Emitted at the end of the resolve phase so the reader can see | ||
| how many ref->SHA lookups hit each tier (L0 per-run cache, | ||
| L1 commits API, L2 bare rev-parse, L3 legacy clone) without | ||
| wiring a debugger. A run dominated by L3 is the canonical | ||
| signal that ref-resolution is paying full clone cost. | ||
| """ | ||
| if not self.verbose or not stats: | ||
| return | ||
| non_zero = {k: v for k, v in stats.items() if v} | ||
| if not non_zero: | ||
| return | ||
| parts = " ".join(f"{k}={v}" for k, v in non_zero.items()) | ||
| _rich_echo(f" [perf] ref-resolver tiers: {parts}", color="dim") |
There was a problem hiding this comment.
Addressed in 324e08c: tightened to dict[str, int].
| ref = dep_ref.reference # None if not specified | ||
| subdir_path = dep_ref.virtual_path | ||
| _perf_logger = getattr(self, "install_logger", None) | ||
| _dep_display = str(dep_ref) |
There was a problem hiding this comment.
Rejected with reasoning. DependencyReference does have a custom __str__ (see src/apm_cli/models/dependency/reference.py:1593) that yields the canonical host/owner/repo/path#ref form. The actual verbose log output captured in the PR body shows this is what users see: [perf] subdir github.com/dotnet/skills/plugins/dotnet-diag/skills/analyzing-dotnet-performance#main cache=persistent. No raw dataclass repr leaks into the log. Leaving as is.
| the first 16 hex chars of sha256(json.dumps(sorted(paths))). | ||
| Two consumers requesting the same set of paths share a shard; | ||
| different sets get separate shards. We do NOT promote a full | ||
| checkout to also satisfy a sparse subset -- that complicates | ||
| eviction for negligible benefit (each sparse shard is ~subdir | ||
| size, so duplication cost is small). | ||
| """ | ||
| if not sparse_paths: | ||
| return "full" | ||
| payload = json.dumps(sorted(sparse_paths), separators=(",", ":")) |
There was a problem hiding this comment.
Addressed in 324e08c: now hashes sorted(set(sparse_paths)). Added an inline comment about the set semantics.
| def _dir_size_bytes(path: Path) -> int: | ||
| """Return the total on-disk size of a directory tree in bytes. | ||
|
|
||
| Best-effort: silently skips files that disappear or cannot be | ||
| stat-ed mid-walk (e.g. transient .git lock files). Used only for | ||
| verbose-mode perf diagnostics (#1433) -- never gates behavior. | ||
| """ | ||
| total = 0 | ||
| try: | ||
| for dirpath, _dirnames, filenames in os.walk(str(path)): | ||
| for fname in filenames: | ||
| fpath = os.path.join(dirpath, fname) | ||
| try: | ||
| total += os.path.getsize(fpath) | ||
| except OSError: | ||
| continue | ||
| except OSError: | ||
| return 0 | ||
| return total |
There was a problem hiding this comment.
Addressed in 324e08c: switched to os.lstat(fpath).st_size and noted in the docstring the alignment with cache.git_cache._dir_size.
Wire bare cache as a promisor (--filter=blob:none) for sparse callers, and resolve subdir-dep refs through the tiered resolver before calling GitCache.get_checkout (skips redundant ls-remote). Combined with the existing sparse-cone consumer materialization: | metric | main | this commit | delta | | ---------------------- | -------------- | --------------- | ----------- | | cold install wall time | 172-284s | 3.74-3.83s | ~45-75x | | bare cache on disk | ~1.7 GB | 4.6 MB | ~370x | | checkouts shard | 12.9 MB | 0.47 MB | ~27x | | apm_modules delivered | 42.5 KB | 42.5 KB | identical | Cache layer (GitCache, BareCache) is the single hook point, so apm install, update, and run all benefit transparently. New performance-expert APM persona under .apm/agents/ encodes the package-manager perf playbook for future review activations. Refs #1433. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- command_logger.subdir_download_start: tighten sparse_paths type to list[str] | None, expand docstring cache_state vocabulary to include shared-bare (matches call sites). - command_logger.tier_summary: tighten stats type to dict[str, int] to match TieredRefResolver.stats. - git_cache._variant_key: dedupe sparse_paths with set() before sort so [a,a] and [a] collapse to the same shard (the documented 'set of paths' semantics). - github_downloader._dir_size_bytes: use os.lstat instead of os.path.getsize so symlinks do not pull bytes from outside the checkout and match cache.git_cache._dir_size convention. Rejected with reasoning: the comment that DependencyReference is a dataclass without __str__ is incorrect -- it has a custom __str__ at models/dependency/reference.py:1593 that yields the canonical 'host/owner/repo/path#ref' form already visible in the verbose log output captured in the PR body. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds the performance-expert persona to the advisory review panel as a
third conditional activation alongside auth-expert and doc-writer.
Activation: PR touches cache/, deps/, install/{phases,pipeline,resolve},
scripts/perf/, perf-instrumentation logs, or PR description claims a
performance win (speedup, latency, bytes-on-disk). Fallback self-check
covers hot-path changes the file-globs miss.
Roster invariant honored: frontmatter description, roster table,
conditional rules section, execution checklist fan-out list, and
panelist-return-schema enum all agree on the new persona slug.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
APM Review Panel:
|
| Persona | B | R | N | Takeaway |
|---|---|---|---|---|
| Python Architect | 0 | 1 | 2 | Cache-layer sparse-cone is well-layered; variant-key and partial-bare taxonomy compose cleanly. One recommended extraction for duplicated sparse-checkout subprocess logic across two modules. |
| Cli Logging Expert | 0 | 0 | 5 | Verbose perf lines use unregistered prefix and inconsistent formatting; partial-clone fallback warning hidden from default users. |
| Devx Ux Expert | 0 | 0 | 4 | Install/update/run share the cache path; consumer payload byte-identical; verbose gating clean. Failure-mode wording could be more actionable for end users. |
| Supply Chain Security Expert | 0 | 0 | 3 | Wire protocol unchanged on non-partial path; auth env propagation consistent; sparse_paths validated upstream. Three nit-level defense-in-depth observations. |
| Oss Growth Hacker | 0 | 3 | 2 | Headline-worthy perf win (~45-75x cold install, ~370x smaller bare cache) but no CHANGELOG / release-note artifact ships with it; the deterministic byte ratio should lead all downstream framing. |
| Auth Expert | 0 | 0 | 1 | Auth env propagation through new partial-clone + promisor path is consistent. One nit on cache-hit re-fetch trust contract. |
| Doc Writer | 0 | 2 | 1 | Persona prose and panel roster are internally consistent and on-voice; two user-facing doc surfaces drift on this PR (CHANGELOG omits the ~45-75x install speedup; cache.md layout block predates the new variant subdir + __p partial-bare flavor). |
| Test Coverage Expert | 0 | 2 | 0 | Core sparse-cone determinism and coexistence well-tested (18/18 pass at unit tier with real git fixtures). Two recommended gaps: partial-clone fallback branch has no regression trap; tiered-SHA-before-get_checkout threading lacks an assertion that locked_sha + sparse_paths propagate to the cache. |
| Performance Expert | 0 | 4 | 2 | Three composable techniques (sparse-cone variant cache, partial-bare promisor, tiered SHA threading) are correctly layered at GitCache so every command benefits. Cache key is honestly variant-keyed (sorted+deduped+canonical-JSON sha256[:16]); per-variant locking avoids unnecessary serialization. Headline byte-count claims (~370x smaller bare, ~27x smaller checkouts shard, byte-identical payload) are deterministic; ~45-75x wall-time claim leans on N=1 harness. One cheap wire-cost win still on the table (--no-tags); one observability gap (install_logger not threaded into GitCache cold-clone path). |
B = blocking-severity findings, R = recommended, N = nits.
Counts are signal strength, not gates. The maintainer ships.
Top 5 follow-ups
- [Oss Growth Hacker + Doc Writer] Add CHANGELOG.md [Unreleased] entry for the ~45-75x cold-install speedup closing [PERF] materialize_from_bare does full checkout for subdir deps -- enable sparse-cone (78 MB -> <1 MB, ~10x download) #1433 -- Convergent finding from two independent panelists. A headline-grade perf win without a changelog entry is invisible to users reading release notes and blocks the growth narrative.
- [Performance Expert] Add --no-tags to partial bare clone invocation -- One-liner with zero downside on APM's workload (consumer never needs tags). Avoids fetching tag refs + tagged objects that blob:none does not filter on large historical repos.
- [Performance Expert + Cli Logging Expert] Thread install_logger into GitCache cold-clone path and surface bare_clone_strategy on persistent-cache cold-miss -- The dominant code path (persistent cache cold miss) currently fires no [perf] line and the partial-clone fallback warning is invisible at default verbosity. Three panelists independently flagged this observability gap.
- [Test Coverage Expert] Add regression-trap test for filter-rejection fallback (partial -> full clone retry) -- The fallback branch has no automated assertion. While failure means slower-not-broken, the performance-expert independently noted __p name mis-classification after fallback -- a test would catch both issues.
- [Doc Writer] Update docs/reference/cli/cache.md layout block to document subdir and __p partial-bare flavor -- New cache layout artifacts are user-observable (visible in ~/.apm/cache/); the reference docs predate this PR's structural additions.
Architecture
classDiagram
direction LR
class GitCache {
<<Facade>>
+get_checkout(url, ref, locked_sha, env, sparse_paths) Path
-_resolve_sha(url, ref, locked_sha, env) str
-_ensure_bare_repo(url, shard_key, sha, env, partial) Path
-_create_checkout(url, shard_key, sha, env, sparse_paths, promisor_url) Path
-_bare_has_sha(bare_dir, sha, env) bool
-_fetch_into_bare_locked(bare_dir, url, sha, env) None
}
class BareCache {
<<Module>>
+materialize_from_bare(bare_path, consumer_dir, ref, env, known_sha, sparse_paths) str
}
class GitHubPackageDownloader {
<<Facade>>
+download_subdirectory_package(dep_ref, ...)
+download_full_repo_package(dep_ref, ...)
+install_logger InstallLogger
+persistent_git_cache GitCache
}
class TieredRefResolver {
<<Chain of Responsibility>>
+resolve(dep_ref) ResolvedRef
+stats dict
}
class InstallLogger {
+subdir_download_start(dep_name, cache_state, sha_short, sparse_paths)
+bare_clone_strategy(strategy, elapsed_ms)
+materialize_result(sparse_applied, consumer_size_bytes)
+tier_summary(stats)
}
class ResolvePhase {
+run(ctx) None
}
GitHubPackageDownloader *-- GitCache : persistent_git_cache
GitHubPackageDownloader *-- TieredRefResolver : _tiered_resolver
GitHubPackageDownloader ..> BareCache : _materialize_from_bare delegates
GitHubPackageDownloader o-- InstallLogger : install_logger
ResolvePhase ..> GitHubPackageDownloader : wires cache + logger + resolver
GitCache ..> BareCache : parallel path
class GitCache:::touched
class BareCache:::touched
class GitHubPackageDownloader:::touched
class InstallLogger:::touched
class ResolvePhase:::touched
classDef touched fill:#fff3b0,stroke:#d47600
flowchart TD
A[apm install] --> B[InstallPipeline.run]
B --> C[ResolvePhase.run]
C -->|attach| D[downloader.install_logger]
C -->|attach| E[downloader._tiered_resolver]
C --> F[download_subdirectory_package]
F --> G{persistent_git_cache?}
G -->|yes| H[resolve_git_reference: L0 to L3]
H --> I[GitCache.get_checkout sparse_paths locked_sha]
I --> J{checkout hit?}
J -->|hit| K[return cached Path]
J -->|miss| L[_ensure_bare_repo partial=True]
L --> M{server accepts filter?}
M -->|no| N[fallback: full bare clone]
M -->|yes| O[atomic_land db_v1 shard__p]
N --> O
O --> P[_create_checkout local shared no-checkout]
P --> Q[git config promisor + url]
Q --> R[sparse-checkout init cone + set]
R --> S[git checkout sha lazy blob fetch]
S --> T[atomic_land checkouts_v1 shard sha variant]
T --> K
K --> V[copytree to apm_modules]
Recommendation
Land this PR -- the architecture is clean, unit coverage is strong, and the perf win is real and deterministic at the byte level. Track the CHANGELOG entry, --no-tags one-liner, and observability threading as immediate follow-ups in #1433's wake; the harness hardening and cache.md drift can ride the next polish pass.
Full per-persona findings
Python Architect
-
[recommended] Duplicated sparse-checkout subprocess logic in git_cache.py and bare_cache.py should be extracted to a shared helper at
src/apm_cli/cache/git_cache.py:535
Identical 2-step subprocess pattern (sparse-checkout init --cone + set ) is copy-pasted in git_cache.py:_create_checkout (~535-557) and bare_cache.py:materialize_from_bare (~650-667). Two call sites with identical subprocess orchestration crosses the extraction threshold; a utils/git_sparse.py helper gives a single place to evolve sparse semantics.
Suggested: Extract apply_sparse_cone(git_exe, repo_dir, paths, env, timeout=30) -> None to src/apm_cli/utils/git_sparse.py. -
[nit] install_logger attached via duck-typed assignment rather than typed constructor param at
src/apm_cli/install/phases/resolve.py:134
downloader.install_logger = ctx.logger sets an undeclared field. Consistent with existing _tiered_resolver pattern so not blocking, but both fields would benefit from Optional[InstallLogger] typing for IDE/mypy discoverability. -
[nit] _dir_size_bytes duplicated between github_downloader.py and scripts/perf/measure_dotnet_skills.py at
src/apm_cli/deps/github_downloader.py:96
Acceptable today (scripts are standalone) but flag for consolidation into utils/file_ops.py if a third call site emerges.
Cli Logging Expert
-
[nit] [perf] prefix is not registered in STATUS_SYMBOLS at
src/apm_cli/core/command_logger.py:403
STATUS_SYMBOLS is the canonical registry. Ad-hoc [perf] bypasses _rich_echo symbol validation. Use [i] info with 'perf:' qualifier or register the key.
Suggested: Register STATUS_SYMBOLS['perf'] or rewrite as_rich_echo('[i] perf: ...', color='dim'). -
[nit] Partial-clone fallback warning only visible via Python logger (verbose), but user-actionable on cold subdir installs at
src/apm_cli/cache/git_cache.py:352
Fallback turns cold install from ~3s to 50-260s. At default verbosity user sees no explanation. _rich_warning at default level would name the cause and suggest checking server version.
Suggested: Add_rich_warning('Partial clone unsupported by server -- falling back to full clone (slower)')before retry. -
[nit] cache=persistent / cache=shared-bare is internal jargon at
src/apm_cli/core/command_logger.py:403
These describe implementation, not outcome. 'hit(persistent)' or 'miss(clone)' tells the reader whether work was avoided. -
[nit] size=0.47MB lacks space before unit; inconsistent with took=3.2 ms elsewhere at
src/apm_cli/core/command_logger.py:436
Project consistency favors space-before-unit. Pick one and apply uniformly. -
[nit] tier_summary call guarded by broad contextlib.suppress hides signature drift bugs at
src/apm_cli/install/phases/resolve.py:531
suppress(AttributeError, TypeError) swallows signature-drift errors silently. hasattr check is explicit and won't mask real bugs inside tier_summary.
Suggested: Replace contextlib.suppress withif hasattr(ctx.logger, 'tier_summary'):.
Devx Ux Expert
-
[nit] Sparse-checkout failures propagate as raw subprocess.CalledProcessError without a user-facing wrapper at
src/apm_cli/cache/git_cache.py
When sparse-checkout init or set fails, the user sees git's stderr without context. A wrapper like 'Sparse checkout failed for -- ensure the path exists in the repository at ' aligns with the failure-mode-is-the-product principle. -
[nit] Same raw-error issue in bare_cache.materialize_from_bare path at
src/apm_cli/deps/bare_cache.py
Mirror the wrapper there too; both paths share the failure-mode contract. -
[nit] Partial-clone fallback warning uses _log.warning (Python logger), invisible at default CLI verbosity at
src/apm_cli/cache/git_cache.py
Users on old servers hit this path and experience the slowdown with no CLI-level signal. (Overlaps with cli-logging-expert finding 2.) -
[nit] 'Failed to clone (partial fallback also failed)' message is operator-precise but raw for end users at
src/apm_cli/cache/git_cache.py
Suggest: 'Could not clone . Tried partial clone and full clone; both failed. Check network connectivity and repository access.'
Supply Chain Security Expert
-
[nit] Add explicit ASCII-printable/no-whitespace assertion on sparse_paths at the _create_checkout boundary at
src/apm_cli/cache/git_cache.py:550
validate_path_segments blocks .. at DependencyReference construction, and subprocess list-mode prevents shell injection. A defense-in-depth assertion on each sparse_paths element at the cache boundary would catch a future caller that bypasses construction (e.g. an adversarial lockfile entry with NUL bytes). -
[nit] Promisor URL persists in consumer .git/config; trust model around later git operations should be documented at
src/apm_cli/cache/git_cache.py:518
remote.origin.url is credential-free (correct), and auth comes from inherited subprocess env. But future APM operations on the cached checkout that DO NOT pass env would use ambient credentials. Document the trust boundary or add a .git/config annotation. -
[nit] _fetch_into_bare_locked detects partial-flavor by directory name suffix rather than git config probe at
src/apm_cli/cache/git_cache.py:639
endswith(_PARTIAL_BARE_SUFFIX) is fragile to future directory renames. Agit config --get extensions.partialCloneprobe would be more robust, though the current convention is documented.
Oss Growth Hacker
-
[recommended] No CHANGELOG entry for a headline-grade performance improvement at
CHANGELOG.md
A 45-75x cold install speedup and 370x smaller bare cache is the strongest conversion story APM has shipped in months -- directly addresses the 'gave up on cold install' cohort. Without an [Unreleased] entry, it will not surface in release notes or social copy. CHANGELOG is the bridge between merged code and external storytelling.
Suggested: Under [Unreleased] > ### Performance: 'Subdir dependencies from large monorepos (e.g. dotnet/skills) now install ~45-75x faster cold and use ~370x less cache disk via sparse-cone materialization + partial bare clones. (perf(#1433): sparse-cone consumer materialization for subdir deps #1436, closes [PERF] materialize_from_bare does full checkout for subdir deps -- enable sparse-cone (78 MB -> <1 MB, ~10x download) #1433)' -
[recommended] Surface the repostable one-liner ('42 KB from 1.7 GB in 3.8s') in release-bound copy
The headline test 'what can a reader repost?' has a perfect answer here that does not appear verbatim anywhere a non-contributor will see it. Include in CHANGELOG/release notes so every downstream artifact can quote it directly.
Suggested: Add to release copy: 'apm install now delivers 42 KB from a 1.7 GB monorepo in 3.8s cold (was 172-284s).' -
[recommended] Lead with the deterministic byte ratio (370x) over the jitter-bound wall-time range (45-75x)
The wide wall-time range invites 'I only got 20x' replies that erode trust. The byte ratio is unjitterable proof. Frame: '370x smaller cache footprint; 45-75x faster cold install depending on network'.
Suggested: Reorder headline to byte ratio first, time ratio second. -
[nit] Wider applicability beyond dotnet/skills not mentioned
Same fix benefits any subdir dep from a large repo (vscode extensions, anthropics/skills, community plugin aggregators). Naming 1-2 others expands perceived audience from '.NET devs' to 'anyone with monorepo deps'. -
[nit] No docs or blog hook filed
A short 'How APM handles large monorepo deps' docs section or blog post would compound the story beyond release notes. Not blocking.
Auth Expert
- [nit] Promisor URL in cached consumer .git/config is credential-free, but trust contract for hypothetical future re-fetch is not asserted by a test at
src/apm_cli/cache/git_cache.py:508
remote.origin.url is set to the bare upstream URL (no token), and auth comes from inherited subprocess env. Today the cache-hit path returns before any git operation runs, so no exposure exists. But a regression-trap test asserting the cached promisor URL never contains credentials would protect that invariant.
Doc Writer
-
[recommended] CHANGELOG.md [Unreleased] has no entry for the ~45-75x cold-install speedup (closes [PERF] materialize_from_bare does full checkout for subdir deps -- enable sparse-cone (78 MB -> <1 MB, ~10x download) #1433) at
CHANGELOG.md
Repo follows Keep a Changelog; [Unreleased] already records smaller items. A measured 172-284s -> 3.8s cold install on a real subdir dep is the largest user-visible install behavior change in the unreleased window and reduces bare-cache disk ~370x. Omitting means next release notes silently ship the win.
Suggested: Add under [Unreleased] > ### Changed (or ### Performance): 'Cold apm install for subdirectory git dependencies is ~45-75x faster on large monorepos. GitCache now uses partial bare clones (--filter=blob:none) with promisor remotes plus sparse-cone consumer materialization, and threads the tiered-resolver SHA through to skip a redundant ls-remote. Bare-cache disk usage drops ~370x on the validated workload (dotnet/skills). No lockfile schema or CLI surface change. (perf(#1433): sparse-cone consumer materialization for subdir deps #1436, closes [PERF] materialize_from_bare does full checkout for subdir deps -- enable sparse-cone (78 MB -> <1 MB, ~10x download) #1433)' -
[recommended] docs/reference/cli/cache.md cache-layout block is stale: new subdir and __p partial-bare flavor are user-observable but undocumented at
docs/src/content/docs/reference/cli/cache.md:117
Users runningls ~/Library/Caches/apm/git/after this PR see (a) __p/ sibling under db_v1/ and (b) new / level under checkouts_v1/// containing full or sparse-. PR body names both layouts as load-bearing for the win. Stale layout docs are a factual error users will hit when debugging cache size.
Suggested: Update layout block to show __p (partial bare) under db_v1/ and / (full | sparse-) under checkouts_v1///. Keep addition under ~10 lines net. -
[nit] performance-expert.agent.md voice, structure, ASCII discipline match existing reviewer-persona pattern; no rewrites needed at
.apm/agents/performance-expert.agent.md
Frontmatter, section order, lane-policing, ASCII-only constraint, and citation pattern all mirror auth-expert.agent.md and supply-chain-security-expert.agent.md. Optional polish: cite function name (verify_checkout_sha) rather thangit_cache.py:126line number to make the persona durable across refactors.
Test Coverage Expert
-
[recommended] Partial-clone fallback branch (filter rejection -> retry without filter) has no test at
tests/unit/cache/test_git_cache_sparse.py
git_cache.py:342-389 implements a fallback: when --filter=blob:none fails, retry without filter. User-observable degradation path (cold install jumps from ~3s to baseline). grep on tests/ for 'partial.*fallback', 'filter.*reject', '_ensure_bare_repo.*partial=True' returns no match. Future refactor breaking the retry would manifest as hard crash instead of graceful degradation.
Suggested: Add TestPartialBareFlavor::test_partial_clone_fallback_to_full_on_server_rejection: patch subprocess.run side_effect to raise CalledProcessError on first clone, assert retry without --filter succeeds, bare_dir populated.
Proof (missing at):tests/unit/cache/test_git_cache_sparse.py::TestPartialBareFlavor::test_partial_clone_fallback_to_full_on_server_rejection-- proves: When a git server rejects --filter=blob:none, apm install still completes (degraded to full bare) instead of crashing.
assert (bare_root / partial_shard).is_dir() and not partial_clone_used_on_retry -
[recommended] Tiered SHA threading into persistent cache (locked_sha + sparse_paths kwargs) not asserted at the call site at
tests/integration/test_github_downloader_phase3w4.py:404
Diff adds resolve_git_reference() BEFORE get_checkout() and passes result as locked_sha + sparse_paths=[subdir_path]. Existing test_persistent_cache_hit_used uses MagicMock and calls .assert_called() but does NOT inspect .call_args. If a future refactor drops locked_sha, cache falls back to internal ls-remote (one extra RTT per dep) with no test failing.
Suggested: Extend test_persistent_cache_hit_used: assert call_kwargs.get('locked_sha') is not None and call_kwargs.get('sparse_paths') == ['packages/my-pkg'].
Proof (missing at):tests/integration/test_github_downloader_phase3w4.py::TestDownloadSubdirectoryPersistentCache::test_persistent_cache_receives_locked_sha_and_sparse_paths-- proves: Tiered SHA resolution feeds the resolved commit into the cache so it skips redundant ls-remote on every subdir dep install.
assert call_kwargs['locked_sha'] is not None; assert call_kwargs['sparse_paths'] == ['packages/my-pkg']
Performance Expert
-
[recommended] Perf harness records N=1 cold + N=1 warm samples; cannot sustain the 172-284s / 3.74-3.83s ranges from a single invocation at
scripts/perf/measure_dotnet_skills.py:171
scripts/perf/measure_dotnet_skills.py:160-180 invokes _measure exactly once per phase. The cited 1.7x range on cold and tighter range post-fix must come from manual repeated runs the harness does not enforce. At N=1 the wall-time delta is dominated by network jitter. Deterministic byte counts already carry the headline; wall-time deserves the same rigor.
Suggested: Run N>=3 cold + N>=3 warm samples; report median + min + max + stdev. Add round-trip count via GIT_TRACE_PACKET=1 grep and bytes-on-wire via GIT_TRACE_CURL=1 -- both lower-noise than wall time. -
[recommended] Verbose [perf] observability gap: bare_clone_strategy never fires on the persistent-cache path (the dominant code path) at
src/apm_cli/cache/git_cache.py:268
github_downloader.py:1170-1184 (persistent-cache hit path) emits only subdir_download_start + materialize_result; the shared-bare path at :1225-1238 emits bare_clone_strategy. When persistent cache is active and a cold miss occurs, GitCache._ensure_bare_repo(partial=True) performs the most expensive op in the pipeline (network clone with --filter=blob:none) and emits ZERO verbose output. User sees cache=persistent identically for a 4ms warm hit and a 3.5s cold partial clone. This is exactly the dimension tier_summary is meant to provide.
Suggested: Thread install_logger into GitCache.get_checkout (or pass a perf-callback) so _ensure_bare_repo emits 'bare clone strategy=--filter=blob:none took=Xms' on cold miss, and subdir_download_start distinguishes persistent-warm vs persistent-cold. -
[recommended] Partial bare clone is missing --no-tags; tag refs + tagged objects are not filtered by blob:none on large historical repos at
src/apm_cli/cache/git_cache.py:320
git_cache.py:320-327 builds clone_args without --no-tags. --filter=blob:none filters BLOBS but git still downloads every tag tip + reachable commit+tree from refs/tags/*. On monorepos with many tagged releases (dotnet/skills harness target), tag objects can be a meaningful fraction of partial-clone wire cost. APM never uses tag refs after the SHA is pinned (lockfile holds resolved SHA; TieredRefResolver resolves via commits API). Zero behavioral downside.
Suggested: Add --no-tags to clone_args and to the partial-flavor fetch_args at ~:635. One-liner; additional wire savings most pronounced on the exact long-tail monorepos this PR exists to optimize. -
[recommended] Partial-bare flavor keeps __p name after server-rejection fallback turns it into a full clone; future eviction logic will mis-classify at
src/apm_cli/cache/git_cache.py:340
git_cache.py:340-388: when --filter=blob:none is rejected, code retries with plain --bare but keeps bare_dir at __p. _fetch_into_bare_locked at ~:638 detects partial-ness purely by the __p suffix. Subsequent fetches against a de-facto-full bare keep passing --filter=blob:none. Listed PR follow-up (eviction policy) should also detect this case (a __p bare whose object DB exceeds a sanity threshold relative to its commits+trees). -
[nit] Three sequential git config subprocesses for promisor setup; fork+exec overhead is trivially avoidable at
src/apm_cli/cache/git_cache.py:517
git_cache.py:517-528 issues three separate subprocess.run(...) for remote.origin.url, .promisor, .partialclonefilter. Each is fork+exec (~10-30ms macOS, more on Windows). Cheaper shapes: append a single [remote 'origin'] block to .git/config atomically. ~1-3% of cold subdir wall time. -
[nit] materialize_result.consumer_size_bytes includes .git/ overhead; metric name implies delivered payload at
src/apm_cli/deps/github_downloader.py:1184
github_downloader.py:1184 reports _dir_size_bytes(_persistent_checkout). The persistent_checkout is the full shard (working tree + .git/, including partial pack index). User sees 'size=0.47MB' attributed to materialization but a meaningful fraction is git metadata. For sparse_applied=True the proper signal is size of cone excluding .git/, or rename to checkout_dir_size_mb.
This panel is advisory. It does not block merge. Re-apply the panel-review label after addressing feedback to re-run.
Address apm-review-panel recommended_followups on PR #1436 in three disciplined buckets so the panel's substantive feedback lands in the same PR rather than as orphan follow-ups. Python architect -- extract apply_sparse_cone() helper: - New src/apm_cli/utils/git_sparse.py: single helper for 'sparse-checkout init --cone' + 'sparse-checkout set <paths>'. - Both call sites (cache/git_cache.py persistent path; deps/bare_cache.py shared-bare path) now delegate. Future sparse-checkout evolution (timeouts, --no-sparse-index) lands in one place; the R0801 pylint guard stays clean and won't drift on the next near-duplicate addition. CLI logging UX -- address all five logging nits: - Replace literal '[perf]' prefix (not in STATUS_SYMBOLS) with '[i] perf:' so the four new verbose lines use the existing info glyph instead of inventing prefix vocabulary. Keeps the perf: semantic grouping cue without growing the symbol registry. - Add missing space in size literal: 'size=0.47MB' -> 'size=0.47 MB'. - Partial-clone fallback warning now uses _rich_warning(...) instead of the silent _log.warning(...) channel, so the user sees the cause of any pre-2.20 GHE / old Gerrit slowdown at default verbosity. - Replace contextlib.suppress(AttributeError, TypeError) on the tier_summary call with an explicit hasattr() check (drops the contextlib import). Intent is now legible to the next reader. - Relabel persistent-cache state as 'persistent-hit' so the hit semantics are explicit at the call site. Test coverage -- add regression-trap tests: - New test_partial_clone_fallback_to_full_on_server_rejection: patches subprocess.run to reject --filter=blob:none on the first call and asserts the retry without --filter succeeds, defending the documented fallback (CHANGELOG line) against silent regression. - Extend test_persistent_cache_hit_used with call_args.kwargs assertions on sparse_paths and locked_sha so the variant-keyed shard plumbing from #1433 is locked down at the integration boundary. Doc drift -- add CHANGELOG [Unreleased] > Performance entry summarizing the speedup, the layered techniques (partial bare + promisor + sparse- cone + threaded SHA), and the documented filter fallback. Lint chain silent: ruff check, ruff format --check, pylint R0801 (no new duplication), lint-auth-signals.sh. Targeted pytest: tests/unit/cache + tests/unit/deps + tests/unit/install + the touched integration file all pass (44 + 1109 + 1388 collected). Refs #1436 #1433 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Two remaining recommendations from the apm-review-panel folded into this PR so the changeset matches the panel's documented expectations. --no-tags on partial bare clone (cache/git_cache.py): Skip fetching tag objects on the perf-fixed path. Release tags can sum to MBs on monorepos (e.g. dotnet/skills) and the cache resolves via SHAs, never via tag refs. Applied to both the initial partial clone and the no-filter fallback retry so the saving is preserved when an old server rejects --filter=blob:none. Scoped to git_cache.py only: bare_cache.py's --branch <ref> path can legitimately receive a tag pin and is left untouched. docs/reference/cli/cache.md layout drift: Document the variant-keyed checkout layout (full/, sparse-<hash>/) and the partial bare suffix (<shard>__p) so the page no longer lies about a pre-#1433 directory shape. Explains why the variant suffix exists (two consumers of the same SHA can want different subdirs). Lint silent (ruff check + format check). Targeted pytest: 44 passes on tests/unit/cache + the touched integration file. Refs #1436 #1433 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…o CI integration gate Add two new scaling guards for the sparse-cone materialisation code paths introduced in #1436: - TestVariantKeyScaling: guards _variant_key() stays O(n log n) in sparse path count (threshold < 15, measured ~9x), with canonicalisation correctness assertions (order-insensitive, duplicate-insensitive, deterministic). - TestVariantLookupScaling: guards that checkout variant resolution (is_dir check) stays O(1) regardless of sibling variant count (threshold < 5, measured ~1x). Wire scaling guards into ci-integration.yml as a dedicated job that runs in parallel with the build (no binary, secrets, or network needed). The fan-in 'Integration Tests (Linux)' gate now requires both integration shards and scaling guards to pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
TL;DR
Cold install of
apm installagainst a subdir dep from a large monorepo (github.com/dotnet/skills, 1.7 GB upstream) now completes in ~3.8s instead of the original 172-284s observed onmain-- a ~45-75x cold-install speedup, validated end-to-end against the live repo by the harness inscripts/perf/measure_dotnet_skills.py.Closes #1433.
What changed (architecture)
The fix is layered at the persistent cache, so every command that resolves git deps (
apm install,apm update,apm run, and any future command) benefits automatically. No command-specific code paths.Three composable techniques, all at the cache layer
Sparse-cone consumer materialization (
GitCache._create_checkout,bare_cache.materialize_from_bare). Whensparse_pathsis set, only the requested top-level directories are written to the consumer's working tree. Shard layout:checkouts_v1/<shard>/<sha>/<variant>/where<variant>isfullorsparse-<hash16>. Full and sparse variants of the same SHA coexist.Partial bare clone (
--filter=blob:none) + promisor consumer (GitCache._ensure_bare_repo,_create_checkout). The bare cache for sparse callers lives atdb_v1/<shard>__p/and is cloned with--filter=blob:none, so it carries commits + trees only (~3% of the upstream size on dotnet/skills). The consumer then has itsremote.originconfigured as a promisor pointing at the real upstream URL withpartialclonefilter=blob:none. Whengit checkoutruns against the sparse cone, only the blobs reachable from the cone are lazy-fetched. Falls back transparently to a full bare clone if the server rejects filter v2.Tiered ref resolution threaded into the cache (
github_downloader.download_subdirectory_package). Subdir deps now resolve the SHA via the attachedTieredRefResolver(L0 in-memory → L1 forge API → L2 bare rev-parse → L3 ls-remote) BEFORE callingGitCache.get_checkout, and passlocked_sha=<resolved>. The cache's internalls-remoteis skipped on every call where a tier hit succeeds. Saves 1 network round-trip per subdir dep per call.Validation (live, against github.com/dotnet/skills)
Captured by
scripts/perf/measure_dotnet_skills.py-- self-contained harness that wipes ALL APM caches, installs once cold, retains caches, installs again warm, reports a comparison table.apm_modules/.../<dep>/The cold/warm wall-time numbers carry network-jitter noise (sample size 2-3 per phase). The deterministic byte counts are the headline -- bare ~370x smaller, checkouts shard ~27x smaller, consumer payload byte-identical. The cold wall-time win is real and reproducible across runs because the new floor is local-FS + small network rather than 1.7 GB-wire-bound.
Verbose-mode log output (cold run)
All
[perf]lines are verbose-only and routed throughInstallLogger; default install output is unchanged.Why this was slow before
The pre-fix cold-install path for a subdir dep:
main-> SHA (~0.5s, fine).GitCache._ensure_bare_repodoesgit clone --bare <url>-- 1.7 GB transferred over the wire even though the consumer wanted a 42 KB subdirectory. Dominant cost: 50-260s depending on network.GitCache._create_checkoutdoesgit clone --local --shared <bare> <consumer>andgit checkout <sha>-- materializes the full 78 MB working tree for a consumer that needs 42 KB.download_subdirectory_packagecopytrees only the requested subdir out of the 78 MB tree intoapm_modules/.Two pathologies:
How the techniques close each gap (mapped to the cost model)
A package manager's cold wall-time decomposes into four phases. Each technique attacks the dominant phase for this workload.
ls-remoteignoring the already-resolved tiered SHAThe fetch phase was >80% of cold wall-time. Cutting it from 1.7 GB to ~5 MB is the headline win and explains the ~45-75x ratio.
Architectural consistency / pervasiveness
classDiagram class apm_install class apm_update class apm_run_resolve class InstallPipeline { +run() } class ResolvePhase { +run() } class GitHubPackageDownloader { +download_subdirectory_package() +download_full_repo_package() -resolve_git_reference() } class TieredRefResolver { +resolve() -L0 PerRunRefCache -L1 commits API -L2 bare rev-parse -L3 ls-remote } class GitCache { +get_checkout(url, ref, locked_sha, sparse_paths) -_ensure_bare_repo(partial) -_create_checkout(sparse_paths, promisor_url) -_variant_key() } class BareCache { +materialize_from_bare(sparse_paths) } apm_install --> InstallPipeline apm_update --> InstallPipeline apm_run_resolve --> InstallPipeline InstallPipeline --> ResolvePhase ResolvePhase --> GitHubPackageDownloader : attaches install_logger + tiered_resolver GitHubPackageDownloader --> TieredRefResolver : resolve_git_reference() GitHubPackageDownloader --> GitCache : get_checkout(locked_sha, sparse_paths) GitHubPackageDownloader --> BareCache : materialize_from_bare(sparse_paths) [shared-bare path] GitCache --> BareCache : independent persistent cacheThe three techniques land at
GitCache(techniques 1 + 2) andTieredRefResolver(technique 3 -- usage threading). Every command that resolves a dep --install,update,run(when it triggers resolve), and any future command -- shares this exact code path. There is NO command-specific perf logic. Attestation: I verified this by:installcallsInstallPipeline.run()->ResolvePhase->GitHubPackageDownloader.download_subdirectory_package()->GitCache.get_checkout(sparse_paths=[subdir]).updatereuses the sameInstallPipeline(seesrc/apm_cli/cli.pyupdatecommand ->install_workflow).runtriggers resolve through the same pipeline when deps are stale.All three commands benefit from the partial bare + tiered SHA + sparse cone without any per-command code change. The cache is shared across projects and across users on the same machine (per-user
~/Library/Caches/apm/git/or~/.cache/apm/git/).Implementation shape: A (cache-layer hook)
I evaluated three shapes:
GitCache.get_checkoutandmaterialize_from_bare. Smallest diff, lowest blast radius. Picked.A wins because: (1) every existing caller still works with
sparse_paths=None; (2) the cache key variant taxonomy honestly encodes the new dimension (sparse_paths and partial-bare flavor are both first-class shard keys, not hidden state); (3) the bare-shared invariant is preserved per flavor -- sparse consumers share the__pbare; full consumers share the legacy bare; bare URL is never keyed by (url, subdir).Trade-offs
--filter=blob:none(old Gerrit, pre-2.20 GHE),_ensure_bare_reporetries once without the filter. The resulting bare is full; future sparse callers find all blobs locally and skip lazy fetch -- behavior degrades to today's baseline, no behavior change for the user. Logged at WARNING with a sanitized URL for operator visibility.git checkout. On a slow network this adds 0.5-2s. The harness confirms total cold wall-time is still 3.8s on a typical link, so the lazy fetch is bounded by the size of the sparse cone, not the full repo.apm_modulesdelivery. Today:copytreefrom the cached checkout intoapm_modules/. At 42 KB this is ~5ms; not worth the cross-volume / Windows compatibility cost in this PR. Filed as follow-up.What stayed off-limits
--filter=blob:noneon bare clone (clean, vanillagit clone --baresemantics)._scrub_bare_remote_urlis untouched. The partial-bare path ingit_cache.pyuses a canonical no-token URL throughout (the call site atgithub_downloader.py:1122buildshttps://<host>/<owner>/<repo>without credentials; auth lives in env). Bare retains a clean URL for promisor use.apm.lock.yamlwriters and readers are unchanged._try_sparse_checkout(dead-code reference function ingithub_downloader.py) kept as-is per issue [PERF] materialize_from_bare does full checkout for subdir deps -- enable sparse-cone (78 MB -> <1 MB, ~10x download) #1433 spec.How to test
Follow-ups (separate issues)
apm_modulesdelivery.Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com