Dedup target list in ProjectGraph.ExpandDefaultTargets to prevent graph explosion#13855
Conversation
…ph explosion ProjectGraph.ExpandDefaultTargets now unconditionally dedupes its output via a hybrid fast/slow path. The downstream BFS cross-products entry-targets against matching PRT items, so any N-duplicate entry list at one hop becomes ~N^2 propagations at the next; over BFS depth D this is N^D edges. Duplicates arise from PRT marker double-emission and from explicit literal duplicates in Targets metadata. ExpandDefaultTargets has a zero-allocation fast path (inline O(n^2) scan for n <= 8) returning the input unchanged when no marker or duplicate is found, and a HashSet-backed slow path otherwise. Dedup is OrdinalIgnoreCase, first-occurrence wins, matching the existing post-BFS dedup at the public GetTargetLists boundary -- so no consumer-observable behavior changes and no ChangeWave is needed. BFS hot path moved off ImmutableList<string>/ImmutableList<TargetSpecification>: ProjectGraphBuildRequest.RequestedTargets, ExpandDefaultTargets, TargetsToPropagate.FromProjectAndEntryTargets, and GetApplicableTargetsForReference all flow string[] end-to-end. TargetsToPropagate collapses two ImmutableLists to one flat TargetSpecification[] + outer-build count. LINQ removed from the per-edge loop in FromProjectAndEntryTargets. ImmutableList<string> is retained only at the public GetTargetLists boundary and in targetLists[node] where the AddRange chain actually derives each version from the prior. 11 new tests (22 with TFMs) cover dedup behavior plus end-to-end GetTargetLists smoke at depth 12 with the duplicate-marker shape and depth 6 with the common single-marker shape. End-to-end GetTargetLists(["Build"]) benchmark on realistic .NET-shape graphs: 19-51% faster and 24-31% lower allocation across small/medium/large trees and linear chains; the pathological duplicate-marker shape is 187x faster and 172x lower allocation at 50 nodes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Wrap unit tests in TestEnvironment.Create(_output) to match the in-file end-to-end test pattern and the sibling ProjectGraph_Tests convention, ensuring evaluation state from in-memory Project instances doesn't leak across tests via the global ProjectCollection (Copilot bot suggestion). - Defer the empty-list allocation in FromProjectAndEntryTargets until the first target is actually appended, avoiding an empty List<TargetSpecification> allocation when a matched PRT item has empty Targets metadata (Copilot bot suggestion). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
/review |
|
❌ Expert Code Review (command) failed. Please review the logs for details. |
|
/review |
|
❌ Expert Code Review (command) failed. Please review the logs for details. |
There was a problem hiding this comment.
Test Coverage — Issues Found
Four concrete gaps; one non-issue clarification.
ISSUE 1 — InlineScanThreshold=8 boundary and slow-path same-reference are untested
Q1 + Q2 together.
NoMarkerNoDuplicates_ReturnsSameInstance uses a 3-element input (well inside the fast path). Two adjacent cases are dark:
| Scenario | Count | Expected |
|---|---|---|
| 8 elements, no markers, no dups | ≤ threshold → fast path | returns same reference, zero alloc |
| 9 elements, no markers, no dups | > threshold → slow path | returns same reference, but allocates a HashSet |
| 9 elements, duplicate at [0] and [8] | > threshold → slow path dedup | returns distinct array |
Concrete failing scenario for Q1: An 8-element input ["A","B","C","D","E","F","G","H"] with a duplicate at index 7 ("A" again) hits the tail of the fast-path O(n2) scan and must correctly fall through to ExpandDefaultTargetsSlow. No test verifies this.
Concrete failing scenario for Q2: A 9-element input ["A","B","C","D","E","F","G","H","I"] (no markers, no dups) goes through the slow path. The lazy-buffer (result) stays null so result?.ToArray() ?? targets returns targets unchanged — same-reference. But a HashSet is silently allocated. The comment in the code explicitly calls this out ("if the input turns out to be marker-and-dup-free... we never allocate the result List<string>, only the HashSet") yet there is no test verifying (a) same reference is returned and (b) the HashSet allocation is acceptable and not regressed.
ISSUE 2 — ProjectReferenceTargetsOrDefaultTargetsMarker + empty Targets metadata (fall-through to defaultTargets) is untested
Q3.
The slow path has two branches for ProjectReferenceTargetsOrDefaultTargetsMarker:
string targetsString = graphEdge.GetMetadataValue(ItemMetadataNames.ProjectReferenceTargetsMetadataName);
if (string.IsNullOrEmpty(targetsString))
{
// fall through to defaultTargets
foreach (string defaultTarget in defaultTargets) ...
}
else
{
foreach (string expandedTarget in ExpressionShredder.SplitSemiColonSeparatedList(targetsString)) ...
}Dedupes_PRTOrDefaultMarker_WhenTargetsMetadataDuplicatesExpansion always passes "A;B;A" metadata, so it only covers the else branch. The if branch (empty Targets metadata → expand to defaultTargets) is unexercised.
Concrete failing scenario: CreateEdge() (no Targets metadata) + input = [ProjectReferenceTargetsOrDefaultTargetsMarker] + defaultTargets = ["A","B"]. The method should return ["A","B"] by falling through to defaultTargets. No test covers this, meaning a regression in the if branch would go undetected.
ISSUE 3 — No test mixes DefaultTargetsMarker and ProjectReferenceTargetsOrDefaultTargetsMarker in the same input
Q4.
Each marker type is tested in isolation. The dedup logic handles both markers in the same loop body, and their outputs must be jointly deduped. A cross-marker duplicate (e.g., DefaultTargetsMarker expands to ["Build"] and ProjectReferenceTargetsOrDefaultTargetsMarker also expands to ["Build"]) exercises the seen.Add interaction between both branches.
Concrete failing scenario:
input = [DefaultTargetsMarker, ProjectReferenceTargetsOrDefaultTargetsMarker]
defaultTargets = ["Build"]
edge Targets = "Build;Publish"
Expected: ["Build","Publish"] — the "Build" from DefaultTargetsMarker is added first; "Build" from the PRT marker is suppressed by seen; "Publish" is added. Not tested.
ISSUE 4 — Refactored GetApplicableTargetsForReference with SkipNonexistentTargets=true is not covered by the new test file
Q5.
GetApplicableTargetsForReference was materially rewritten from LINQ (Where + Select + ToImmutableList) to a manual pre-allocated array with writeIndex and Array.Resize. The new code has three distinct result shapes:
writeIndex == end→ return full pre-sized array (no resize)writeIndex == 0→ return[]0 < writeIndex < end→Array.Resize
The SkipNonexistentTargets path exercises the writeIndex < end case. The new test file adds no tests for any of these shapes. Existing coverage in other test files (e.g., ProjectGraph_Tests) may still catch regressions, but the PR touches this code path and does not add targeted regression tests for it.
Non-issue — _output field is used (Q7)
_output is passed to TestEnvironment.Create(_output) in every test method. It is not unused.
Informational — No explicit timing assertion (Q6)
GetTargetLists_DuplicateMarkerPRT_StaysBoundedAcrossChain at depth=12 is an implicit performance regression test: without the fix, the BFS would not finish in any reasonable wall-clock time. However it contains no Stopwatch assertion. This is acceptable for a correctness test suite (explicit timing tests are flaky in CI), but it means a future regression to O(N^depth) would manifest only as a timeout rather than a clean assertion failure. Low severity.
Generated by Expert Code Review (command) for issue #13855 · sonnet46 9.6M
Comment /review to run again
Context
Microsoft.Build.Graph.ProjectGraphbuilds the static graph by walking project references breadth-first. For each edge it expandsProjectReferenceTargets(PRT) items to decide which targets to propagate, including two markers:.default→ the referenced project'sDefaultTargets.projectReferenceTargetsOrDefaultTargets→ the entry-point PRT value if set, else.defaultIf marker expansion produces a target that also appears literally in the same
Targetsmetadata — or if the materializedTargetsmetadata itself contains duplicates — the per-edge target list grows. The downstreamProjectInterpretation.TargetsToPropagate.FromProjectAndEntryTargetscross-products each entry-target against every matching PRT, so an N-duplicate entry list at one hop produces ~N² propagations at the next. Across BFS depth D this becomes ~N^D, burning gigabytes and minutes on graphs of only a few dozen nodes.The reproducer that surfaced this in a large internal codebase was two SDK targets files both prepending to the same property and both emitting
<ProjectReferenceTargets Include="Build" Targets="$(...)"/>. The second emitter snapshotted a property value that already contained the marker, so the materializedTargetshad the marker more than once and the explosion took off from hop 1. The authoring-side double-emission could also be fixed where it originates, but this PR is the engine-side guard rail so any future recurrence (or any literalTargets="Build;Build"-style authoring quirk) can't take the graph down.Changes Made
The fix (
src/Build/Graph/ProjectGraph.cs)ExpandDefaultTargetsnow dedupes its output unconditionally, with a hybrid fast/slow shape:ExpandDefaultTargetsSlow): oneHashSet<string>sized to count plus a lazily-allocatedList<string>buffer. Single pass, expands markers in place, dedupes via the set. Used for n > 8 or when the fast scan flagged anything.Dedup is
OrdinalIgnoreCase, first-occurrence wins.Why this is behavior-preserving (no ChangeWave)
GetTargetListsalready collapses each per-node final target list toOrdinalIgnoreCase-unique entries viaImmutableHashSet+ImmutableList.AddRange, andExpandDefaultTargetsis only called fromGetTargetLists. Adding inner dedup only changes BFS internal state (encounteredEdgesset membership, per-edgerequestedTargetslist size) — never the public return value. No new public API, no new warnings or errors, no consumer-observable change.Supporting refactor (justification below)
The dedup fix on its own is small. The PR also takes the BFS hot path off
ImmutableList<string>, which is the dominant cost once the explosion is gone:ProjectGraphBuildRequest.RequestedTargets:ImmutableList<string>→string[].Equals/GetHashCodeuse.Length+ indexer, no virtual dispatch or AVL traversal.ProjectGraph.cs:string[]cascade replacesImmutableList<string>/IReadOnlyList<string>.ImmutableList<string>is kept at the publicGetTargetListsboundary and intargetLists[node], whereAddRangeactually derives each version from the prior.ProjectInterpretation.TargetsToPropagate: signature widened tostring[];_outerBuildTargets+_allTargets(twoImmutableList<TargetSpecification>) collapsed to a singleTargetSpecification[] _allTargets+int _outerBuildTargetCount— one allocation/copy per source list instead of three+two.Where(...).Select(...).ToArray()+AddRange→ directforeachover theSemiColonTokenizerstruct fromExpressionShredder.SplitSemiColonSeparatedList, appending via areflocal. Drops theWhereSelectArrayIteratorstate machine, an intermediateTargetSpecification[], and tokenizer boxing.GetApplicableTargetsForReferencereturnsstring[]directly, sized for the no-skip common case andArray.Resized only when something is skipped. Drops LINQ state machine +LargeArrayBuilderdoubling copies.Why drop
ImmutableList<T>here at all: the original draft usedImmutableList<string>.Builderfor the dedup buffer. In review it became clear that AVL-treeImmutableList<T>is materially more expensive per element thanList<T>/string[], and the structural-sharing benefit doesn't apply on this path — every per-edgeapplicableTargets/expandedTargetsis built fresh from rawProjectReferenceTargetsitems, never as an.Add/.Removederivative of a common ancestor, so the trees share zero internal nodes.Testing
New file:
src/Build.UnitTests/Graph/ProjectGraph_ExpandDefaultTargetsDedup_Tests.cs— 11 tests, covering:.projectReferenceTargetsOrDefaultTargetswith literal duplicates is deduped.Build;Build;Build) are deduped.DefaultTargets.GetTargetListssmoke at depth 12 with the duplicate-marker shape — confirms result size stays bounded.GetTargetListssanity at depth 6 for the common single-marker case.All 11 pass on
net10.0andnet48(22/22). The rest ofMicrosoft.Build.Graph.UnitTestsis unchanged by this PR; the 6 pre-existing failures (3 tests × 2 TFMs) are all[ActiveIssue("https://github.com/dotnet/msbuild/issues/4368")].Performance evidence (why the refactor scope is justified)
End-to-end
ProjectGraph.GetTargetLists(["Build"])via BenchmarkDotNet on .NET 10.0.8, X64 RyuJIT AVX2,--job short. The graph is built once in[GlobalSetup]so the measured op is purely the BFS hot path. Each project carries<ProjectReferenceTargets Include="Build" Targets=".projectReferenceTargetsOrDefaultTargets;GetNativeManifest;_GetCopyToOutputDirectoryItemsFromThisProject"/>to mirror the realistic shape produced byMicrosoft.Common.CurrentVersion.targets. V1 = upstreammainat the PR base; the same benchmark DLL is rebuilt against each engine.Every realistic shape is 19–51% faster and 24–31% lower allocation. The pathological shape — which is what would otherwise OOM/hang on a large graph — is 187× faster and 172× lower allocation at just 50 nodes; because the growth is geometric in BFS depth, the gap widens rapidly past that.
Benchmark source is kept out of this PR to keep the diff focused; happy to upstream it separately if maintainers want it in
ref/ordocumentation/.Notes
ChangeWaves.mdentry — there is no observable behavior change at theGetTargetListsboundary, only reduced internal work.Strings.resxchanges.ProjectGraph.cs+137/-33,ProjectInterpretation.cs+78/-33, new test file +268/-0.Targetsmetadata — is now bounded.