Dedup target list in ProjectGraph.ExpandDefaultTargets to prevent graph explosion by dfederm · Pull Request #13855 · dotnet/msbuild

dfederm · 2026-05-23T00:18:40Z

Context

Microsoft.Build.Graph.ProjectGraph builds the static graph by walking project references breadth-first. For each edge it expands ProjectReferenceTargets (PRT) items to decide which targets to propagate, including two markers:

.default → the referenced project's DefaultTargets
.projectReferenceTargetsOrDefaultTargets → the entry-point PRT value if set, else .default

If marker expansion produces a target that also appears literally in the same Targets metadata — or if the materialized Targets metadata itself contains duplicates — the per-edge target list grows. The downstream ProjectInterpretation.TargetsToPropagate.FromProjectAndEntryTargets cross-products each entry-target against every matching PRT, so an N-duplicate entry list at one hop produces ~N² propagations at the next. Across BFS depth D this becomes ~N^D, burning gigabytes and minutes on graphs of only a few dozen nodes.

The reproducer that surfaced this in a large internal codebase was two SDK targets files both prepending to the same property and both emitting <ProjectReferenceTargets Include="Build" Targets="$(...)"/>. The second emitter snapshotted a property value that already contained the marker, so the materialized Targets had the marker more than once and the explosion took off from hop 1. The authoring-side double-emission could also be fixed where it originates, but this PR is the engine-side guard rail so any future recurrence (or any literal Targets="Build;Build"-style authoring quirk) can't take the graph down.

Changes Made

The fix (`src/Build/Graph/ProjectGraph.cs`)

ExpandDefaultTargets now dedupes its output unconditionally, with a hybrid fast/slow shape:

Fast path for n ≤ 8 (the dominant BFS-hop size): inline O(n²) scan. If no marker is present and no duplicate is found, returns the input array unchanged with zero allocation.
Slow path (ExpandDefaultTargetsSlow): one HashSet<string> sized to count plus a lazily-allocated List<string> buffer. Single pass, expands markers in place, dedupes via the set. Used for n > 8 or when the fast scan flagged anything.

Dedup is OrdinalIgnoreCase, first-occurrence wins.

Why this is behavior-preserving (no ChangeWave)

GetTargetLists already collapses each per-node final target list to OrdinalIgnoreCase-unique entries via ImmutableHashSet + ImmutableList.AddRange, and ExpandDefaultTargets is only called from GetTargetLists. Adding inner dedup only changes BFS internal state (encounteredEdges set membership, per-edge requestedTargets list size) — never the public return value. No new public API, no new warnings or errors, no consumer-observable change.

Supporting refactor (justification below)

The dedup fix on its own is small. The PR also takes the BFS hot path off ImmutableList<string>, which is the dominant cost once the explosion is gone:

ProjectGraphBuildRequest.RequestedTargets: ImmutableList<string> → string[]. Equals/GetHashCode use .Length + indexer, no virtual dispatch or AVL traversal.
BFS working-set types throughout ProjectGraph.cs: string[] cascade replaces ImmutableList<string> / IReadOnlyList<string>. ImmutableList<string> is kept at the public GetTargetLists boundary and in targetLists[node], where AddRange actually derives each version from the prior.
ProjectInterpretation.TargetsToPropagate: signature widened to string[]; _outerBuildTargets+_allTargets (two ImmutableList<TargetSpecification>) collapsed to a single TargetSpecification[] _allTargets + int _outerBuildTargetCount — one allocation/copy per source list instead of three+two.
PRT-emission loop: Where(...).Select(...).ToArray() + AddRange → direct foreach over the SemiColonTokenizer struct from ExpressionShredder.SplitSemiColonSeparatedList, appending via a ref local. Drops the WhereSelectArrayIterator state machine, an intermediate TargetSpecification[], and tokenizer boxing.
GetApplicableTargetsForReference returns string[] directly, sized for the no-skip common case and Array.Resized only when something is skipped. Drops LINQ state machine + LargeArrayBuilder doubling copies.

Why drop ImmutableList<T> here at all: the original draft used ImmutableList<string>.Builder for the dedup buffer. In review it became clear that AVL-tree ImmutableList<T> is materially more expensive per element than List<T>/string[], and the structural-sharing benefit doesn't apply on this path — every per-edge applicableTargets/expandedTargets is built fresh from raw ProjectReferenceTargets items, never as an .Add/.Remove derivative of a common ancestor, so the trees share zero internal nodes.

Testing

New file: src/Build.UnitTests/Graph/ProjectGraph_ExpandDefaultTargetsDedup_Tests.cs — 11 tests, covering:

Marker expansion producing a duplicate of an entry literal is deduped.
.projectReferenceTargetsOrDefaultTargets with literal duplicates is deduped.
Explicit non-marker duplicates (Build;Build;Build) are deduped.
Case-insensitive dedup, first occurrence wins.
Clean input returns the same instance (reference identity asserts the zero-allocation fast path).
Marker present, no duplicates produced.
Marker expands to empty DefaultTargets.
Every entry collapses to a singleton.
First-occurrence order preserved for mixed literal+marker+literal inputs.
End-to-end GetTargetLists smoke at depth 12 with the duplicate-marker shape — confirms result size stays bounded.
End-to-end GetTargetLists sanity at depth 6 for the common single-marker case.

All 11 pass on net10.0 and net48 (22/22). The rest of Microsoft.Build.Graph.UnitTests is unchanged by this PR; the 6 pre-existing failures (3 tests × 2 TFMs) are all [ActiveIssue("https://github.com/dotnet/msbuild/issues/4368")].

Performance evidence (why the refactor scope is justified)

End-to-end ProjectGraph.GetTargetLists(["Build"]) via BenchmarkDotNet on .NET 10.0.8, X64 RyuJIT AVX2, --job short. The graph is built once in [GlobalSetup] so the measured op is purely the BFS hot path. Each project carries <ProjectReferenceTargets Include="Build" Targets=".projectReferenceTargetsOrDefaultTargets;GetNativeManifest;_GetCopyToOutputDirectoryItemsFromThisProject"/> to mirror the realistic shape produced by Microsoft.Common.CurrentVersion.targets. V1 = upstream main at the PR base; the same benchmark DLL is rebuilt against each engine.

Scenario	V1 Time / Alloc	This PR Time / Alloc	Time	Alloc
15-node balanced binary tree	11.75 µs / 19.45 KB	5.77 µs / 14.34 KB	0.49×	0.74×
50-node balanced binary tree	25.81 µs / 68.91 KB	19.31 µs / 50.78 KB	0.75×	0.74×
200-node tree, fanout 3	90.33 µs / 262.14 KB	72.92 µs / 199.77 KB	0.81×	0.76×
100-node linear chain	71.01 µs / 171.09 KB	52.36 µs / 118.52 KB	0.74×	0.69×
Duplicate-marker PRT shape, 50 nodes (the bug)	8,972 µs / 18,069 KB	47.98 µs / 104.81 KB	0.005×	0.006×

Every realistic shape is 19–51% faster and 24–31% lower allocation. The pathological shape — which is what would otherwise OOM/hang on a large graph — is 187× faster and 172× lower allocation at just 50 nodes; because the growth is geometric in BFS depth, the gap widens rapidly past that.

Benchmark source is kept out of this PR to keep the diff focused; happy to upstream it separately if maintainers want it in ref/ or documentation/.

Notes

No ChangeWaves.md entry — there is no observable behavior change at the GetTargetLists boundary, only reduced internal work.
No new public API. No new diagnostics. No Strings.resx changes.
File deltas: ProjectGraph.cs +137/-33, ProjectInterpretation.cs +78/-33, new test file +268/-0.
The authoring-side issue that first triggered this (two SDK targets files prepending to the same property) has already been fixed outside this repo. This PR is the engine-side guard rail: any future recurrence — or any plain literal duplicate in Targets metadata — is now bounded.

…ph explosion ProjectGraph.ExpandDefaultTargets now unconditionally dedupes its output via a hybrid fast/slow path. The downstream BFS cross-products entry-targets against matching PRT items, so any N-duplicate entry list at one hop becomes ~N^2 propagations at the next; over BFS depth D this is N^D edges. Duplicates arise from PRT marker double-emission and from explicit literal duplicates in Targets metadata. ExpandDefaultTargets has a zero-allocation fast path (inline O(n^2) scan for n <= 8) returning the input unchanged when no marker or duplicate is found, and a HashSet-backed slow path otherwise. Dedup is OrdinalIgnoreCase, first-occurrence wins, matching the existing post-BFS dedup at the public GetTargetLists boundary -- so no consumer-observable behavior changes and no ChangeWave is needed. BFS hot path moved off ImmutableList<string>/ImmutableList<TargetSpecification>: ProjectGraphBuildRequest.RequestedTargets, ExpandDefaultTargets, TargetsToPropagate.FromProjectAndEntryTargets, and GetApplicableTargetsForReference all flow string[] end-to-end. TargetsToPropagate collapses two ImmutableLists to one flat TargetSpecification[] + outer-build count. LINQ removed from the per-edge loop in FromProjectAndEntryTargets. ImmutableList<string> is retained only at the public GetTargetLists boundary and in targetLists[node] where the AddRange chain actually derives each version from the prior. 11 new tests (22 with TFMs) cover dedup behavior plus end-to-end GetTargetLists smoke at depth 12 with the duplicate-marker shape and depth 6 with the common single-marker shape. End-to-end GetTargetLists(["Build"]) benchmark on realistic .NET-shape graphs: 19-51% faster and 24-31% lower allocation across small/medium/large trees and linear chains; the pathological duplicate-marker shape is 187x faster and 172x lower allocation at 50 nodes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

- Wrap unit tests in TestEnvironment.Create(_output) to match the in-file end-to-end test pattern and the sibling ProjectGraph_Tests convention, ensuring evaluation state from in-memory Project instances doesn't leak across tests via the global ProjectCollection (Copilot bot suggestion). - Defer the empty-list allocation in FromProjectAndEntryTargets until the first target is actually appended, avoiding an empty List<TargetSpecification> allocation when a matched PRT item has empty Targets metadata (Copilot bot suggestion). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…dedup

JanProvaznik · 2026-06-05T15:25:53Z

/review

github-actions · 2026-06-05T15:26:49Z

❌ Expert Code Review (command) failed. Please review the logs for details.

…dedup

JanProvaznik · 2026-06-08T09:39:09Z

/review

github-actions · 2026-06-08T09:39:44Z

❌ Expert Code Review (command) failed. Please review the logs for details.

github-actions

Test Coverage — Issues Found

Four concrete gaps; one non-issue clarification.

ISSUE 1 — `InlineScanThreshold=8` boundary and slow-path same-reference are untested

Q1 + Q2 together.

NoMarkerNoDuplicates_ReturnsSameInstance uses a 3-element input (well inside the fast path). Two adjacent cases are dark:

Scenario	Count	Expected
8 elements, no markers, no dups	≤ threshold → fast path	returns same reference, zero alloc
9 elements, no markers, no dups	> threshold → slow path	returns same reference, but allocates a `HashSet`
9 elements, duplicate at [0] and [8]	> threshold → slow path dedup	returns distinct array

Concrete failing scenario for Q1: An 8-element input ["A","B","C","D","E","F","G","H"] with a duplicate at index 7 ("A" again) hits the tail of the fast-path O(n2) scan and must correctly fall through to ExpandDefaultTargetsSlow. No test verifies this.

Concrete failing scenario for Q2: A 9-element input ["A","B","C","D","E","F","G","H","I"] (no markers, no dups) goes through the slow path. The lazy-buffer (result) stays null so result?.ToArray() ?? targets returns targets unchanged — same-reference. But a HashSet is silently allocated. The comment in the code explicitly calls this out ("if the input turns out to be marker-and-dup-free... we never allocate the result List<string>, only the HashSet") yet there is no test verifying (a) same reference is returned and (b) the HashSet allocation is acceptable and not regressed.

ISSUE 2 — `ProjectReferenceTargetsOrDefaultTargetsMarker` + empty Targets metadata (fall-through to `defaultTargets`) is untested

Q3.

The slow path has two branches for ProjectReferenceTargetsOrDefaultTargetsMarker:

string targetsString = graphEdge.GetMetadataValue(ItemMetadataNames.ProjectReferenceTargetsMetadataName);
if (string.IsNullOrEmpty(targetsString))
{
    // fall through to defaultTargets
    foreach (string defaultTarget in defaultTargets) ...
}
else
{
    foreach (string expandedTarget in ExpressionShredder.SplitSemiColonSeparatedList(targetsString)) ...
}

Dedupes_PRTOrDefaultMarker_WhenTargetsMetadataDuplicatesExpansion always passes "A;B;A" metadata, so it only covers the else branch. The if branch (empty Targets metadata → expand to defaultTargets) is unexercised.

Concrete failing scenario: CreateEdge() (no Targets metadata) + input = [ProjectReferenceTargetsOrDefaultTargetsMarker] + defaultTargets = ["A","B"]. The method should return ["A","B"] by falling through to defaultTargets. No test covers this, meaning a regression in the if branch would go undetected.

ISSUE 3 — No test mixes `DefaultTargetsMarker` and `ProjectReferenceTargetsOrDefaultTargetsMarker` in the same input

Q4.

Each marker type is tested in isolation. The dedup logic handles both markers in the same loop body, and their outputs must be jointly deduped. A cross-marker duplicate (e.g., DefaultTargetsMarker expands to ["Build"] and ProjectReferenceTargetsOrDefaultTargetsMarker also expands to ["Build"]) exercises the seen.Add interaction between both branches.

Concrete failing scenario:

input = [DefaultTargetsMarker, ProjectReferenceTargetsOrDefaultTargetsMarker]
defaultTargets = ["Build"]
edge Targets = "Build;Publish"

Expected: ["Build","Publish"] — the "Build" from DefaultTargetsMarker is added first; "Build" from the PRT marker is suppressed by seen; "Publish" is added. Not tested.

ISSUE 4 — Refactored `GetApplicableTargetsForReference` with `SkipNonexistentTargets=true` is not covered by the new test file

Q5.

GetApplicableTargetsForReference was materially rewritten from LINQ (Where + Select + ToImmutableList) to a manual pre-allocated array with writeIndex and Array.Resize. The new code has three distinct result shapes:

writeIndex == end → return full pre-sized array (no resize)
writeIndex == 0 → return []
0 < writeIndex < end → Array.Resize

The SkipNonexistentTargets path exercises the writeIndex < end case. The new test file adds no tests for any of these shapes. Existing coverage in other test files (e.g., ProjectGraph_Tests) may still catch regressions, but the PR touches this code path and does not add targeted regression tests for it.

Non-issue — `_output` field is used (Q7)

_output is passed to TestEnvironment.Create(_output) in every test method. It is not unused.

Informational — No explicit timing assertion (Q6)

GetTargetLists_DuplicateMarkerPRT_StaysBoundedAcrossChain at depth=12 is an implicit performance regression test: without the fix, the BFS would not finish in any reasonable wall-clock time. However it contains no Stopwatch assertion. This is acceptable for a correctness test suite (explicit timing tests are flaky in CI), but it means a future regression to O(N^depth) would manifest only as a timeout rather than a clean assertion failure. Low severity.

Generated by Expert Code Review (command) for issue #13855 · sonnet46 9.6M
_{Comment /review to run again}

Copilot AI review requested due to automatic review settings May 23, 2026 00:18

dfederm mentioned this pull request May 23, 2026

Move core ProjectReferenceTargets from Managed-only to Common targets #13427

Merged

Copilot AI reviewed May 23, 2026

OvesN assigned rainersigwald and JanProvaznik May 26, 2026

OvesN requested review from JanProvaznik and rainersigwald May 26, 2026 13:02

rainersigwald requested a review from Copilot May 26, 2026 14:21

Copilot started reviewing on behalf of rainersigwald May 26, 2026 14:22 View session

Copilot AI reviewed May 26, 2026

View reviewed changes

Comment thread src/Build.UnitTests/Graph/ProjectGraph_ExpandDefaultTargetsDedup_Tests.cs

Comment thread src/Build/Graph/ProjectInterpretation.cs Outdated

Comment thread src/Build/Graph/ProjectGraph.cs

rainersigwald approved these changes May 26, 2026

View reviewed changes

Comment thread src/Build/Graph/ProjectGraph.cs

Comment thread src/Build/Graph/ProjectInterpretation.cs

Merge branch 'main' into dfederm/msbuild-expanddefaulttargets-marker-…

89e5504

…dedup

JanProvaznik approved these changes Jun 5, 2026

View reviewed changes

Merge branch 'main' into dfederm/msbuild-expanddefaulttargets-marker-…

bac30b6

…dedup

github-actions Bot reviewed Jun 8, 2026

View reviewed changes

JanProvaznik merged commit fb8ccd3 into dotnet:main Jun 8, 2026
14 checks passed

This was referenced Jun 9, 2026

[release/10.0.4xx] Source code updates from dotnet/msbuild dotnet/dotnet#7137

Merged

[main] Source code updates from dotnet/msbuild dotnet/dotnet#7141

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dedup target list in ProjectGraph.ExpandDefaultTargets to prevent graph explosion#13855

Dedup target list in ProjectGraph.ExpandDefaultTargets to prevent graph explosion#13855
JanProvaznik merged 4 commits into
dotnet:mainfrom
dfederm:dfederm/msbuild-expanddefaulttargets-marker-dedup

dfederm commented May 23, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JanProvaznik commented Jun 5, 2026

Uh oh!

github-actions Bot commented Jun 5, 2026 •

edited

Loading

Uh oh!

JanProvaznik commented Jun 8, 2026

Uh oh!

github-actions Bot commented Jun 8, 2026 •

edited

Loading

Uh oh!

github-actions Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

Conversation

dfederm commented May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Context

Changes Made

The fix (src/Build/Graph/ProjectGraph.cs)

Why this is behavior-preserving (no ChangeWave)

Supporting refactor (justification below)

Testing

Performance evidence (why the refactor scope is justified)

Notes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JanProvaznik commented Jun 5, 2026

Uh oh!

github-actions Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JanProvaznik commented Jun 8, 2026

Uh oh!

github-actions Bot commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Test Coverage — Issues Found

ISSUE 1 — InlineScanThreshold=8 boundary and slow-path same-reference are untested

ISSUE 2 — ProjectReferenceTargetsOrDefaultTargetsMarker + empty Targets metadata (fall-through to defaultTargets) is untested

ISSUE 3 — No test mixes DefaultTargetsMarker and ProjectReferenceTargetsOrDefaultTargetsMarker in the same input

ISSUE 4 — Refactored GetApplicableTargetsForReference with SkipNonexistentTargets=true is not covered by the new test file

Non-issue — _output field is used (Q7)

Informational — No explicit timing assertion (Q6)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

dfederm commented May 23, 2026 •

edited

Loading

The fix (`src/Build/Graph/ProjectGraph.cs`)

github-actions Bot commented Jun 5, 2026 •

edited

Loading

github-actions Bot commented Jun 8, 2026 •

edited

Loading

ISSUE 1 — `InlineScanThreshold=8` boundary and slow-path same-reference are untested

ISSUE 2 — `ProjectReferenceTargetsOrDefaultTargetsMarker` + empty Targets metadata (fall-through to `defaultTargets`) is untested

ISSUE 3 — No test mixes `DefaultTargetsMarker` and `ProjectReferenceTargetsOrDefaultTargetsMarker` in the same input

ISSUE 4 — Refactored `GetApplicableTargetsForReference` with `SkipNonexistentTargets=true` is not covered by the new test file

Non-issue — `_output` field is used (Q7)