[Perf] Optimize allocations in the layout engine#34155
Conversation
|
Let's not wait for .net11 for these awesome changes! 👯 |
| // Since no rows are specified, we'll create an implied row 0 | ||
| return Implied(); | ||
| _rows = ArrayPool<Definition>.Shared.Rent(1); | ||
| _rows[0] = new Definition(GridLength.Star); |
There was a problem hiding this comment.
For this scenario, wouldn’t it be better to have a cached Definition[] oneRow = [new Definition(GridLength.Star) ? The smallest array returned by ArrayPool is of length 16, so this is maybe an overhead (?)
same for InitializeColumns
There was a problem hiding this comment.
Thanks for the suggestion, that makes a lot of sense.
|
Pushed 3bc89bd — caches the implied arrays on instead of renting from . When no explicit row/column definitions are specified, the cached array is reused across measure/arrange calls. Still 0 B allocated in benchmarks. |
577ff3f to
814bdab
Compare
|
🚀 Dogfood this PR with:
curl -fsSL https://raw.githubusercontent.com/dotnet/maui/main/eng/scripts/get-maui-pr.sh | bash -s -- 34155Or
iex "& { $(irm https://raw.githubusercontent.com/dotnet/maui/main/eng/scripts/get-maui-pr.ps1) } 34155" |
… reuse - Convert Cell, Definition, and GridStructure from class to struct - Use ArrayPool for IView[], Cell[], and Definition[] arrays - Track actual counts (_childCount, _rowCount, _columnCount) for rented arrays - Add int defsCount parameter to all static methods operating on Definition[] - Reuse Dictionary<SpanKey, double> across measure passes (Clear instead of new) - Convert SpanKey to IEquatable<SpanKey> with HashCode.Combine - Convert foreach to indexed for loops in ArrangeChildren - Add lazy Dictionary initialization for no-span grids - Result: Grid layout achieves 0 B managed allocations (was 87-457 KB) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Replace float[] size = {w, h} in SelfSizing with two local variables
- Add [InlineArray(4)] FrameBuffer for Flex.Item.Frame (NET8_0_OR_GREATER)
- Use ArrayPool for ordered_indices and lines arrays in flex_layout
- Convert lines array growth from Array.Resize(+1) to doubling strategy
- Convert foreach to indexed for loops in FlexLayoutManager
- Change frame index fields from uint to int (InlineArray requirement)
- Result: Core Flex engine achieves 0 B managed allocations
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Cache ILayout.Count and Spacing in local variables - Convert foreach to indexed for loops in Measure/ArrangeChildren - Cache childCount in StackLayoutManager.UsesExpansion - Result: Stack layout maintains 0 B allocations with real objects Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add InvalidationEventArgs.GetCached(trigger) for cached instances - Replace new InvalidationEventArgs(trigger) in VisualElement, Page, Layout - Result: 0 B per invalidation dispatch Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- LayoutAllocBenchmarker: lightweight fakes for TRUE allocation measurement - LayoutHotPathBenchmarker: hot-path benchmarks with NSubstitute/Controls objects - InvalidationBenchmarker: invalidation event dispatch benchmark - initial-analysis.md: detailed performance analysis with results Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The remaining Flex Controls-layer allocations (64 B/child/pass) are caused by BindableProperty.SetValue boxing doubles for X/Y/Width/Height in VisualElement.UpdateBoundsComponents. Generic BindableProperty<T> will eliminate this overhead. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The analysis content has been incorporated into the PR description and issue #34154. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Revert foreach→for and Count/Spacing caching in Stack and FlexLayoutManager — benchmarks confirm these had zero allocation impact (compiler already optimizes foreach on concrete types, and Stack already used indexed for loops). Remove manual LoopCount loops from benchmarks — let BenchmarkDotNet handle iteration for proper statistical analysis. Per-operation numbers are now directly readable. Grid ArrangeChildren retains the foreach→for change because foreach on IGridLayout (interface) boxes the List<IView>.Enumerator struct (verified: 1.56 KB/op with foreach vs 0 B with for). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Grid: construct GridStructure into local first, then return old arrays and swap — prevents dangling state if constructor throws - Flex: wrap layout_item body in try/finally to ensure ArrayPool arrays are always returned via cleanup() even on exceptions - Grid: add comment documenting ArrayPool lifecycle and IDisposable consideration for GridLayoutManager Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
For grids without explicit row/column definitions, cache the single-element Definition[] arrays on the GridLayoutManager instead of renting them from ArrayPool. The smallest pool bucket is 16 elements, so caching a reusable Definition[1] avoids unnecessary pool overhead for the common case. Suggested-by: Pictos Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Extend the caching pattern from implied-only to all row/column arrays. ArrayPool<Definition> is now completely eliminated — the manager owns exact-sized Definition[] arrays that are reused across layout passes. This avoids ArrayPool's minimum bucket of 16 elements, which was wasteful for typical grids with 1-6 rows/columns. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
814bdab to
5080e83
Compare
There was a problem hiding this comment.
Pull request overview
This PR targets GC pressure in MAUI’s layout hot paths by eliminating steady-state allocations in Core Grid/Flex layout code and by adding benchmarks to measure both “pure Core” and end-to-end Controls layout behavior.
Changes:
- Refactors
GridLayoutManagerto use pooled arrays and struct-based internal state (cells/definitions/span keys) to avoid per-pass allocations. - Refactors
Flexlayout internals to remove per-child temporary array allocations and to pool internal working buffers (indices, wrap lines), withtry/finallycleanup. - Adds new BenchmarkDotNet benchmarkers for allocation-focused and end-to-end layout scenarios (including measure invalidation).
Reviewed changes
Copilot reviewed 5 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/Core/src/Layouts/GridLayoutManager.cs | Converts internal grid layout state to structs and uses pooling/caching to reduce allocations in measure/arrange paths. |
| src/Core/src/Layouts/Flex.cs | Removes per-call array allocations, pools internal buffers, and adds try/finally cleanup to ensure pooled arrays are returned. |
| src/Core/tests/Benchmarks/Benchmarks/LayoutHotPathBenchmarker.cs | New end-to-end hot path benchmark (NSubstitute + some real Controls objects). |
| src/Core/tests/Benchmarks/Benchmarks/LayoutAllocBenchmarker.cs | New allocation-focused benchmark using lightweight fake layouts/views to avoid mocking noise. |
| src/Core/tests/Benchmarks/Benchmarks/InvalidationBenchmarker.cs | New benchmark to measure measure-invalidation event dispatch overhead. |
| src/Controls/src/Core/LegacyLayouts/Layout.cs | Formatting-only change at file end. |
Comments suppressed due to low confidence (1)
src/Core/src/Layouts/GridLayoutManager.cs:1337
Definition’s constructor setsSizebefore assigning_gridLength. SinceSize’s setter relies onIsStar(which reads_gridLength), this depends on the default-initialized_gridLengthvalue. Assign_gridLengthfirst, then setSizefor absolute lengths to avoid subtle initialization-order bugs.
public Definition(GridLength gridLength)
{
_size = 0;
MinimumSize = 0;
if (gridLength.IsAbsolute)
{
Size = gridLength.Value;
}
_gridLength = gridLength;
| public int frame_size2_i; // cross axis size | ||
| int[]? ordered_indices; | ||
| int ordered_indices_count; | ||
|
|
There was a problem hiding this comment.
ordered_indices_count is assigned but never read. With TreatWarningsAsErrors enabled, this will fail the build (CS0414). Remove the field, or use it to bound child_at access / document why it’s needed.
| /// <summary> | |
| /// The number of valid entries in <see cref="ordered_indices"/>. | |
| /// This is set during layout calculation and tracks the logical child count | |
| /// for the current layout context. | |
| /// </summary> | |
| public readonly int OrderedIndicesCount => ordered_indices_count; |
| readonly Rect _targetBounds = new(0, 0, ConstraintWidth, ConstraintHeight); | ||
|
|
There was a problem hiding this comment.
_targetBounds is never used, which will produce an unused-field warning (and can fail the build under warnings-as-errors). Remove it or use it for the Arrange bounds.
| readonly Rect _targetBounds = new(0, 0, ConstraintWidth, ConstraintHeight); |
When a Grid has no explicit row/column definitions, reuse static cached arrays instead of allocating new Definition[1] each time. ArrayPool minimum bucket is 16 elements, so this avoids unnecessary pool overhead for the common single-definition case. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
/review -b feature/refactor-copilot-yml |
AI code review for net11.0 targetVerdict: LGTM (with two low-priority notes to confirm) Independent review (diff-first, then reconciled with the PR narrative). This is not an approval — a human still needs to sign off. What the PR doesReduces layout-engine allocations on hot paths:
Findings (non-blocking)
CIAll Confidence: medium-high. The pooling/InlineArray work is correct and compiles cleanly across platforms; the only residual concern is the shared static-array invariant, which I believe is currently safe but fragile. |
kubaflo
left a comment
There was a problem hiding this comment.
PR #34155 — [Perf] Optimize allocations in the layout engine (Flex / GridLayoutManager)
Verdict: NEEDS_CHANGES (confidence: high) — HEAD 039ffb1. The Flex pooling and the per-instance _cachedRows/_cachedColumns reuse are good, but the shared static s_defaultRow/s_defaultColumn arrays introduce cross-grid layout corruption. Caught independently by 2 of 4 models (gpt-5.5 + opus-4.6) and confirmed by code trace.
❌ The bug (GridLayoutManager.cs:226 / :249)
For a definition-less grid, _rows/_columns are aliased to shared static Definition[] arrays whose Size/MinimumSize are mutated during measure/arrange. Because the array is static, a reentrant measure of a nested definition-less grid resets and mutates the same array mid-pass, clobbering the parent grid's pre-computed star sizes. This is the common case (a Grid with no RowDefinitions nested inside another, under a finite height/width constraint where IsStarHeightPrecomputable is true), so it can yield incorrect measured sizes.
Trace: FirstMeasurePass → MeasureCell → child.Measure() constructs the child's GridStructure → child InitializeRows does s_defaultRow[0] = new Definition(GridLength.Star) and mutates it → parent resumes and reads its now-corrupted _rows[0] in MinimizeStarsForMeasurement / MeasuredGridHeight.
✅ Fix (keeps the perf win)
Replace the s_defaultRow/s_defaultColumn fallback with a fresh new Definition[1]. It's allocated once per grid and then reused across passes via _cachedRows/_cachedColumns, so the allocation-reduction goal is preserved without cross-instance sharing. Remove the now-unused statics. (Inline suggestion provided for the row case; mirror it for columns at line 249.)
Other paths
The Flex.cs pooling changes and the explicit-definition caching path (new Definition[count], line 232/255) are correctly per-instance and look fine. CI on this PR is otherwise green/known-flake; this is a code-correctness block, not a CI block.
| // Since no rows are specified, we'll create an implied row 0 | ||
| return Implied(); | ||
| // Since no rows are specified, we'll create an implied row 0 | ||
| _rows = (cached is not null && cached.Length >= 1) ? cached : s_defaultRow; |
There was a problem hiding this comment.
Shared mutable static array causes cross-grid layout corruption with nested definition-less grids. When a grid has no row definitions, _rows is pointed at the shared static s_defaultRow (line 226), and the same applies to s_defaultColumn at line 249. The Definition struct's Size/MinimumSize are then mutated in place during measure/arrange (e.g. ResolveStarRows, MinimizeStarsForMeasurement at ~871, SumDefinitions). Because the array is static and shared across all definition-less grids, a reentrant pass corrupts it:
- Definition-less Grid A measures a child that is itself a definition-less Grid B (extremely common — e.g. a
Gridwith noRowDefinitionscontaining another). - During
FirstMeasurePass,MeasureCell→child.Measure()(line ~451/474) constructs Grid B'sGridStructure, whoseInitializeRowsresets the sames_defaultRow[0] = new Definition(GridLength.Star)(line 227) and mutates it. - Control returns to Grid A, which now reads its
_rows[0](==s_defaultRow[0]) — but the pre-computed starSize(fromResolveStarRows, fired whenIsStarHeightPrecomputableis true, i.e. the common finite-constraint case) has been clobbered with Grid B's state. Grid A'sMeasuredGridHeight/MinimizeStarsForMeasurementthen use corrupted values.
This is reachable in the common case (nested star/definition-less grids under a finite constraint), so it can produce wrong measured sizes.
Fix (preserves the allocation win): don't alias the shared statics — allocate a one-element array, which is still cached per-grid via _cachedRows/_cachedColumns on subsequent passes, so you keep the per-instance allocation savings without cross-instance sharing. Apply the same to columns at line 249, and remove the now-unused s_defaultRow/s_defaultColumn statics.
Fixes #34154
Description
This PR eliminates all managed heap allocations in the Core layout engine (Grid, Flex) during steady-state measure+arrange passes. The changes target the hot path that runs on every layout cycle — in a typical app with scrolling lists or animated layouts, this path executes thousands of times per second.
Why this matters
Every allocation in the layout hot path contributes to GC pressure. On mobile devices (Android/iOS), Gen0 collections during layout can cause frame drops. By eliminating allocations entirely from the Core layout engine, we remove GC as a variable in layout performance.
What changed
GridLayoutManager.cs (largest change)
1. GridStructure class → struct
The
GridStructureclass was allocated on everyMeasure()call. Converting it to a struct and storing it as a field onGridLayoutManagereliminates this allocation. Because structs are value types, method calls on struct fields operate directly on the field (no defensive copies) — the field is intentionally non-readonly.Uses
_hasGridStructurebool instead of nullable (can't useNullable<GridStructure>because.Valuewould copy the entire struct).2. Cell class → struct
Each
Celltracked a child's grid position and measurement constraints. Converting to a struct eliminates per-child allocations. Methods that mutate cells now useref Cellparameters to avoid copy-mutation bugs.3. Definition class → struct
Each
Definitiontracked a row/column's size and grid length. Converting to a struct and addingreadonlymodifiers to pure getters. Fixed a pre-existing copy-mutation bug inEnsureSizeLimitwherevar def = defs[n]; def.Size = newSize;silently mutated a copy — now correctly usesdefs[n].Size = newSize;.4. SpanKey record → readonly struct with IEquatable
The
SpanKeywas arecord(reference type with heap allocation). Converted to areadonly structimplementingIEquatable<SpanKey>to eliminate allocations when used as Dictionary keys. Implements properGetHashCode()usingHashCode.Combine(with netstandard fallback).5. Span class eliminated
The
Spanclass bundled aSpanKeywith aRequesteddouble. Eliminated the class entirely —TrackSpannow takes individual parameters and the dictionary storesDictionary<SpanKey, double>directly.6. ArrayPool for all arrays
All four arrays in
GridStructurenow useArrayPool<T>.Shared:_childrenToLayOut(IView[]): cleared on return to avoid holding references_cells(Cell[]): struct array, no clearing needed_rows(Definition[]): struct array, no clearing needed_columns(Definition[]): struct array, no clearing neededRented arrays may be larger than requested — actual counts tracked via
_childCount,_rowCount,_columnCount. All array loops use these counts instead of.Length.ReturnArrays()is called at the start ofMeasure()before creating a newGridStructure. The new structure is constructed into a local first, then the old arrays are returned and the field is swapped — this ensures exception safety if the constructor throws.7. Dictionary reuse for span tracking
Dictionary<SpanKey, double>? _spansDictionaryfield onGridLayoutManageris passed intoGridStructurevia constructor. On subsequent calls,.Clear()reuses the dictionary instead of allocating a new one. Lazy initialization (_spans ??= new()) still works for grids with no spanning children.8. foreach → for in ArrangeChildren
foreachonIGridLayout(interface dispatch) boxes theList<IView>.Enumeratorstruct — verified by benchmark: 1.56 KB/op withforeachvs 0 B with indexedfor. Internal array loops (_cells,_rows,_columns) also useforwith count because ArrayPool-rented arrays are oversized.Flex.cs
9. SelfSizing float[] elimination
SelfSizingDelegatewas called withfloat[] size = new float[2] { w, h }— allocating a 2-element array per child per layout pass. Replaced withref float width, ref float heightparameters, eliminating the allocation entirely.10. InlineArray(4) for Frame buffer
Item.Framewasfloat[] Frame { get; } = new float[4]— each Item allocated a 4-element float array. Replaced with[InlineArray(4)] struct FrameBufferthat stores the 4 floats inline in the Item. Conditional onNET8_0_OR_GREATERwithfloat[]fallback for netstandard.11. ArrayPool for ordered_indices and lines
ordered_indices:ArrayPool<int>.Shared.Rent(item.Count)inflex_layout.init, returned incleanup()lines(flex wrap lines):ArrayPool<flex_layout_line>.Shared.Rent(newCapacity)with manual copy+return for growth. Changed growth strategy fromArray.Resize(+1)(linear, N allocations for N lines) to doubling (logarithmic).cleanup()wrapped intry/finallyto ensure arrays are always returned, even if layout throws.InvalidationEventArgs.cs
12. Static cached instances
Added
InvalidationEventArgs.GetCached(InvalidationTrigger)that returns static singletons per trigger value. Replacednew InvalidationEventArgs(trigger)inVisualElement,Page, and legacyLayout. These fire on every measure invalidation, which happens frequently during layout.What we tried and didn't work (across all engines)
NSubstitute-based benchmarking for allocation measurement: NSubstitute mocks add 40–200% allocation noise that completely obscures real optimization gains. Mock indexer calls (
_grid[n]) allocate tracking objects per invocation. Solution: CreatedLayoutAllocBenchmarkerwith lightweight hand-written fakes implementingIGridLayout/IStackLayoutdirectly.Optimizing remaining Flex Controls-layer allocations: The remaining ~848 B (12 children) ≈ 71 B per child per pass. Traced through the call chain:
FlexLayoutManager.ArrangeChildren→child.Arrange(frame)→VisualElement.ArrangeOverride→UpdateBoundsComponents→ sets X, Y, Width, Height viaBindableObject.SetValue(property, doubleValue). EachSetValueboxes thedoubleargument. This is fundamental to howBindablePropertyworks — fixing it requires genericBindableProperty<T>(tracked in [Perf] Eliminate value-type boxing in BindableObject.SetValue #34080).Stack/FlexLayoutManager foreach→for and Count/Spacing caching: Benchmarks confirmed these had zero allocation impact — the compiler already optimizes
foreachon concrete types, and Stack was already allocation-free. These changes were reverted to minimize maintenance overhead.Grid
_cells[]foreach→for: The C# compiler already optimizesforeachon arrays to indexed access — no enumerator boxing.New Benchmarks
LayoutAllocBenchmarker
Lightweight fake objects (no NSubstitute) for true allocation measurement. Includes
FakeView,FakeGridLayout,FakeStackLayout,FakeRowDefinition,FakeColumnDefinition. Benchmarks Grid, VStack, HStack, and Flex Core engine with[Params]for ChildCount (12, 60) and UseSpans (true, false).LayoutHotPathBenchmarker
Uses NSubstitute for Grid/Stack and real Controls objects (
FlexLayout+Borderchildren) for Flex. Measures the full Controls-layer stack includingVisualElement.Measure/Arrange.InvalidationBenchmarker
Measures
InvalidationEventArgsdispatch allocation (before/after static caching).Benchmark Results
LayoutAllocBenchmarker — Core layer, lightweight fake objects
This benchmark uses hand-written fake
IView/IGridLayout/IStackLayoutimplementations (no NSubstitute) to measure true layout engine allocations without mock infrastructure noise.Baseline =
origin/net11.0with identical benchmark code copied over.Grid (1× Measure + 1× Arrange per invocation)
Raw BenchmarkDotNet output — baseline (net11.0)
Raw BenchmarkDotNet output — optimized (this PR)
(Gen0/Gen1/Gen2 all zero — omitted)
Flex Core engine (1× Layout per invocation, no Controls layer)
Raw BenchmarkDotNet output — baseline (net11.0)
Raw BenchmarkDotNet output — optimized (this PR)
Stack (1× Measure + 1× Arrange per invocation)
Stack layout was already allocation-free. No changes to Stack code in this PR.
LayoutHotPathBenchmarker — Flex end-to-end with real Controls objects
This benchmark uses real
FlexLayout+Borderchildren (Controls layer) to measure allocations through the full stack includingVisualElement.Measure/Arrange.The remaining ~848 B (12 children) ≈ 71 B per child, traced to
VisualElement.UpdateBoundsComponentsboxing doubles intoBindableObject.SetValuefor X/Y/Width/Height. This will be fixed by genericBindableProperty<T>(#34080).Raw BenchmarkDotNet output — baseline (net11.0)
Raw BenchmarkDotNet output — optimized (this PR)
Note on GridLayoutManagerBenchMarker (existing, NSubstitute-based)
The existing
GridLayoutManagerBenchMarkeruses NSubstitute mocks. After struct conversions, each_grid[n]indexer call on a mockedIGridLayoutallocates NSubstitute tracking objects — this is a benchmark artifact, not a real regression. TheLayoutAllocBenchmarkerwith lightweight fakes confirms the optimizations work correctly with real objects.Test Status
All 441 existing tests pass (394 Core layout + 47 Controls layout).