Skip to content

Cap distributed directory ownership transfer batch size#10047

Merged
ReubenBond merged 7 commits into
dotnet:mainfrom
ReubenBond:fix/directory-snapshot-transfer-ranges
May 2, 2026
Merged

Cap distributed directory ownership transfer batch size#10047
ReubenBond merged 7 commits into
dotnet:mainfrom
ReubenBond:fix/directory-snapshot-transfer-ranges

Conversation

@ReubenBond

@ReubenBond ReubenBond commented Apr 28, 2026

Copy link
Copy Markdown
Member

Summary

Stabilizes distributed grain directory ownership transfer so this can be the first merge candidate in the directory PR sequence.

This PR is now limited to the foundational ownership-transfer fixes:

  • correct snapshot handoff range selection
  • stable ownership checks while partition transfers are in flight

Details

When a partition acquires a range, that range can span multiple previous owners. The acquiring partition now requests snapshots only for the intersection between the acquired range and each previous owner's actual range, instead of asking any previous owner for unrelated parts of the ring.

Directory register/lookup/deregister operations now wait against a stable ownership view before deciding whether the local partition can serve a request. This avoids stale-membership requests racing with a newer in-flight range transfer and being served by a partition that is no longer the correct owner.

Changes

  • Request snapshot transfer only for the intersection of previousOwnerRange and addedRange.
  • Add targeted coverage for snapshot transfer range intersections.
  • Wait for stable current ownership before partition RegisterAsync, LookupAsync, and DeregisterAsync responses.

Stack notes

Focused branch ranges:

Validation

Focused validation run locally:

  • dotnet test test\Orleans.GrainDirectory.Tests\Orleans.GrainDirectory.Tests.csproj --framework net10.0 --filter "FullyQualifiedName~GrainDirectoryPartitionTests" -- -parallel none -noshadow
  • dotnet test test\Orleans.GrainDirectory.Tests\Orleans.GrainDirectory.Tests.csproj --framework net10.0 --filter "FullyQualifiedName~GrainDirectoryRollingUpgradeTests|FullyQualifiedName~GrainDirectoryPartitionBatchingTests|FullyQualifiedName~GrainDirectoryResilienceTests.JoiningSilo_DoesNotLeaveStaleEntriesOnPreviousOwner" -- -parallel none -noshadow

GrainDirectoryResilienceTests.ElasticChaos was also attempted as part of a broader top-stack run and timed out after several minutes; the narrower non-chaos coverage above passed.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes distributed grain directory snapshot transfers so that range pulls are scoped to the intersection of (a) the range being acquired and (b) the range owned by each previous owner, avoiding out-of-range snapshot requests during directory migration.

Changes:

  • Expand DEBUG assertions to account for full-range membership cases during range transitions.
  • Adjust range-acquisition snapshot pulls to request only previousOwnerRange ∩ addedRange (including wrapped-range intersections).
  • Update trace/warning log message text to reflect singular “range” wording.
Show a summary per file
File Description
src/Orleans.Runtime/GrainDirectory/GrainDirectoryPartition.cs Restricts snapshot transfer queries to the prior owner’s actual owned subrange intersecting the acquired range; minor assertion/log message adjustments.

Copilot's findings

  • Files reviewed: 1/1 changed files
  • Comments generated: 2

Comment thread src/Orleans.Runtime/GrainDirectory/GrainDirectoryPartition.cs Outdated
Comment thread src/Orleans.Runtime/GrainDirectory/GrainDirectoryPartition.cs Outdated
@ReubenBond ReubenBond force-pushed the fix/directory-snapshot-transfer-ranges branch 2 times, most recently from 46e81cf to f38fbdd Compare April 28, 2026 23:13
@ReubenBond ReubenBond force-pushed the feature/directory-migration-base branch from b4c02ea to f40de9a Compare April 28, 2026 23:13
@ReubenBond ReubenBond changed the base branch from feature/directory-migration-base to main April 28, 2026 23:13
@ReubenBond ReubenBond changed the base branch from main to fix/directory-transfer-payload-batching April 29, 2026 00:12
@ReubenBond ReubenBond force-pushed the fix/directory-snapshot-transfer-ranges branch from f38fbdd to 4ff2582 Compare April 29, 2026 14:51
@ReubenBond ReubenBond force-pushed the fix/directory-transfer-payload-batching branch from f40de9a to 596a6a3 Compare April 29, 2026 14:51
@ReubenBond ReubenBond force-pushed the fix/directory-snapshot-transfer-ranges branch from 4ff2582 to 4993770 Compare April 29, 2026 15:09
@ReubenBond ReubenBond force-pushed the fix/directory-transfer-payload-batching branch from 596a6a3 to 8226bb5 Compare April 29, 2026 15:10
@ReubenBond ReubenBond force-pushed the fix/directory-snapshot-transfer-ranges branch from 4993770 to 8f24b05 Compare April 29, 2026 18:23
@ReubenBond ReubenBond force-pushed the fix/directory-transfer-payload-batching branch from 8226bb5 to 1d60be0 Compare April 29, 2026 18:26
@ReubenBond ReubenBond changed the base branch from fix/directory-transfer-payload-batching to main April 29, 2026 18:29
@ReubenBond ReubenBond closed this Apr 29, 2026
@ReubenBond ReubenBond reopened this Apr 29, 2026
@ReubenBond ReubenBond force-pushed the fix/directory-snapshot-transfer-ranges branch 3 times, most recently from 0a87ad8 to 5166942 Compare April 29, 2026 21:01
@ReubenBond ReubenBond changed the title Fix directory snapshot transfer ranges Stabilize distributed directory ownership transfer Apr 29, 2026
@ReubenBond

Copy link
Copy Markdown
Member Author

Rebuilt this as the first green merge candidate. It now contains the coupled lower-layer fixes needed for CI to pass from main:

The rolling-upgrade regression test was moved to the later feature PR layer because it depends on the follow-up resilience commits there.

@ReubenBond ReubenBond force-pushed the fix/directory-snapshot-transfer-ranges branch from 5166942 to 13a7c57 Compare April 29, 2026 23:39
@ReubenBond ReubenBond changed the title Stabilize distributed directory ownership transfer Cap distributed directory ownership transfer batch size Apr 29, 2026
@ReubenBond ReubenBond force-pushed the fix/directory-snapshot-transfer-ranges branch from 13a7c57 to f0bdb34 Compare April 29, 2026 23:51
@ReubenBond ReubenBond enabled auto-merge April 29, 2026 23:51
@ReubenBond ReubenBond added this pull request to the merge queue Apr 30, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Apr 30, 2026
@ReubenBond ReubenBond added this pull request to the merge queue May 1, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 1, 2026
ReubenBond and others added 7 commits May 2, 2026 08:26
Query each previous owner only for the intersection of its prior range and the range being acquired during distributed directory migration.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Wait on a stable ownership view before serving register, lookup, and deregister requests, and stamp registrations with the captured membership version.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Ensure directory membership snapshots store ranges by partition index instead of sorted ring order so ownership transfer targets the correct partition.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Convert the partition range regression to sample generated directory membership snapshots via CsCheck.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Allow the internal snapshot test constructor to use a generated partition count and update property tests to assert ranges using that count.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Keep directory partition indexes tied to the original hash-code array order instead of assigning them after sorting ring boundaries. Strengthen the property test to compute expected partition ranges from the generated hashes so this mapping is independently verified across variable partition counts.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@ReubenBond ReubenBond force-pushed the fix/directory-snapshot-transfer-ranges branch from d21f877 to 8f925e2 Compare May 2, 2026 15:26
@ReubenBond ReubenBond added this pull request to the merge queue May 2, 2026
Merged via the queue into dotnet:main with commit 6705d86 May 2, 2026
62 checks passed
@ReubenBond ReubenBond deleted the fix/directory-snapshot-transfer-ranges branch May 2, 2026 21:50
@github-actions github-actions Bot locked and limited conversation to collaborators Jun 2, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants