Cap distributed directory ownership transfer batch size#10047
Merged
ReubenBond merged 7 commits intoMay 2, 2026
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Fixes distributed grain directory snapshot transfers so that range pulls are scoped to the intersection of (a) the range being acquired and (b) the range owned by each previous owner, avoiding out-of-range snapshot requests during directory migration.
Changes:
- Expand DEBUG assertions to account for full-range membership cases during range transitions.
- Adjust range-acquisition snapshot pulls to request only
previousOwnerRange ∩ addedRange(including wrapped-range intersections). - Update trace/warning log message text to reflect singular “range” wording.
Show a summary per file
| File | Description |
|---|---|
| src/Orleans.Runtime/GrainDirectory/GrainDirectoryPartition.cs | Restricts snapshot transfer queries to the prior owner’s actual owned subrange intersecting the acquired range; minor assertion/log message adjustments. |
Copilot's findings
- Files reviewed: 1/1 changed files
- Comments generated: 2
46e81cf to
f38fbdd
Compare
b4c02ea to
f40de9a
Compare
f38fbdd to
4ff2582
Compare
f40de9a to
596a6a3
Compare
4ff2582 to
4993770
Compare
596a6a3 to
8226bb5
Compare
4993770 to
8f24b05
Compare
This was referenced Apr 29, 2026
8226bb5 to
1d60be0
Compare
0a87ad8 to
5166942
Compare
Member
Author
|
Rebuilt this as the first green merge candidate. It now contains the coupled lower-layer fixes needed for CI to pass from
The rolling-upgrade regression test was moved to the later feature PR layer because it depends on the follow-up resilience commits there. |
This was referenced Apr 29, 2026
5166942 to
13a7c57
Compare
13a7c57 to
f0bdb34
Compare
Query each previous owner only for the intersection of its prior range and the range being acquired during distributed directory migration. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Wait on a stable ownership view before serving register, lookup, and deregister requests, and stamp registrations with the captured membership version. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Ensure directory membership snapshots store ranges by partition index instead of sorted ring order so ownership transfer targets the correct partition. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Convert the partition range regression to sample generated directory membership snapshots via CsCheck. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Allow the internal snapshot test constructor to use a generated partition count and update property tests to assert ranges using that count. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Keep directory partition indexes tied to the original hash-code array order instead of assigning them after sorting ring boundaries. Strengthen the property test to compute expected partition ranges from the generated hashes so this mapping is independently verified across variable partition counts. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
d21f877 to
8f925e2
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Stabilizes distributed grain directory ownership transfer so this can be the first merge candidate in the directory PR sequence.
This PR is now limited to the foundational ownership-transfer fixes:
Details
When a partition acquires a range, that range can span multiple previous owners. The acquiring partition now requests snapshots only for the intersection between the acquired range and each previous owner's actual range, instead of asking any previous owner for unrelated parts of the ring.
Directory register/lookup/deregister operations now wait against a stable ownership view before deciding whether the local partition can serve a request. This avoids stale-membership requests racing with a newer in-flight range transfer and being served by a partition that is no longer the correct owner.
Changes
previousOwnerRangeandaddedRange.RegisterAsync,LookupAsync, andDeregisterAsyncresponses.Stack notes
IRemoteGrainDirectorycompatibility implementation.Focused branch ranges:
main..fix/directory-snapshot-transfer-rangesfix/directory-snapshot-transfer-ranges..feature/distributed-remote-grain-directoryfeature/distributed-remote-grain-directory..fix/directory-transfer-payload-batchingfix/directory-transfer-payload-batching..fix/directory-migration-regression-testValidation
Focused validation run locally:
dotnet test test\Orleans.GrainDirectory.Tests\Orleans.GrainDirectory.Tests.csproj --framework net10.0 --filter "FullyQualifiedName~GrainDirectoryPartitionTests" -- -parallel none -noshadowdotnet test test\Orleans.GrainDirectory.Tests\Orleans.GrainDirectory.Tests.csproj --framework net10.0 --filter "FullyQualifiedName~GrainDirectoryRollingUpgradeTests|FullyQualifiedName~GrainDirectoryPartitionBatchingTests|FullyQualifiedName~GrainDirectoryResilienceTests.JoiningSilo_DoesNotLeaveStaleEntriesOnPreviousOwner" -- -parallel none -noshadowGrainDirectoryResilienceTests.ElasticChaoswas also attempted as part of a broader top-stack run and timed out after several minutes; the narrower non-chaos coverage above passed.