Fix LocalGrainDirectory membership reconciliation#10086
Merged
ReubenBond merged 4 commits intoMay 12, 2026
Conversation
Process cluster membership as snapshots in LocalGrainDirectory so directory state can be reconciled and retried after failures. Move silo-removal activation cleanup out of Catalog and keep handoff operations retrying until success, obsolescence, or shutdown. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Suppress expected shutdown failures while stopping membership processing and disposing the directory cache. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Remove redundant locking and cancellation from snapshot application, publish the latest directory membership before side effects, and simplify defunct entry cleanup against the current membership. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This was referenced May 12, 2026
Contributor
There was a problem hiding this comment.
Pull request overview
This PR refactors LocalGrainDirectory to process cluster membership changes as versioned snapshots (instead of one-off silo status events), reconciling local directory/cache state from each snapshot and moving directory-owned activation cleanup out of Catalog and into LocalGrainDirectory. It also adjusts handoff execution behavior to keep retrying while the directory is running, and updates a unit test to accommodate the new IClusterMembershipService dependency.
Changes:
- Replace
ISiloStatusOracleevent-driven membership updates inLocalGrainDirectorywith snapshot-driven reconciliation viaIClusterMembershipService. - Move “directory-owned activation” deactivation logic from
CatalogtoLocalGrainDirectorywhen a partition owner transitions to a terminating state. - Update handoff manager retry mechanics and add guards to reduce stale/incorrect handoff processing; update a test for the new constructor dependency.
Show a summary per file
| File | Description |
|---|---|
| test/Orleans.Core.Tests/Directory/CachedGrainLocatorTests.cs | Updates the LocalGrainDirectory test construction to pass a mocked IClusterMembershipService. |
| src/Orleans.Runtime/GrainDirectory/LocalGrainDirectory.cs | Implements snapshot-based membership processing/reconciliation and relocates activation deactivation logic into the directory. |
| src/Orleans.Runtime/GrainDirectory/GrainDirectoryHandoffManager.cs | Changes handoff execution/retry behavior and adds additional successor checks and filtering logic. |
| src/Orleans.Runtime/Catalog/Catalog.cs | Removes directory-ownership cleanup logic and related dependencies now handled by LocalGrainDirectory. |
Copilot's findings
- Files reviewed: 4/4 changed files
- Comments generated: 2
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Part 1 of 7 split from #10085.
Problem:
LocalGrainDirectory membership changes were applied as one-off events, so partial failures could leave directory/cache state inconsistent. Directory-owned activation cleanup also lived in Catalog, which coupled Catalog to directory ownership details.
Solution:
Process cluster membership as snapshots inside LocalGrainDirectory, reconcile the local directory/cache from each snapshot, move directory-owned activation cleanup into LocalGrainDirectory, and keep handoff work retrying while the directory is running.
Stack:
Merge this PR first. Next: #10087.
Review focus:
Snapshot application ordering, directory/cache reconciliation, activation deactivation behavior, and handoff retry semantics.