Fix NRE from null-Sender routes built during description mode (GH-2897)#2899
Merged
Conversation
A MessageRoute built while WolverineSystemPart.WithinDescription is true may take a null Sender (endpoint.Agent!), because the sending agent isn't built yet during FindResources()/resource-setup-on-startup (before transports start). If such a route escapes onto the send path it NREs in CreateForSending -> new Envelope(message, Sender) (the Envelope ctor dereferences agent.Endpoint.DefaultSerializer). This is a 6.0-line regression: eager PrepopulateRoutingCache (#2769/#2794) + the description/clear dance created a window that 5.x's lazy routing never had. Three layers: 1. (primary) RoutingFor no longer caches routes built while WithinDescription is true, so a null-Sender description route can never poison the runtime router cache. This enforces the invariant the manual ClearRoutingFor() in FindResources was groping for. 2. WolverineSystemPart.WithinDescription is now AsyncLocal-scoped instead of a process- global mutable static, so a description pass on one task/host can't bleed its "null Sender is OK" state across an await into concurrent route building elsewhere. WriteToConsole now resets the flag in a finally. 3. (defense-in-depth) CreateForSending resolves the live sending agent on demand if a route's Sender is somehow null, instead of NRE'ing in the Envelope ctor. Adds a CoreTests regression test asserting description-mode routing is not cached (fails before fix #1, passes after). Full wolverine.slnx builds clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #2897.
Problem
With durable policies active (
UseDurableLocalQueues/UseDurableOutboxOnAllSendingEndpoints/UseDurableInboxOnAllListeners), hosts can NRE on startup:MessageRoute.Senderis a get-only property set once in the ctor. WhileWolverineSystemPart.WithinDescriptionistrue, the ctor takesSender = endpoint.Agent!— and the agent can be null (it isn't built until transports start, which is afterFindResources()/resource-setup-on-startup). If that null-Senderroute gets cached and reused for an actual send,CreateForSending→new Envelope(message, Sender)dereferencesagent.Endpointand throws.Why it's a 6.0-line regression (worked in v5): v5 built routes lazily on first send — always with a live agent, so
Senderwas never null. 6.0 added eagerPrepopulateRoutingCacheat startup (#2769 / #2794) alongside the description-phase mechanism, which created the window.FindResources()already tries to compensate by clearing those routes (ClearRoutingFor), but that manual dance is order/concurrency-fragile and the bug slips past it under test/ASP.NET host startup (the reporter sees it only under Alba; the workaround is to disable the durable policies in the test host).Fix (three layers)
RoutingForskips the cache write whileWithinDescriptionis true, so a null-Senderroute can never poison the runtime router cache. This enforces the invariant the manualClearRoutingFor()was groping for, independent of startup ordering.WithinDescriptionis nowAsyncLocal-scoped rather than a process-global mutable static, so a description pass on one task/host can't bleed its "null Sender is OK" state across anawaitinto concurrent route building on another flow or another host in the same process.WriteToConsolenow resets it in afinally(it previously leaked).CreateForSending— if a route'sSenderis somehow null, resolve the live agent on demand viaGetOrBuildSendingAgentinstead of NRE'ing in theEnvelopector.Tests
Adds
CoreTests/Runtime/Routing/description_mode_routes_are_not_cached.csasserting that routing requested underWithinDescriptionis not cached (each lookup rebuilds) while normal routing is cached. Verified it fails before fix #1 and passes after. ExistingWithinDescription/routing tests (global_partitioning,WolverineDiagnosticsCommand,system_message_type_filtering) all still pass.dotnet build wolverine.slnx -c Releaseis clean (0/0); routing/describe/bugs slice green.🤖 Generated with Claude Code