Skip to content

Fix NRE from null-Sender routes built during description mode (GH-2897)#2899

Merged
jeremydmiller merged 1 commit into
mainfrom
fix-2897-null-sender-routes
May 25, 2026
Merged

Fix NRE from null-Sender routes built during description mode (GH-2897)#2899
jeremydmiller merged 1 commit into
mainfrom
fix-2897-null-sender-routes

Conversation

@jeremydmiller

Copy link
Copy Markdown
Member

Fixes #2897.

Problem

With durable policies active (UseDurableLocalQueues / UseDurableOutboxOnAllSendingEndpoints / UseDurableInboxOnAllListeners), hosts can NRE on startup:

System.NullReferenceException
   at Wolverine.Envelope..ctor(...)          // agent.Endpoint.DefaultSerializer, agent == null
   at Wolverine.Runtime.Routing.MessageRoute.CreateForSending(...)

MessageRoute.Sender is a get-only property set once in the ctor. While WolverineSystemPart.WithinDescription is true, the ctor takes Sender = endpoint.Agent! — and the agent can be null (it isn't built until transports start, which is after FindResources()/resource-setup-on-startup). If that null-Sender route gets cached and reused for an actual send, CreateForSendingnew Envelope(message, Sender) dereferences agent.Endpoint and throws.

Why it's a 6.0-line regression (worked in v5): v5 built routes lazily on first send — always with a live agent, so Sender was never null. 6.0 added eager PrepopulateRoutingCache at startup (#2769 / #2794) alongside the description-phase mechanism, which created the window. FindResources() already tries to compensate by clearing those routes (ClearRoutingFor), but that manual dance is order/concurrency-fragile and the bug slips past it under test/ASP.NET host startup (the reporter sees it only under Alba; the workaround is to disable the durable policies in the test host).

Fix (three layers)

  1. (primary) Don't cache description-mode routesRoutingFor skips the cache write while WithinDescription is true, so a null-Sender route can never poison the runtime router cache. This enforces the invariant the manual ClearRoutingFor() was groping for, independent of startup ordering.
  2. De-globalize the flagWithinDescription is now AsyncLocal-scoped rather than a process-global mutable static, so a description pass on one task/host can't bleed its "null Sender is OK" state across an await into concurrent route building on another flow or another host in the same process. WriteToConsole now resets it in a finally (it previously leaked).
  3. (defense-in-depth) Resilient CreateForSending — if a route's Sender is somehow null, resolve the live agent on demand via GetOrBuildSendingAgent instead of NRE'ing in the Envelope ctor.

Tests

Adds CoreTests/Runtime/Routing/description_mode_routes_are_not_cached.cs asserting that routing requested under WithinDescription is not cached (each lookup rebuilds) while normal routing is cached. Verified it fails before fix #1 and passes after. Existing WithinDescription/routing tests (global_partitioning, WolverineDiagnosticsCommand, system_message_type_filtering) all still pass.

dotnet build wolverine.slnx -c Release is clean (0/0); routing/describe/bugs slice green.

Note on reproduction: I confirmed the exact crash chain and the precondition, but could not deterministically reproduce the trigger in a minimal generic host (the FindResources clear self-heals there) — it needs the reporter's Alba/ASP.NET host wiring. These fixes close the null-Sender class structurally regardless of the exact startup ordering, so they don't depend on reproducing that specific window.

🤖 Generated with Claude Code

A MessageRoute built while WolverineSystemPart.WithinDescription is true may take a
null Sender (endpoint.Agent!), because the sending agent isn't built yet during
FindResources()/resource-setup-on-startup (before transports start). If such a route
escapes onto the send path it NREs in CreateForSending -> new Envelope(message, Sender)
(the Envelope ctor dereferences agent.Endpoint.DefaultSerializer). This is a 6.0-line
regression: eager PrepopulateRoutingCache (#2769/#2794) + the description/clear dance
created a window that 5.x's lazy routing never had.

Three layers:

1. (primary) RoutingFor no longer caches routes built while WithinDescription is true,
   so a null-Sender description route can never poison the runtime router cache. This
   enforces the invariant the manual ClearRoutingFor() in FindResources was groping for.

2. WolverineSystemPart.WithinDescription is now AsyncLocal-scoped instead of a process-
   global mutable static, so a description pass on one task/host can't bleed its
   "null Sender is OK" state across an await into concurrent route building elsewhere.
   WriteToConsole now resets the flag in a finally.

3. (defense-in-depth) CreateForSending resolves the live sending agent on demand if a
   route's Sender is somehow null, instead of NRE'ing in the Envelope ctor.

Adds a CoreTests regression test asserting description-mode routing is not cached
(fails before fix #1, passes after). Full wolverine.slnx builds clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jeremydmiller jeremydmiller merged commit 318f0e1 into main May 25, 2026
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant