Skip to content

Fix #4727: release the message-batch semaphore on the double-checked-lock fast path (composite rebuild deadlock)#4728

Merged
jeremydmiller merged 2 commits into
JasperFx:masterfrom
erdtsieck:bugfix/4727-message-batch-semaphore-leak
Jun 12, 2026
Merged

Fix #4727: release the message-batch semaphore on the double-checked-lock fast path (composite rebuild deadlock)#4728
jeremydmiller merged 2 commits into
JasperFx:masterfrom
erdtsieck:bugfix/4727-message-batch-semaphore-leak

Conversation

@erdtsieck

Copy link
Copy Markdown
Contributor

Fixes #4727 (and the residual deadlock behind the closed #4721).

Problem

ProjectionUpdateBatch.CurrentMessageBatch leaks its SemaphoreSlim. After await _semaphore.WaitAsync(...), the inner double-checked if (_batch != null) return _batch; sits outside the try/finally, so the second concurrent caller to acquire the semaphore returns without ever calling _semaphore.Release().

During an optimized composite rebuild whose stage emits side-effect messages, the parallel event slices all call CurrentMessageBatch via ProjectionBatch.PublishMessageAsync. They queue on the semaphore; the first creates the batch and releases; the second hits the inner early-return and leaks the semaphore — so every remaining queued slice is parked on WaitAsync forever and the rebuild freezes (idle, no error, no DB query). Captured via a managed dump in #4727.

Fix

Move the inner null-check inside the try so finally always releases the semaphore, and drop the Task.Delay(Random.Shared.Next(25, 200)) band-aid that was only masking the race.

Test

DaemonTests/Bugs/Bug_4727_message_batch_semaphore_leak.cs — a gated IMessageOutbox holds the semaphore while the first batch is created so N concurrent CurrentMessageBatch callers reliably pile up on it. The test deadlocks (15s timeout) on the previous code and passes in ~1s with this fix; all callers observe the single shared batch.

Found in production on a sharded + tenant-partitioned store (Marten 9.7.3 / JasperFx 2.9.9 / Wolverine 6.8.0): an invoices composite rebuild whose stage-2 members publish side-effect messages froze at a batch boundary on every deploy.

…checked-lock fast path

ProjectionUpdateBatch.CurrentMessageBatch acquired _semaphore and then returned the
already-created batch from the inner double-checked `if (_batch != null) return _batch;`
which sat OUTSIDE the try/finally, so the second concurrent caller to win the semaphore
returned without releasing it. During an optimized composite rebuild whose stage emits
side-effect messages, the parallel event slices all call CurrentMessageBatch (via
ProjectionBatch.PublishMessageAsync); the leaked semaphore then deadlocks every queued
slice, freezing the rebuild forever (idle, no error, no query) - the symptom reported
in JasperFx#4727 and originally JasperFx#4721.

Move the inner null-check inside the try so finally always releases the semaphore, and
drop the random Task.Delay race band-aid that was masking the leak.

Adds DaemonTests/Bugs/Bug_4727_message_batch_semaphore_leak.cs: a gated IMessageOutbox
holds the semaphore while the first batch is created so N concurrent CurrentMessageBatch
callers pile up on it. The test deadlocks (15s timeout) on the previous code and passes
in ~1s with this fix.
…ed + tenant-partitioned tenancy (JasperFx#4727)

Regression for the full production configuration that exposed the CurrentMessageBatch
semaphore deadlock: MultiTenantedWithShardedDatabases + TenancyStyle.Conjoined +
UseTenantPartitionedEvents + a multi-stage CompositeProjectionFor whose stage-2 member
publishes side-effect messages (RaiseSideEffects -> slice.PublishMessage), driven through an
optimized composite rebuild.

The optimized rebuild runs in ShardExecutionMode.Continuous, so the stage-2 side effects fire
and the parallel event slices contend on ProjectionUpdateBatch.CurrentMessageBatch. On the
pre-fix code the rebuild deadlocks and never completes (the test hangs); with the semaphore
fix it completes in ~1s and every tenant's documents on the multi-tenant shard materialize.
@jeremydmiller jeremydmiller merged commit 59695e4 into JasperFx:master Jun 12, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants