Skip to content

[Fix] Shutdown Issues on Rabbit MQ#4047

Merged
iancooper merged 15 commits into
masterfrom
shutdown_rmq
Mar 16, 2026
Merged

[Fix] Shutdown Issues on Rabbit MQ#4047
iancooper merged 15 commits into
masterfrom
shutdown_rmq

Conversation

@iancooper

Copy link
Copy Markdown
Member

Closes #4024 and #3684

This corrects both the issue that we were failing to exit correctly because we did not use DisposeAsync when closing an async channel from the Proactor (this may have caused errors in other messaginggateways), and because implementations of DisposeAsync were blocking, which stalls the Brighter single-threaded synchronization context. This forced users to CTRL+C to exit.

In addition, it fixed a race condition around pool closure.

iancooper and others added 9 commits March 11, 2026 23:12
…sync on ChannelAsync

- Test: When_ChannelAsync_Is_Disposed_Async_Then_Consumer_Is_Disposed
- IAmAChannelAsync now extends IAsyncDisposable
- ChannelAsync.DisposeAsync() awaits _messageConsumer.DisposeAsync()
- All transports inherit this via ChannelAsync (no per-transport changes needed)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- ADR 0054 status: Proposed → Accepted
- Updated Impact on Other Transports section with codebase analysis
- Added tasks.md with 5 phases, 9 tasks
- Approved design and tasks

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ispose()

- Test: When_Proactor_Receives_Quit_Should_Dispose_Channel_Async
- All 5 Channel.Dispose() calls in Proactor.EventLoop() changed to await Channel.DisposeAsync()
- Eliminates sync-over-async deadlock in the Proactor shutdown path (#3684)
- The entire dispose chain is now fully async within BrighterAsyncContext

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…4024)

- ShutdownHandler now verifies via ReferenceEquals that the connection
  being shut down is the same instance currently in the pool before
  removing/disposing it. Prevents stale handlers from disposing
  replacement connections.
- RemoveConnection moves ContainsKey check inside the lock, eliminating
  the TOCTOU race condition. TryRemoveConnection already handles missing
  keys safely via TryGetValue.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…onfigureAwait(false) (#4024, #3684)

- ShutdownHandler now verifies via ReferenceEquals that the connection
  being shut down is the same instance currently in the pool before
  removing/disposing it. Prevents stale handlers from disposing
  replacement connections.
- RemoveConnectionAsync moves ContainsKey check inside the semaphore,
  eliminating the TOCTOU race condition.
- Added ConfigureAwait(false) to all 12 await calls as defence-in-depth
  against sync-over-async deadlocks — continuations will run on the
  thread pool rather than capturing BrighterSynchronizationContext.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
542 core tests pass, 73 RMQ integration tests pass (13 pre-existing failures unrelated to changes).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@iancooper iancooper self-assigned this Mar 12, 2026
@iancooper iancooper requested a review from preardon as a code owner March 12, 2026 17:17
@iancooper iancooper added Bug .NET Pull requests that update .net code V10.X labels Mar 12, 2026
codescene-delta-analysis[bot]

This comment was marked as outdated.

@claude

claude Bot commented Mar 12, 2026

Copy link
Copy Markdown

PR Review: [Fix] Shutdown Issues on Rabbit MQ

This is a well-reasoned, surgical fix for two long-standing bugs. The ADR is exemplary — the root cause analysis for the sync-over-async deadlock is thorough and accurate. Overall the approach is correct and I support merging with a few observations below.


✅ Strengths

  1. Root cause is correctly identified. The deadlock trace (fire-and-forget DisposeAsync()GetAwaiter().GetResult() → continuation posted to the same blocked thread) is accurate and the fix directly addresses it.
  2. Minimal blast radius. Changing IAmAChannelAsync : IAsyncDisposable + adding ChannelAsync.DisposeAsync() + updating all 5 Channel.Dispose() call sites in Proactor.cs is exactly the right scope.
  3. ConfigureAwait(false) additions in the async connection pool are correct and provide good defence-in-depth.
  4. ReferenceEquals guard is the right way to prevent stale shutdown handlers — cleaner than reference counting.
  5. TOCTOU fix in RemoveConnection (moving ContainsKey inside the lock) is correct.
  6. ADR documentation in docs/adr/0054-… is thorough and covers alternatives considered.

🔍 Issues and Observations

1. ChannelAsync.DisposeAsync() lacks ConfigureAwait(false) (minor)

// ChannelAsync.cs — new method
public virtual async ValueTask DisposeAsync()
{
    await _messageConsumer.DisposeAsync();   // ← no ConfigureAwait(false)
    GC.SuppressFinalize(this);
}

ChannelAsync is library code. All the async pool methods in this same PR add .ConfigureAwait(false). This one should too for consistency — and the ADR explicitly lists this as a decision:

await _messageConsumer.DisposeAsync().ConfigureAwait(false);

2. DisposeAsync() doesn't coordinate with the existing sync Dispose() pattern

ChannelAsync already has a Dispose(bool) method (from IDisposable via IAmAChannel). The new DisposeAsync() does not appear to set a _disposed guard, so:

  • Calling DisposeAsync() followed by Dispose() (or vice versa) will double-dispose _messageConsumer.
  • The recommended pattern for classes implementing both IDisposable and IAsyncDisposable is:
public virtual async ValueTask DisposeAsync()
{
    await _messageConsumer.DisposeAsync().ConfigureAwait(false);
    Dispose(false);          // suppress the managed cleanup path, or use a bool _disposed guard
    GC.SuppressFinalize(this);
}

The idempotency test only exercises double-DisposeAsync(). A test for DisposeAsync() then Dispose() (the mixed path) would catch any double-dispose issue.


3. Test AsyncChannelDisposalTests doesn't assert the consumer was disposed

// When_channel_async_dispose_async_should_await_consumer_dispose.cs
// Act
await ((IAsyncDisposable)channel).DisposeAsync();

// Assert — only checks idempotency, not that the consumer was actually disposed
var exception = await Record.ExceptionAsync(async () =>
    await ((IAsyncDisposable)channel).DisposeAsync()
);
Assert.Null(exception);

The test name says "should await consumer dispose" but the assertion only confirms no exception is thrown on a second disposal. There is no assertion that _messageConsumer.DisposeAsync() was actually called during the first disposal. Consider adding a spy/tracking consumer (similar to TrackingChannelAsync) to assert the consumer's DisposeAsync was invoked.


4. #pragma warning disable xUnit1031 in a test file is a warning sign

#pragma warning disable xUnit1031
[Fact]
public void When_Proactor_Receives_Quit_Should_Dispose_Channel_Async()
{
    _performerTask.Wait(TimeSpan.FromSeconds(10));
    ...
}
#pragma warning restore xUnit1031

xUnit1031 warns against calling .Wait() on a Task in a sync test because it can cause deadlocks in some synchronization contexts. The suppression is pragmatic here (the Proactor manages its own BrighterAsyncContext and the .Wait() is on the outer performer task), but this test should ideally be async and use await _performerTask.WaitAsync(TimeSpan.FromSeconds(10)) to keep the pattern consistent with the rest of the test suite.


5. Async setup in the test constructor is fragile

public ProactorQuitAsyncDisposalTests()
{
    ...
    var performer = new Performer(_trackingChannel, messagePump);
    _performerTask = performer.Run();   // starts async work in constructor
    performer.Stop(_routingKey);
}

Starting async work in a constructor is generally fragile because exceptions are lost and the test execution order can be surprising. If performer.Run() is a Task, consider using a [Theory]/[Fact] setup method or a factory pattern. The Thread using directive is also present but doesn't appear to be used — minor cleanup opportunity.


6. No tests for the connection pool fixes

The two new test files cover the ChannelAsync.DisposeAsync() and Proactor paths but nothing covers:

  • The ReferenceEquals guard preventing stale handler disposal
  • The TOCTOU fix in RemoveConnection

These are inherently harder to test deterministically, but even a simple unit test using a mock IConnection to verify the guard logic would be valuable given that these were real production race conditions.


7. .claude/settings.json included in this PR

The diff includes changes to .claude/settings.json enabling several new Claude Code plugins. This is unrelated to the bug fix and should ideally be on a separate commit/PR to keep the change history clean.


Summary

Area Assessment
Core deadlock fix (Proactor + ChannelAsync + IAmAChannelAsync) ✅ Correct
Connection pool async ConfigureAwait(false) ✅ Correct
ReferenceEquals guard (async + sync) ✅ Correct
TOCTOU fix in sync RemoveConnection ✅ Correct
DisposeAsync missing ConfigureAwait(false) ⚠️ Minor — should be fixed
DisposeAsync/Dispose coordination ⚠️ Potential double-dispose risk
Test assertions ⚠️ Consumer disposal not actually asserted
Test structure (xUnit1031, async constructor) 💡 Nit
Missing connection pool tests 💡 Nice-to-have
.claude/settings.json in PR 💡 Separate commit/PR preferred

The fix is sound and addresses the root cause correctly. The ConfigureAwait(false) omission on DisposeAsync and the Dispose/DisposeAsync coordination concern are the two items worth addressing before merge. Everything else is polish.

Review by Claude Sonnet 4.6 via Claude Code

@iancooper

Copy link
Copy Markdown
Member Author

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

🤖 Generated with Claude Code

iancooper and others added 2 commits March 12, 2026 18:17
…stency

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
DisposeAsync() and Dispose() now coordinate via a shared _disposed flag,
ensuring the consumer is only disposed once regardless of call order.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
codescene-delta-analysis[bot]

This comment was marked as outdated.

@claude

claude Bot commented Mar 12, 2026

Copy link
Copy Markdown

PR Review: [Fix] Shutdown Issues on RabbitMQ

Great fix! The root-cause analysis in the ADR is excellent and the changes are well-targeted. The shutdown deadlock and connection pool race conditions are real, well-diagnosed bugs and the solutions are appropriate. A few issues worth discussing before merging.


Bug: Dispose(false) call in DisposeAsync() is effectively a no-op

File: src/Paramore.Brighter/ChannelAsync.cs

The DisposeAsync() implementation sets _disposed = true, then calls Dispose(false). But Dispose(bool) starts with if (_disposed) return; — so it returns immediately. The call is dead code.

Any cleanup logic in Dispose(bool) that is not guarded by the if (disposing) branch (unmanaged resource cleanup, for instance) would silently be skipped after an async dispose.

The Microsoft guidance for implementing both IDisposable and IAsyncDisposable recommends a DisposeAsyncCore() virtual method so the two paths can share the _disposed guard without stepping on each other:

protected virtual async ValueTask DisposeAsyncCore()
{
    await _messageConsumer.DisposeAsync().ConfigureAwait(false);
}

public virtual async ValueTask DisposeAsync()
{
    await DisposeAsyncCore().ConfigureAwait(false);
    Dispose(false);   // runs unmanaged cleanup (not gated by _disposed)
    GC.SuppressFinalize(this);
}

private void Dispose(bool disposing)
{
    if (_disposed) return;
    _disposed = true;
    if (disposing)
    {
        _ = _messageConsumer.DisposeAsync(); // intentional fire-and-forget for sync callers
    }
    // unmanaged cleanup here
}

Right now ChannelAsync has no unmanaged resources so there is no visible bug today, but the pattern is incorrect and fragile.


CS4014 warning: unobserved fire-and-forget in sync Dispose(bool)

File: src/Paramore.Brighter/ChannelAsync.cs

The line _messageConsumer.DisposeAsync(); in the if (disposing) block will produce a CS4014 compiler warning (unawaited ValueTask). Even where the fire-and-forget is intentional, it should be suppressed explicitly:

_ = _messageConsumer.DisposeAsync();  // intentional: sync callers cannot await

Test: assertion does not verify consumer disposal

File: tests/.../When_channel_async_dispose_async_should_await_consumer_dispose.cs

The test name says it should verify "the consumer is properly disposed", but the only assertion is that a second DisposeAsync() call does not throw (idempotency). There is no assertion that DisposeAsync() was actually called on the consumer.

A thin test double (similar to TrackingChannelAsync in the other new test) that records calls to DisposeAsync() would make this test actually verify the stated behaviour.


Minor: Proactor test uses sync blocking with #pragma warning disable xUnit1031

File: tests/.../When_proactor_receives_quit_should_dispose_channel_async.cs

The test is a sync [Fact] that calls _performerTask.Wait(TimeSpan.FromSeconds(10)). Making it async Task avoids the deadlock risk and lets xUnit handle the timeout properly:

[Fact]
public async Task When_Proactor_Receives_Quit_Should_Dispose_Channel_Async()
{
    await _performerTask.WaitAsync(TimeSpan.FromSeconds(10));
    // ...
}

What is good

  • ADR documentation is thorough. The architecture diagrams comparing broken vs. fixed shutdown paths are particularly useful.
  • ConfigureAwait(false) additions are comprehensive and constitute genuine defence-in-depth.
  • ReferenceEquals guard in the ShutdownHandler closure is the correct fix for the stale-handler race.
  • TOCTOU fix in sync RemoveConnection — removing the outer ContainsKey check entirely (since TryRemoveConnection already uses TryGetValue) is cleaner than just moving it inside the lock.
  • Blast-radius analysis for the IAmAChannelAsync : IAsyncDisposable interface change is reassuring.
  • All 5 Channel.Dispose() call sites in Proactor.EventLoop() were updated consistently.

Summary

The core fixes (async dispose chain, connection pool TOCTOU, stale handler guard) are correct and address real production issues. The main concern is the Dispose(false) no-op in DisposeAsync() — it is an incorrect implementation of the dual-dispose pattern and should be corrected before merging. The test gaps are lower priority but worth fixing for long-term confidence in the disposal contract.

…ashSet corruption

BasicConsume/BasicConsumeAsync already triggers HandleBasicConsumeOk via
the RabbitMQ client library callback. The manual call caused a second
concurrent Add to DefaultBasicConsumer's internal HashSet<string>,
corrupting it and failing DLQ tests with InvalidOperationException.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@iancooper

Copy link
Copy Markdown
Member Author

Will close #3972

codescene-delta-analysis[bot]

This comment was marked as outdated.

@claude

claude Bot commented Mar 12, 2026

Copy link
Copy Markdown

Code Review — PR #4047: Fix Shutdown Issues on RabbitMQ

Good fix overall — the root-cause analysis in the ADR is thorough and the two core bugs (sync-over-async deadlock, connection pool race condition) are correctly addressed. A few things worth raising:


1. ChannelAsync.Dispose() still has the fire-and-forget problem (significant)

File: src/Paramore.Brighter/ChannelAsync.cs, line 222

private void Dispose(bool disposing)
{
    if (_disposed) return;
    _disposed = true;

    if (disposing)
    {
        _messageConsumer.DisposeAsync();   // ⚠️ ValueTask not observed
    }
}

The PR fixes the async dispose path correctly, but the synchronous Dispose() path still calls _messageConsumer.DisposeAsync() fire-and-forget. Anyone who calls Dispose() rather than DisposeAsync() (e.g. a using statement, or a non-async finalizer path in other message pumps) will get the original bug back.

Common mitigations:

  • Use BrighterAsyncContext.Run(() => _messageConsumer.DisposeAsync().AsTask()) in the sync path (consistent with how RmqMessageConsumer.Dispose() handles it).
  • Or warn callers via obsolescence: [Obsolete("Use DisposeAsync instead.")] on Dispose().
  • Or document explicitly that Dispose() does not guarantee async consumer teardown.

2. Dispose(false) call inside DisposeAsync() is confusing (minor)

File: src/Paramore.Brighter/ChannelAsync.cs, lines 196–204

public virtual async ValueTask DisposeAsync()
{
    if (_disposed) return;
    _disposed = true;

    await _messageConsumer.DisposeAsync().ConfigureAwait(false);
    Dispose(false);          // ← called with disposing=false, but _disposed is already true
    GC.SuppressFinalize(this);
}

Because _disposed is already true by the time Dispose(false) is called, the method returns immediately and does nothing. The call is a no-op. It's also semantically odd — disposing=false conventionally signals the finalizer path (unmanaged resources only).

Recommendation: remove the Dispose(false) call from DisposeAsync() entirely to make the intent clearer.


3. Test doesn't verify the core behaviour it claims to test (significant)

File: tests/.../When_channel_async_dispose_async_should_await_consumer_dispose.cs

// Assert — after async dispose, the consumer's timer is disposed,
// so receiving should throw or return empty; we verify the channel
// can be disposed a second time without throwing (idempotent)
var exception = await Record.ExceptionAsync(async () =>
    await ((IAsyncDisposable)channel).DisposeAsync()
);
Assert.Null(exception);

The test name says "should await consumer dispose", but the assertion only checks idempotency (no exception on double-dispose). It does not verify that the consumer's DisposeAsync() was actually awaited to completion before ChannelAsync.DisposeAsync() returned. A spy/mock consumer that sets a flag only after some async delay would make the assertion meaningful.


4. xUnit1031 suppression in test (minor)

File: tests/.../When_proactor_receives_quit_should_dispose_channel_async.cs, line 62–76

#pragma warning disable xUnit1031
[Fact]
public void When_Proactor_Receives_Quit_Should_Dispose_Channel_Async()
{
    _performerTask.Wait(TimeSpan.FromSeconds(10));
    ...
}
#pragma warning restore xUnit1031

The warning is suppressed rather than fixed. The simplest fix is to make the test method async Task and use await _performerTask.WaitAsync(TimeSpan.FromSeconds(10)), which is also safer (the current code will swallow a cancellation exception on timeout and proceed to the assertions).


5. Connection password included in pool key used for logging (minor)

File: src/Paramore.Brighter.MessagingGateway.RMQ.Async/RmqMessageGatewayConnectionPool.cs, line 191

private static string GetConnectionId(ConnectionFactory connectionFactory)
    =>
        $"{connectionFactory.UserName}.{connectionFactory.Password}.{connectionFactory.HostName}..."
            .ToLowerInvariant();

The plain-text password is embedded in the connection-ID string. This isn't directly logged here, but the string is used as a dictionary key and appears in the connectionId local captured by closures (including the ShutdownHandler). Any future logging of connectionId by mistake would leak credentials. Consider hashing the password component.


6. ResetConnectionAsync indentation inconsistency (nit)

File: src/Paramore.Brighter.MessagingGateway.RMQ.Async/RmqMessageGatewayConnectionPool.cs, line 88

ResetConnectionAsync is indented with 2 spaces while every other method in the file uses 4 spaces. Looks like an editor artefact.


Summary

# Severity File Issue
1 Significant ChannelAsync.cs:222 Sync Dispose() still fire-and-forgets DisposeAsync()
2 Minor ChannelAsync.cs:202 No-op Dispose(false) call inside DisposeAsync is confusing
3 Significant When_channel_async_dispose_async_should_await_consumer_dispose.cs Test only checks idempotency, not await semantics
4 Minor When_proactor_receives_quit_should_dispose_channel_async.cs:64 .Wait() should be await
5 Minor RmqMessageGatewayConnectionPool.cs:191 Password in connection key string
6 Nit RmqMessageGatewayConnectionPool.cs:88 Indentation inconsistency

The core fix (Proactor calling await Channel.DisposeAsync() and the async connection pool using SemaphoreSlim) is correct and well-motivated. Items 1 and 3 are worth addressing before merge.

- Mark ChannelAsync.Dispose() as [Obsolete] to guide callers to DisposeAsync
- Use BrighterAsyncContext.Run() in sync Dispose path instead of fire-and-forget
- Remove no-op Dispose(false) call from DisposeAsync with explanatory comment
- Improve AsyncChannelDisposalTests to verify consumer DisposeAsync is actually called
- Make ProactorQuitAsyncDisposalTests async, replacing .Wait() with await .WaitAsync()

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
codescene-delta-analysis[bot]

This comment was marked as outdated.

@claude

claude Bot commented Mar 13, 2026

Copy link
Copy Markdown

Code Review: PR 4047 - RMQ Shutdown Deadlock and Connection Pool Race Condition Fixes

Overall assessment: Well-reasoned and surgical fix for two real production bugs. Approve with a few items worth considering before merge.

The ADR correctly identifies the root causes, traces the deadlock chain step-by-step, and applies targeted fixes rather than a broad refactor. The primary fix (await Channel.DisposeAsync() in the Proactor) is minimal and correct.


Strengths

  • The core fix in Proactor.cs (MT_QUIT path: Channel.Dispose() to await Channel.DisposeAsync()) is exactly the right change - minimal, surgical, and addresses the deadlock at its origin.
  • Adding IAsyncDisposable to IAmAChannelAsync makes the async dispose contract explicit and propagates correct usage through the interface.
  • ConfigureAwait(false) throughout RmqMessageGatewayConnectionPool (async) is good defence-in-depth.
  • The ReferenceEquals guard in ShutdownHandler correctly prevents a stale event handler from disposing a freshly-created replacement connection.
  • The TOCTOU fix in sync RemoveConnection (move the existence check inside the lock) is correct.
  • PooledConnection being made a sealed record with required constructor arguments is a quality improvement.
  • The tests directly verify the two most important observable behaviors.

Issues Worth Addressing

  1. RmqMessageConsumer.DisposeAsync() calls Dispose(true) instead of base.DisposeAsync()

This is the most significant concern. RmqMessageConsumer.DisposeAsync() calls Dispose(true) which in turn calls base RmqMessageGateway.Dispose(bool), which still has sync-over-async (Channel?.AbortAsync().Wait() and RemoveConnectionAsync(...).GetAwaiter().GetResult()).

The ADR acknowledges this and argues the code is safe because it is no longer reached from within BrighterAsyncContext. That is true for the current Proactor path, but it is a fragile assumption. If any future code path calls DisposeAsync() from within a single-threaded context (another async pump, a test, a future consumer variant), the deadlock risk returns.

Suggestion: Replace Dispose(true) with await base.DisposeAsync() in RmqMessageConsumer.DisposeAsync(). This fully severs the async-to-sync dependency and makes the implementation correct by construction rather than by convention.

  1. RmqMessageGateway.DisposeAsync() has no double-dispose guard

RemoveConnectionAsync is called unconditionally even on a second dispose. In practice TryRemoveConnectionAsync is a no-op if the key is missing, so this is not harmful - but it diverges from the standard dispose pattern. A _disposed guard (mirroring what ChannelAsync.DisposeAsync() does correctly) would be cleaner.

  1. ShutdownHandler - cancelled token edge case

In the async ShutdownHandler, await s_lock.WaitAsync(e.CancellationToken) is used. If ShutdownEventArgs.CancellationToken is already cancelled when the handler fires (hard close scenario), this throws OperationCanceledException. In the RabbitMQ .NET client, exceptions from async event handlers are implementation-dependent and may be swallowed silently. Consider wrapping in a try/catch or using CancellationToken.None.

  1. Test coverage gap on the actual deadlock scenario

The Proactor test uses InMemoryMessageConsumer, not RmqMessageConsumer, and does not exercise BrighterAsyncContext's single-threaded scheduler mechanics. The strongest regression protection would be a test that runs a Proactor inside BrighterAsyncContext.Run() with a consumer whose DisposeAsync() posts continuations back to the context - the exact scenario that was deadlocking. Worth considering for the integration test suite.

  1. Missing test for the connection pool race condition fix (issue 4024)

The ReferenceEquals guard in ShutdownHandler has no unit test. A test that simulates a connection being replaced and verifies the stale handler does not dispose the new connection would be valuable.


Minor Notes

  • The [Obsolete] on ChannelAsync.Dispose() is good, but using statements on IDisposable will silently call the sync path with only a compiler warning. Worth documenting in migration notes.
  • CancelConsumerAsync checks _consumer != null and Channel != null together - if Channel is null but _consumer.IsRunning is true, the server-side consumer is not cancelled. Pre-existing and low-risk in the single-pump model.
  • IAmAChannelAsync now extends IAsyncDisposable, which is technically a breaking change for any external implementors. If this is a public extensibility point, it may warrant a note in the changelog.

Summary

The fix is sound and ready to merge. The primary concern worth addressing before or after merge: RmqMessageConsumer.DisposeAsync() calls Dispose(true) which re-introduces sync-over-async into the dispose chain. Replacing it with await base.DisposeAsync() would make the fix correct by construction. The missing unit tests for the pool race condition fix and the end-to-end deadlock scenario could be tracked as follow-up issues.

Review generated with Claude Code (https://claude.ai/code)

… guard, and shutdown handler safety

Replace Dispose(true) with await base.DisposeAsync() in RmqMessageConsumer to
sever the sync-over-async dependency. Add _disposed guard to RmqMessageGateway.
DisposeAsync() to prevent double-dispose. Use CancellationToken.None in async
ShutdownHandler to avoid OperationCanceledException on hard-close. Add regression
tests for deadlock scenario (BrighterAsyncContext.Run) and ReferenceEquals guard.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
codescene-delta-analysis[bot]

This comment was marked as outdated.

@claude

claude Bot commented Mar 13, 2026

Copy link
Copy Markdown

Code Review: Fix Shutdown Issues on Rabbit MQ

This PR addresses two real, well-documented bugs with clear root-cause analysis. The ADR is excellent — the shutdown chain diagrams and alternatives-considered section are exactly what a serious bug fix warrants. The approach is surgical and correct. Below are observations ranging from a compile error to minor style notes.

Bug: Deadlock test will not compile

File: tests/.../When_proactor_shutdown_inside_async_context_should_not_deadlock.cs

The test does: var completedInTime = completed.WaitAsync(TimeSpan.FromSeconds(10)).Result

Task.Run(() => void_lambda) returns Task (non-generic). Task.WaitAsync(TimeSpan) also returns Task. A non-generic Task has no .Result property — only Task<T> does. This will not compile.

The correct fix is Task.Wait(TimeSpan) which returns bool (true = completed, false = timed out):

bool completedInTime = completed.Wait(TimeSpan.FromSeconds(10));
Assert.True(completedInTime, "Proactor should shut down cleanly inside BrighterAsyncContext without deadlock");

Possible gap: async pool ShutdownHandler missing ReferenceEquals guard

The ADR (Decision 4) and the sync pool diff both show the ReferenceEquals guard being added to ShutdownHandler. The diff for RmqMessageGatewayConnectionPool.cs (Async) only shows ConfigureAwait(false) additions and the TOCTOU fix for RemoveConnectionAsync — the async ShutdownHandler guard is not visible in the diff.

If the async pool ShutdownHandler still checks by connectionId string only (without ReferenceEquals), the race condition from issue 4024 remains exploitable on the async transport path, which is the primary path for most modern Brighter deployments. Please verify this was updated.

[Obsolete] on Dispose() is too aggressive

Marking Dispose() as [Obsolete] will generate CS0618 warnings in all consumer code that uses using statements or calls .Dispose() on a channel — including user application code and existing Brighter tests. Dispose() is part of the IDisposable contract and is still perfectly valid for callers outside BrighterAsyncContext. The Proactor fix already eliminates the deadlock. Consider removing the [Obsolete] attribute and documenting the constraint in XML doc comments instead.

BrighterAsyncContext.Run() in sync Dispose — comment mismatch

The sync Dispose(bool) now uses BrighterAsyncContext.Run(async () => await _messageConsumer.DisposeAsync()), which does fully clean up async resources. But the [Obsolete] message says "may not fully clean up async resources" — the message and implementation are inconsistent. If the method is kept, the comment should accurately reflect the new behavior.

HandleBasicConsumeOk removal deserves a mention

Both RmqMessageConsumer.cs variants remove manual calls to HandleBasicConsumeOk / HandleBasicConsumeOkAsync. This is a meaningful behavioral change not directly related to the shutdown deadlock or pool race. The RabbitMQ .NET client v7+ raises BasicConsumeOk automatically so calling it manually was a double-trigger. The change is likely correct, but a sentence in the PR description would help reviewers and users upgrading understand it is intentional.

Test class names do not match file names (minor)

Brighter uses When_* for both file names and class names. The new tests use When_* file names but PascalCase class names (AsyncChannelDisposalTests, ProactorQuitAsyncDisposalTests, etc.). Worth aligning for consistency.

Stale-handler test tests the pattern, not the pool (observation)

When_stale_shutdown_handler_fires_should_not_dispose_replacement_connection.cs verifies the ReferenceEquals guard pattern using FakeConnection/FakePooledConnection without touching RmqMessageGatewayConnectionPool directly. A bug in the pool's actual ShutdownHandler implementation would go undetected. Not a blocker, but worth noting for future test coverage.

What is well done

  • Root-cause analysis in the ADR is rigorous and accurate — the sync-over-async deadlock chain is traced precisely.
  • All five Channel.Dispose() call sites in Proactor.cs were updated to await Channel.DisposeAsync() — none were missed.
  • ConfigureAwait(false) added consistently throughout the async pool.
  • TOCTOU fix in sync RemoveConnection — moving the check inside the lock — is correct.
  • _disposed guards for idempotent disposal on ChannelAsync and RmqMessageGateway are clean.
  • RmqMessageConsumer.DisposeAsync() now calls await base.DisposeAsync() instead of Dispose(true), correctly breaking the sync-over-async chain at its source.
  • The deadlock regression test using AsyncContinuationChannelAsync with Task.Yield() is a clever way to force scheduler continuations and reproduce the deadlock scenario.

The compile error in the deadlock test and the potential async pool ShutdownHandler gap are the two items that should be confirmed before merging.

@iancooper

Copy link
Copy Markdown
Member Author

@claude "Bug: Deadlock test will not compile." It compiles and tests locally and on GHA. Your feedback is wrong.

@claude

claude Bot commented Mar 13, 2026

Copy link
Copy Markdown

Claude finished @iancooper's task in 27s —— View job


You are correct, and I apologise. My review was wrong on this point.

The actual code at line 83 is:

bool finishedInTime = completed.Wait(TimeSpan.FromSeconds(30));

Task.Wait(TimeSpan) returns bool and is valid on a non-generic Task. My review incorrectly described the code as using .WaitAsync(TimeSpan).Result, which is not what the file contains. I should have read the actual file before stating there was a compile error.

The test is correct as written, and my feedback on that point should be disregarded.

@codescene-delta-analysis codescene-delta-analysis Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gates Passed
4 Quality Gates Passed

See analysis details in CodeScene

Quality Gate Profile: Clean Code Collective
Install CodeScene MCP: safeguard and uplift AI-generated code. Catch issues early with our IDE extension and CLI tool.

@claude

claude Bot commented Mar 16, 2026

Copy link
Copy Markdown

Code Review: [Fix] Shutdown Issues on RabbitMQ (#4047)

This is a well-structured, surgical fix for two real and painful bugs. The root cause analysis in the ADR is thorough and the fix correctly addresses the sync-over-async deadlock at its source rather than working around it. Overall the approach is sound — here are some observations:


🚨 Blocker: Unresolved Merge Conflict in specs/.current-spec

specs/.current-spec contains raw conflict markers:

<<<<<<< shutdown_rmq
0024-rmq_shutdown
=======
0024-resilience_pipeline_bug
>>>>>>> master

This file should not be merged in this state. It will break any spec tooling that reads .current-spec.


🔍 Undocumented Behavioural Change: HandleBasicConsumeOk Removal

Both RmqMessageConsumer (sync and async) remove the manual HandleBasicConsumeOk/HandleBasicConsumeOkAsync call after BasicConsume:

- _consumer.HandleBasicConsumeOk(_consumerTag);         // sync
- await _consumer.HandleBasicConsumeOkAsync(_consumerTag, cancellationToken); // async

The RabbitMQ client library calls these callbacks automatically when the broker acknowledges BasicConsume. Calling them manually was double-invoking the callback. This looks like a legitimate bug fix, but it's not mentioned in the PR description or the ADR. It deserves explicit documentation — is this intentional, and has it been tested?


⚠️ _disposed Flag is Not Thread-Safe

In both ChannelAsync and RmqMessageGateway, _disposed is a plain bool:

private bool _disposed;

Without volatile, the compiler/JIT may reorder reads/writes, and changes may not be visible across threads. Dispose is not supposed to be called concurrently, but a volatile bool or Interlocked.Exchange(ref _disposed, 1) pattern (using int) would be more defensive. This is a minor issue since the .NET guidelines say callers must not call Dispose concurrently, but it's worth noting.


⚠️ ChannelAsync.Dispose() Wrapping BrighterAsyncContext.Run() Has Edge Cases

The sync Dispose() now does:

BrighterAsyncContext.Run(async () => await _messageConsumer.DisposeAsync());

This is a significant behavioural change from the previous fire-and-forget. The [Obsolete] attribute is the right signal. One concern: if _messageConsumer.DisposeAsync() throws, the exception will now propagate synchronously through Dispose(), whereas before it was silently swallowed. This is better behaviour, but it could surface previously hidden errors in callers that didn't expect Dispose() to throw. Consider whether a try/catch with logging is appropriate here.


📝 specs/0024-rmq_shutdown/README.md Status Not Updated

The README still shows Status: Requirements and all checklist items unchecked, despite the implementation being complete and merged (or nearly so). Minor housekeeping, but it leaves the spec in a misleading state.


📝 Test for Connection Pool Race Uses Fakes Instead of Real Class

When_stale_shutdown_handler_fires_should_not_dispose_replacement_connection.cs validates the ReferenceEquals guard pattern in isolation using FakeConnection/FakePooledConnection — it does not exercise RmqMessageGatewayConnectionPool directly. This is reasonable given the difficulty of testing static state without a broker, and the pattern is clearly derived from the real code. Worth noting for future maintainers: if the pool's internal structure changes, these tests won't catch regressions automatically.


✅ What's Done Well

  • Root cause fix, not a workaround: Changing Channel.Dispose()await Channel.DisposeAsync() in the Proactor eliminates the deadlock at the source. The Task.Run alternative was correctly rejected.
  • ConfigureAwait(false) sweep: Comprehensive — all async methods in RmqMessageGatewayConnectionPool now have it.
  • TOCTOU fix in RemoveConnection: Moving the ContainsKey check inside the lock is the correct fix.
  • RmqMessageConsumer.DisposeAsync() now calls await base.DisposeAsync() instead of Dispose(true) — this is the key fix that breaks the sync-over-async chain.
  • ObjectDisposedException guard in ShutdownHandler: Catching this for the semaphore is a nice defensive touch for shutdown races.
  • Deadlock regression test: When_proactor_shutdown_inside_async_context_should_not_deadlock.cs is well-conceived — using Task.Yield() to force a continuation is exactly the right technique to prove the fix.
  • ADR quality: The architecture decision record is exceptionally thorough — the call chain diagrams (before/after), alternatives considered, and blast radius analysis are all excellent.

Summary: The core logic is correct and the fix addresses both bugs cleanly. The merge conflict in specs/.current-spec needs to be resolved before merging. The HandleBasicConsumeOk removal should be documented. Everything else is minor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

3 - Done Bug .NET Pull requests that update .net code V10.X

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] RMQ connection pool: stale ConnectionShutdown handler can dispose a newer connection

1 participant