[Fix] Shutdown Issues on Rabbit MQ#4047
Conversation
…sync on ChannelAsync - Test: When_ChannelAsync_Is_Disposed_Async_Then_Consumer_Is_Disposed - IAmAChannelAsync now extends IAsyncDisposable - ChannelAsync.DisposeAsync() awaits _messageConsumer.DisposeAsync() - All transports inherit this via ChannelAsync (no per-transport changes needed) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- ADR 0054 status: Proposed → Accepted - Updated Impact on Other Transports section with codebase analysis - Added tasks.md with 5 phases, 9 tasks - Approved design and tasks Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ispose() - Test: When_Proactor_Receives_Quit_Should_Dispose_Channel_Async - All 5 Channel.Dispose() calls in Proactor.EventLoop() changed to await Channel.DisposeAsync() - Eliminates sync-over-async deadlock in the Proactor shutdown path (#3684) - The entire dispose chain is now fully async within BrighterAsyncContext Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…4024) - ShutdownHandler now verifies via ReferenceEquals that the connection being shut down is the same instance currently in the pool before removing/disposing it. Prevents stale handlers from disposing replacement connections. - RemoveConnection moves ContainsKey check inside the lock, eliminating the TOCTOU race condition. TryRemoveConnection already handles missing keys safely via TryGetValue. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…onfigureAwait(false) (#4024, #3684) - ShutdownHandler now verifies via ReferenceEquals that the connection being shut down is the same instance currently in the pool before removing/disposing it. Prevents stale handlers from disposing replacement connections. - RemoveConnectionAsync moves ContainsKey check inside the semaphore, eliminating the TOCTOU race condition. - Added ConfigureAwait(false) to all 12 await calls as defence-in-depth against sync-over-async deadlocks — continuations will run on the thread pool rather than capturing BrighterSynchronizationContext. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
542 core tests pass, 73 RMQ integration tests pass (13 pre-existing failures unrelated to changes). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
PR Review: [Fix] Shutdown Issues on Rabbit MQThis is a well-reasoned, surgical fix for two long-standing bugs. The ADR is exemplary — the root cause analysis for the sync-over-async deadlock is thorough and accurate. Overall the approach is correct and I support merging with a few observations below. ✅ Strengths
🔍 Issues and Observations1.
|
| Area | Assessment |
|---|---|
| Core deadlock fix (Proactor + ChannelAsync + IAmAChannelAsync) | ✅ Correct |
Connection pool async ConfigureAwait(false) |
✅ Correct |
ReferenceEquals guard (async + sync) |
✅ Correct |
TOCTOU fix in sync RemoveConnection |
✅ Correct |
DisposeAsync missing ConfigureAwait(false) |
|
DisposeAsync/Dispose coordination |
|
| Test assertions | |
| Test structure (xUnit1031, async constructor) | 💡 Nit |
| Missing connection pool tests | 💡 Nice-to-have |
.claude/settings.json in PR |
💡 Separate commit/PR preferred |
The fix is sound and addresses the root cause correctly. The ConfigureAwait(false) omission on DisposeAsync and the Dispose/DisposeAsync coordination concern are the two items worth addressing before merge. Everything else is polish.
Review by Claude Sonnet 4.6 via Claude Code
Code reviewNo issues found. Checked for bugs and CLAUDE.md compliance. 🤖 Generated with Claude Code |
…stency Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
DisposeAsync() and Dispose() now coordinate via a shared _disposed flag, ensuring the consumer is only disposed once regardless of call order. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
PR Review: [Fix] Shutdown Issues on RabbitMQGreat fix! The root-cause analysis in the ADR is excellent and the changes are well-targeted. The shutdown deadlock and connection pool race conditions are real, well-diagnosed bugs and the solutions are appropriate. A few issues worth discussing before merging. Bug:
|
…ashSet corruption BasicConsume/BasicConsumeAsync already triggers HandleBasicConsumeOk via the RabbitMQ client library callback. The manual call caused a second concurrent Add to DefaultBasicConsumer's internal HashSet<string>, corrupting it and failing DLQ tests with InvalidOperationException. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Will close #3972 |
Code Review — PR #4047: Fix Shutdown Issues on RabbitMQGood fix overall — the root-cause analysis in the ADR is thorough and the two core bugs (sync-over-async deadlock, connection pool race condition) are correctly addressed. A few things worth raising: 1.
|
| # | Severity | File | Issue |
|---|---|---|---|
| 1 | Significant | ChannelAsync.cs:222 |
Sync Dispose() still fire-and-forgets DisposeAsync() |
| 2 | Minor | ChannelAsync.cs:202 |
No-op Dispose(false) call inside DisposeAsync is confusing |
| 3 | Significant | When_channel_async_dispose_async_should_await_consumer_dispose.cs |
Test only checks idempotency, not await semantics |
| 4 | Minor | When_proactor_receives_quit_should_dispose_channel_async.cs:64 |
.Wait() should be await |
| 5 | Minor | RmqMessageGatewayConnectionPool.cs:191 |
Password in connection key string |
| 6 | Nit | RmqMessageGatewayConnectionPool.cs:88 |
Indentation inconsistency |
The core fix (Proactor calling await Channel.DisposeAsync() and the async connection pool using SemaphoreSlim) is correct and well-motivated. Items 1 and 3 are worth addressing before merge.
- Mark ChannelAsync.Dispose() as [Obsolete] to guide callers to DisposeAsync - Use BrighterAsyncContext.Run() in sync Dispose path instead of fire-and-forget - Remove no-op Dispose(false) call from DisposeAsync with explanatory comment - Improve AsyncChannelDisposalTests to verify consumer DisposeAsync is actually called - Make ProactorQuitAsyncDisposalTests async, replacing .Wait() with await .WaitAsync() Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Code Review: PR 4047 - RMQ Shutdown Deadlock and Connection Pool Race Condition Fixes Overall assessment: Well-reasoned and surgical fix for two real production bugs. Approve with a few items worth considering before merge. The ADR correctly identifies the root causes, traces the deadlock chain step-by-step, and applies targeted fixes rather than a broad refactor. The primary fix (await Channel.DisposeAsync() in the Proactor) is minimal and correct. Strengths
Issues Worth Addressing
This is the most significant concern. RmqMessageConsumer.DisposeAsync() calls Dispose(true) which in turn calls base RmqMessageGateway.Dispose(bool), which still has sync-over-async (Channel?.AbortAsync().Wait() and RemoveConnectionAsync(...).GetAwaiter().GetResult()). The ADR acknowledges this and argues the code is safe because it is no longer reached from within BrighterAsyncContext. That is true for the current Proactor path, but it is a fragile assumption. If any future code path calls DisposeAsync() from within a single-threaded context (another async pump, a test, a future consumer variant), the deadlock risk returns. Suggestion: Replace Dispose(true) with await base.DisposeAsync() in RmqMessageConsumer.DisposeAsync(). This fully severs the async-to-sync dependency and makes the implementation correct by construction rather than by convention.
RemoveConnectionAsync is called unconditionally even on a second dispose. In practice TryRemoveConnectionAsync is a no-op if the key is missing, so this is not harmful - but it diverges from the standard dispose pattern. A _disposed guard (mirroring what ChannelAsync.DisposeAsync() does correctly) would be cleaner.
In the async ShutdownHandler, await s_lock.WaitAsync(e.CancellationToken) is used. If ShutdownEventArgs.CancellationToken is already cancelled when the handler fires (hard close scenario), this throws OperationCanceledException. In the RabbitMQ .NET client, exceptions from async event handlers are implementation-dependent and may be swallowed silently. Consider wrapping in a try/catch or using CancellationToken.None.
The Proactor test uses InMemoryMessageConsumer, not RmqMessageConsumer, and does not exercise BrighterAsyncContext's single-threaded scheduler mechanics. The strongest regression protection would be a test that runs a Proactor inside BrighterAsyncContext.Run() with a consumer whose DisposeAsync() posts continuations back to the context - the exact scenario that was deadlocking. Worth considering for the integration test suite.
The ReferenceEquals guard in ShutdownHandler has no unit test. A test that simulates a connection being replaced and verifies the stale handler does not dispose the new connection would be valuable. Minor Notes
Summary The fix is sound and ready to merge. The primary concern worth addressing before or after merge: RmqMessageConsumer.DisposeAsync() calls Dispose(true) which re-introduces sync-over-async into the dispose chain. Replacing it with await base.DisposeAsync() would make the fix correct by construction. The missing unit tests for the pool race condition fix and the end-to-end deadlock scenario could be tracked as follow-up issues. Review generated with Claude Code (https://claude.ai/code) |
… guard, and shutdown handler safety Replace Dispose(true) with await base.DisposeAsync() in RmqMessageConsumer to sever the sync-over-async dependency. Add _disposed guard to RmqMessageGateway. DisposeAsync() to prevent double-dispose. Use CancellationToken.None in async ShutdownHandler to avoid OperationCanceledException on hard-close. Add regression tests for deadlock scenario (BrighterAsyncContext.Run) and ReferenceEquals guard. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Code Review: Fix Shutdown Issues on Rabbit MQ This PR addresses two real, well-documented bugs with clear root-cause analysis. The ADR is excellent — the shutdown chain diagrams and alternatives-considered section are exactly what a serious bug fix warrants. The approach is surgical and correct. Below are observations ranging from a compile error to minor style notes. Bug: Deadlock test will not compile File: tests/.../When_proactor_shutdown_inside_async_context_should_not_deadlock.cs The test does:
The correct fix is Possible gap: async pool ShutdownHandler missing ReferenceEquals guard The ADR (Decision 4) and the sync pool diff both show the If the async pool ShutdownHandler still checks by
Marking BrighterAsyncContext.Run() in sync Dispose — comment mismatch The sync HandleBasicConsumeOk removal deserves a mention Both Test class names do not match file names (minor) Brighter uses Stale-handler test tests the pattern, not the pool (observation)
What is well done
The compile error in the deadlock test and the potential async pool ShutdownHandler gap are the two items that should be confirmed before merging. |
|
@claude "Bug: Deadlock test will not compile." It compiles and tests locally and on GHA. Your feedback is wrong. |
|
Claude finished @iancooper's task in 27s —— View job You are correct, and I apologise. My review was wrong on this point. The actual code at line 83 is: bool finishedInTime = completed.Wait(TimeSpan.FromSeconds(30));
The test is correct as written, and my feedback on that point should be disregarded. |
There was a problem hiding this comment.
Gates Passed
4 Quality Gates Passed
See analysis details in CodeScene
Quality Gate Profile: Clean Code Collective
Install CodeScene MCP: safeguard and uplift AI-generated code. Catch issues early with our IDE extension and CLI tool.
Code Review: [Fix] Shutdown Issues on RabbitMQ (#4047)This is a well-structured, surgical fix for two real and painful bugs. The root cause analysis in the ADR is thorough and the fix correctly addresses the sync-over-async deadlock at its source rather than working around it. Overall the approach is sound — here are some observations: 🚨 Blocker: Unresolved Merge Conflict in
|
Closes #4024 and #3684
This corrects both the issue that we were failing to exit correctly because we did not use DisposeAsync when closing an async channel from the Proactor (this may have caused errors in other messaginggateways), and because implementations of DisposeAsync were blocking, which stalls the Brighter single-threaded synchronization context. This forced users to CTRL+C to exit.
In addition, it fixed a race condition around pool closure.