Best-effort teardown ownership release + 'resources check' detects missing schema (closes #3123)#3127
Merged
Merged
Conversation
When host startup aborts (e.g. a Marten schema-migration error), Wolverine's teardown called MessageStoreCollection.ReleaseAllOwnershipAsync, which looped every registered store — including ancillary stores whose schema (wolverine_incoming_envelopes) was never created — and ran raw UPDATEs with no guard. The first store to fail (PostgreSQL 42P01) threw out of the loop, so (a) the remaining stores never had their ownership released and (b) StopAsync doesn't catch PostgresException, so the secondary teardown error propagated and masked the real startup failure. Releasing ownership on teardown is optional — any envelopes left as owner_id = nodeNumber are reclaimed by the durability agent's recovery polling on the next live node. So wrap each store's release in a try/catch that logs at Debug and continues to the next store. Adds a unit regression test proving a failing store neither aborts releasing the others nor throws. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…hema (#3123) The teardown best-effort fix above means host disposal no longer throws when an RDBMS message store's schema doesn't exist. That exposed that MessageStoreResource.Check (what 'resources check' runs) only verified connectivity via CheckConnectivityAsync — a reachable database with a dropped schema still passed. The SqlServer stateful-resource smoke test 'check_negative' was only getting its non-zero exit code from the teardown crash, not from the check itself. Add IMessageStoreAdmin.AssertStorageExistsAsync (default no-op so non-RDBMS stores are unaffected), implemented for RDBMS stores by reusing the same Weasel schema diff the migration path already uses (SchemaMigration.DetermineAsync) — throwing when the schema is missing or out of date. MessageStoreResource.Check now calls it after the connectivity check, and MultiTenantedMessageStore forwards to every database. No Weasel change needed. check_negative now passes because the check genuinely detects the missing schema. Full SqlServerTests suite (358) and Postgres + SqlServer stateful-resource smoke tests green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This was referenced Jun 17, 2026
Merged
This was referenced Jun 22, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
When host startup aborts (e.g. a Marten schema-migration error), teardown threw an uncaught
Npgsql.PostgresException 42P01: relation "<schema>.wolverine_incoming_envelopes" does not existfromMessageStoreCollection.ReleaseAllOwnershipAsync, masking the real startup failure. Ancillary stores are registered before schema migration, so after a failed/partial startup their tables may not exist; the per-storeUPDATEhad no guard, the loop had no per-iteration try/catch (so the first failure stranded the other stores too), andStopAsyncdoesn't catchPostgresException.Fix (two parts)
1. Best-effort teardown ownership release.
MessageStoreCollection.ReleaseAllOwnershipAsyncnow wraps each store's release in atry/catchthat logs atDebugand continues. Releasing ownership on teardown is optional — orphaned envelopes are reclaimed by the durability agent's recovery polling on the next live node — so a missing schema must neither strand the other stores nor surface as an unhandled teardown error.2.
resources checknow detects a missing/un-provisioned schema. Making teardown stop throwing exposed thatMessageStoreResource.Check(whatresources checkruns) only verified connectivity (CheckConnectivityAsyncjust opens/closes a connection). A reachable database with a dropped schema still passed — so the existingcheck_negativesmoke test was getting its non-zero exit code only from the teardown crash, not from the check.Added
IMessageStoreAdmin.AssertStorageExistsAsync(default no-op, so non-RDBMS stores are unaffected), implemented for RDBMS stores by reusing the same Weasel schema diff the migration path already uses (SchemaMigration.DetermineAsync) and throwing when the schema is missing or out of date.MessageStoreResource.Checkcalls it after the connectivity check;MultiTenantedMessageStoreforwards to every database. No Weasel change required.IStatefulResource.Checkis only invoked by theresources checkcommand (not normal startup), so there's no startup regression.Tests
release_all_ownership_is_best_effort— a failing store neither aborts releasing the others nor throws.stateful_resource_smoke_tests.check_negativenow passes because the check genuinely detects the missing schema (exit 1), not because teardown crashed.SqlServerTestssuite green (358 passed, 1 skipped); SqlServer + Postgres stateful-resource smoke tests green.🤖 Generated with Claude Code