Skip to content

Best-effort teardown ownership release + 'resources check' detects missing schema (closes #3123)#3127

Merged
jeremydmiller merged 2 commits into
mainfrom
fix-3123-teardown-ownership-best-effort
Jun 17, 2026
Merged

Best-effort teardown ownership release + 'resources check' detects missing schema (closes #3123)#3127
jeremydmiller merged 2 commits into
mainfrom
fix-3123-teardown-ownership-best-effort

Conversation

@jeremydmiller

@jeremydmiller jeremydmiller commented Jun 17, 2026

Copy link
Copy Markdown
Member

Problem

When host startup aborts (e.g. a Marten schema-migration error), teardown threw an uncaught Npgsql.PostgresException 42P01: relation "<schema>.wolverine_incoming_envelopes" does not exist from MessageStoreCollection.ReleaseAllOwnershipAsync, masking the real startup failure. Ancillary stores are registered before schema migration, so after a failed/partial startup their tables may not exist; the per-store UPDATE had no guard, the loop had no per-iteration try/catch (so the first failure stranded the other stores too), and StopAsync doesn't catch PostgresException.

Fix (two parts)

1. Best-effort teardown ownership release. MessageStoreCollection.ReleaseAllOwnershipAsync now wraps each store's release in a try/catch that logs at Debug and continues. Releasing ownership on teardown is optional — orphaned envelopes are reclaimed by the durability agent's recovery polling on the next live node — so a missing schema must neither strand the other stores nor surface as an unhandled teardown error.

2. resources check now detects a missing/un-provisioned schema. Making teardown stop throwing exposed that MessageStoreResource.Check (what resources check runs) only verified connectivity (CheckConnectivityAsync just opens/closes a connection). A reachable database with a dropped schema still passed — so the existing check_negative smoke test was getting its non-zero exit code only from the teardown crash, not from the check.

Added IMessageStoreAdmin.AssertStorageExistsAsync (default no-op, so non-RDBMS stores are unaffected), implemented for RDBMS stores by reusing the same Weasel schema diff the migration path already uses (SchemaMigration.DetermineAsync) and throwing when the schema is missing or out of date. MessageStoreResource.Check calls it after the connectivity check; MultiTenantedMessageStore forwards to every database. No Weasel change required. IStatefulResource.Check is only invoked by the resources check command (not normal startup), so there's no startup regression.

Tests

  • New unit test release_all_ownership_is_best_effort — a failing store neither aborts releasing the others nor throws.
  • The existing stateful_resource_smoke_tests.check_negative now passes because the check genuinely detects the missing schema (exit 1), not because teardown crashed.
  • Full SqlServerTests suite green (358 passed, 1 skipped); SqlServer + Postgres stateful-resource smoke tests green.

🤖 Generated with Claude Code

jeremydmiller and others added 2 commits June 17, 2026 10:44
When host startup aborts (e.g. a Marten schema-migration error), Wolverine's teardown
called MessageStoreCollection.ReleaseAllOwnershipAsync, which looped every registered
store — including ancillary stores whose schema (wolverine_incoming_envelopes) was never
created — and ran raw UPDATEs with no guard. The first store to fail (PostgreSQL 42P01)
threw out of the loop, so (a) the remaining stores never had their ownership released and
(b) StopAsync doesn't catch PostgresException, so the secondary teardown error propagated
and masked the real startup failure.

Releasing ownership on teardown is optional — any envelopes left as owner_id = nodeNumber
are reclaimed by the durability agent's recovery polling on the next live node. So wrap
each store's release in a try/catch that logs at Debug and continues to the next store.

Adds a unit regression test proving a failing store neither aborts releasing the others
nor throws.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…hema (#3123)

The teardown best-effort fix above means host disposal no longer throws when an
RDBMS message store's schema doesn't exist. That exposed that MessageStoreResource.Check
(what 'resources check' runs) only verified connectivity via CheckConnectivityAsync — a
reachable database with a dropped schema still passed. The SqlServer stateful-resource
smoke test 'check_negative' was only getting its non-zero exit code from the teardown
crash, not from the check itself.

Add IMessageStoreAdmin.AssertStorageExistsAsync (default no-op so non-RDBMS stores are
unaffected), implemented for RDBMS stores by reusing the same Weasel schema diff the
migration path already uses (SchemaMigration.DetermineAsync) — throwing when the schema
is missing or out of date. MessageStoreResource.Check now calls it after the connectivity
check, and MultiTenantedMessageStore forwards to every database. No Weasel change needed.

check_negative now passes because the check genuinely detects the missing schema. Full
SqlServerTests suite (358) and Postgres + SqlServer stateful-resource smoke tests green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@jeremydmiller jeremydmiller changed the title Make teardown ownership release best-effort per store (closes #3123) Best-effort teardown ownership release + 'resources check' detects missing schema (closes #3123) Jun 17, 2026
@jeremydmiller jeremydmiller merged commit f3aaafb into main Jun 17, 2026
24 of 25 checks passed
This was referenced Jun 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant