#4684 Phase E.2 — progression lock-wait sampler#4692
Merged
Conversation
Second slice of #4684 (item 5 of the parent spec). Marten-side only, no core hooks needed. PostgreSQL doesn't expose cumulative per-row lock-wait history, so the sampler approximates by polling pg_stat_activity at the existing --instrument-sample-seconds cadence and counting ungranted lock holders on mt_event_progression that aren't our own session: WHERE a.state = 'active' AND a.wait_event_type IS NOT NULL AND a.pid != pg_backend_pid() AND EXISTS ( SELECT 1 FROM pg_locks l JOIN pg_class c ON c.oid = l.relation WHERE l.pid = a.pid AND c.relname = 'mt_event_progression' AND l.granted = false) Per-sample (timestamp, waiter_count, max_wait_ms) rolls up at end-of-run to three numbers: max_concurrent_waiters, max_single_wait_ms, and the approximation observed_waiter_seconds = sum(waiter_count) * sample_interval. Zero across all three signals no contention; non-zero signals the new concurrent-rebuild cap (#420 in 2.9.0) is being exercised or that something outside the daemon is racing the row. ## What ships * `Instrumentation/ProgressionLockSampler.cs` — background poll loop, separate connection per tick (cheap, dev-tool only). Filters our own pid out so the sampler never observes itself. Best-effort: transient blip drops a sample. * `Instrumentation/LockWaitStats` record — the rolled-up summary that lands in RebuildInstrumentation.Snapshot. * `InstrumentationOptions.LockTracePath` + `--instrument-lock-trace <path>` — per-sample CSV trace, implies --instrument like its sibling. * RebuildInstrumentation owns the new sampler alongside the existing ProgressSampler / NpgsqlCommandCounter; one Activity span tags the rolled-up contention numbers. * RebuildCommand / StressCommand console output picks up three new rows; metrics.json grows a "progressionLockWaits" section. * README documents the metric and the interpretation. ## Verification Smoke at 22.5k events (4 tenants × 5k × 4 writers): * Lock trace CSV: 6 rows at 0.5s interval, all (0, 0) — no contention on a single rebuild against a fresh DB (expected). * Console: "lock max-waiters 0, waiter-sec 0.0" alongside the existing E.1 throughput / pg-cmd rows. * --instrument off still a true no-op — sampler / counter / activity never constructed. Real non-zero readings need parallel concurrent rebuilds racing the same row; the harness is wired to report them when the situation arises rather than try to manufacture contention in a smoke test. Phase E.3 (RecentlyUsedCache hit/miss) and E.4 (per-EvolveAsync lookup count) still need JasperFx-side hooks; pinned as separate PRs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Follow-on to the previous commit that landed the code; the README edits were attempted in the same batch but failed against an updated file. Splitting out as a focused README-only commit so the diff stays clear. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced Jun 10, 2026
Merged
This was referenced Jun 17, 2026
This was referenced Jun 18, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Second slice of #4684 (item 5 of the parent spec). Marten-side only, no core hooks needed.
Why
PostgreSQL doesn't expose cumulative per-row lock-wait history --
pg_stat_activityshows the current wait of each session. The sampler approximates by polling at the existing--instrument-sample-secondscadence and counting ungranted lock holders onmt_event_progressionthat aren't our own session:Per-sample
(timestamp, waiter_count, max_wait_ms)rolls up at end-of-run to three numbers:MaxConcurrentWaitersMaxSingleWaitMsObservedWaiterSecondssum(waiter_count_per_sample) * sample_interval-- approximation of cumulative time spent waitingZero across all three signals no contention; non-zero signals the new concurrent-rebuild cap (jasperfx#420 in 2.9.0) is being exercised, or that something outside the daemon is racing the row.
What ships
Instrumentation/ProgressionLockSampler.csInstrumentation/LockWaitStats(record in same file)RebuildInstrumentation.Snapshot.InstrumentationOptions.LockTracePath+--instrument-lock-trace <path>--instrumentlike its sibling flag.RebuildInstrumentationProgressSampler/NpgsqlCommandCounter; one Activity span tags the rolled-up contention numbers.RebuildCommand/StressCommandmetrics.jsongrows a\"progressionLockWaits\"section.src/Marten.ScaleTesting/README.mdVerification
Smoke at 22.5k events (4 tenants × 5k × 4 writers):
(0, 0)-- no contention on a single rebuild against a fresh DB (expected).lock max-waiters 0, waiter-sec 0.0alongside the existing E.1 throughput / pg-cmd rows.--instrumentoffReal non-zero readings need parallel concurrent rebuilds racing the same row; the harness is wired to report them when the situation arises rather than manufacture contention in a smoke test.
Remaining Phase E follow-ups
RecentlyUsedCachehit/miss counters. Requires a JasperFx-side instrumentation hook onRecentlyUsedCache<TId, TDoc>.EvolveAsynclookup count. Requires wrappingIDocumentOperationsquery/load calls inside projection user-code; likely a JasperFx-side hook onSliceGroup.Parent #4684 stays open until E.3-E.4 land.
🤖 Generated with Claude Code