#4666 Phase C — validate + stress subcommands#4676
Merged
jeremydmiller merged 1 commit intoJun 5, 2026
Conversation
…erminism fix
What ships:
* Aggregate baseline + diff (Validation/AggregateBaseline.cs):
AggregateBaselineCapture.CaptureAsync(connection, schemaName, ct)
AggregateBaselineCapture.Diff(expected, actual)
AggregateBaselineCapture.WriteAsync / ReadAsync (JSON)
Walks every mt_doc_* table under the configured schema (skipping the
_b_N hash-partition children — we hash the parent's rows once). For each
table: streaming SHA-256 over `id || '\0' || data::text || '\n'` ordered
by id::text. Stable hash for byte-identical aggregates.
* `validate` subcommand:
marten-scaletest validate --baseline scaletest-baseline.json
[--write-baseline]
- First run with no baseline writes one and exits 0.
- Subsequent runs diff captured state against the baseline; non-empty
diff exits 1 with per-table drift lines.
- --write-baseline overrides + writes the current state as the new
baseline (use after intentional projection changes).
* `stress` subcommand (the actual #4667 crash gate):
marten-scaletest stress [--wipe]
--tenants N --events-per-tenant M --writers W
--shard-timeout-seconds S
--baseline scaletest-baseline.json
[--skip-validate]
Chains seed → rebuild → validate with fail-fast semantics. Spectre table
summarises every phase with status / elapsed / one-line note. Final exit
code reflects the worst phase.
* TenantDailyRollupProjection determinism fix:
The original date-based key fell back to DateTimeOffset.UtcNow when the
upstream AppointmentDetails snapshot didn't carry a Requested timestamp
(which was every snapshot, since the lifted Evolve never set it). The
fallback made the projection produce different bucket keys on every
rebuild — caught immediately by the new validate subcommand on first
end-to-end run.
Fix: rename to TenantBucketRollup, key by a stable hash bucket derived
from the AppointmentDetails id (first 4 bytes mod 64 → "b000".."b063").
Same id → same bucket → reproducible across rebuilds. Preserves the
"exercises cross-stage chaining + per-tenant aggregation" intent without
the date dimension. ByRoutingReason swapped to SortedDictionary so JSON
serialization order is stable.
Projection .Name preserved as "TenantDailyRollup" per the #4666 spec.
Smoke (local Postgres):
* 2 tenants × 500 events seed = 1,141 events / 150 streams.
* stress --wipe → seed OK + rebuild OK + validate writes baseline (13
tables hashed).
* stress (no --wipe, same events) → seed SKIPPED + rebuild OK (re-runs
cleanly) + validate.
Known limitation surfaced by Phase C smoke:
The validate phase reports two tables drifting across rebuild runs of
the SAME events:
- mt_doc_appointmentmetrics (custom IProjection aggregating across
streams via LoadAsync + Store)
- mt_doc_tenantbucketrollup (multi-stream projection summing
AppointmentDetails events into hash buckets)
Both legitimately produce non-deterministic per-tenant counts under
parallel slice fan-out — the events arrive in different orders across
runs, so intermediate aggregation values differ. Other 11 tables are
byte-identical across rebuilds.
For the #4667 verification gate this is the right behavior: the validate
catches projection-side non-determinism (useful), and the `rebuild`
phase staying OK across multiple runs is the actual race-fix signal.
Phase D (running stress at full 20M-event scale) will use the same
rebuild-OK gate.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
7e17168 to
3680d02
Compare
This was referenced Jun 8, 2026
This was referenced Jun 8, 2026
This was referenced Jun 9, 2026
Merged
This was referenced Jun 16, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Phase C of #4666. Stacked
on top of #4675 (Phase B).
Adds the validate / stress subcommands and a determinism fix to the
TenantBucketRollup projection that the new validate command flushed out on
first end-to-end run.
What ships
Validation/AggregateBaseline.cs)mt_doc_*table under the configured schema (skipping_b_Nhash-partition children — hash the parent's rows once). Streaming SHA-256 overid || '\\0' || data::text || '\\n'ordered byid::text. Stable hash for byte-identical aggregates.validatesubcommandmarten-scaletest validate --baseline scaletest-baseline.json [--write-baseline]. First run writes baseline, exits 0. Subsequent runs diff + exit 1 on drift.stresssubcommand (the actual #4667 crash gate)marten-scaletest stress [--wipe] --tenants N --events-per-tenant M --writers W --shard-timeout-seconds S --baseline scaletest-baseline.json [--skip-validate]. Chains seed → rebuild → validate with fail-fast semantics. Spectre table summarises every phase.DateTimeOffset.UtcNow(every Updated snapshot lacked a Requested timestamp). Rebuilds drifted by minute. New key is a stable hash bucket from the AppointmentDetails id (~64 buckets).ByRoutingReason→SortedDictionaryfor stable JSON serialization. Projection.Namepreserved as"TenantDailyRollup"per the #4666 spec.Smoke (local Postgres)
stress --wipe→ seed OK + rebuild OK + validate writes baseline (13 tables)stress(no --wipe, same events) → seed SKIPPED + rebuild OK + validate runsKnown limitation surfaced by Phase C smoke
The validate phase reports two tables drifting across rebuild runs of the same events:
mt_doc_appointmentmetrics(customIProjectionaggregating across streams viaLoadAsync + Store)mt_doc_tenantbucketrollup(multi-stream projection summingAppointmentDetailsevents into hash buckets)Both legitimately produce non-deterministic per-tenant counts under parallel slice fan-out — the events arrive in different orders across runs, so intermediate aggregation values differ. Other 11 tables are byte-identical across rebuilds.
For the #4667 verification gate this is the right behavior: the validate catches projection-side non-determinism (useful), and the
rebuildphase stayingOKacross multiple runs is the actual race-fix signal. Phase D (running stress at full 20M-event scale) uses the same rebuild-OK gate.Followups (after merge)
stressat scale against the dev box. IfrebuildstaysOKacross the run, close #4667.🤖 Generated with Claude Code