Conversation
## Problem PR #4245 added a `hop_count` field to `GetMsg::Response` so terminal GET telemetry (`GetSuccess`) could report routing depth, but left `GetMsg::ResponseStreaming` — the variant used for GET responses over `streaming_threshold` (default 64 KB) — without the field. Worse, in the task-per-tx GET architecture the originator's GET reply is delivered straight to its driver's `pending_op_result` waiter rather than through the event loop's inbound dispatch, so the originator never runs `from_inbound_msg_v1` for its own reply. The implicit `GetSuccess` arm there only fires at relays forwarding an inline `Response{Found}`, and not at all for streamed responses (`ResponseStreaming` fell into the `Get(_) => Ignored` catch-all). Net effect: streamed GET successes — the large, data-rich contracts most likely to surface routing problems — produced no terminal GET telemetry at all, and the simulation regression test for this (`test_hop_count_populated_on_terminal_get_events`, #4250) had to stay `#[ignore]`d because it never saw a terminal GET event. ## Approach - Add `hop_count: usize` to `GetMsg::ResponseStreaming` and thread it through the driver's `Terminal::Streaming`, `classify`, the storer/ upgrade producer (`relay_send_found`), and the relay fork+pipe forward, exactly mirroring the established `PutMsg::ResponseStreaming` pattern. - Emit `GetSuccess` explicitly from the originator's GET driver (`drive_client_get_inner`) on client-visible success, carrying the wire-carried `hop_count` clamped to `max_hops_to_live` — mirroring how PUT emits from `finalize_put_at_originator`. This covers inline AND streaming, direct AND relayed GETs uniformly. - Add the matching `from_inbound_msg_v1` `ResponseStreaming` arm so relays emit `GetSuccess` for streamed responses too, restoring symmetry with the inline `Response{Found}` arm. - Bump `version`/`min-compatible-version` to 0.2.69 for the wire-format change (positional bincode field). ## Testing - `test_get_msg_response_streaming_hop_count_roundtrip` (get.rs): wire roundtrip for the new field. - `classify_response_streaming_is_streaming_terminal` (op_ctx_task.rs): the classifier preserves `hop_count` into `Terminal::Streaming`. - `test_streaming_get_emits_get_success_with_hop_count` (streaming_e2e): a deterministic 1MB relayed streaming GET emits a `GetSuccess` event with a populated, in-range `hop_count`. Fails before the fix (0 events). - Un-ignored `test_hop_count_populated_on_terminal_get_events` (#4250): rewired onto a deterministic controlled PUT-then-GET workload so it reliably produces terminal GET successes. Closes #4249 Closes #4250 [AI-assisted - Claude] Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Rule Review: No issues foundRules checked: git-workflow.md, code-style.md, testing.md, operations.md No rule violations detected. Key properties verified:
Rule review against |
Codex review (P2): the `from_inbound_msg_v1` `ResponseStreaming` arm
recorded `GetSuccess` on the stream HEADER, before the payload arrived.
Unlike inline `Response{Found}` (whose envelope carries the full payload,
so receipt IS success), a `ResponseStreaming` message is only metadata —
the payload streams separately and can still fail to be delivered,
claimed, assembled, or deserialized. Emitting success on the header would
report success for a payload the node may never receive, and risked
double-counting against the new driver-side emission.
Fix: remove that arm entirely (let `ResponseStreaming` stay in the
`Get(_) => Ignored` catch-all). The terminal streaming `GetSuccess` is now
emitted solely by the originator's GET driver, which fires only after
`build_host_response`'s local-store re-query confirms the assembled
payload is present (`host_result.is_ok()`) — so success accurately
reflects payload arrival and there is exactly one emission per peer.
Also: give `test_streaming_get_emits_get_success_with_hop_count` a unique
seed (was shared with `test_streaming_get_through_relay`). The shared seed
collided in seed-keyed global registries under parallel execution, making
the suite intermittently fail; the streaming-forward counter guard still
proves the streaming path fires regardless of seed.
[AI-assisted - Claude]
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…count
Codex review (P2): an inline `GetMsg::Response` reply flows through
`handle_pure_network_message_v1`, whose unconditional `from_inbound_msg_v1`
call already emits a `GetSuccess` for the inline `Response{Found}` arm
BEFORE the driver bypass forwards the reply. So the previous unconditional
driver-side emission produced a SECOND `GetSuccess` for the same
`(tx, peer)` on inline GETs — a double-count. Reproduced in
`test_hop_count_populated_on_terminal_get_events`: one peer logged 2
GetSuccess events for a single tx.
Fix: gate the driver-side emission to the streaming reply path
(`Terminal::is_streaming()`). `GetMsg::ResponseStreaming` is the only GET
reply `from_inbound_msg_v1` deliberately does not emit for (the header is
not proof the payload arrived), so the driver emission fills exactly that
gap without duplicating the inline path.
Also add a permanent no-double-count guard to the un-ignored #4250 test:
assert no peer emits more than one GetSuccess per transaction. Verified it
fails (max_dup=2) if the emission is ungated and passes (max_dup=1) when
gated to streaming.
[AI-assisted - Claude]
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Review summary (multi-lens self-review + external Codex pass)Ran a self-review across four lenses (code-first, testing-gaps, skeptical/adversarial, big-picture) plus two rounds of external Finding 1 (Codex P2) — premature streaming-header success. Finding 2 (Codex P2) — inline GetSuccess double-count. Investigation note. Instrumentation confirmed that in the task-per-tx architecture the originator's GET reply is delivered straight to its driver's Verification. [AI-assisted - Claude] |
|
Holding this PR for human review before merge. It turned out to require a wire-format / protocol change (bump to v0.2.69, This was produced by an autonomous fix agent whose guardrails said to stop and report on wire-format changes rather than auto-merge; flagging it here so @sanity can review the protocol-version bump and cross-version compatibility before it lands. Converted to draft + auto-merge disabled to take it out of the merge queue meanwhile. The code itself looks solid (3 regression tests, 2 Codex P2s fixed, un-ignores #4250) — this is purely the merge gate for a protocol change. [AI-assisted - Claude] |
Problem
PR #4245 added a
hop_countfield toGetMsg::Responseso terminal GET telemetry (GetSuccess) could report routing depth, but leftGetMsg::ResponseStreaming— the variant used for GET responses overstreaming_threshold(default 64 KB) — without the field.Worse, in the task-per-tx GET architecture the originator's GET reply is delivered straight to its driver's
pending_op_resultwaiter rather than through the event loop's inbound dispatch, so the originator never runsfrom_inbound_msg_v1for its own reply. The implicitGetSuccessarm there only fires at relays forwarding an inlineResponse{Found}, and not at all for streamed responses (ResponseStreamingfell into theGet(_) => Ignoredcatch-all).Net effect: streamed GET successes — the large, data-rich contracts most likely to surface routing problems — produced no terminal GET telemetry at all. The dashboard's GET success-rate and hop-depth panels were silently blank for any contract large enough to stream, and the simulation regression test for this (
test_hop_count_populated_on_terminal_get_events, #4250) had to stay#[ignore]d because it never saw a terminal GET event.Approach
The issue anticipated this ("If they don't emit
GetSuccess: route streaming completion throughGetSuccess(preferred)"). Empirically confirmed via instrumentation thatfrom_inbound_msg_v1is never reached for the originator's own GET reply, so the fix routes terminal GET telemetry through the driver:hop_count: usizetoGetMsg::ResponseStreamingand thread it through the driver'sTerminal::Streaming,classify, the storer/upgrade producer (relay_send_found), and the relay fork+pipe forward — mirroring the establishedPutMsg::ResponseStreamingpattern.GetSuccessexplicitly from the originator's GET driver (drive_client_get_inner) on client-visible success, carrying the wire-carriedhop_countclamped tomax_hops_to_live— mirroring how PUT emits fromfinalize_put_at_originator. Covers inline AND streaming, direct AND relayed GETs uniformly.from_inbound_msg_v1ResponseStreamingarm so relays emitGetSuccessfor streamed responses too, restoring symmetry with the inlineResponse{Found}arm.version/min-compatible-versionto0.2.69for the positional bincode wire-format change.No double-emission: the originator emits via the driver (relays never run that branch); relays emit via
from_inbound_msg_v1(the originator never reaches it for its own reply). Each peer attributes oneGetSuccessto itself, exactly as the inline path already does.Testing
test_get_msg_response_streaming_hop_count_roundtrip(get.rs): wire roundtrip for the new field across 5 hop-count values.classify_response_streaming_is_streaming_terminal(op_ctx_task.rs): classifier preserveshop_countintoTerminal::Streaming.test_streaming_get_emits_get_success_with_hop_count(streaming_e2e): a deterministic 1 MB relayed streaming GET emits aGetSuccessevent with a populated, in-rangehop_count; guarded by the streaming-forward counter so it can't pass on a local-cache shortcut. Verified to fail before the fix (0 GetSuccess events) and pass after.test_hop_count_populated_on_terminal_get_events(test: un-ignore test_hop_count_populated_on_terminal_get_events once a GET-producing workload is wired through TestConfig #4250): rewired onto a deterministic controlled PUT-then-GET workload (sparse topology forces a routed GET) so it reliably produces terminal GET successes. Stable across repeated runs.cargo fmt,cargo clippy --locked -- -D warnings, and the fullstreaming_e2e+ GET simulation suites are green locally; atest_direct_runner_determinismrun confirms the telemetry change is deterministic.Closes #4249
Closes #4250
[AI-assisted - Claude]