Skip to content

fix(executor): escalate UPDATE related-fetch to network when op_manager is wired#4006

Merged
iduartgomez merged 4 commits into
mainfrom
fix/update-related-fetch-network
May 2, 2026
Merged

fix(executor): escalate UPDATE related-fetch to network when op_manager is wired#4006
iduartgomez merged 4 commits into
mainfrom
fix/update-related-fetch-network

Conversation

@iduartgomez

Copy link
Copy Markdown
Collaborator

Summary

The bridged-path fetch_related_for_validation (used by every UpdateQuery upsert) was local-only: bridged_lookup_key + state_store.get, then immediately MissingRelated on miss. Cross-node UPDATE flows blew up here whenever the receiving node hadn't yet cached the related contract.

This is the root cause behind freenet/mail#80 — the recipient's inbox UPDATE always carried RequestRelated for the sender's AFT record contract, the receiver had never seen that contract, the merge errored, and the broadcast had to bounce through ResyncRequest recovery instead of applying directly. The mail UI worked around it by inline-bundling related state via RelatedStateAndDelta.

Fix

The PUT path already had a network-aware sibling (fetch_related_for_validation_network) that walks local_state_or_from_network. Rather than duplicate the dispatch, push the network-fallback logic into the bridged helper itself: try local first, then escalate via the same start_sub_op_get driver, gated on op_manager.is_some().

Mock executors and local-only test harnesses have op_manager == None, so the legacy MissingRelated outcome is preserved — pinned by the new test_update_missing_related_local_only_errors.

Test plan

  • cargo test -p freenet --lib pool_tests::related_contract_tests (13 tests, including new one) — all pass
  • Cross-node E2E via freenet/mail iso-nodes harness: alice on gw → bob on peer, bob renders the message after this change without the inline-bundled-related-state workaround
  • CI

Drive-by

crates/core/src/transport.rs:941 had a pre-existing let _ = ... that was tripping the workspace let_underscore_must_use clippy lint and blocking the pre-commit hook. Renamed to let _drain = ....

…er is wired

Bridged-path fetch_related_for_validation was local-only and returned MissingRelated immediately on miss. Cross-node UPDATE flows broke because the receiving node hadn't yet cached the related contract — most visibly in freenet/mail#80 where bob's inbox UPDATE always carried RequestRelated for alice's AFT record. Escalate to network GET via start_sub_op_get when op_manager is Some; mock executors with op_manager=None preserve legacy MissingRelated behavior. Add test_update_missing_related_local_only_errors to pin the mock path. E2E validated against the iso-nodes harness.
@github-actions

github-actions Bot commented May 2, 2026

Copy link
Copy Markdown
Contributor

HEAD has the updated documentation. The M in git status is a working-tree staged revert that's not part of the PR commits. The committed docs are correct.


Rule Review: Clean — one housekeeping note

Rules checked: git-workflow.md, code-style.md, testing.md, contracts.md
Files reviewed: 4 (contracts.md, runtime.rs, related_contract_tests.rs, transport.rs)

Warnings

None.

Info

  • crates/core/src/contract/executor/runtime.rs (working-tree only, not in the diff) — git status shows M .claude/rules/contracts.md as a staged revert of the documentation update. The HEAD commit is correct, but the staged working-tree change would undo the docs fix if accidentally committed. Stash or drop it before any follow-up commit.

Checklist against rules:

Check Result
fix: PR has regression test ✅ Three new tests cover local-only miss, network-escalation success, and network-escalation failure
Edge cases covered (happy + error paths) ✅ All three branches exercised
No .unwrap() in new production code map_err+? used throughout
No fire-and-forget spawns ✅ No new spawns
No catch-all _ => in exhaustive match SubOpGetOutcome matched explicitly (Found / NotFound / Infra)
contracts.md docs updated in PR ✅ Both "Design decisions" and the step-list updated in HEAD
RELATED_FETCH_TIMEOUT still bounds the network path ✅ New fetch_related_via_network call happens inside the existing fetch_all block, which is wrapped by the timeout
Thread-local test hook appropriately cfg(test) gated
transport.rs rename is test-only and harmless

Rule review against .claude/rules/. WARNING findings block merge.

Address rule-review feedback on PR #4006:

1. Add test_update_missing_related_escalates_to_network_when_stubbed: extracts the network-escalation branch into a free fn fetch_related_via_network with a thread-local override (#[cfg(test)] only). The test wires a stub that records every requested ContractInstanceId, then asserts the related id was queried through the network branch — pre-fix this assertion would fail because the bridged path returned MissingRelated synchronously without consulting the network.

2. Update .claude/rules/contracts.md: the previous 'Network fetch during broadcast processing is deferred' note documented the local-only behavior that this PR replaces. Updated to reflect the local-first + network-escalation flow, with the rationale (cross-node first-time observers) and timeout bound.
1. .claude/rules/contracts.md step 3 in 'WHEN validate_state returns RequestRelated' updated to reflect local-then-network fallback. The previous text said 'Look up each locally (lookup_key + state_store.get)' which described the pre-fix behavior.

2. fetch_related_via_network: explain why _tx must stay bound (load-bearing for the receiver under future refactors that may return non-Copy guards).
Address rule-review on PR #4006:

1. test_update_missing_related_network_fetch_fails — covers SubOpGetOutcome::NotFound/Infra mapping in fetch_related_via_network. Stub returns Err(MissingRelated); test asserts the stub was invoked exactly once and the upsert surfaces the failure as an error. Pairs with the existing happy-path test.

2. Doc link fix: NETWORK_FETCH_OVERRIDE → TEST_NETWORK_FETCH_OVERRIDE in fetch_related_via_network rustdoc.

3. Trim the _tx binding comment to match what Transaction's Copy semantics actually guarantee. Previous wording suggested a drop-guard concern that isn't present today.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant