fix(rpc): release shard lock before cold-storage awaits in get_filter_changes#145
Open
Evalir wants to merge 1 commit into
Open
fix(rpc): release shard lock before cold-storage awaits in get_filter_changes#145Evalir wants to merge 1 commit into
Evalir wants to merge 1 commit into
Conversation
…_changes get_filter_changes held a DashMap RefMut (a parking_lot RwLock write guard) across cold-storage .await points. On a current_thread tokio runtime, two concurrent polls that landed on the same shard could deadlock: the second task parked the OS thread waiting for the lock, leaving no way for the first task to resume. Refactored into snapshot -> cold I/O -> commit, with the RefMut scoped to two short critical sections (no .await inside either). Verified by temporarily forcing DashMap to 2 shards (~50% collision): - Pre-fix: test_rpc_filter_edge_cases hung 10 of 30 runs. - Post-fix: 0 of 30. Default shard count (~128 on dev, ~8 on 2-core CI) made the hang rare enough to slip through into main. Also adds .config/nextest.toml with a 5-minute per-test timeout so any future hang fails fast instead of burning the GitHub Actions 6-hour job ceiling (the failure mode of node-components#134). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Member
Author
This stack of pull requests is managed by Graphite. Learn more about stacking. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

get_filter_changes held a DashMap RefMut (a parking_lot RwLock write
guard) across cold-storage .await points. On a current_thread tokio
runtime, two concurrent polls that landed on the same shard could
deadlock: the second task parked the OS thread waiting for the lock,
leaving no way for the first task to resume.
Refactored into snapshot -> cold I/O -> commit, with the RefMut scoped
to two short critical sections (no .await inside either).
Verified by temporarily forcing DashMap to 2 shards (~50% collision):
CI) made the hang rare enough to slip through into main.
Also adds .config/nextest.toml with a 5-minute per-test timeout so any
future hang fails fast instead of burning the GitHub Actions 6-hour
job ceiling (the failure mode of node-components#134).
Co-Authored-By: Claude Opus 4.7 (1M context) noreply@anthropic.com