swarm/src/lib: Continue polling network when behaviour is blocked#2304
Merged
Conversation
With libp2p#2248 a connection task `await`s sending an event to the behaviour before polling for new events from the behaviour [1]. When `Swarm::poll` is unable to deliver an event to a connection task it returns `Poll::Pending` even though (a) polling `Swarm::network` might be able to make progress (`network_not_ready` being `false`) and (b) it does not register a waker to be woken up [2]. In combination this can lead to a deadlock where a connection task waits to send an event to the behaviour and `Swarm::poll` returns `Poll::Pending` failing to send an event to the connection task, not registering a waiker in order to be polled again. With this commit `Swarm::poll` will only return `Poll::Pending`, when failing to deliver an event to a connection task, if the network is unable to make progress (i.e. `network_not_ready` being `true`). In the long-run `Swarm::poll` should likely be redesigned, prioritizing the behaviour over the network, given the former is the control plane and the latter potentially yields new work from the outside. [1]: https://github.com/libp2p/rust-libp2p/blob/ca1b7cf043b4264c69b19fe75de488330a7a1f2f/core/src/connection/pool/task.rs#L224-L232 [2]: https://github.com/libp2p/rust-libp2p/blob/ca1b7cf043b4264c69b19fe75de488330a7a1f2f/swarm/src/lib.rs#L756-L783
Member
Author
|
@AgeManning could you validate whether this fixes the stalls you see on #2290? |
Contributor
|
This corrects the stalls I have been seeing on #2290 🎉 |
Member
Author
|
I am sorry for the delay here. I was sick for a week. I hope to be able to cut a new release candidate tomorrow. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
With #2248 a connection task
awaits sending an event to the behaviour before polling for new eventsfrom the behaviour 1.
When
Swarm::pollis unable to deliver an event to a connection task itreturns
Poll::Pendingeven though (a) pollingSwarm::networkmight beable to make progress (
network_not_readybeingfalse) and (b) itdoes not register a waker to be woken up 2.
In combination this can lead to a deadlock where a connection task waits
to send an event to the behaviour and
Swarm::pollreturnsPoll::Pendingfailing to send an event to the connection task, notregistering a waiker in order to be polled again.
With this commit
Swarm::pollwill only returnPoll::Pending, whenfailing to deliver an event to a connection task, if the network is
unable to make progress (i.e.
network_not_readybeingtrue).In the long-run
Swarm::pollshould likely be redesigned, prioritizingthe behaviour over the network, given the former is the control plane
and the latter potentially yields new work from the outside.