Skip to content

swarm/src/lib: Continue polling network when behaviour is blocked#2304

Merged
mxinden merged 4 commits into
libp2p:masterfrom
mxinden:not-pending-on-network-progress
Oct 26, 2021
Merged

swarm/src/lib: Continue polling network when behaviour is blocked#2304
mxinden merged 4 commits into
libp2p:masterfrom
mxinden:not-pending-on-network-progress

Conversation

@mxinden

@mxinden mxinden commented Oct 20, 2021

Copy link
Copy Markdown
Member

With #2248 a connection task
awaits sending an event to the behaviour before polling for new events
from the behaviour 1.

When Swarm::poll is unable to deliver an event to a connection task it
returns Poll::Pending even though (a) polling Swarm::network might be
able to make progress (network_not_ready being false) and (b) it
does not register a waker to be woken up 2.

In combination this can lead to a deadlock where a connection task waits
to send an event to the behaviour and Swarm::poll returns
Poll::Pending failing to send an event to the connection task, not
registering a waiker in order to be polled again.

With this commit Swarm::poll will only return Poll::Pending, when
failing to deliver an event to a connection task, if the network is
unable to make progress (i.e. network_not_ready being true).

In the long-run Swarm::poll should likely be redesigned, prioritizing
the behaviour over the network, given the former is the control plane
and the latter potentially yields new work from the outside.

With libp2p#2248 a connection task
`await`s sending an event to the behaviour before polling for new events
from the behaviour [1].

When `Swarm::poll` is unable to deliver an event to a connection task it
returns `Poll::Pending` even though (a) polling `Swarm::network` might be
able to make progress (`network_not_ready` being `false`) and (b) it
does not register a waker to be woken up [2].

In combination this can lead to a deadlock where a connection task waits
to send an event to the behaviour and `Swarm::poll` returns
`Poll::Pending` failing to send an event to the connection task, not
registering a waiker in order to be polled again.

With this commit `Swarm::poll` will only return `Poll::Pending`, when
failing to deliver an event to a connection task, if the network is
unable to make progress (i.e. `network_not_ready` being `true`).

In the long-run `Swarm::poll` should likely be redesigned, prioritizing
the behaviour over the network, given the former is the control plane
and the latter potentially yields new work from the outside.

[1]: https://github.com/libp2p/rust-libp2p/blob/ca1b7cf043b4264c69b19fe75de488330a7a1f2f/core/src/connection/pool/task.rs#L224-L232
[2]: https://github.com/libp2p/rust-libp2p/blob/ca1b7cf043b4264c69b19fe75de488330a7a1f2f/swarm/src/lib.rs#L756-L783
@mxinden

mxinden commented Oct 20, 2021

Copy link
Copy Markdown
Member Author

@AgeManning could you validate whether this fixes the stalls you see on #2290?

@AgeManning

Copy link
Copy Markdown
Contributor

This corrects the stalls I have been seeing on #2290 🎉

@mxinden mxinden marked this pull request as ready for review October 22, 2021 09:55
@mxinden

mxinden commented Oct 26, 2021

Copy link
Copy Markdown
Member Author

I am sorry for the delay here. I was sick for a week. I hope to be able to cut a new release candidate tomorrow.

@mxinden mxinden merged commit 4ef5738 into libp2p:master Oct 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants