Skip to content

fix(@libp2p/tcp): race condition in onSocket#2763

Merged
achingbrain merged 5 commits into
libp2p:mainfrom
tabcat:socket-race
Oct 23, 2024
Merged

fix(@libp2p/tcp): race condition in onSocket#2763
achingbrain merged 5 commits into
libp2p:mainfrom
tabcat:socket-race

Conversation

@tabcat

@tabcat tabcat commented Oct 11, 2024

Copy link
Copy Markdown
Member

Title

fix(@libp2p/tcp): race condition in onSocket

Description

This fixes a race condition in the onSocket listener method.
onSocket gets a net.Socket parameter that needs to be closed later
before the tcp server can close.

The net.Socket is used to create a MultiaddrConnection but the
connection is not tracked with this.connections until after it has been
upgraded.

If the tcp listener.close is called while a the MultiaddrConnection is waiting
to be upgraded, then the MultiaddrConnection socket cannot be closed as
it does not exist in this.connections.

Instead of adding the MultiaddrConnection to this.connections on
creation I decided to create a separate property named
this.maConnections. I decided to track the non-upgraded connections
separately because I was unsure how this.connections must be used with other things
like metrics.

Notes & open questions

Closes #2760

Change checklist

  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation if necessary (this includes comments as well)
  • I have added tests that prove my fix is effective or that my feature works

This fixes a race condition in the onSocket listener method.
onSocket gets a net.Socket parameter that needs to be closed later
before the tcp server can close.
The net.Socket is used to create a MultiaddrConnection but the
connection is not tracked with this.connections until after it has been
upgraded.
If the tcp listener.close is called while a the MultiaddrConnection is waiting
to be upgraded, then the MultiaddrConnection socket cannot be closed as
it does not exist in this.connections.

Instead of adding the MultiaddrConnection to this.connections on
creation I decided to create a separate property named
this.maConnections. I decided to track the non-upgraded connections
separately because I was unsure how this.connections must be used with other things
like metrics.
@tabcat tabcat marked this pull request as ready for review October 11, 2024 12:16
@tabcat tabcat requested a review from a team as a code owner October 11, 2024 12:16

@achingbrain achingbrain left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is necessary. When we close the transport and call this.pause(true) we close the socket server which should destroy all open sockets.

This is the same as calling maConn.abort(err) - the interesting thing it does is call socket.destroy() - see toMultiaddrConnection for this.

It's possible there's something missing though - could you please add a test to show the problem you're seeing?

Comment thread packages/transport-tcp/src/listener.ts Outdated
@achingbrain

achingbrain commented Oct 11, 2024

Copy link
Copy Markdown
Member

Actually server.close says:

Stops the server from accepting new connections and keeps existing connections.

So yes, it is necessary to close un-upgraded connections separately, but we should probably just close them immediately if they fail to upgrade, if we're not doing that already.

@tabcat

tabcat commented Oct 11, 2024

Copy link
Copy Markdown
Member Author

Yes, they are aborted inside of the catch block of the upgrade call currently.

The problem is that maConn is added to this.connections inside of the then block of the upgrade. If close is called before the then block is ran then the connection cannot be closed and this.server.close never resolves.

@tabcat

tabcat commented Oct 11, 2024

Copy link
Copy Markdown
Member Author

It's possible there's something missing though - could you please add a test to show the problem you're seeing?

I can add a test. The test that I have been using exists in https://github.com/tabcat/delay5 which I should be able to replicate in a separate PR.

@tabcat tabcat left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks much cleaner :)

Comment thread packages/transport-tcp/src/listener.ts
@achingbrain

Copy link
Copy Markdown
Member

Sorry for jumping in, I thought I'd just drag this across the line.

This has shown up a slightly different problem in that we currently close the socket server but wait for the "close" event via the callback passed to this.server.close - the "close" event is only emitted once all connections have closed - if the connections don't close the promise will never resolve which could be what's causing ChainSafe/lodestar#6053

The callback was added in #2058 it was not in the original PR that added the "pause" functionality - libp2p/js-libp2p-tcp#218

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

race condition causes @libp2p/tcp close method to hang

2 participants