Conversation
|
OK this was weirdly hard to verify -- I spun up a scratch machine (this option is Linux specific), started |
|
Not sure what to do about the failing tests. cc @benesch wouldn't mind a look at the change itself / any tips for CI here |
|
CI here has been broken for years I’m afraid! Recommend filing the patch upstream, where CI does work. Then the only thing that can bite us is a rebase issue, which is fairly unlikely, and we get plenty of end-to-end coverage of this library in Materialize. |
|
rust-postgres#1007 merged upstream. I didn't cherry-pick since the change is so few lines / it didn't apply cleanly... not sure if that's cool or not. Otherwise, I think we're good to go here, though I'm not able to merge myself (cc @benesch) |
|
Thanks, @pH14! I figured I'd just integrate the latest upstream changes (including yours) into the master branch, following the instructions here: https://github.com/MaterializeInc/rust-postgres#integrating-upstream-changes. I "fixed" the branch protection settings (I think) to not require CI to pass. I think a |
Currently tokio-postgres exposes two knobs to maintain healthy connections: a connect timeout and keep-alives settings that apply directly to the TCP socket. These cover the cases of connection establishment and for maintaining idle connections, but do not cover the case of an active/established socket that does not hear a response from the receiver for a long period of time. By default it can take 15-20m (15 retries with exponential backoff. the # of retries is controlled by
tcp_retries2) for a connection to be killed under these circumstances.The generally recommended solution to this problem is to set
TCP_USER_TIMEOUTto cap the total amount of time a socket waits to receive a response after it is established. https://blog.cloudflare.com/when-tcp-sockets-refuse-to-die/ has a great writeup of this case under "Busy ESTAB socket is not forever".I haven't found a super satisfying way of testing this yet, but staging it here for now.