Implement graceful shutdown procedure #8851
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
BOLT 2 says:
We can abuse this requirement to implement a graceful shutdown procedure:
channel_reestablishmessages for any channels that have exactly zero outstanding HTLCs.This PR has two objectives:
snub-idle-channelsdynamic config variable that, when set totrue, makes lightningd:channeldsubdaemons for channels that have no outstanding HTLCs;channel_reestablishmessages for channels that have no outstanding HTLCs,contrib/lightning-graceful-stop.shscript that utilizessnub-idle-channelsto implement the graceful shutdown procedure outlined above.I have tested this graceful shutdown procedure on my own production node with great success. In under a minute my node dropped from over 30 outstanding HTLCs to 14, all of which were "stuck." The shutdown script reported that the next expiration was 140 blocks away, giving me plenty of time to power off my node and perform a hardware upgrade. If I had been willing to wait for all of my outstanding HTLCs to be resolved, then I could have stopped my node indefinitely with no danger of any forced unilateral closures. (Of course, my peers could still voluntarily choose to unilaterally close my channels with them if they grew tired of waiting for my node to reappear in the network, but that's not the concern that graceful shutdown is attempting to address.)
Note that there is still one edge case that this graceful shutdown strategy doesn't solve. If a peer has transmitted a new commitment containing a new HTLC, but we never transmitted our own new commitment containing that same new HTLC (either because we never received the peer's new commitment or because we restarted before we could send our own new commitment), then we will not know about (or will have forgotten) the new HTLC, and we will believe that the channel is safe to snub even though the peer would retransmit their new commitment containing the new HTLC if we allowed them to reestablish the channel. I am not certain, but it may be possible to use the fields in the
channel_reestablishmessage received from the peer to ascertain whether the peer has new HTLCs that they need to retransmit to us, and if they do, then we shouldn't snub the channel even if we are currently aware of no outstanding HTLCs in it.Checklist
Before submitting the PR, ensure the following tasks are completed. If an item is not applicable to your PR, please mark it as checked:
tools/lightning-downgrade