Improve block connection logging for easier debugging#4420
Open
Samarth-Goyal-1401 wants to merge 323 commits intolightningdevkit:masterfrom
Open
Improve block connection logging for easier debugging#4420Samarth-Goyal-1401 wants to merge 323 commits intolightningdevkit:masterfrom
Samarth-Goyal-1401 wants to merge 323 commits intolightningdevkit:masterfrom
Conversation
Using extend is slightly cleaner because it doesn't require mut on the parameters.
Extract error message strings into local variables before constructing ChannelError return tuples, reducing nesting and improving readability.
Tests previously selected a random block connect style to improve coverage. However, this randomness made test output non-deterministic and significantly cluttered diffs when comparing logs before and after changes. To address this, an LDK_TEST_CONNECT_STYLE environment variable is added to override the random selection and enable deterministic test runs. Note that broader coverage may be better achieved via targeted tests per connect style or a test matrix cycling through all styles, but this change focuses on improving reproducibility and debuggability.
…ip-validate-arc-loop Remove circular reference in `GossipVerifier`
Add a `RandomState` hasher implementation for tests that supports deterministic behavior via the `LDK_TEST_DETERMINISTIC_HASHES=1` environment variable. When set, hash maps use fixed keys ensuring consistent iteration order across test runs. By default, tests continue to use std's RandomState for random hashing, keeping test behavior close to production. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…-var Improve test determinism with configurable connect style and deterministic hash maps
Reading `ChannelMonitor`s on startup is one of the slowest parts of
LDK initialization. Now that we have an async `KVStore`, there's no
need for that, we can simply paralellize their loading, which we do
here.
Sadly, because Rust futures are pretty unergonomic, we have to add
some `unsafe {}` here, but arguing its fine is relatively
straightforward.
`tokio::spawn` can be use both to spawn a forever-running background task or to spawn a task which gets `poll`ed independently and eventually returns a result which the callsite wants. In LDK, we have only ever needed the first, and thus didn't bother defining a return type for `FutureSpawner::spawn`. However, in the next commit we'll start using `FutureSpawner` in a context where we actually do want the spawned future's result. Thus, here, we add a result output to `FutureSpawner::spawn`, mirroring the `tokio::spawn` API.
`MonitorUpdatingPersister::read_all_channel_monitors_with_updates` was made to do the IO operations in parallel in a previous commit, however in practice this doesn't provide material parallelism for large routing nodes. Because deserializing `ChannelMonitor`s is the bulk of the work (when IO operations are sufficiently fast), we end up blocked in single-threaded work nearly the entire time. Here, we add an alternative option - a new `read_all_channel_monitors_with_updates_parallel` method which uses the `FutureSpawner` to cause the deserialization operations to proceed in parallel.
When reading `ChannelMonitor`s from a `MonitorUpdatingPersister` on startup, we have to make sure to load any `ChannelMonitorUpdate`s and re-apply them as well. For users of async persistence who don't have any `ChannelMonitorUpdate`s (e.g. because they set `maximum_pending_updates` to 0 or, in the future, we avoid persisting updates for small `ChannelMonitor`s), this means two round-trips to the storage backend, one to load the `ChannelMonitor` and one to try to read the next `ChannelMonitorUpdate` only to have it fail. Instead, here, we use `KVStore::list` to fetch the list of stored `ChannelMonitorUpdate`s, which for async `KVStore` users allows us to parallelize the list of update fetching and the `ChannelMonitor` loading itself. Then we know exactly when to stop reading `ChannelMonitorUpdate`s, including reading none if there are none to read. This also avoids relying on `KVStore::read` correctly returning `NotFound` in order to correctly discover when to stop reading `ChannelMonitorUpdate`s.
When reading `ChannelMonitor`s from a `MonitorUpdatingPersister` on startup, we have to make sure to load any `ChannelMonitorUpdate`s and re-apply them as well. Now that we know which `ChannelMonitorUpdate`s to load from `list`ing the entries from the `KVStore` we can parallelize the reads themselves, which we do here. Now, loading all `ChannelMonitor`s from an async `KVStore` requires only three full RTTs - one to list the set of `ChannelMonitor`s, one to both fetch the `ChanelMonitor` and list the set of `ChannelMonitorUpdate`s, and one to fetch all the `ChannelMonitorUpdate`s (with the last one skipped when there are no `ChannelMonitorUpdate`s to read).
…ogging Add monitor_updating_paused logging
Persist as a bool so that we don't need to use Option<bool> when we will just inevitably unwrap_or(false) the field. This means that we won't be able to distinguish between an incoming htlc that has no TLV set, and one that has the TLV set with a false value in it. We accept this loss of information for the sake of simplicity in the codebase.
…llel-reads Parallelize `ChannelMonitor` loading from async `KVStore`s
…l-accountable Set and relay experimental accountable signal
…ng-docs Clarify splicing feature flag requirements
Necessary for the next commit and makes it easier to read.
We recently began reconstructing ChannelManager::decode_update_add_htlcs on startup, using data present in the Channels. However, we failed to prune HTLCs from this rebuilt map if a given HTLC was already forwarded to the outbound edge (we pruned correctly if the outbound edge was a closed channel, but not otherwise). Here we fix this bug that would have caused us to double-forward inbound HTLC forwards.
No need to iterate through all entries in the map, we can instead pull out the specific entry that we want.
We are working on removing the requirement of regularly persisting the ChannelManager, and as a result began reconstructing the manager's forwards maps from Channel data on startup in a recent PR, see cb398f6 and parent commits. At the time, we implemented ChannelManager::read to prefer to use the newly reconstructed maps, partly to ensure we have test coverage of the new maps' usage. This resulted in a lot of code that would deduplicate HTLCs that were present in the old maps to avoid redundant HTLC handling/duplicate forwards, adding extra complexity. Instead, always use the old maps in prod, but randomly use the newly reconstructed maps in testing, to exercise the new codepaths (see reconstruct_manager_from_monitors in ChannelManager::read).
…-race-some-fs Use `DirEntry::file_type` rather than `metadata...` in `list`
…-reconstruct-fwds-followup-2 Prevent HTLC double-forwards, prune forwarded onions
4354 follow up
Useful when using these macros in lightning-tests/upgrade_downgrade_tests
In the next commit, we want to dedup fields between the InboundUpdateAdd::Forwarded's HTLCPreviousHopData and the outer InboundHTLCOutput/Channel structs, since many fields are duplicated in both places at the moment. As part of doing this cleanly, we first refactor the method that retrieves these InboundUpdateAdds for reconstructing the set of pending HTLCs during ChannelManager deconstruction. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Previously, the InboundUpdateAdd::Forwarded enum variant contained an HTLCPreviousHopData, which had a lot of fields that were redundant with the outer InboundHTLCOutput/Channel structs. Here we dedup those fields, which is important because the pending InboundUpdateAdds are persisted whenever the ChannelManager is persisted.
Only run the full build matrix (Linux/Windows/macOS × stable/beta/MSRV) on pushes to main. PR and non-main push builds now only run Linux with the MSRV toolchain (1.75.0), which is the most important gate for catching issues. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Restrict CI build matrix to Linux+MSRV for PRs
We have the same error being returned from several `ChannelManager` message handlers, so we DRY it up. Doing so also lets us get rid of the inlined `format!` call, which for some reason prevents `rustfmt` from formatting code around it.
We have the same error being returned from several `ChannelManager` API methods, so we DRY it up. Doing so also lets us get rid of the inlined `format!` call, which for some reason prevents `rustfmt` from formatting code around it.
Previously, LDK would by default limit channels pre-Wumbo sizes and leave it to the user to bump `ChannelHandshakeLimits::max_funding_satoshis`. This has mostly historical reasons that aimed to allow limiting risk when Lightning and LDK were not as matured as today. By now, we do however expect ~all users to eventually want to bump this limit, and having them pick an arbitrary value (or pick a default ourselves) is kinda odd. Users that still want to limit risks have ample other means to do so, for example manually rejecting inbound channels via the manual-acceptence flow (via `Event::OpenChannelRequest`) or soon even limiting risk on a per-HTLC basis via general purpose HTLC interception. Furthermore, it turns out that our current implementation is wrong, as we do always announce `Wumbo`/`option_supports_large_channels` support via the `IN` feature in `ChannelManager` defaults, irrespective of what limit is configured. This has us announcing support for Wumbo channels to only then reject inbound requests in case a counterparty dares to actually try to open one. To address this, we here simply propose to drop the `max_funding_satoshis` field and corresponding checks entirely, and do what we've announced to the network for a long time: enable Wumbo by default.
…unding-satoshis Drop `ChannelHandshakeLimits::max_funding_satoshis`
Makes an upcoming commit cleaner: when we add a next_hop variable we want to distinguish it from the previous hop.
Adds support for passing user_channel_id into the pending_claims_to_replay vec, which is used by the ChannelManager on startup. For now user_channel_id is always set to None, but in upcoming commits we will set it to Some when the downstream channel is still open (this is currently a bug). Separated out here for reviewability.
We need these fields to generate a correct PaymentForwarded event if we need to claim this inbound HTLC backwards after restart and it's already been claimed and removed on the outbound edge.
Previously, we were spuriously using the upstream channel's info when we should've been using the downstream channel's.
Previously, if a forwarding node reloaded mid-HTLC-forward with a preimage in the outbound edge monitor and the outbound edge channel still open, and subsequently reclaimed the inbound HTLC backwards, the PaymentForwarded event would be missing the next_user_channel_id field.
Test that if we restart and had two inbound MPP-part HTLCs received over the same channel in the holding cell prior to shutdown, and we lost the holding cell prior to restart, those HTLCs will still be claimed backwards. Test largely written by Claude
We previously had 5 due to wanting some flexibility to bump versions in between, but eventually concluded that wasn't necessary.
…manager-splice-msg-handlers Rustfmt ChannelManager splice message handlers
…-dedup-htlc-fwd-data Dedup `InboundUpdateAdd::Forwarded` data; fix `PaymentForwarded` fields
|
I've assigned @tankyleo as a reviewer! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Improves visibility into block synchronization and transaction confirmation, making it easier to debug sync issues as requested in #2348 .
Changes:
Upgrades the log level
best_block_updated from TRACE to INFO.
Adds an INFO level log to
filtered_block_connected to track chain tip updates.
In
transactions_confirmed, if the number of transactions is small (<= 10), we now log the individual
txids at DEBUG level. This is extremely helpful for verifying exactly which transactions LDK is seeing confirming without enabling full trace logging.
Fixes #2348.