Context: we are AlephZero and develop a substrate node with our custom finality gadget AlephBFT (in place of GRANDPA). We run at low block-time (1sec) and with AURA. Otherwise our spec is pretty standard. We currently depend on the polkadot-v0.9.13 branch.
We have been performing some simulated high-latency network tests and run into serious issues with the network reputation system. With low blocktime, and high network latency it is quite common to encounter forks and chain reorgs. In such cases our finality gadget occasionally requires importing some blocks which are out of the main branch, using the NetworkService::set_sync_fork_request call. This is completely analogous to GRANDPA, which also uses this call to fetch unknown blocks that are encountered during voting rounds.
The issue is that these calls make the nodes ban each other, here is how it looks like in logs:
2022-02-03 13:44:55.010 DEBUG tokio-runtime-worker peerset: Report 12D3KooWP3nEfSdusNZrAH7aZnwLkKizt1cmS5dR6LaM3vwEkLic: -2147483648 to -2147483648. Reason: Same block request multiple times, Disconnecting
After that happens, the nodes stop talking to each other, and unsuprisingly the finalization stalls, because there are not enough nodes to achieve consensus.
Internally we don't even call NetworkService::set_sync_fork_request multiple times on the same block. The ban is a result of some internal retries in the sync protocol. As a result, honest nodes ban each other (there are typically a lots of these bans happening).
Is there a way for us to get around this issue without making changes to the sync protocol?
Context: we are AlephZero and develop a substrate node with our custom finality gadget AlephBFT (in place of GRANDPA). We run at low block-time (1sec) and with AURA. Otherwise our spec is pretty standard. We currently depend on the
polkadot-v0.9.13branch.We have been performing some simulated high-latency network tests and run into serious issues with the network reputation system. With low blocktime, and high network latency it is quite common to encounter forks and chain reorgs. In such cases our finality gadget occasionally requires importing some blocks which are out of the main branch, using the
NetworkService::set_sync_fork_requestcall. This is completely analogous to GRANDPA, which also uses this call to fetch unknown blocks that are encountered during voting rounds.The issue is that these calls make the nodes ban each other, here is how it looks like in logs:
2022-02-03 13:44:55.010 DEBUG tokio-runtime-worker peerset: Report 12D3KooWP3nEfSdusNZrAH7aZnwLkKizt1cmS5dR6LaM3vwEkLic: -2147483648 to -2147483648. Reason: Same block request multiple times, DisconnectingAfter that happens, the nodes stop talking to each other, and unsuprisingly the finalization stalls, because there are not enough nodes to achieve consensus.
Internally we don't even call
NetworkService::set_sync_fork_requestmultiple times on the same block. The ban is a result of some internal retries in the sync protocol. As a result, honest nodes ban each other (there are typically a lots of these bans happening).Is there a way for us to get around this issue without making changes to the sync protocol?