fix: handle bad response of 0 block from peer#8150
Conversation
3d7d877 to
fce951e
Compare
| export const MAX_BATCH_PROCESSING_ATTEMPTS = 3; | ||
| /** | ||
| * Consider batch faulty after downloading and processing this number of times | ||
| * as in https://github.com/ChainSafe/lodestar/issues/8147 we cannot proceed the sync chain if a peer return 0 blocks without error |
There was a problem hiding this comment.
If we get 0 blocks without error perhaps we can just retry the batch with a different peer instead of abandoning the sync chain and pulling the whole thing again?
There was a problem hiding this comment.
This is an easier fix for now. Lets merge as-is, we will need to revisit our syncing anyways.
There was a problem hiding this comment.
There was a problem hiding this comment.
If we get 0 blocks without error perhaps we can just retry the batch with a different peer instead of abandoning the sync chain and pulling the whole thing again?
I already did that, throwing error in beaconBlockxMaybeBlobsByRange means the chain will retry with another peer
will change the comment to avoid the misleading
**Motivation** - we got a lot of rate limited response from peers during syncing - we should be able to control which peers to connect to based on tracked active requests in PeerBalancer **Description** - enforce no more than MAX_CONCURRENT_REQUESTS per peer in sync - limit number of epochs downloaded ahead part of #8033 **Test results on fusaka-devnet-3** was able to sync `fusaka-devnet-3` 2 times using this branch along with #8150 <img width="851" height="346" alt="Screenshot 2025-08-09 at 14 45 32" src="https://github.com/user-attachments/assets/1f3afca0-13a6-4f7a-9c18-51d03cc34793" /> --------- Co-authored-by: Tuyen Nguyen <twoeths@users.noreply.github.com> Co-authored-by: Cayman <caymannava@gmail.com>
|
🎉 This PR is included in v1.34.0 🎉 |
Motivation
nhas 0 block so chain does not process anything (and note that this is wrong response from peers, most likely peer started from acheckpointSyncUrln+1cannot process due toUNKNOWN_BLOCK_PARENTResourceUnavailableinstead, and that's expected in the specDescription
MAX_BATCH_PROCESSING_ATTEMPTSto 0, ie if we cannot process a batch, we remove that sync chain and RangeSync will automatically add a new oneearliestAvailableSlotResourceUnavailableerror in that case according to the spec, we handle just in case. For lodestar, I tracked that in [fulu] lodestar returns 0 blocks without error #8149)Closes #8147
was able to sync
fusaka-devnet-32 times using this branch along with #8166