Skip to content

fix: handle bad response of 0 block from peer#8150

Merged
wemeetagain merged 4 commits into
peerDASfrom
te/handle_0_blocks_batch
Aug 11, 2025
Merged

fix: handle bad response of 0 block from peer#8150
wemeetagain merged 4 commits into
peerDASfrom
te/handle_0_blocks_batch

Conversation

@twoeths

@twoeths twoeths commented Aug 8, 2025

Copy link
Copy Markdown
Contributor

Motivation

  • sync chain was stuck because some peers incorrectly response 0 block for a batch when they don't have it, see [fulu] Sync stuck due to peers returning 0 block for beacon_block_by_range request #8147
    • batch n has 0 block so chain does not process anything (and note that this is wrong response from peers, most likely peer started from a checkpointSyncUrl
    • batch n+1 cannot process due to UNKNOWN_BLOCK_PARENT
    • then SyncChain stuck
  • sometimes other clients throw ResourceUnavailable instead, and that's expected in the spec

Description

  • set MAX_BATCH_PROCESSING_ATTEMPTS to 0, ie if we cannot process a batch, we remove that sync chain and RangeSync will automatically add a new one
  • do not sync from peers without earliestAvailableSlot
  • throw error if peers return 0 blocks (peers should have returned ResourceUnavailable error in that case according to the spec, we handle just in case. For lodestar, I tracked that in [fulu] lodestar returns 0 blocks without error #8149)

Closes #8147

was able to sync fusaka-devnet-3 2 times using this branch along with #8166

Screenshot 2025-08-09 at 14 45 32

@twoeths twoeths changed the title [fulu] handle 0 block batch fix: handle 0 block batch Aug 8, 2025
@twoeths twoeths force-pushed the te/handle_0_blocks_batch branch from 3d7d877 to fce951e Compare August 8, 2025 08:47
@twoeths twoeths changed the title fix: handle 0 block batch fix: handle bad response of 0 block from peer Aug 8, 2025
@twoeths twoeths marked this pull request as ready for review August 9, 2025 07:46
@twoeths twoeths requested a review from a team as a code owner August 9, 2025 07:46
export const MAX_BATCH_PROCESSING_ATTEMPTS = 3;
/**
* Consider batch faulty after downloading and processing this number of times
* as in https://github.com/ChainSafe/lodestar/issues/8147 we cannot proceed the sync chain if a peer return 0 blocks without error

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we get 0 blocks without error perhaps we can just retry the batch with a different peer instead of abandoning the sync chain and pulling the whole thing again?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an easier fix for now. Lets merge as-is, we will need to revisit our syncing anyways.

@wemeetagain wemeetagain Aug 11, 2025

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we get 0 blocks without error perhaps we can just retry the batch with a different peer instead of abandoning the sync chain and pulling the whole thing again?

I already did that, throwing error in beaconBlockxMaybeBlobsByRange means the chain will retry with another peer
will change the comment to avoid the misleading

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

created #8180

wemeetagain added a commit that referenced this pull request Aug 11, 2025
**Motivation**

- we got a lot of rate limited response from peers during syncing
- we should be able to control which peers to connect to based on
tracked active requests in PeerBalancer

**Description**

- enforce no more than MAX_CONCURRENT_REQUESTS per peer in sync
- limit number of epochs downloaded ahead


part of #8033

**Test results on fusaka-devnet-3**
was able to sync `fusaka-devnet-3` 2 times using this branch along with
#8150

<img width="851" height="346" alt="Screenshot 2025-08-09 at 14 45 32"
src="https://github.com/user-attachments/assets/1f3afca0-13a6-4f7a-9c18-51d03cc34793"
/>

---------

Co-authored-by: Tuyen Nguyen <twoeths@users.noreply.github.com>
Co-authored-by: Cayman <caymannava@gmail.com>
@wemeetagain wemeetagain merged commit 1d608c5 into peerDAS Aug 11, 2025
12 of 18 checks passed
@wemeetagain wemeetagain deleted the te/handle_0_blocks_batch branch August 11, 2025 14:21
@wemeetagain

Copy link
Copy Markdown
Member

🎉 This PR is included in v1.34.0 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants