fix: track rate limited errors by twoeths · Pull Request #8116 · ChainSafe/lodestar

twoeths · 2025-08-06T08:53:06Z

Motivation

when we end request to peers, they may respond with rate limit error and we don't track it
this is a prerequisite for rate limit prevention that I'll implement later. We want to make sure REQUEST_RATE_LIMITED does not happen in that branch

Description

fix the malformed extracted error message, see the unit test
throw REQUEST_RATE_LIMITED if message contains "rate limit", see rate limited error messages of all clients here
new metric to track out going error by reason

Closes #8065
Closes #8110

Test on dev nodes

twoeths · 2025-08-06T09:51:43Z

new rate limited error messages with this branch:

Prysm: verbose: Req error method=beacon_blocks_by_range, version=2, encoding=ssz_snappy, client=Prysm, peer=16...iQxcdU, requestId=96163, status=1, errorMessage=rate limited
Lighthouse: verbose: Req error method=beacon_blocks_by_range, version=2, encoding=ssz_snappy, client=Lighthouse, peer=16...xwThs6, requestId=75188, status=139, errorMessage=Rate limited. There are already 2 active requests with the same protocol
NA: verbose: Req error method=data_column_sidecars_by_range, version=1, encoding=ssz_snappy, client=NA, peer=16...evfBuU, requestId=74333, status=1, errorMessage=rate limited

matthewkeil · 2025-08-06T16:14:54Z

    }
  }

+  // other errors like RequestErrorCode.REQUEST_RATE_LIMITED could come from ourself, not the peer so we should not penalize them


I'm confused about this. Wouldnt an outgoing request that we "rate limit" mean that we are not making the request. How would the error "come from ourself"?

per spec we should not send more than 2 requests per protocol to the same peer
in the future we should implement that at client side, ie do not send more requests to the same peer/protocol
then we can avoid that error

ahh... as in we throw RequestErrorCode.REQUEST_RATE_LIMITED from the client instead of just not making the request?

right now it'll throw RESP_RATE_LIMITED, ie we makes a request and peer returns that error. We don't have self rate limiter yet, the strategy for now is to control at the client side, ie in SyncChain (I have a local branch for it, will create a PR soon but want some more testings)

I'll also implement self rate limiter soon, in that case it'll throw REQUEST_RATE_LIMITED. Then we'll track this error and see which module needs to control active requests like in SyncChain

wemeetagain · 2025-08-07T11:34:13Z

+    return {code: RequestErrorCode.RESP_TIMEOUT};
+  }
+
  switch (status) {


Can we also add a RespStatus.RATE_LIMITED (which == 139) and check in this switch?
This will help in case error messages in clients change.

rate limited error is not specified in the response code https://github.com/ethereum/consensus-specs/blob/dev/specs/phase0/p2p-interface.md#responding-side

lighthouse returned 139 but other clients returned 1

I don't think we can improve unless it's specified in the spec

matthewkeil · 2025-08-07T22:45:32Z

+    {
+      name: "NA - rate limited",
+      errorMessage: "rate limited",
+    },


Do you think we should add a test case for the "wait #.###" condition?

I added more test cases in 7624a60

graphite-app · 2025-08-08T09:52:22Z

+    outgoingErrorReasons: register.gauge<{reason: RequestErrorCode}>({
+      name: "beacon_reqresp_outgoing_requests_error_reason_total",
+      help: "Count total outgoing request errors by reason",
+      labelNames: ["reason"],
+    }),


This metric is defined as a gauge but is being used with the inc() method in ReqResp.ts, which is counter behavior. For accurate metrics reporting, this should be defined as a counter instead of a gauge. Gauges represent values that can go up and down, while counters are for values that only increase over time, which appears to be the intended usage here.

Suggested change

outgoingErrorReasons: register.gauge<{reason: RequestErrorCode}>({

name: "beacon_reqresp_outgoing_requests_error_reason_total",

help: "Count total outgoing request errors by reason",

labelNames: ["reason"],

}),

outgoingErrorReasons: register.counter<{reason: RequestErrorCode}>({

name: "beacon_reqresp_outgoing_requests_error_reason_total",

help: "Count total outgoing request errors by reason",

labelNames: ["reason"],

}),

Spotted by Diamond

Is this helpful? React 👍 or 👎 to let us know.

nflaig · 2025-08-09T11:53:29Z

seems like since we merged this PR our unit tests are failing consistently

**Motivation** - fix unit tests and e2e tests of `peerDAS` branch **Description** - when sending status, based on its body we set correct version, otherwise peers cannot deserialize the request body - at fulu fork transition, update local status cache so that it sends the correct version of status message - fix failed unit test as introduced by [PR-8116](#8116 (comment)) --------- Co-authored-by: Tuyen Nguyen <twoeths@users.noreply.github.com> Co-authored-by: Nico Flaig <nflaig@protonmail.com> Co-authored-by: Cayman <caymannava@gmail.com>

**Motivation** - track req/resp outgoing request error by reason **Description** - the metric was added in #8116 <img width="835" height="609" alt="Screenshot 2025-08-28 at 15 25 59" src="https://github.com/user-attachments/assets/92b97adc-9ae1-4ce0-a1d3-ef32378d5ee0" /> --------- Co-authored-by: Tuyen Nguyen <twoeths@users.noreply.github.com> Co-authored-by: Cayman <caymannava@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

wemeetagain · 2025-09-10T16:02:01Z

🎉 This PR is included in v1.34.0 🎉

twoeths added 3 commits August 6, 2025 14:34

fix: decode p2p error message

cca4c14

feat: detect and track REQUEST_RATE_LIMITED error

37fedbc

fix: remove unused import

01675da

twoeths marked this pull request as ready for review August 6, 2025 10:14

twoeths requested a review from a team as a code owner August 6, 2025 10:14

twoeths changed the title ~~fix: decode p2p error message~~ fix: track rate limited errors Aug 6, 2025

wemeetagain reviewed Aug 6, 2025

View reviewed changes

Comment thread packages/reqresp/src/request/errors.ts Outdated

matthewkeil reviewed Aug 6, 2025

View reviewed changes

twoeths added 3 commits August 7, 2025 10:30

fix: handle wait error message

a6d90a6

fix: apply PeerAction when RESP_TIMEOUT

fbba340

fix: new RESP_RATE_LIMITED error code

3d1b5b5

wemeetagain reviewed Aug 7, 2025

View reviewed changes

matthewkeil reviewed Aug 7, 2025

View reviewed changes

chore: more test cases

7624a60

graphite-app Bot reviewed Aug 8, 2025

View reviewed changes

wemeetagain approved these changes Aug 8, 2025

View reviewed changes

wemeetagain merged commit 68b6feb into peerDAS Aug 8, 2025
15 of 19 checks passed

wemeetagain deleted the te/detect_rate_limit_error branch August 8, 2025 11:27

This was referenced Aug 8, 2025

Cannot track rate limited error #8065

Closed

Reqresp not parsing error message properly #8110

Closed

twoeths mentioned this pull request Aug 11, 2025

test: unit tests and e2e tests of peerDAS branch #8173

Merged

twoeths mentioned this pull request Aug 28, 2025

chore: track req/resp outgoing request error reason #8283

Merged

matthewkeil mentioned this pull request Sep 8, 2025

feat: refactor block input #8200

Merged

Uh oh!

Conversation

twoeths commented Aug 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

twoeths commented Aug 6, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

graphite-app Bot Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nflaig commented Aug 9, 2025

Uh oh!

wemeetagain commented Sep 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

twoeths commented Aug 6, 2025 •

edited

Loading