Skip to content

Create README.md#1

Merged
ChainSafeSystems merged 1 commit into
ChainSafe:masterfrom
Mikerah:patch-1
Jun 23, 2018
Merged

Create README.md#1
ChainSafeSystems merged 1 commit into
ChainSafe:masterfrom
Mikerah:patch-1

Conversation

@Mikerah

@Mikerah Mikerah commented Jun 22, 2018

Copy link
Copy Markdown
Contributor

Added an overview of the project.
Will be adding more details as it gets built out.

@ChainSafeSystems ChainSafeSystems merged commit f59f4f3 into ChainSafe:master Jun 23, 2018
wemeetagain pushed a commit that referenced this pull request Sep 3, 2019
Merge from Chainsafe master
dapplion pushed a commit that referenced this pull request Jan 19, 2022
spiral-ladder added a commit to spiral-ladder/lodestar that referenced this pull request Jul 7, 2025
Ran into this myself while setting `lodestar` up - found some old
issues([ChainSafe#1](ChainSafe#3037), 
[ChainSafe#2](ChainSafe#1396)) dating 
back to 2020 indicating that others also ran into the same issue
in the past, so this note is probably worth adding to `CONTRIBUTING.md`.

Added a short note about what the error is about and what the solution is.
lodekeeper referenced this pull request in lodekeeper/lodestar Feb 14, 2026
…sponse

The responseEncodeError() function was yielding the error status byte and
snappy-encoded error message as separate chunks through the async generator.
When piped through libp2p stream.sink, this created a race condition where
the stream could close after the status byte was flushed but before the
error message bytes arrived on the reader side.

This resulted in the requesting side receiving the correct error status code
but an empty errorMessage, causing flaky failures in the e2e reqresp tests
('should handle a server error' and 'should handle a server error after
emitting two blocks'). These two tests were the #1 cause of CI E2E failures,
appearing in ~90% of all E2E test failures.

The fix collects the status byte and encoded error message into a single
Buffer.concat() yield, ensuring they are delivered atomically through the
stream.
nazarhussain pushed a commit that referenced this pull request Feb 16, 2026
…sponse (#8908)

## Motivation

The e2e reqresp tests "should handle a server error" and "should handle
a server error after emitting two blocks" have been consistently flaky,
appearing in **~90% of all CI E2E test failures**. Analysis of ~100
recent CI runs confirmed this as the #1 source of E2E flakiness.

The failure pattern:
```
expected { code: "REQUEST_ERROR_SERVER_ERROR", errorMessage: "" }
to deeply equal { code: "REQUEST_ERROR_SERVER_ERROR", errorMessage: "TEST_EXAMPLE_ERROR_1234" }
```

The error status code is received correctly, but the error message is
empty.

## Root Cause

`responseEncodeError()` yields the error status byte and snappy-encoded
error message as **separate chunks** through the async generator:

```ts
yield Buffer.from([status]);  // chunk 1
yield* encodeErrorMessage(errorMessage, protocol.encoding);  // chunk 2+
```

When piped through `stream.sink`, libp2p can close/flush the stream
after the first yield completes but before the subsequent error message
chunks are delivered to the reader side. The `readErrorMessage()`
function on the receiving end then finds no data after the status byte
and returns an empty string.

## Fix

Collect the status byte and encoded error message into a single
`Buffer.concat()` yield, ensuring they are delivered atomically through
the stream. This eliminates the race condition without changing the wire
format.

## Notes

- All existing reqresp unit tests pass (85/85)
- The wire format is unchanged — the same bytes are sent, just in a
single chunk instead of multiple
- This is consistent with how other protocols handle similar issues
(combining header + payload)

> This PR was authored by an AI contributor. All code was reviewed by
sub-agents before submission.

---------

Co-authored-by: lodekeeper <lodekeeper@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <175061342+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Nico Flaig <nflaig@protonmail.com>
wemeetagain pushed a commit that referenced this pull request Feb 24, 2026
…8955)

## Motivation

PR #8890 (libp2p v3) suffers from ~22% Unknown peer rate (vs ~4% on
v2/unstable). Root cause: libp2p v3 enforces per-protocol stream limits
(`identify: maxOutboundStreams=1`). When repeated STATUS messages
trigger overlapping `identify()` calls for the same peer, v3 throws
`TooManyOutboundProtocolStreamsError` which cascades into massive EOF
failures (~5000 identify errors/2h on feat1 vs ~287 on unstable).

## Description

Minimal fix — no retries, no backoff, no spray:

- **Single in-flight identify per peer**: `identifyInProgress` map keyed
by `PeerIdStr → connection.id`. Before calling `identify()`, checks if
there's already one in-flight for the same connection. If so, skips.
- **Event-driven fallback**: Listens to `peer:identify` events from
libp2p (fired on successful identify or identify-push). Updates
`agentVersion/agentClient` even if our explicit `identify()` failed
earlier.
- **Reconnect race safety**: Uses `connection.id` as epoch token. After
`await identify()`, verifies the in-flight key still matches before
writing results — a reconnect during the await clears the old entry, so
stale results are discarded.
- **Cleanup on disconnect**: Removes in-flight tracking when peer
disconnects.

## Changes

- `packages/beacon-node/src/network/peers/peerManager.ts`: Added
`identifyInProgress` map, `onPeerIdentify` event handler, dedup guard in
`onStatus`, stale-result guard in `identifyPeer()`, cleanup in
disconnect handler
- `packages/beacon-node/test/e2e/network/peers/peerManager.test.ts`: 4
new tests (dedup, reconnect race, event-driven fallback, existing flow
preserved)

## Evidence

Loki log comparison (2h window):
- **feat1 (v3)**: ~5000 identify errors — 3794 EOF, 863
EOF-while-reading, 283 missing public key, 26 too-many-outbound-streams
- **unstable (v2)**: ~287 identify errors — 143 unexpected-end, 82
timeouts

```
# The overlapping call pattern (before fix):
STATUS #1 → identify() in-flight
STATUS #2 → identify() overlaps → TooManyOutboundProtocolStreamsError → EOF cascade

# After fix:
STATUS #1 → identify() in-flight, tracked in identifyInProgress
STATUS #2 → sees in-flight marker, skips
peer:identify event → updates agentVersion as safety net
```

> Note: This is a replacement for the retry-based approach in #8954.
That PR added retry machinery which masked the root cause rather than
preventing overlapping calls.

## AI Disclosure

This PR was authored with AI assistance (Claude Opus 4.6 via OpenClaw).
All code was reviewed and tested by the AI agent.

Co-authored-by: lodekeeper <lodekeeper@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants