op-node/derive: malformed span batch tx data propagates as unclassified error causing O(N×backoff) drain

## Problem

[`channel_in_reader.go:118-124`](https://github.com/ethereum-optimism/optimism/blob/d25685cb6189c10907ec12ba6172ccbeb2cfb8ec/op-node/rollup/derive/channel_in_reader.go#L118-L124) does not catch errors from `DeriveSpanBatch`:

```go
case SpanBatchType:
    // ...
    batch.Batch, err = DeriveSpanBatch(batchData, cr.cfg.BlockTime, cr.cfg.Genesis.L2Time, cr.cfg.L2ChainID)
    if err != nil {
        return nil, err  // unclassified error propagates up
    }
```

`DeriveSpanBatch` calls `RawSpanBatch.ToSpanBatch` → `derive()`, which reconstructs full transactions from the span batch encoding. This can fail with **unclassified errors** (plain `fmt.Errorf`/`errors.New`) when the batch data is malformed:

- `recoverV`: `"invalid tx type: %d"` for unknown transaction types
- `fullTxs`: `stx.UnmarshalBinary` failure, `"tx to not enough"`, `stx.convertToFullTx` failure

These are all **external input failures** (bad data from the batcher), not logic errors. The one `NewCriticalError` in `DeriveSpanBatch` (for the `*RawSpanBatch` type assertion) is correctly classified as a logic error and should continue to propagate.

## Why it matters

Unclassified errors reach `PipelineDeriver.OnEvent`'s catch-all:

```go
} else if err != nil {
    d.pipeline.log.Error("Derivation process error", "err", err)
    d.emitter.Emit(ctx, rollup.EngineTemporaryErrorEvent{Err: err})
}
```

This causes:
1. **`Error`-level log** — misleading for operators; this is bad batcher data, not an infrastructure failure
2. **Backoff + retry** — since `cr.nextBatchFn` is still set, retry reads the next batch (the bad one was already consumed). A channel with N malformed span batches requires N backoff cycles to drain.

A batcher with a valid key can exploit this by submitting channels full of span batches with deliberately malformed transaction data (e.g. an unknown tx type byte), causing O(N × backoff_duration) stall per channel.

## Fix

Catch non-critical errors in `ChannelInReader.NextBatch` and treat them as a drop:

```go
batch.Batch, err = DeriveSpanBatch(batchData, cr.cfg.BlockTime, cr.cfg.Genesis.L2Time, cr.cfg.L2ChainID)
if err != nil {
    if errors.Is(err, ErrCritical) {
        return nil, err // logic error, propagate
    }
    cr.log.Warn("dropping malformed span batch", "err", err)
    return nil, NotEnoughData
}
```

`NotEnoughData` causes immediate retry with no backoff, draining bad channels at full speed.

## Parent issue

Part of #19491

---
*Generated by [Claude](https://claude.ai)*


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

op-node/derive: malformed span batch tx data propagates as unclassified error causing O(N×backoff) drain #19494

Problem

Why it matters

Fix

Parent issue

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

op-node/derive: malformed span batch tx data propagates as unclassified error causing O(N×backoff) drain #19494

Description

Problem

Why it matters

Fix

Parent issue

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions