Problem
channel_in_reader.go:118-124 does not catch errors from DeriveSpanBatch:
case SpanBatchType:
// ...
batch.Batch, err = DeriveSpanBatch(batchData, cr.cfg.BlockTime, cr.cfg.Genesis.L2Time, cr.cfg.L2ChainID)
if err != nil {
return nil, err // unclassified error propagates up
}
DeriveSpanBatch calls RawSpanBatch.ToSpanBatch → derive(), which reconstructs full transactions from the span batch encoding. This can fail with unclassified errors (plain fmt.Errorf/errors.New) when the batch data is malformed:
recoverV: "invalid tx type: %d" for unknown transaction types
fullTxs: stx.UnmarshalBinary failure, "tx to not enough", stx.convertToFullTx failure
These are all external input failures (bad data from the batcher), not logic errors. The one NewCriticalError in DeriveSpanBatch (for the *RawSpanBatch type assertion) is correctly classified as a logic error and should continue to propagate.
Why it matters
Unclassified errors reach PipelineDeriver.OnEvent's catch-all:
} else if err != nil {
d.pipeline.log.Error("Derivation process error", "err", err)
d.emitter.Emit(ctx, rollup.EngineTemporaryErrorEvent{Err: err})
}
This causes:
Error-level log — misleading for operators; this is bad batcher data, not an infrastructure failure
- Backoff + retry — since
cr.nextBatchFn is still set, retry reads the next batch (the bad one was already consumed). A channel with N malformed span batches requires N backoff cycles to drain.
A batcher with a valid key can exploit this by submitting channels full of span batches with deliberately malformed transaction data (e.g. an unknown tx type byte), causing O(N × backoff_duration) stall per channel.
Fix
Catch non-critical errors in ChannelInReader.NextBatch and treat them as a drop:
batch.Batch, err = DeriveSpanBatch(batchData, cr.cfg.BlockTime, cr.cfg.Genesis.L2Time, cr.cfg.L2ChainID)
if err != nil {
if errors.Is(err, ErrCritical) {
return nil, err // logic error, propagate
}
cr.log.Warn("dropping malformed span batch", "err", err)
return nil, NotEnoughData
}
NotEnoughData causes immediate retry with no backoff, draining bad channels at full speed.
Parent issue
Part of #19491
Generated by Claude
Problem
channel_in_reader.go:118-124does not catch errors fromDeriveSpanBatch:DeriveSpanBatchcallsRawSpanBatch.ToSpanBatch→derive(), which reconstructs full transactions from the span batch encoding. This can fail with unclassified errors (plainfmt.Errorf/errors.New) when the batch data is malformed:recoverV:"invalid tx type: %d"for unknown transaction typesfullTxs:stx.UnmarshalBinaryfailure,"tx to not enough",stx.convertToFullTxfailureThese are all external input failures (bad data from the batcher), not logic errors. The one
NewCriticalErrorinDeriveSpanBatch(for the*RawSpanBatchtype assertion) is correctly classified as a logic error and should continue to propagate.Why it matters
Unclassified errors reach
PipelineDeriver.OnEvent's catch-all:This causes:
Error-level log — misleading for operators; this is bad batcher data, not an infrastructure failurecr.nextBatchFnis still set, retry reads the next batch (the bad one was already consumed). A channel with N malformed span batches requires N backoff cycles to drain.A batcher with a valid key can exploit this by submitting channels full of span batches with deliberately malformed transaction data (e.g. an unknown tx type byte), causing O(N × backoff_duration) stall per channel.
Fix
Catch non-critical errors in
ChannelInReader.NextBatchand treat them as a drop:NotEnoughDatacauses immediate retry with no backoff, draining bad channels at full speed.Parent issue
Part of #19491
Generated by Claude