perf: avoid O(N²) when a single SSE event spans many chunks by emoralesb05 · Pull Request #28 · rexxars/eventsource-parser

emoralesb05 · 2026-04-18T04:14:58Z

Summary

createParser().feed(chunk) is O(N²) in total bytes when a single SSE line spans many chunks. The cost is incompleteLine = incompleteLine + chunk, which allocates a new string sized at all-bytes-seen-so-far on every call.

This swaps the string accumulator for pendingFragments: string[], joined exactly once when a line terminator finally arrives. The hot path is gated by a single pendingFragments.length === 0 check and delegates straight to processLines(chunk) exactly like the original.

End-to-end win on a real MCP workload: 93 s → 6.8 s parsing a 280 MB payload (≈14×). Synthetic bench: ~640× on the worst-case shape.

Background

Prior work addressed intra-chunk line splitting: #19 closed by 3.0.1's splitLines rewrite (8952917), and vercel/ai#5862 closed after 3.0.1.

Neither touched the per-feed concat. It only really hurts when a single SSE event carries a large payload streamed in many small chunks. That shape shows up with LLM responses without intra-event newlines, MCP-over-SSE servers like mcp-clickhouse that emit results as one content block, or any consumer chunking on small TCP/TLS frames. 3.0.7's perf refactor improved per-line work but left the concatenation pattern intact.

The bug

src/parse.ts (3.0.7):

let incompleteLine = ''

function feed(chunk: string) {
  // ...
  const input = incompleteLine === '' ? chunk : incompleteLine + chunk
  incompleteLine = processLines(input)
}

For a stream where one SSE line is N bytes split across K chunks (no terminator until the very end), every feed() allocates a string sized at the running total. Total work is Σ(i × chunk_size) ≈ O(N²/chunk_size).

Reproducing

The new huge-line-drip bench fixture (256 KiB payload chunked into 1–8 byte slices, no terminator until the end) on main:

feed() — huge-line-drip   1300.00 ms/iter   (~30 MB allocated)

Per-chunk cost grows linearly with the buffer already accumulated. Classic "string concat in a loop" fingerprint. Measured against a real 280 MB MCP payload:

Buffered so far	Per-chunk processing
3 MB	1 ms
66 MB	8 ms
131 MB	16 ms
197 MB	27 ms
262 MB	37 ms
288 MB	79 ms

For a real-world reproducer: point @ai-sdk/mcp at mcp-clickhouse against any ClickHouse instance with a ≥1M-row table, issue SELECT * LIMIT 1000000. mcp-clickhouse builds the full result string and emits it as a single SSE message event.

The fix

// Hot path: no buffered prefix from a prior partial line. Hand the chunk
// straight to processLines, exactly like the original implementation.
// Zero new work in the common case (every chunk ends with `\n\n`).
if (pendingFragments.length === 0) {
  const trailing = processLines(chunk)
  if (trailing !== '') pendingFragments.push(trailing)
  return
}

// We have a buffered prefix. If this chunk also has no terminator, append
// to the buffer without concatenating. That's the O(N²) trap we're avoiding.
if (chunk.indexOf('\n') === -1 && chunk.indexOf('\r') === -1) {
  pendingFragments.push(chunk)
  return
}

// Terminator arrived. Join the accumulated fragments + this chunk once,
// process, and buffer any new trailing partial line.
pendingFragments.push(chunk)
const input = pendingFragments.join('')
pendingFragments.length = 0
const trailing = processLines(input)
if (trailing !== '') pendingFragments.push(trailing)

processLines and parseLine internals are unchanged. reset() adapts to the array form. feedFirst is inlined into feed (one BOM check on the first chunk, gated by the existing isFirstChunk flag).

Why the hot path stays free: well-formed SSE chunks end with \n\n. processLines consumes them, returns '', and nothing gets pushed. The next call sees pendingFragments.length === 0 and takes the same path the original code did. The buffering only kicks in once a chunk leaves a partial line behind.

Validation

Synthetic bench

Matched-clock comparison (Apple M4 Max, Node 24.11, mitata defaults):

Bench	Baseline	Fix	Δ
data-only	13.10 µs	13.22 µs	+1%
named-event	7.00 µs	7.28 µs	+4%
identified-event	10.14 µs	10.65 µs	+5%
multibyte	10.56 µs	10.79 µs	+2%
heartbeat	7.44 µs	7.36 µs	−1%
idle-stream	20.66 µs	7.95 µs	−62%
small-chunk	143.78 µs	114.69 µs	−22%
large-multiline-data	13.22 µs	13.30 µs	+1%
huge-line-drip	1.18 s	1.84 ms	~640×
edge-cases	5.79 µs	6.21 µs	+7%

Run-to-run variance on the same code across three identical invocations ranged from 1% (edge-cases) to 52% (data-only) on this hardware, so sub-10% deltas on small fixtures aren't meaningful signals. The wins on idle-stream, small-chunk, and huge-line-drip are well outside that band.

End-to-end (real workload)

1M rows / 280 MB payload via mcp-clickhouse → ClickHouse Cloud, fetched through @ai-sdk/mcp's streamable-HTTP transport. Numbers came from a downstream MCP integration during an unrelated latency investigation:

Version	Total time	RSS peak
3.0.6	93 s	3.9 GB
3.0.7 (the perf refactor)	101 s	2.3 GB
3.0.7 + this fix	6.8 s	2.4 GB

3.0.7 lowered RSS but didn't change wall time on this shape, which points to the per-feed concat being the dominant cost for streams where a single event spans many chunks.

Instrumented call distribution on the 280 MB payload: 4,436 feed() calls hit the new fast path (no concat, chunk had no terminator, just appended to the buffer); 2 hit the slow path (the first chunk carrying the event:/data: header, and the last chunk carrying the \n\n terminator). Exactly what the design predicted.

Replace the per-feed `incompleteLine + chunk` concat with a `pendingFragments: string[]` joined once when a terminator arrives. ~640× faster on the new `huge-line-drip` bench fixture; ~14× faster end-to-end on a real 280 MB MCP workload (93 s → 6.8 s). Hot path unchanged, all 42 existing tests pass.

emoralesb05 · 2026-04-18T04:16:43Z

Let me know if there is anything wrong with this! I had to pnpm patch it in our repo but would love to remove that patch and have it be fixed at the source!

I tried to benchmark it as much as possible with your framework but also via the @ai-sdk/mcp directly with clickhouse with different sizes and payloads.. i hope this helps!

rexxars · 2026-04-19T17:43:54Z

Thanks a bunch! Appreciate it ❤️

emoralesb05 mentioned this pull request Apr 18, 2026

O(N²) parse time in @ai-sdk/mcp on large MCP tool responses (fix in rexxars/eventsource-parser#28) vercel/ai#14619

Closed

1 task

rexxars merged commit 4c41223 into rexxars:main Apr 19, 2026
1 check passed

emoralesb05 deleted the perf/streaming-investigation branch April 19, 2026 17:52

emoralesb05 restored the perf/streaming-investigation branch April 19, 2026 18:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: avoid O(N²) when a single SSE event spans many chunks#28

perf: avoid O(N²) when a single SSE event spans many chunks#28
rexxars merged 1 commit into
rexxars:mainfrom
emoralesb05:perf/streaming-investigation

emoralesb05 commented Apr 18, 2026

Uh oh!

emoralesb05 commented Apr 18, 2026

Uh oh!

Uh oh!

rexxars commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

emoralesb05 commented Apr 18, 2026

Summary

Background

The bug

Reproducing

The fix

Validation

Synthetic bench

End-to-end (real workload)

Uh oh!

emoralesb05 commented Apr 18, 2026

Uh oh!

Uh oh!

rexxars commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants