Skip to content

backport: feat(ai): optimize text accumulation runtime to O(N)#15906

Draft
aayush-kapoor wants to merge 1 commit into
release-v6.0from
aayush/backport-optimization
Draft

backport: feat(ai): optimize text accumulation runtime to O(N)#15906
aayush-kapoor wants to merge 1 commit into
release-v6.0from
aayush/backport-optimization

Conversation

@aayush-kapoor

Copy link
Copy Markdown
Collaborator

Background

manual backport for #15897

Checklist

  • All commits are signed (PRs with unsigned commits cannot be merged)
  • Tests have been added / updated (for bug fixes / features)
  • Documentation has been added / updated (for bug fixes / features)
  • A patch changeset for relevant packages has been added (for bug fixes / features - run pnpm changeset in the project root)
  • I have reviewed this pull request (self-review)

@meitalbensinai

Copy link
Copy Markdown

Validation results + report of a sibling site this PR does not cover

Validated this PR (specifically the v6 backport, #15906) against an in-the-wild reproduction of the bug it was filed for. Short version: the PR is correctly written and a real improvement, but the bug still fires on tool-input-heavy workloads because of a third O(N²) site in the same file that this PR does not address. Posting here so you can decide whether to expand scope before merging or land it with a tracking issue.

Stack used

  • opencode (anomalyco fork) at v1.15.12 source rebuild + this PR's patched ai@6.0.168
  • For "both fixes" group: also patched the sibling opencode/processor.ts site we filed in anomalyco/opencode#30072 (same chunked-text shape as this PR)
  • Model: minimax/minimax-m2.7 via OpenRouter (verbose-reasoning + heavy edit tool inputs — the workload class that triggers the bug)
  • Instance: protonmail/webclients SWE-bench-Pro instance 7e54526774… (heavyweight TS monorepo; historically a reliable bug-firer)
  • N=5 per group, sequential, same environment

Results

Config n resolved med total med mean s/step med max-step p100 max-step
Both fixes (this PR + opencode#30072) 5 5/5 692s 6.3s 88s 97s
This PR only 5 4/5 1075s 8.6s 83s 103s
Unpatched 5 5/5 801s 7.4s 91s 120s

Max-step distribution per group (sorted desc):

  • Both fixes: 97s, 89s, 88s, 86s, 48s
  • This PR only: 103s, 99s, 83s, 61s, 59s
  • Unpatched: 120s, 94s, 91s, 88s, 63s

Every single run, in every config, has at least one step taking 48–120s. In a healthy run, no individual LLM step should take >15s on this model. The signature is the bug still firing — just attenuated.

The third site

packages/ai/src/ui/process-ui-message-stream.ts:577

case 'tool-input-delta': {
  const partialToolCall = state.partialToolCalls[chunk.toolCallId];
  
  partialToolCall.text += chunk.inputTextDelta;          // ← same pattern as text-delta / reasoning-delta

  const { value: partialArgs } = await parsePartialJson(
    partialToolCall.text,                                // ← forces flatten on every chunk
  );

Same text += shape this PR fixes for the text and reasoning branches in the same switch. MiniMax M2.7 is especially exposed because its edit tool calls carry multi-line diffs streamed in many small chunks; at step ~50+ in a long agent loop, the partialToolCall.text for an in-flight edit grows large enough that the per-chunk concat + parse hits quadratic time. That matches what we observe — runs clean for ~50 steps, then late-step spikes once tool-input streaming bytes have accumulated.

Why a naïve prepareTextAccumulator here is harder

The parsePartialJson(partialToolCall.text) call on every delta needs the cumulative string, so a lazy-join getter alone doesn't break the quadratic — every delta still flattens. Options:

  1. Incremental partial-JSON parser that consumes deltas without rebuilding the full string each time (most correct, real engineering).
  2. Buffer-and-flush: only run parsePartialJson every N deltas (or after a debounce window). Drops some UI smoothness, large perf win.
  3. Chunk + soft-rejoin cap: store as chunks, only flatten when needed for parsePartialJson, but rejoin if _chunks.length exceeds a threshold to bound worst-case.

Happy to file the follow-up PR if you'd like — wanted to flag it before #15906/#15897 merge so the scope decision is informed. Either way, thanks for the clean lifecycle design on the existing fix; the WeakMap + explicit finalize is nicer than what we shipped on our side.

@lgrammel

lgrammel commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

Added benchmark that reproduces quadratic effect with high number of chunks / small chunk size:

50k:   1088.913 ms
100k:  4106.260 ms
150k:  8736.345 ms
200k: 15703.122 ms
250k: 24122.616 ms

However, it does not show that the changes here fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants