[WIP] agentx by cquil11 · Pull Request #348 · SemiAnalysisAI/InferenceX-app

cquil11 · 2026-05-14T15:19:52Z

Summary

add per-agentic-point interactivity (1 / TPOT) and TTFT time-series charts
default both charts to P90 with independently selectable rolling P75/P90 lines over the trailing 50 profiling requests
add red cumulative P75/P90 convergence lines that follow the selected percentile
persist TPOT in versioned request timelines and ignore warmup, cancelled, missing, and invalid samples
show elapsed-from-start timestamps beside dataset flamegraph turns and subturns
show subagent headers as elapsed start-end ranges, with child timings as a fallback
standardize dataset distributions on P50/P75/P90/P95 guide lines and summaries
add a zero-preserving log histogram for uncached input tokens per request

Data updates

backfilled all 746 stored request timelines to version 3 with TPOT populated
re-ingested both cc-traces-weka-062126 variants so flamegraph structures include timing
backfilled both dataset aggregates to chart-data v2 with ISL, OSL, and uncached-input percentiles
purged the API cache after each dataset refresh

Unofficial-run overlays cannot open the persisted agentic point-detail route because they do not have a benchmark_results id or stored request timeline. The new point-detail charts are therefore intentionally limited to DB-backed official points.

Validation

pnpm typecheck
pnpm lint
pnpm fmt
app unit suite: 118 files / 2195 tests passed
DB unit suite: 19 files / 294 tests passed
focused Cypress component: dataset distribution card, 4 tests passed
focused Cypress E2E: agentic time-series, flamegraph timing, and dataset distributions, 5 tests passed

Adds agentic_traces scenario end-to-end: - Schema migrations for agentic scenario, availability, and KV offload mode - DB ingest/ETL + query updates to carry scenario, offload_mode, and server/theoretical cache-hit rates through to the API layer - Frontend types, filters (GlobalFilterContext / InferenceContext / ChartControls), URL state, and tooltip rows for agentic-only fields - ScatterGraph: subtle dashed halo on Pareto-frontier points that used KV offload so the tradeoff is visible at a glance

- ScatterGraph: include `offload_mode` in `buildPointConfigId` so d3's data join keeps both `on` and `off` variants for the same (config, conc). Without it, the second variant collapsed onto the first key, so FP8 offload-on points (and their halos) silently disappeared. - benchmark-mapper: handle older artifacts that emit `users`/`offload_mode` AND newer ones that emit `conc`/`offloading` (with 'none' → 'off' mapping). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The halo's purpose is to surface KV-offload usage; restricting it to Pareto-frontier-only points hid the indicator on most runs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

b300-p1 (and similar) artifacts were skipping ingest because the runner-pool suffix wasn't in the strip list and didn't normalize to the canonical b300 GPU key. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- Label text now includes `C=<conc>` alongside the GPU/parallelism tag (default `<tp> C=<conc>`, advanced `<getPointLabel> C=<conc>`) - Bumped point-label font-weight to 700 so the labels read clearly against the chart fill - Greedy collision-avoidance pass on render and zoom: tries placing each label above/below the point through 4 candidate dy offsets, hiding the label only when no slot is free Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…oint Tspans now ride above the text's `dy` anchor — the LAST line sits at the anchor (just above the point) and earlier lines stack above it. Previously the second tspan landed below the anchor and crashed into the marker. Also widened collision candidates by label height so the flipped-below position fully clears the point on multi-line labels. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… pass When a `<text>` contains tspans, the parent's `dy` does not shift the bbox cleanly — its (unused) y=0 origin still factors in, so the rendered text ended up centered on the point. Move the absolute offset into the FIRST tspan's `dy`; later tspans cascade by 1.1em. Collision avoidance now drives the first tspan's `dy` and tries four candidate baselines (primary above, primary below, secondary above, secondary below), accounting for full label height when picking a non- overlapping slot. Labels still hidden as a last resort. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two complementary fixes for runs whose `results_bmk` aggregated artifact ends up containing both a successful row and a failed-attempt row for the same (config, conc, offload) — the failed row's null metrics were overwriting the good row via ON CONFLICT DO UPDATE. 1. Artifact-level: strip the trailing `_<runner-pool>_<attempt>` suffix from each artifact name and group by the logical name, keeping only the most recent per group. 2. Row-level: skip rows with `num_requests_successful === 0` AND `num_requests_total > 0`. The aggregated artifact merges rows from all runners — including failed ones — so artifact-level dedup alone can't reach inside it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

# Conflicts: # packages/app/src/components/GlobalFilterContext.tsx # packages/app/src/components/inference/utils/tooltipUtils.ts # packages/db/src/etl/normalizers.ts

Tag display name for the `aiperf` spec_method suffix used by the alternate-harness runs ingested for the agentic minimax sweep. Without this entry the legend shows 'AIPERF' from the default toUpperCase fallback. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

bigint workflow_run_id sometimes deserializes as a number on the frontend depending on the postgres adapter's behavior; strict === between a number and a string silently dropped every match, so the changelog popover always reported "no changelog data available." Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

If the selected model has agentic_traces data, prefer that over the default 8K/1K fixed-seq when the user hasn't explicitly chosen via URL. effectiveSequence already falls back to availableSequences[0] for models without agentic, so models with only fixed-seq data still render correctly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

vercel · 2026-05-14T15:19:57Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
inferencemax-app	Ready	Preview, Comment	Jun 26, 2026 6:28am

# Conflicts: # packages/app/src/components/inference/ui/ChartControls.tsx # packages/app/src/components/inference/utils/tooltipUtils.ts # packages/db/src/etl/normalizers.ts

rowToAggDataEntry was only copying median/p99 metric variants — picking p90/p99.9 in the percentile selector silently fell back to 0 and collapsed every point into a vertical line at x=0. Copy the full median/p90/p99/p99.9 set into AggDataEntry. Hide the X-Axis Metric dropdown for agentic mode (it doubled up with the percentile selector) and route the input-metric chart through withPercentile so picking p99 actually plots p99_ttft instead of the hard-coded p99_ttft config default. Percentile options pared back to median + p99.

# Conflicts: # packages/app/src/components/GlobalFilterContext.tsx # packages/app/src/components/inference/InferenceContext.tsx # packages/app/src/components/inference/hooks/useChartData.ts

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Aligns the TTFT x-axis selectors with the percentile selector — only p90 is offered everywhere. Default x-axis metric and chart config input-throughput x are p90_ttft. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The `!isAgentic` gate on the e2e TTFT override branch dropped the user's `p90_ttft` pick in agentic mode, leaving the chart on the default p90_e2el. The trailing withPercentile pass is idempotent when xAxisField is already at the right percentile, so the gate is unnecessary. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- /datasets: methodology prose + dataset registry cards (DatasetList) - /datasets/[slug]: summary stats, model mix, 5 precomputed-histogram distribution cards (DistributionCard, log/linear), and a searchable/sortable/paginated conversation table - /datasets/[slug]/conversations/[convId]: per-conversation TraceFlamegraph — one bar per turn (cached prefix + uncached input + output), subagent groups collapsible (collapsed by default) with expand/collapse-all - header nav 'Datasets' link - query-layer test (mock DbClient): not-found paths + numeric coercion Verified end-to-end against the live branch DB: both datasets list with real stats, distributions render, flamegraph shows the prefix-reuse signature (turn 2 fully uncached, later turns mostly cached), expand-all surfaces subagent subturns. Zero console errors. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Wrap rows in a fixed-height (max-h-[520px]) vertically scrollable bordered box. Subagent group headers carry aggregate token totals that dwarf any single turn, which made their bars overflow the row (width >> 100%). Now turns/subturns use a per-turn scale while group headers use a separate group-aggregate scale (slim muted strips), both clamped to the track — groups stay comparable to each other and nothing overflows. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Add run_datasets (workflow_run → dataset slug) mapping (migration 012) and surface it through the benchmark-siblings sku. The agentic detail page's request timeline now deep-links each request bar to its exact conversation in the /datasets viewer — the request cid, stripped of any ::sa:/::fa: suffix, is the dataset conv_id. Tooltip shows a 'click to view in dataset' hint; bars get a pointer cursor only when a mapping exists. Backfilled workflow_run 27915787191 (the dsv4/b300/vllm run incl. point 422083) → cc-traces-weka-062126. Verified: clicking a timeline bar on /inference/agentic/422083 navigates to the matching /datasets/cc-traces-weka-062126/conversations/<conv_id>. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The timeline link now carries ?turn=<ti> (and &sa=<agentId> for subagent requests). The flamegraph resolves the target node — main turns by ordinal, subagent turns by matching the group's agentId then the ti-th child — expands the subagent group if needed, scrolls the row into view, and flashes a ring. subagentIdOf strips the harness stream suffix (:s<n> and :aux:<n>) so the cid's agent id matches the dataset SubagentNode.agentId. Verified end-to-end: clicking a subagent bar on /inference/agentic/422083 opens the conversation, expands the right group, and highlights the exact subturn. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ooltip - Deep-link highlight is now state-driven (bg-primary/20 + ring, fades over 700ms) instead of fragile classList mutation, so it's clearly visible and survives re-renders. Subagent groups still auto-expand and scroll into view. - Portal the hover tooltip to document.body so its position:fixed is viewport-relative — an ancestor transform was offsetting it away from the cursor. Now it sits at pointer+12px. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The conversation page read ?turn/&sa from window.location.search in a useState initializer, which captures stale/empty params during a client-side navigation — so scroll+highlight+expand only worked after a manual reload. Switch to the reactive useSearchParams (page wrapped in Suspense) so the params are present on the first nav. Also make the flamegraph expand the target subagent group via an effect (reacting to target changes), and defer the scroll one frame so the just-expanded child row exists. Verified via a real timeline click — no reload. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

In HC mode the iwanthue palette is sized and indexed by the key set it's generated over. ScatterGraph generated it from the *active* (selected) hw set, so deselecting a line shrank the set, re-sized the palette, and shifted every remaining line's hue — most visible on single-vendor agentic runs (which span the full hue wheel since 2c06009), where deselecting B300 could recolor B200 from red to blue. Pass the stable full set of hw-types-with-data as hcKeys so the palette and per-key index are fixed; toggling now only hides/shows lines without recoloring the rest. Adds a useThemeColors regression test asserting a line's HC color is identical across active subsets when hcKeys is the full set. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…elector/lockfile conflicts

…8882)

…te x-axis toggle test for single-chart mode buttons

…ph (incl deep-link), and dataset list states

…l copies

…ring doesn't collide with master

…titching (#491)

Replace the per-row P# badges with a colored left-gutter bracket that groups requests in the same main-agent or subagent scope whose original execution intervals overlapped (ran in parallel). Non-transitive overlap chains get their own side-by-side lanes; the gutter only renders when an overlap group exists, so non-parallel traces have no extra whitespace. Legend swatch and conversation-view copy updated to describe the bracket; e2e assertions check data-overlap-group on bracket segments. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…races A pathological conversation (1621 turns, a subagent fanning out into 622 children with 17-way concurrency) produced 49 bracket lanes — a 686px gutter that pushed the bars off-screen, plus one DOM node per lane per row (~110k empty divs, 157k total nodes on Expand all). Cap displayed lanes at MAX_LANES (6): overflow groups fold into the last "dense" lane, so every parallel row still carries a marker but the gutter width stays bounded. Render the gutter sparsely (only lanes a row touches, absolutely positioned) instead of a dense lane-per-row matrix. A subtle note surfaces when lanes are capped so the fold isn't silent. Outlier now: gutter 686px -> 84px, DOM on Expand all 157k -> 35k nodes. Normal multi-lane traces are unchanged (<=6 lanes hit the identity path). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

adibarra · 2026-06-26T04:09:39Z

@claude fix the build, it is failing. pnpm typecheck might help in diagnosing. once you fix it merge master branch into this one and fix conflicts.

github-actions · 2026-06-26T04:10:53Z

Claude finished @adibarra's task in 7m 5s —— View job

Task: Fix build + merge master ✅

Diagnose build failure with pnpm typecheck
Fix type/build errors
Merge origin/master into feat/agentx and resolve conflicts
Re-run typecheck/lint/fmt to confirm green
Commit and push

Build fix

tsc failed in trace-flamegraph.tsx:

Property 'endS' does not exist on type 'TurnNode'.

The TurnNode interface in packages/db/src/etl/weka-structure.ts had startS but no endS, even though the flamegraph reads node.endS/child.endS (timestamps + parallel-bracket overlap detection) and the ETL's own subagentTimeRange already treats a turn as spanning start → start + api_time. Added endS?: number to TurnNode and populate it (startS + api_time) for both top-level turns and subagent child turns, factoring the existing logic into a shared turnEndS helper.

Merge

The runner had a shallow clone, so origin/master initially showed "unrelated histories" — unshallowed (git fetch --unshallow) to recover the common ancestor, then merged. Three files conflicted, all on the same theme (this branch added offload_mode as a chart-line dimension; master added br.id to the benchmark query). Resolved by combining both sides:

packages/db/src/queries/benchmarks.ts — kept offload_mode in the line-selection key (DISTINCT ON, ORDER BY, winners join) and master's br.id SELECT.
packages/db/src/json-provider.ts — lineKey keeps offload_mode; merged comment wording.
packages/db/src/json-provider.line-single-run.test.ts — kept the per-offload-mode test coverage.

Validation

pnpm typecheck ✅
pnpm lint ✅
pnpm fmt ✅
db tests for the touched files (weka-structure, json-provider.line-single-run, queries/*) ✅

Note: 3 tests in packages/db/src/connection.test.ts fail in this CI runner only because it exports DATABASE_DRIVER=neon and DATABASE_SSL=true, which override the defaults those tests assert. They are environment-induced (fail identically on master) and unrelated to this change.
· branch feat/agentx

Co-authored-by: Alec Ibarra <adibarra@users.noreply.github.com> Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

# Conflicts: # packages/db/src/json-provider.line-single-run.test.ts # packages/db/src/json-provider.ts # packages/db/src/queries/benchmarks.ts

…-end Agentic artifacts ship *_intvty under two harness definitions: slow-tail 1/p(ITL) (what the charts assume) vs fast-tail p(1/ITL), which inverts percentile order (p90 lands at ~1/p10(ITL)). Ingest stored the artifact value verbatim and the frontend only filled intvty when missing, so newer "timing fix" runs landed with the wrong definition — e.g. p90 reading 23.9 instead of 11.2 — silently contaminating cross-run Pareto comparisons. Enforce the invariant in every path: - ingest mapper: derive agentic mean/median/p75/p90/p95/p99 *_intvty from *_itl, discarding the artifact value (self-correcting ingest). - frontend agenticAliases: always derive intvty = 1/itl (override, not fill-if-missing) so overlay / ?unofficialrun= rows match. - backfill-agentic-intvty script: one-time fix for stored rows (already run against the DB: 164 rows / 656 values rewritten, 0 contaminated after). - ingest agent doc: note the invariant + the backfill escape hatch. std_intvty is intentionally left alone (reciprocal of a std is meaningless; the API strips it). Unit tests added on both the mapper and the transform. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

cquil11 and others added 12 commits April 23, 2026 13:40

fix: render offload halo on every offload-on point, not just frontier

07ba106

The halo's purpose is to surface KV-offload usage; restricting it to Pareto-frontier-only points hid the indicator on most runs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Merge remote-tracking branch 'origin/master' into feat/agentx

52d35ba

# Conflicts: # packages/app/src/components/GlobalFilterContext.tsx # packages/app/src/components/inference/utils/tooltipUtils.ts # packages/db/src/etl/normalizers.ts

Merge remote-tracking branch 'origin/master' into feat/agentx

cb4e87c

# Conflicts: # packages/app/src/components/inference/ui/ChartControls.tsx # packages/app/src/components/inference/utils/tooltipUtils.ts # packages/db/src/etl/normalizers.ts

vercel Bot deployed to Preview May 14, 2026 15:36 View deployment

vercel Bot deployed to Preview May 15, 2026 17:26 View deployment

fix(agentic): default percentile to p99 and drop median option

50a06d1

vercel Bot deployed to Preview May 15, 2026 17:28 View deployment

cquil11 added 2 commits May 15, 2026 12:30

Merge remote-tracking branch 'origin/master' into feat/agentx

25305dc

# Conflicts: # packages/app/src/components/GlobalFilterContext.tsx # packages/app/src/components/inference/InferenceContext.tsx # packages/app/src/components/inference/hooks/useChartData.ts

fix(agentic): keep only p90 as the percentile option

3c96e91

vercel Bot deployed to Preview May 15, 2026 17:31 View deployment

vercel Bot deployed to Preview May 15, 2026 17:32 View deployment

fix(agentic): default percentile to p90, surface only p90/p99

642081a

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

vercel Bot deployed to Preview May 15, 2026 17:33 View deployment

vercel Bot deployed to Preview May 15, 2026 17:39 View deployment

vercel Bot deployed to Preview May 15, 2026 17:42 View deployment

fix(agentic): default e2e chart x-axis to p90 TTFT

49f2b27

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cquil11 and others added 8 commits June 22, 2026 16:16

docs(ingest): note the separate agentic-dataset ingest script

0c50139

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

vercel Bot deployed to Preview June 23, 2026 03:35 View deployment

adibarra added 6 commits June 23, 2026 01:00

merge origin/master into feat/agentx; resolve quick-filter/category-s…

605bff7

…elector/lockfile conflicts

chore(security): bump dompurify override to >=3.4.11 (GHSA-cmwh-pvxp-…

a912eab

…8882)

test(e2e): align selector testid with scenario-selector rename; rewri…

ba6bc1c

…te x-axis toggle test for single-chart mode buttons

test(datasets): component tests for distribution card, trace flamegra…

ada19b5

…ph (incl deep-link), and dataset list states

refactor(datasets): extract shared compact() formatter, dedupe 5 loca…

1c61ee3

…l copies

refactor(db): squash agentic migrations into 007_agentic.sql so numbe…

e2e5424

…ring doesn't collide with master

vercel Bot deployed to Preview June 23, 2026 15:28 View deployment

add agentic time-series and dataset timing

772dfef

vercel Bot deployed to Preview June 23, 2026 16:01 View deployment

add dataset percentile distributions

13471d7

vercel Bot deployed to Preview June 23, 2026 18:46 View deployment

cquil11 and others added 5 commits June 23, 2026 16:10

use cumulative percentiles for agentic charts

8bfe664

fix(db): build each chart line from a single run, no cross-run/date s…

e3e0bf4

…titching (#491)

Default agentic charts to interactivity

2c3bb6d

github-actions Bot and others added 3 commits June 26, 2026 04:13

fix(db): add endS to TurnNode so flamegraph timing typechecks

95d7f01

Co-authored-by: Alec Ibarra <adibarra@users.noreply.github.com> Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Merge remote-tracking branch 'origin/master' into feat/agentx

5a40444

# Conflicts: # packages/db/src/json-provider.line-single-run.test.ts # packages/db/src/json-provider.ts # packages/db/src/queries/benchmarks.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] agentx#348

[WIP] agentx#348
cquil11 wants to merge 108 commits into
masterfrom
feat/agentx

cquil11 commented May 14, 2026 •

edited

Loading

Uh oh!

vercel Bot commented May 14, 2026 •

edited

Loading

Uh oh!

adibarra commented Jun 26, 2026

Uh oh!

github-actions Bot commented Jun 26, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

cquil11 commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Data updates

Validation

Uh oh!

vercel Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adibarra commented Jun 26, 2026

Uh oh!

github-actions Bot commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Task: Fix build + merge master ✅

Build fix

Merge

Validation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cquil11 commented May 14, 2026 •

edited

Loading

vercel Bot commented May 14, 2026 •

edited

Loading

github-actions Bot commented Jun 26, 2026 •

edited

Loading