feat(dashboard): cost-accuracy panel — measured spend vs list price (D, monitoring) by jmlago · Pull Request #61 · genlayerlabs/unhardcoded

jmlago · 2026-06-30T19:44:10Z

What

PR D, reframed as monitoring (not the original auto-correction). A read-only "Cost accuracy" card in the dashboard Overview: per provider, the measured effective $/Mtok (from the calls ledger) vs the advertised list price, with a drift flag.

Why monitoring, not auto-correction

Auto-correcting the ranking from measured cost would: (a) form a routing feedback loop (cheaper-measured → more traffic → changes the measured cost), (b) be circular for compute-from-price providers (their cost_usd is derived from list price), and (c) change routing opaquely from a derived number. So D observes and flags; the operator decides (adjust C's manual multiplier, investigate) — the same human-in-the-loop stance as C.

B = list price · C = manual ranking lever · D = does reality match? (a warning, not an auto-tune)

How

host_store.cost_by_route(window_s) — per (provider, family) ledger aggregate (calls, tokens in/out/cached, cost_usd), derived by query (feat(host-store): the route_* family derived on the fly (#4a+#4b+#4c) #41: store raw, derive at read; no in-process fold).
_cost_accuracy_rows (pure, unit-tested) — joins that with the live ranked price from /x/runtime, dividing the fictitious price_multiplier back out so the comparison is measured-spend vs advertised-list. Computes expected cost with the same cache-read discount as billing, rolls up per provider, flags drift > 15% with ≥ 20 calls, sorts worst-first.
GET /dashboard/api/cost-accuracy (admin) + a card in the Overview.

Honest about signal strength

Reported-cost providers (openrouter) → cost_usd is authoritative → drift reveals real discounts/surprises. High value.
Compute-from-price providers (direct openai/anthropic/google) → cost_usd derives from list → reads ~1.0 by construction. That's itself informative ("no independent signal here") and a sanity check that billing matches the scrape.

Never touches routing

Read-only. No ranking/admission impact.

Verification

_cost_accuracy_rows: +25% drift flags (≥20 calls), at-list reads no drift, the multiplier is divided out before comparison, unpriced routes are skipped, and a big drift with <20 calls does not warn.
JS render verified directly (balanced markup, drift badge, empty state).
Suite 434 passed / 0 failed.

Closes the B/C/D pricing arc: list price → manual lever → measured reality-check.

coderabbitai · 2026-06-30T19:44:18Z

Warning

Review limit reached

@jmlago, you've reached your PR review limit, so we couldn't start this review.

Next review available in: 17 minutes

Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available.
You're only billed for reviews past your plan's rate limits ($0.25/file).

How can I continue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews.

How do review limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please refer docs for additional details.

Review details

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ea67b5ab-a5b2-4d62-874f-256250786aad

📥 Commits

Reviewing files that changed from the base of the PR and between d4760e9 and 7e1cda8.

📒 Files selected for processing (5)

auth_proxy.py
host_store.py
shim.py
tests/test_auth_proxy_dashboard_full.py
tests/test_shim.py

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch cost-accuracy-overview

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

…D, monitoring) PR D, reframed from "auto-correct the ranking" to OBSERVABILITY, per the decision that an automatic measured-cost correction would (a) form a routing feedback loop, (b) be circular for compute-from-price providers, and (c) hide an opaque routing change. Instead D surfaces the deviation and lets the operator act — the same human-in-the-loop stance as C's manual lever. A read-only Overview card showing, per provider, the measured effective $/Mtok (from the calls ledger) vs the advertised list price, with a drift flag. - host_store.cost_by_route: per (provider, family) ledger aggregate over a window, derived by query (#41: store raw, derive at read — no in-process fold). - _cost_accuracy_rows (pure, unit-tested): joins that with the live ranked price from /x/runtime, dividing the fictitious price_multiplier back out so the comparison is measured-spend vs advertised-list. Rolls up per provider, flags drift > 15% with >= 20 calls, sorts worst-first. - GET /dashboard/api/cost-accuracy (admin) + a card in the Overview. Never touches routing. The signal is strongest for providers that report their own cost (openrouter — reveals real discounts/surprises); a compute-from-price provider reads ~1.0 by construction, which is itself a useful "no independent signal here". Suite 434/0; the join/deviation logic + the JS render verified directly.

…y (review) Review (Axis 8): the panel rendered tautological rows identically to real-signal ones. For a compute-from-price provider (the direct openai/anthropic/google), measured and expected both derive from the same list price → deviation ~1.0 by construction, and any drift is reprice noise (ledger cost_usd sealed at the price-of-then vs the current ema), not an effective discount. Flagging "drift" on those is non-actionable and trains the operator to ignore the badge — the opposite of what a monitoring panel wants. Record the cost basis as a raw fact and use it: - shim._cost_basis (single source of the cost tiering _executed_cost_usd already used): 'subscription' | 'reported' (provider's own usage.cost — INDEPENDENT signal) | 'computed' (derived from list price — tautological) | None. Stamped on x_router and threaded to the ledger; new calls.cost_basis column (ALTER ADD COLUMN IF NOT EXISTS). - cost_by_route aggregates n_reported; _cost_accuracy_rows labels each row reported|derived and only warns where the signal is real (reported). The panel shows the signal tag; derived drift renders muted, never badged. Review (Axis 1): import _CACHE_READ_FACTOR from shim instead of redefining 0.1 — the panel's expected cost must track the billing factor; a copy could silently diverge into false drift. Suite 439/0 (incl. _cost_basis tiers + a derived provider with big drift that does NOT warn).

jmlago · 2026-06-30T20:05:55Z

Both correct — addressed in the latest commit (rebased on C/#60).

Axis 8 (signal vs tautology). You're right: the panel mixed rows where measured cost is an independent signal (reported) with rows where it's tautological (computed from the same list price → ~1.0 by construction; any drift is reprice noise from sealed-then vs ema-now, not a discount), and could badge a non-actionable "drift" on a direct provider. Fixed by recording the cost basis as a raw fact and using it:

shim._cost_basis — now the single source of the cost tiering _executed_cost_usd already used: subscription | reported | computed | None. Stamped on x_router, threaded to the ledger via a new calls.cost_basis column (ALTER ADD COLUMN IF NOT EXISTS).
cost_by_route aggregates n_reported; _cost_accuracy_rows labels each row reported|derived and only warns where the signal is real (reported). The card shows the signal tag; a derived row's drift renders muted and is never badged.

Axis 1 (_CACHE_READ_FACTOR). Imported from shim now instead of redefining 0.1 — agreed it's not just DRY, the expected-cost calc must track the billing factor or it silently shows false drift.

New tests: _cost_basis tiers; a derived provider with a big apparent drift that does not warn; the reported provider still flags +25%. Suite 439/0.

Thanks — this turns the panel from "a number per provider" into "a number that means something only where it can."

This was referenced Jun 30, 2026

[codex] add effective provider pricing sources #32

Closed

feat(dynamic-pricing) F1: measured per-route economics (route_economics) #35

Closed

jmlago added 2 commits June 30, 2026 20:56

jmlago force-pushed the cost-accuracy-overview branch from 902e75b to 7e1cda8 Compare June 30, 2026 20:05

jmlago merged commit f92ba70 into main Jun 30, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(dashboard): cost-accuracy panel — measured spend vs list price (D, monitoring)#61

feat(dashboard): cost-accuracy panel — measured spend vs list price (D, monitoring)#61
jmlago merged 2 commits into
mainfrom
cost-accuracy-overview

jmlago commented Jun 30, 2026

Uh oh!

coderabbitai Bot commented Jun 30, 2026 •

edited

Loading

Review limit reached

Uh oh!

jmlago commented Jun 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

jmlago commented Jun 30, 2026

What

Why monitoring, not auto-correction

How

Honest about signal strength

Never touches routing

Verification

Uh oh!

coderabbitai Bot commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Uh oh!

jmlago commented Jun 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Jun 30, 2026 •

edited

Loading