Skip to content

feat(pricing): price the direct providers from their official pricing pages#59

Merged
jmlago merged 2 commits into
mainfrom
direct-provider-pricing
Jun 30, 2026
Merged

feat(pricing): price the direct providers from their official pricing pages#59
jmlago merged 2 commits into
mainfrom
direct-provider-pricing

Conversation

@jmlago

@jmlago jmlago commented Jun 30, 2026

Copy link
Copy Markdown
Member

What & why

PR B of the pricing roadmap. openai, anthropic, google had no price source, so every (provider, family) defaulted to price = +inf (core/llm_policy/fields.lua:33). The direct candidate therefore never passed a cost ceiling and ranked last on cost — even though the first-party API is usually cheaper than the same model routed via openrouter. This gives each direct provider a real price.

Where the prices come from — official pages, no API

None of the three exposes a pricing API; all publish prices on documentation pages. The source scrapes the official page each one serves and parses it. The formats differ, so each parser is tailored and unit-tested:

Provider Source Format
Anthropic platform.claude.com/docs/en/about-claude/pricing.md Markdown pipe table (input / cache read / output)
OpenAI developers.openai.com/api/docs/pricing.md MDX — pricing is a <TextTokenPricingTables> JSX component: ["gpt-5.5 (<272K…)", 5, 0.5, 30] = [name, input, cached, output]
Google ai.google.dev/gemini-api/docs/pricing devsite HTML — each model is <h2 id="gemini-…"> (the id is the catalog family verbatim) with <td>Input price</td><td>$…</td> rows; cache-read + per-hour storage share one <br>-split cell

A page's model label maps to a catalog family by a dot/dash/space-insensitive slug ("Claude Opus 4.8"claude-opus-4-8, "Gemini 3.1 Pro Preview"gemini-3.1-pro-preview), so there's no hand-maintained alias map to drift. Unmatched rows are dropped (push_prices also filters to catalog-served pairs).

Durable + fail-safe, off the request path

official page ──(scrape, ~hourly)──► parser ──► host_store.provider_prices  (durable, Postgres)
                                                      │  coast on failure / restyle / cold start
                                                      ▼
                                               push in/out ──► core ranking
  • New host_store table provider_prices(provider_id, model_family, price_in, price_out, price_cached_in, updated_at) — the source of truth (idempotent DDL, added to truncate_all_for_tests).
  • The source upserts the table and returns in/out for the core. If a scrape fails or a page is restyled past the parser, pricing() coasts on the table (which also warm-starts a fresh process). Coverage degrades; a price never snaps back to +inf.
  • Cache-read price is captured into the table for the effective-cost work in PRs C/D. The core ranks on in/out only (no cached field there), so this is stored, not yet ranked.

Cached prices (verified live)

Captured the cache-read column from all three live pages:

ANTHROPIC (15)  claude-opus-4-8   5.0 / 25.0  cache 0.5
OPENAI    (45)  gpt-5.5           5.0 / 30.0  cache 0.5    gpt-5.4  2.5 / 15.0  cache 0.25
GEMINI    (19)  gemini-3.1-pro…   2.0 / 12.0  cache 0.2    gemini-2.5-pro  1.25 / 10.0  cache 0.125

Notes / simplifications (no silent truncation)

  • Gemini Pro tiers (≤200k / >200k) → the base (≤200k) tier is stored; the measured correction in PR D handles the long-context delta.
  • OpenAI/Anthropic batch/flex panes are parsed but lose to standard (first-wins).

Design fit

Builds directly on the modular registry (#57): each direct provider just declares a source. No request-path code.

Verification

  • Suite 428 passed / 0 failed (7 new parser + source tests, DB-free via fixtures mirroring the real layouts + monkeypatched store).
  • Parsers run against the live official pages: anthropic 15 / openai 45 / gemini 19 families, each with input + output + cache-read.

Summary by CodeRabbit

  • New Features
    • Added official provider pricing support for direct vendors, including current input, output, and cached-input rates.
    • Pricing data now updates automatically from providers’ published pages and falls back to the last saved values if a refresh fails.
  • Bug Fixes
    • Improved matching of pricing rows to the correct model families, helping avoid missing or mismatched rates.
    • Added support for storing and retrieving provider pricing consistently across app restarts.

… pages

PR B of the pricing roadmap. openai/anthropic/google had NO price source, so
every (provider, family) sat at the field default price = +inf
(core/llm_policy/fields.lua): the direct candidate never passed a cost ceiling
and ranked last on cost, even though the first-party API is usually cheaper than
the same model via openrouter. This gives each direct provider a price source.

No provider exposes a pricing API — all three publish prices on documentation
pages — so the source scrapes the OFFICIAL page each provider serves and parses
it. The formats differ, so each has a tailored, tested parser:
  - Anthropic — the Markdown twin (pricing.md): a real pipe table. (in / cache
    read / out)
  - OpenAI — the .md is MDX; pricing lives in a <TextTokenPricingTables> JSX
    component as [name, input, cached, output] array rows.
  - Google — the devsite HTML, where each model is an <h2 id="gemini-…"> (the id
    is the catalog family verbatim) with <td>Input price</td><td>$…</td> rows; the
    cache-read and per-hour storage prices share one <br>-split cell.
A model label maps to a catalog family by a dot/dash/space-insensitive slug, so
no hand-maintained alias map drifts ("Claude Opus 4.8" ↔ claude-opus-4-8).

Durable + fail-safe, strictly off the request path:
  - new host_store table `provider_prices(provider_id, model_family, price_in,
    price_out, price_cached_in, updated_at)` — the source of truth, in Postgres.
  - the source upserts the table and returns in/out to the core; on a failed
    scrape or a restyled page past the parser it COASTS on the table (which also
    warm-starts a fresh process), so coverage degrades, a price never snaps to
    +inf. poll ~hourly.
  - cache-READ price is captured into the table (verified against all three live
    pages) for the effective-cost work in PRs C/D; the core ranks on in/out only.

Builds on the modular registry (#57): each direct provider just declares a
`source`. Suite 428/0; parsers verified against the live pages (anthropic 15 /
openai 45 / gemini 19 families, in+out+cache).
@coderabbitai

coderabbitai Bot commented Jun 30, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@jmlago, you've reached your PR review limit, so we couldn't start this review.

Next review available in: 42 minutes

Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available.
You're only billed for reviews past your plan's rate limits ($0.25/file).

How can I continue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews.

How do review limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please refer docs for additional details.

Review details
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 502d605f-6230-48b6-93fb-f23ec062f9bd

📥 Commits

Reviewing files that changed from the base of the PR and between f421227 and a0f7006.

📒 Files selected for processing (2)
  • sources/official_pricing.py
  • tests/test_official_pricing.py
📝 Walkthrough

Walkthrough

Adds OfficialPriceSource, a polling class that scrapes first-party provider pricing pages (Anthropic Markdown, OpenAI MDX JSX, Google HTML), upserts results into a new provider_prices Postgres table via host_store, and falls back to cached rows on failure. The source is wired into the provider registry for OpenAI, Anthropic, and Google.

Changes

Official Provider Pricing

Layer / File(s) Summary
provider_prices schema and read/write APIs
host_store.py
Adds provider_prices table to schema initialization; implements get_provider_prices and set_provider_prices with fail-soft semantics; extends truncate_all_for_tests to include the new table.
Per-format pricing page parsers
sources/official_pricing.py
Implements _slug/_money helpers, a Markdown pipe-table parser, an OpenAI MDX JSX array parser, and a Gemini HTML state-machine parser (_GeminiPriceParser); registers them in _PARSERS.
OfficialPriceSource polling class
sources/official_pricing.py
Defines OfficialPriceSource with catalog slug resolution, lazy HTTP fetch, durable upsert to host_store, coast-on-failure fallback to stored pricing, and a no-op balances().
Provider registry wiring
providers.py
Adds _official_price_source/_present helpers; extends PROVIDERS to wire OpenAI, Anthropic, and Google with OfficialPriceSource and catalog-driven enabled gating.
Tests
tests/test_official_pricing.py, tests/test_sources.py
Adds fixture strings, pure parser unit tests, async integration tests for happy-path upsert/scrape-failure fallback/uncataloged-row filtering, and fixes the codex registry test provider key.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Poem

🐇 Hoppity-hop to the pricing page,
I scrape the docs at each provider's stage.
Markdown or HTML, JSX or not,
I upsert the prices and cache what I've got.
If the network goes down, I coast on the store—
No zeroed-out prices, just what came before! 🌟

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 27.91% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: adding pricing for direct providers from their official pages.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch direct-provider-pricing

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@sources/official_pricing.py`:
- Around line 319-328: In the pricing fetch flow inside the resolver path that
builds `rows` and calls `host_store.set_provider_prices`, stop returning the
parsed subset immediately when only part of the scrape succeeds. Instead, merge
the fresh `rows` with any last-known entries already stored for the same
provider/model families so missing families continue to coast on stored prices,
and make sure the code in `official_pricing.py` handles the return value of
`set_provider_prices()` so a failed persistence attempt still falls back to the
stored data rather than silently dropping coverage.

In `@tests/test_official_pricing.py`:
- Around line 152-159: The fallback test is too permissive because the stubbed
host_store.get_provider_prices ignores the requested provider, so
OfficialPriceSource.pricing() could accidentally read all cached prices and
still pass. Tighten the test by making the fake assert that it is called with
provider_id="openai" (or equivalent pid) before returning the cached rows, so
the provider-scoped lookup is enforced in pricing() and _src/_Client remain
covered.
- Around line 162-167: The test in test_pricing_ignores_uncataloged_page_rows
currently only exercises a cataloged Google model, so it does not verify
filtering of unknown rows. Update the fixture setup around _src("google",
client=_Client(GEMINI_HTML)) so the parsed page includes at least one family not
present in CATALOG, or temporarily narrow the catalog for this case, and keep
the assertion focused on only the cataloged family surviving pricing().
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9dd8fd64-8498-40ac-8a78-2f9b5460d7f4

📥 Commits

Reviewing files that changed from the base of the PR and between 7583285 and f421227.

📒 Files selected for processing (5)
  • host_store.py
  • providers.py
  • sources/official_pricing.py
  • tests/test_official_pricing.py
  • tests/test_sources.py

Comment on lines +319 to +328
text = await self._fetch_text(self._url)
records = _PARSERS[self._fmt](text)
rows = self._resolve(records)
if rows:
host_store.set_provider_prices([
{"provider_id": self.provider_id, "model_family": fam,
"price_in": r["price_in"], "price_out": r["price_out"],
"price_cached_in": r.get("price_cached_in")}
for fam, r in rows.items()])
return self._prices(rows.items())

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Coast on stored prices when a scrape is only partially successful.

Lines 319-328 return the parsed subset as soon as rows is non-empty. If a page drift leaves only some families parseable, the rest disappear from the live price set and fall back to missing/+inf even though host_store.provider_prices may still have last-known rows. This branch also ignores set_provider_prices() returning False, so the durable fallback guarantee can silently stop working.

Suggested fix
     async def pricing(self) -> list[Price]:
         try:
             text = await self._fetch_text(self._url)
             records = _PARSERS[self._fmt](text)
             rows = self._resolve(records)
             if rows:
-                host_store.set_provider_prices([
+                persisted = host_store.set_provider_prices([
                     {"provider_id": self.provider_id, "model_family": fam,
                      "price_in": r["price_in"], "price_out": r["price_out"],
                      "price_cached_in": r.get("price_cached_in")}
                     for fam, r in rows.items()])
-                return self._prices(rows.items())
+                if not persisted:
+                    _log.warning("%s failed to persist refreshed prices; coasting may be stale", self.name)
+                stored = {
+                    row["model_family"]: row
+                    for row in host_store.get_provider_prices(self.provider_id)
+                }
+                stored.update(rows)
+                return self._prices(stored.items())
             _log.warning("%s parsed no catalog-served prices; coasting", self.name)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
text = await self._fetch_text(self._url)
records = _PARSERS[self._fmt](text)
rows = self._resolve(records)
if rows:
host_store.set_provider_prices([
{"provider_id": self.provider_id, "model_family": fam,
"price_in": r["price_in"], "price_out": r["price_out"],
"price_cached_in": r.get("price_cached_in")}
for fam, r in rows.items()])
return self._prices(rows.items())
text = await self._fetch_text(self._url)
records = _PARSERS[self._fmt](text)
rows = self._resolve(records)
if rows:
persisted = host_store.set_provider_prices([
{"provider_id": self.provider_id, "model_family": fam,
"price_in": r["price_in"], "price_out": r["price_out"],
"price_cached_in": r.get("price_cached_in")}
for fam, r in rows.items()])
if not persisted:
_log.warning("%s failed to persist refreshed prices; coasting may be stale", self.name)
stored = {
row["model_family"]: row
for row in host_store.get_provider_prices(self.provider_id)
}
stored.update(rows)
return self._prices(stored.items())
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@sources/official_pricing.py` around lines 319 - 328, In the pricing fetch
flow inside the resolver path that builds `rows` and calls
`host_store.set_provider_prices`, stop returning the parsed subset immediately
when only part of the scrape succeeds. Instead, merge the fresh `rows` with any
last-known entries already stored for the same provider/model families so
missing families continue to coast on stored prices, and make sure the code in
`official_pricing.py` handles the return value of `set_provider_prices()` so a
failed persistence attempt still falls back to the stored data rather than
silently dropping coverage.

Comment on lines +152 to +159
def test_pricing_coasts_on_host_store_when_scrape_fails(monkeypatch):
monkeypatch.setattr(host_store, "get_provider_prices", lambda pid=None: [
{"provider_id": "openai", "model_family": "gpt-5.5",
"price_in": 4.2, "price_out": 28.0, "price_cached_in": 0.4}])
src = _src("openai", client=_Client(boom=True)) # network down
prices = asyncio.run(src.pricing()) # must NOT raise
assert [(p["model_family"], p["price_in_usd_per_mtok"]) for p in prices] \
== [("gpt-5.5", 4.2)]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗄️ Data Integrity & Integration | 🟡 Minor | ⚡ Quick win

Assert the provider-scoped fallback read.

This stub ignores pid, so the test still passes if OfficialPriceSource.pricing() accidentally reads the whole cached table instead of only provider_id="openai". Make the fake assert the requested provider to lock down that boundary.

Proposed test tightening
 def test_pricing_coasts_on_host_store_when_scrape_fails(monkeypatch):
-    monkeypatch.setattr(host_store, "get_provider_prices", lambda pid=None: [
-        {"provider_id": "openai", "model_family": "gpt-5.5",
-         "price_in": 4.2, "price_out": 28.0, "price_cached_in": 0.4}])
+    def fake_get_provider_prices(pid=None):
+        assert pid == "openai"
+        return [{"provider_id": "openai", "model_family": "gpt-5.5",
+                 "price_in": 4.2, "price_out": 28.0, "price_cached_in": 0.4}]
+
+    monkeypatch.setattr(host_store, "get_provider_prices", fake_get_provider_prices)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def test_pricing_coasts_on_host_store_when_scrape_fails(monkeypatch):
monkeypatch.setattr(host_store, "get_provider_prices", lambda pid=None: [
{"provider_id": "openai", "model_family": "gpt-5.5",
"price_in": 4.2, "price_out": 28.0, "price_cached_in": 0.4}])
src = _src("openai", client=_Client(boom=True)) # network down
prices = asyncio.run(src.pricing()) # must NOT raise
assert [(p["model_family"], p["price_in_usd_per_mtok"]) for p in prices] \
== [("gpt-5.5", 4.2)]
def test_pricing_coasts_on_host_store_when_scrape_fails(monkeypatch):
def fake_get_provider_prices(pid=None):
assert pid == "openai"
return [{"provider_id": "openai", "model_family": "gpt-5.5",
"price_in": 4.2, "price_out": 28.0, "price_cached_in": 0.4}]
monkeypatch.setattr(host_store, "get_provider_prices", fake_get_provider_prices)
src = _src("openai", client=_Client(boom=True)) # network down
prices = asyncio.run(src.pricing()) # must NOT raise
assert [(p["model_family"], p["price_in_usd_per_mtok"]) for p in prices] \
== [("gpt-5.5", 4.2)]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_official_pricing.py` around lines 152 - 159, The fallback test is
too permissive because the stubbed host_store.get_provider_prices ignores the
requested provider, so OfficialPriceSource.pricing() could accidentally read all
cached prices and still pass. Tighten the test by making the fake assert that it
is called with provider_id="openai" (or equivalent pid) before returning the
cached rows, so the provider-scoped lookup is enforced in pricing() and
_src/_Client remain covered.

Comment on lines +162 to +167
def test_pricing_ignores_uncataloged_page_rows(monkeypatch):
monkeypatch.setattr(host_store, "set_provider_prices", lambda rows: True)
# google catalog serves only gemini-3.1-pro-preview; the page row resolves to it
src = _src("google", client=_Client(GEMINI_HTML))
prices = asyncio.run(src.pricing())
assert {p["model_family"] for p in prices} == {"gemini-3.1-pro-preview"}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

This test never creates an uncataloged row.

GEMINI_HTML only contains gemini-3.1-pro-preview, which is already present in CATALOG, so this cannot prove that unknown page rows are filtered out. Add a second parsed family that's absent from the catalog, or shrink the catalog for this case.

Proposed test tightening
 def test_pricing_ignores_uncataloged_page_rows(monkeypatch):
     monkeypatch.setattr(host_store, "set_provider_prices", lambda rows: True)
-    # google catalog serves only gemini-3.1-pro-preview; the page row resolves to it
-    src = _src("google", client=_Client(GEMINI_HTML))
+    html = GEMINI_HTML + """
+<h2 id="gemini-unknown" data-text="Gemini Unknown">Gemini Unknown</h2>
+<h3 id="standard">Standard</h3>
+<table>
+<tr><td>Input price</td><td>Not available</td><td>$0.10</td></tr>
+<tr><td>Output price (including thinking tokens)</td><td>Not available</td><td>$0.20</td></tr>
+</table>
+"""
+    src = _src("google", client=_Client(html))
     prices = asyncio.run(src.pricing())
     assert {p["model_family"] for p in prices} == {"gemini-3.1-pro-preview"}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def test_pricing_ignores_uncataloged_page_rows(monkeypatch):
monkeypatch.setattr(host_store, "set_provider_prices", lambda rows: True)
# google catalog serves only gemini-3.1-pro-preview; the page row resolves to it
src = _src("google", client=_Client(GEMINI_HTML))
prices = asyncio.run(src.pricing())
assert {p["model_family"] for p in prices} == {"gemini-3.1-pro-preview"}
def test_pricing_ignores_uncataloged_page_rows(monkeypatch):
monkeypatch.setattr(host_store, "set_provider_prices", lambda rows: True)
html = GEMINI_HTML + """
<h2 id="gemini-unknown" data-text="Gemini Unknown">Gemini Unknown</h2>
<h3 id="standard">Standard</h3>
<table>
<tr><td>Input price</td><td>Not available</td><td>$0.10</td></tr>
<tr><td>Output price (including thinking tokens)</td><td>Not available</td><td>$0.20</td></tr>
</table>
"""
src = _src("google", client=_Client(html))
prices = asyncio.run(src.pricing())
assert {p["model_family"] for p in prices} == {"gemini-3.1-pro-preview"}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_official_pricing.py` around lines 162 - 167, The test in
test_pricing_ignores_uncataloged_page_rows currently only exercises a cataloged
Google model, so it does not verify filtering of unknown rows. Update the
fixture setup around _src("google", client=_Client(GEMINI_HTML)) so the parsed
page includes at least one family not present in CATALOG, or temporarily narrow
the catalog for this case, and keep the assertion focused on only the cataloged
family surviving pricing().

…, never commit (Axis 7)

A per-Mtok price is invariant-bearing: it governs both routing and billing, and
because pricing() COASTS on the table, a bad parse that lands there becomes a
sticky wrong price the whole fleet routes/bills on — persisting even after the
page is fixed. The previous write only checked `not None`.

Validate before the upsert (in _resolve) and symmetrically on read (in _prices,
so a poisoned coasted row can't re-stamp either):
  - in/out must be a positive number <= $1000/Mtok. For these first-party PAID
    providers a $0 is a misparse, not a free offer (unlike the marketplace's
    legitimate $0); the ceiling matches config.live.lua's market_price_cap and
    clears real prices (o1-pro is $600/Mtok out) while a misread number (a context
    window ~272k, a 1e6 storage figure) sits far above it. A bad row DROPS →
    coverage degrades, coasting on the last good value.
  - cache-read must be plausible AND strictly cheaper than the base input (a cache
    read is always a fraction of input; cached >= input means the parser grabbed
    output/storage/the wrong cell). Implausible cache is nulled, not dropped — the
    in/out row is still good.

Adversarial tests: a $0 row and an over-ceiling row drop and are not committed to
the table; a poisoned stored row is not re-served on coast; a cache >= input is
nulled. Verified against the live pages (0 cached>=input violations across all
three). Suite 432/0.
@jmlago

jmlago commented Jun 30, 2026

Copy link
Copy Markdown
Member Author

Good catch — agreed, the price field is invariant-bearing and the coast makes a bad parse sticky. Guarded in a0f7006.

  • Drop before the upsert AND on read (_resolve + _prices), so a poisoned coasted row can't re-stamp either.
  • in/out: positive and <= $1000/Mtok. A $0 drops (misparse, not a free offer for a first-party paid provider). On the ceiling: I checked against the live pages and o1-pro is genuinely $600/Mtok output — so a tight bound would clip a real price. Set it to $1000 to match config.live.lua's market_price_cap; misreads (a context window ~272k, a 1e6 storage figure) sit far above it.
  • cache-read must be plausible AND strictly cheaper than base input (a cache read is always a fraction of input; cached >= input means the parser grabbed output/storage/the wrong cell) → nulled, keeping the good in/out row.

Adversarial tests added (garbage → dropped, table untouched; poisoned stored row not re-served; cache≥input nulled), and verified 0 cached>=input violations across all three live pages. Suite 432/0.

@jmlago jmlago merged commit 2aedcd5 into main Jun 30, 2026
1 check passed
jmlago added a commit that referenced this pull request Jun 30, 2026
…billing (Axis 7)

Review caught it (correctly): the multiplier scaled chosen.price_in/out, and
shim._executed_cost_usd step 3 computes billed cost_usd from exactly those fields
whenever the provider reports no cost of its own — i.e. the direct
openai/anthropic/google providers (#59), bedrock, and codex-scarcity. So the
"ranking-only" claim was false: the lever leaked into cost_usd, x_router, the
session meter and stats, a "risk premium > 1" over-reported real spend, and the
effect was inconsistent (reported-cost providers ignored it, computed ones did
not).

Make it genuinely ranking-only: the multiplier is a FICTITIOUS routing lever.
- push_prices still scales the price RANKING sees.
- _executed_cost_usd divides the same multiplier back out before computing cost,
  so billing settles at the raw list price (or the provider-reported cost in step
  2). cost_usd is now invariant to the lever, uniformly across providers — pinned
  by a new test (0.5 multiplier, cost still bills the list price, not half).
- Knob reframed to "ranking price multiplier" with min 0.1 (no zero-price /
  divide-by-zero footgun); help states it does not change billing.

The divide-back is exact because every multiplier-bearing provider is priced via
push_prices (no catalog-static price to mis-scale). Suite 435/0.
jmlago added a commit that referenced this pull request Jun 30, 2026
* feat(pricing): per-provider effective-price multiplier knob (C)

PR C of the pricing roadmap. On top of the base list prices (B), an operator
often pays an EFFECTIVE price that differs from list — a negotiated discount,
prepaid credits, or a risk premium for a less-trusted provider. This adds a
deterministic per-provider multiplier so ranking reflects the effective cost.

- `<provider>.price_multiplier` (float, default 1.0) is auto-added to the knob
  schema for every provider that contributes a price (has a source), so it shows
  in the Config tab and persists via the host store like every other knob. No
  dead knob for a sourceless provider (act over potency).
- Applied centrally in `sources.push_prices`: price_in/out are scaled by the
  multiplier on the way into the core's ranking metrics. The raw list price stays
  untouched at the source and in the provider_prices table — the multiplier is
  "effective vs list", deterministic, and lives only in what ranking sees (so the
  measured correction in PR D can still compare against the raw list price).
  Marketplace/offer prices (antseed, openrouter_market) ride the live market and
  are intentionally not scaled.

Suite 423/0.

* fix(pricing): make the price multiplier ranking-only — never distort billing (Axis 7)

Review caught it (correctly): the multiplier scaled chosen.price_in/out, and
shim._executed_cost_usd step 3 computes billed cost_usd from exactly those fields
whenever the provider reports no cost of its own — i.e. the direct
openai/anthropic/google providers (#59), bedrock, and codex-scarcity. So the
"ranking-only" claim was false: the lever leaked into cost_usd, x_router, the
session meter and stats, a "risk premium > 1" over-reported real spend, and the
effect was inconsistent (reported-cost providers ignored it, computed ones did
not).

Make it genuinely ranking-only: the multiplier is a FICTITIOUS routing lever.
- push_prices still scales the price RANKING sees.
- _executed_cost_usd divides the same multiplier back out before computing cost,
  so billing settles at the raw list price (or the provider-reported cost in step
  2). cost_usd is now invariant to the lever, uniformly across providers — pinned
  by a new test (0.5 multiplier, cost still bills the list price, not half).
- Knob reframed to "ranking price multiplier" with min 0.1 (no zero-price /
  divide-by-zero footgun); help states it does not change billing.

The divide-back is exact because every multiplier-bearing provider is priced via
push_prices (no catalog-static price to mis-scale). Suite 435/0.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant