feat(pricing): price the direct providers from their official pricing pages#59
Conversation
… pages
PR B of the pricing roadmap. openai/anthropic/google had NO price source, so
every (provider, family) sat at the field default price = +inf
(core/llm_policy/fields.lua): the direct candidate never passed a cost ceiling
and ranked last on cost, even though the first-party API is usually cheaper than
the same model via openrouter. This gives each direct provider a price source.
No provider exposes a pricing API — all three publish prices on documentation
pages — so the source scrapes the OFFICIAL page each provider serves and parses
it. The formats differ, so each has a tailored, tested parser:
- Anthropic — the Markdown twin (pricing.md): a real pipe table. (in / cache
read / out)
- OpenAI — the .md is MDX; pricing lives in a <TextTokenPricingTables> JSX
component as [name, input, cached, output] array rows.
- Google — the devsite HTML, where each model is an <h2 id="gemini-…"> (the id
is the catalog family verbatim) with <td>Input price</td><td>$…</td> rows; the
cache-read and per-hour storage prices share one <br>-split cell.
A model label maps to a catalog family by a dot/dash/space-insensitive slug, so
no hand-maintained alias map drifts ("Claude Opus 4.8" ↔ claude-opus-4-8).
Durable + fail-safe, strictly off the request path:
- new host_store table `provider_prices(provider_id, model_family, price_in,
price_out, price_cached_in, updated_at)` — the source of truth, in Postgres.
- the source upserts the table and returns in/out to the core; on a failed
scrape or a restyled page past the parser it COASTS on the table (which also
warm-starts a fresh process), so coverage degrades, a price never snaps to
+inf. poll ~hourly.
- cache-READ price is captured into the table (verified against all three live
pages) for the effective-cost work in PRs C/D; the core ranks on in/out only.
Builds on the modular registry (#57): each direct provider just declares a
`source`. Suite 428/0; parsers verified against the live pages (anthropic 15 /
openai 45 / gemini 19 families, in+out+cache).
|
Warning Review limit reached
Next review available in: 42 minutes Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available. How can I continue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews. How do review limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window. Please refer docs for additional details. Review details⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughAdds ChangesOfficial Provider Pricing
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@sources/official_pricing.py`:
- Around line 319-328: In the pricing fetch flow inside the resolver path that
builds `rows` and calls `host_store.set_provider_prices`, stop returning the
parsed subset immediately when only part of the scrape succeeds. Instead, merge
the fresh `rows` with any last-known entries already stored for the same
provider/model families so missing families continue to coast on stored prices,
and make sure the code in `official_pricing.py` handles the return value of
`set_provider_prices()` so a failed persistence attempt still falls back to the
stored data rather than silently dropping coverage.
In `@tests/test_official_pricing.py`:
- Around line 152-159: The fallback test is too permissive because the stubbed
host_store.get_provider_prices ignores the requested provider, so
OfficialPriceSource.pricing() could accidentally read all cached prices and
still pass. Tighten the test by making the fake assert that it is called with
provider_id="openai" (or equivalent pid) before returning the cached rows, so
the provider-scoped lookup is enforced in pricing() and _src/_Client remain
covered.
- Around line 162-167: The test in test_pricing_ignores_uncataloged_page_rows
currently only exercises a cataloged Google model, so it does not verify
filtering of unknown rows. Update the fixture setup around _src("google",
client=_Client(GEMINI_HTML)) so the parsed page includes at least one family not
present in CATALOG, or temporarily narrow the catalog for this case, and keep
the assertion focused on only the cataloged family surviving pricing().
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 9dd8fd64-8498-40ac-8a78-2f9b5460d7f4
📒 Files selected for processing (5)
host_store.pyproviders.pysources/official_pricing.pytests/test_official_pricing.pytests/test_sources.py
| text = await self._fetch_text(self._url) | ||
| records = _PARSERS[self._fmt](text) | ||
| rows = self._resolve(records) | ||
| if rows: | ||
| host_store.set_provider_prices([ | ||
| {"provider_id": self.provider_id, "model_family": fam, | ||
| "price_in": r["price_in"], "price_out": r["price_out"], | ||
| "price_cached_in": r.get("price_cached_in")} | ||
| for fam, r in rows.items()]) | ||
| return self._prices(rows.items()) |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
Coast on stored prices when a scrape is only partially successful.
Lines 319-328 return the parsed subset as soon as rows is non-empty. If a page drift leaves only some families parseable, the rest disappear from the live price set and fall back to missing/+inf even though host_store.provider_prices may still have last-known rows. This branch also ignores set_provider_prices() returning False, so the durable fallback guarantee can silently stop working.
Suggested fix
async def pricing(self) -> list[Price]:
try:
text = await self._fetch_text(self._url)
records = _PARSERS[self._fmt](text)
rows = self._resolve(records)
if rows:
- host_store.set_provider_prices([
+ persisted = host_store.set_provider_prices([
{"provider_id": self.provider_id, "model_family": fam,
"price_in": r["price_in"], "price_out": r["price_out"],
"price_cached_in": r.get("price_cached_in")}
for fam, r in rows.items()])
- return self._prices(rows.items())
+ if not persisted:
+ _log.warning("%s failed to persist refreshed prices; coasting may be stale", self.name)
+ stored = {
+ row["model_family"]: row
+ for row in host_store.get_provider_prices(self.provider_id)
+ }
+ stored.update(rows)
+ return self._prices(stored.items())
_log.warning("%s parsed no catalog-served prices; coasting", self.name)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| text = await self._fetch_text(self._url) | |
| records = _PARSERS[self._fmt](text) | |
| rows = self._resolve(records) | |
| if rows: | |
| host_store.set_provider_prices([ | |
| {"provider_id": self.provider_id, "model_family": fam, | |
| "price_in": r["price_in"], "price_out": r["price_out"], | |
| "price_cached_in": r.get("price_cached_in")} | |
| for fam, r in rows.items()]) | |
| return self._prices(rows.items()) | |
| text = await self._fetch_text(self._url) | |
| records = _PARSERS[self._fmt](text) | |
| rows = self._resolve(records) | |
| if rows: | |
| persisted = host_store.set_provider_prices([ | |
| {"provider_id": self.provider_id, "model_family": fam, | |
| "price_in": r["price_in"], "price_out": r["price_out"], | |
| "price_cached_in": r.get("price_cached_in")} | |
| for fam, r in rows.items()]) | |
| if not persisted: | |
| _log.warning("%s failed to persist refreshed prices; coasting may be stale", self.name) | |
| stored = { | |
| row["model_family"]: row | |
| for row in host_store.get_provider_prices(self.provider_id) | |
| } | |
| stored.update(rows) | |
| return self._prices(stored.items()) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@sources/official_pricing.py` around lines 319 - 328, In the pricing fetch
flow inside the resolver path that builds `rows` and calls
`host_store.set_provider_prices`, stop returning the parsed subset immediately
when only part of the scrape succeeds. Instead, merge the fresh `rows` with any
last-known entries already stored for the same provider/model families so
missing families continue to coast on stored prices, and make sure the code in
`official_pricing.py` handles the return value of `set_provider_prices()` so a
failed persistence attempt still falls back to the stored data rather than
silently dropping coverage.
| def test_pricing_coasts_on_host_store_when_scrape_fails(monkeypatch): | ||
| monkeypatch.setattr(host_store, "get_provider_prices", lambda pid=None: [ | ||
| {"provider_id": "openai", "model_family": "gpt-5.5", | ||
| "price_in": 4.2, "price_out": 28.0, "price_cached_in": 0.4}]) | ||
| src = _src("openai", client=_Client(boom=True)) # network down | ||
| prices = asyncio.run(src.pricing()) # must NOT raise | ||
| assert [(p["model_family"], p["price_in_usd_per_mtok"]) for p in prices] \ | ||
| == [("gpt-5.5", 4.2)] |
There was a problem hiding this comment.
🗄️ Data Integrity & Integration | 🟡 Minor | ⚡ Quick win
Assert the provider-scoped fallback read.
This stub ignores pid, so the test still passes if OfficialPriceSource.pricing() accidentally reads the whole cached table instead of only provider_id="openai". Make the fake assert the requested provider to lock down that boundary.
Proposed test tightening
def test_pricing_coasts_on_host_store_when_scrape_fails(monkeypatch):
- monkeypatch.setattr(host_store, "get_provider_prices", lambda pid=None: [
- {"provider_id": "openai", "model_family": "gpt-5.5",
- "price_in": 4.2, "price_out": 28.0, "price_cached_in": 0.4}])
+ def fake_get_provider_prices(pid=None):
+ assert pid == "openai"
+ return [{"provider_id": "openai", "model_family": "gpt-5.5",
+ "price_in": 4.2, "price_out": 28.0, "price_cached_in": 0.4}]
+
+ monkeypatch.setattr(host_store, "get_provider_prices", fake_get_provider_prices)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def test_pricing_coasts_on_host_store_when_scrape_fails(monkeypatch): | |
| monkeypatch.setattr(host_store, "get_provider_prices", lambda pid=None: [ | |
| {"provider_id": "openai", "model_family": "gpt-5.5", | |
| "price_in": 4.2, "price_out": 28.0, "price_cached_in": 0.4}]) | |
| src = _src("openai", client=_Client(boom=True)) # network down | |
| prices = asyncio.run(src.pricing()) # must NOT raise | |
| assert [(p["model_family"], p["price_in_usd_per_mtok"]) for p in prices] \ | |
| == [("gpt-5.5", 4.2)] | |
| def test_pricing_coasts_on_host_store_when_scrape_fails(monkeypatch): | |
| def fake_get_provider_prices(pid=None): | |
| assert pid == "openai" | |
| return [{"provider_id": "openai", "model_family": "gpt-5.5", | |
| "price_in": 4.2, "price_out": 28.0, "price_cached_in": 0.4}] | |
| monkeypatch.setattr(host_store, "get_provider_prices", fake_get_provider_prices) | |
| src = _src("openai", client=_Client(boom=True)) # network down | |
| prices = asyncio.run(src.pricing()) # must NOT raise | |
| assert [(p["model_family"], p["price_in_usd_per_mtok"]) for p in prices] \ | |
| == [("gpt-5.5", 4.2)] |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/test_official_pricing.py` around lines 152 - 159, The fallback test is
too permissive because the stubbed host_store.get_provider_prices ignores the
requested provider, so OfficialPriceSource.pricing() could accidentally read all
cached prices and still pass. Tighten the test by making the fake assert that it
is called with provider_id="openai" (or equivalent pid) before returning the
cached rows, so the provider-scoped lookup is enforced in pricing() and
_src/_Client remain covered.
| def test_pricing_ignores_uncataloged_page_rows(monkeypatch): | ||
| monkeypatch.setattr(host_store, "set_provider_prices", lambda rows: True) | ||
| # google catalog serves only gemini-3.1-pro-preview; the page row resolves to it | ||
| src = _src("google", client=_Client(GEMINI_HTML)) | ||
| prices = asyncio.run(src.pricing()) | ||
| assert {p["model_family"] for p in prices} == {"gemini-3.1-pro-preview"} |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
This test never creates an uncataloged row.
GEMINI_HTML only contains gemini-3.1-pro-preview, which is already present in CATALOG, so this cannot prove that unknown page rows are filtered out. Add a second parsed family that's absent from the catalog, or shrink the catalog for this case.
Proposed test tightening
def test_pricing_ignores_uncataloged_page_rows(monkeypatch):
monkeypatch.setattr(host_store, "set_provider_prices", lambda rows: True)
- # google catalog serves only gemini-3.1-pro-preview; the page row resolves to it
- src = _src("google", client=_Client(GEMINI_HTML))
+ html = GEMINI_HTML + """
+<h2 id="gemini-unknown" data-text="Gemini Unknown">Gemini Unknown</h2>
+<h3 id="standard">Standard</h3>
+<table>
+<tr><td>Input price</td><td>Not available</td><td>$0.10</td></tr>
+<tr><td>Output price (including thinking tokens)</td><td>Not available</td><td>$0.20</td></tr>
+</table>
+"""
+ src = _src("google", client=_Client(html))
prices = asyncio.run(src.pricing())
assert {p["model_family"] for p in prices} == {"gemini-3.1-pro-preview"}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def test_pricing_ignores_uncataloged_page_rows(monkeypatch): | |
| monkeypatch.setattr(host_store, "set_provider_prices", lambda rows: True) | |
| # google catalog serves only gemini-3.1-pro-preview; the page row resolves to it | |
| src = _src("google", client=_Client(GEMINI_HTML)) | |
| prices = asyncio.run(src.pricing()) | |
| assert {p["model_family"] for p in prices} == {"gemini-3.1-pro-preview"} | |
| def test_pricing_ignores_uncataloged_page_rows(monkeypatch): | |
| monkeypatch.setattr(host_store, "set_provider_prices", lambda rows: True) | |
| html = GEMINI_HTML + """ | |
| <h2 id="gemini-unknown" data-text="Gemini Unknown">Gemini Unknown</h2> | |
| <h3 id="standard">Standard</h3> | |
| <table> | |
| <tr><td>Input price</td><td>Not available</td><td>$0.10</td></tr> | |
| <tr><td>Output price (including thinking tokens)</td><td>Not available</td><td>$0.20</td></tr> | |
| </table> | |
| """ | |
| src = _src("google", client=_Client(html)) | |
| prices = asyncio.run(src.pricing()) | |
| assert {p["model_family"] for p in prices} == {"gemini-3.1-pro-preview"} |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/test_official_pricing.py` around lines 162 - 167, The test in
test_pricing_ignores_uncataloged_page_rows currently only exercises a cataloged
Google model, so it does not verify filtering of unknown rows. Update the
fixture setup around _src("google", client=_Client(GEMINI_HTML)) so the parsed
page includes at least one family not present in CATALOG, or temporarily narrow
the catalog for this case, and keep the assertion focused on only the cataloged
family surviving pricing().
…, never commit (Axis 7)
A per-Mtok price is invariant-bearing: it governs both routing and billing, and
because pricing() COASTS on the table, a bad parse that lands there becomes a
sticky wrong price the whole fleet routes/bills on — persisting even after the
page is fixed. The previous write only checked `not None`.
Validate before the upsert (in _resolve) and symmetrically on read (in _prices,
so a poisoned coasted row can't re-stamp either):
- in/out must be a positive number <= $1000/Mtok. For these first-party PAID
providers a $0 is a misparse, not a free offer (unlike the marketplace's
legitimate $0); the ceiling matches config.live.lua's market_price_cap and
clears real prices (o1-pro is $600/Mtok out) while a misread number (a context
window ~272k, a 1e6 storage figure) sits far above it. A bad row DROPS →
coverage degrades, coasting on the last good value.
- cache-read must be plausible AND strictly cheaper than the base input (a cache
read is always a fraction of input; cached >= input means the parser grabbed
output/storage/the wrong cell). Implausible cache is nulled, not dropped — the
in/out row is still good.
Adversarial tests: a $0 row and an over-ceiling row drop and are not committed to
the table; a poisoned stored row is not re-served on coast; a cache >= input is
nulled. Verified against the live pages (0 cached>=input violations across all
three). Suite 432/0.
|
Good catch — agreed, the price field is invariant-bearing and the coast makes a bad parse sticky. Guarded in
Adversarial tests added (garbage → dropped, table untouched; poisoned stored row not re-served; cache≥input nulled), and verified 0 |
…billing (Axis 7) Review caught it (correctly): the multiplier scaled chosen.price_in/out, and shim._executed_cost_usd step 3 computes billed cost_usd from exactly those fields whenever the provider reports no cost of its own — i.e. the direct openai/anthropic/google providers (#59), bedrock, and codex-scarcity. So the "ranking-only" claim was false: the lever leaked into cost_usd, x_router, the session meter and stats, a "risk premium > 1" over-reported real spend, and the effect was inconsistent (reported-cost providers ignored it, computed ones did not). Make it genuinely ranking-only: the multiplier is a FICTITIOUS routing lever. - push_prices still scales the price RANKING sees. - _executed_cost_usd divides the same multiplier back out before computing cost, so billing settles at the raw list price (or the provider-reported cost in step 2). cost_usd is now invariant to the lever, uniformly across providers — pinned by a new test (0.5 multiplier, cost still bills the list price, not half). - Knob reframed to "ranking price multiplier" with min 0.1 (no zero-price / divide-by-zero footgun); help states it does not change billing. The divide-back is exact because every multiplier-bearing provider is priced via push_prices (no catalog-static price to mis-scale). Suite 435/0.
* feat(pricing): per-provider effective-price multiplier knob (C) PR C of the pricing roadmap. On top of the base list prices (B), an operator often pays an EFFECTIVE price that differs from list — a negotiated discount, prepaid credits, or a risk premium for a less-trusted provider. This adds a deterministic per-provider multiplier so ranking reflects the effective cost. - `<provider>.price_multiplier` (float, default 1.0) is auto-added to the knob schema for every provider that contributes a price (has a source), so it shows in the Config tab and persists via the host store like every other knob. No dead knob for a sourceless provider (act over potency). - Applied centrally in `sources.push_prices`: price_in/out are scaled by the multiplier on the way into the core's ranking metrics. The raw list price stays untouched at the source and in the provider_prices table — the multiplier is "effective vs list", deterministic, and lives only in what ranking sees (so the measured correction in PR D can still compare against the raw list price). Marketplace/offer prices (antseed, openrouter_market) ride the live market and are intentionally not scaled. Suite 423/0. * fix(pricing): make the price multiplier ranking-only — never distort billing (Axis 7) Review caught it (correctly): the multiplier scaled chosen.price_in/out, and shim._executed_cost_usd step 3 computes billed cost_usd from exactly those fields whenever the provider reports no cost of its own — i.e. the direct openai/anthropic/google providers (#59), bedrock, and codex-scarcity. So the "ranking-only" claim was false: the lever leaked into cost_usd, x_router, the session meter and stats, a "risk premium > 1" over-reported real spend, and the effect was inconsistent (reported-cost providers ignored it, computed ones did not). Make it genuinely ranking-only: the multiplier is a FICTITIOUS routing lever. - push_prices still scales the price RANKING sees. - _executed_cost_usd divides the same multiplier back out before computing cost, so billing settles at the raw list price (or the provider-reported cost in step 2). cost_usd is now invariant to the lever, uniformly across providers — pinned by a new test (0.5 multiplier, cost still bills the list price, not half). - Knob reframed to "ranking price multiplier" with min 0.1 (no zero-price / divide-by-zero footgun); help states it does not change billing. The divide-back is exact because every multiplier-bearing provider is priced via push_prices (no catalog-static price to mis-scale). Suite 435/0.
What & why
PR B of the pricing roadmap.
openai,anthropic,googlehad no price source, so every(provider, family)defaulted toprice = +inf(core/llm_policy/fields.lua:33). The direct candidate therefore never passed a cost ceiling and ranked last on cost — even though the first-party API is usually cheaper than the same model routed via openrouter. This gives each direct provider a real price.Where the prices come from — official pages, no API
None of the three exposes a pricing API; all publish prices on documentation pages. The source scrapes the official page each one serves and parses it. The formats differ, so each parser is tailored and unit-tested:
platform.claude.com/docs/en/about-claude/pricing.mddevelopers.openai.com/api/docs/pricing.md<TextTokenPricingTables>JSX component:["gpt-5.5 (<272K…)", 5, 0.5, 30]=[name, input, cached, output]ai.google.dev/gemini-api/docs/pricing<h2 id="gemini-…">(the id is the catalog family verbatim) with<td>Input price</td><td>$…</td>rows; cache-read + per-hour storage share one<br>-split cellA page's model label maps to a catalog family by a dot/dash/space-insensitive slug (
"Claude Opus 4.8"↔claude-opus-4-8,"Gemini 3.1 Pro Preview"↔gemini-3.1-pro-preview), so there's no hand-maintained alias map to drift. Unmatched rows are dropped (push_pricesalso filters to catalog-served pairs).Durable + fail-safe, off the request path
provider_prices(provider_id, model_family, price_in, price_out, price_cached_in, updated_at)— the source of truth (idempotent DDL, added totruncate_all_for_tests).pricing()coasts on the table (which also warm-starts a fresh process). Coverage degrades; a price never snaps back to+inf.Cached prices (verified live)
Captured the cache-read column from all three live pages:
Notes / simplifications (no silent truncation)
Design fit
Builds directly on the modular registry (#57): each direct provider just declares a
source. No request-path code.Verification
Summary by CodeRabbit