feat(metering): flow cost+cache aggregation, per-session warm routes, consumer /v1/session by jmlago · Pull Request #22 · genlayerlabs/unhardcoded

jmlago · 2026-06-24T16:50:50Z

Follow-on to #18 (the work landed on cache-metrics after it was squash-merged, so it had no PR).

What

Flow cost/cache: ensemble (flow) calls were showing $0 / 0% cached downstream — the flow aggregated only tokens in/out and its synthetic flow chosen has no price. Now aggregates cost + cached across nodes.
Standard cache usage: emit usage.prompt_tokens_details.cached_tokens (not just x_router) so OpenAI-compatible clients (opencode) parse cache reads.
Streaming meter: fold the session meter on the streaming paths too (opencode always streams).
Header session: read X-Unhardcoded-Session as the conversation id (clients that can set headers but not body fields).
Warm routes: per-(session,family) warm map (family/provider/served_by — the real peer/backend) for display.
Consumer endpoint: GET /v1/session/{sid} (consumer-key authed) → {calls, tokens_in/out, tokens_cached, cost_usd, warm[]}, so a harness (opencode plugin) shows live economics without operator /x/* access.

Verified live

flow cost_usd 0.034, cached 2304; GET /v1/session/{sid} → warm [gpt-5.5/openrouter, z-ai/glm-5.2/openrouter_market].

Summary by CodeRabbit

New Features
- Added a new session lookup endpoint for retrieving session details.
- Streaming and non-streaming responses now include cached token usage details when available.
- Session-based usage tracking now covers streamed requests and displays warm-route information.
Bug Fixes
- Improved error handling for session requests with clearer authentication and upstream failure responses.
- Ensured session metrics are recorded consistently across all streaming paths.

…rm routes, consumer /v1/session Post-#18 follow-on (stranded on the merged cache-metrics branch). Lands: - route_session_meter: /x/session endpoints wiring + per-(session,family) warm map (observe_route/warm) for display. - shim: read session from X-Unhardcoded-Session header; meter the session on the STREAMING paths too; emit standard usage.prompt_tokens_details.cached_tokens so OpenAI-compatible clients (opencode) see cache; /x/session returns warm. - llm_router_host: aggregate cost + cached across flow nodes (the synthetic 'flow' chosen has no price); record warm route per node on success. - auth_proxy: consumer-facing GET /v1/session/{sid} (consumer-key authed) so a harness reads cost/tokens/cache/warm without operator /x/* access. Verified live: flow cost_usd 0.034 + cached 2304; GET /v1/session/{sid} returns warm [gpt-5.5/openrouter, z-ai/glm-5.2/openrouter_market].

coderabbitai · 2026-06-24T16:51:05Z

Warning

Review limit reached

@jmlago, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 35 minutes and 33 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d8906aad-3445-4ed6-9454-566ef11f55ac

📥 Commits

Reviewing files that changed from the base of the PR and between 5cd5350 and 20c2d13.

📒 Files selected for processing (5)

auth_proxy.py
route_session_meter.py
shim.py
tests/test_metering.py
tests/test_shim.py

📝 Walkthrough

Walkthrough

Adds per-session warm-route metadata tracking to route_session_meter with observe_route() and warm() functions. llm_router_host gains flow-level cost aggregation using tokens_cached/cost_reported and calls observe_route() after successful routing. shim.py threads req.session into all streaming final-chunk paths, adds prompt_tokens_details.cached_tokens to OpenAI usage responses, and includes warm-route data in the /x/session/{sid} meter endpoint. auth_proxy.py adds a new consumer-facing GET /v1/session/{sid} endpoint that authenticates and proxies to the router.

Changes

Session Warm-Route Tracking and Display

Layer / File(s)	Summary
Warm-route store in `route_session_meter` `route_session_meter.py`	Adds `_warm` in-memory dict keyed by session/family; introduces `observe_route()` to record provider and served-by metadata, `warm()` to retrieve session entries, and extends `reset()` to clear the new store.
Flow cost aggregation and warm-route recording in `llm_router_host` `llm_router_host.py`	Imports `route_session_meter`; extends `node_trace` with `tokens_cached`/`cost_reported`; introduces `_node_cost` closure that prefers provider-reported cost or computes from pricing with cached-token discount; sums into `flow_cost` on the return payload; calls `observe_route()` in `_fold_route_outcome` on successful calls.
Session attribution and cache reporting in `shim` streaming paths `shim.py`	Adds `_session_from_header` helper reading `X-Unhardcoded-Session`; extends `_final_chunk_parts` with optional `session` param that triggers `route_session_meter.observe()`; adds `prompt_tokens_details.cached_tokens` to streaming and non-streaming usage objects; propagates `req.session` through `_pseudo_stream`, `_flow_stream`, and `_sse_gen` (including its error path) and `_router_response_to_openai`.
Session display endpoints in `shim` and `auth_proxy` `shim.py`, `auth_proxy.py`	`/x/session/{sid}` now appends a `warm` field from `route_session_meter.warm(sid)`; `auth_proxy` adds `GET /v1/session/{sid}` that authenticates the caller via Bearer token, proxies to the router's internal endpoint, and returns upstream JSON or a `502` error.

Sequence Diagram(s)

sequenceDiagram
  participant Consumer
  participant auth_proxy as auth_proxy<br/>(GET /v1/session/{sid})
  participant shim_router as shim<br/>(/x/session/{sid})
  participant route_session_meter

  rect rgba(100, 149, 237, 0.5)
    note over Consumer,auth_proxy: Authentication
    Consumer->>auth_proxy: GET /v1/session/{sid}<br/>Authorization: Bearer token
    auth_proxy->>auth_proxy: _extract_token + _caller_auth
    auth_proxy-->>Consumer: 401/403 on auth failure
  end

  rect rgba(60, 179, 113, 0.5)
    note over auth_proxy,route_session_meter: Proxy + warm data assembly
    auth_proxy->>shim_router: GET /x/session/{sid}
    shim_router->>route_session_meter: warm(sid)
    route_session_meter-->>shim_router: [{family, provider, served_by}, ...]
    shim_router-->>auth_proxy: {calls, tokens, cost, warm: [...]}
    auth_proxy-->>Consumer: upstream JSON or 502
  end

sequenceDiagram
  participant Client
  participant shim_handler
  participant llm_router_host
  participant _fold_route_outcome
  participant route_session_meter

  Client->>shim_handler: POST /v1/chat/completions<br/>[X-Unhardcoded-Session: sid]
  shim_handler->>shim_handler: _session_from_header → req.session = sid
  shim_handler->>llm_router_host: execute_flow_async(session=sid)
  llm_router_host->>_fold_route_outcome: fold each node result
  _fold_route_outcome->>route_session_meter: observe_route(session, provider, family, served_by)
  llm_router_host-->>shim_handler: response with tokens_cached + cost_reported
  shim_handler->>shim_handler: _final_chunk_parts(session=sid)
  shim_handler->>route_session_meter: observe(sid, tokens_in, tokens_out, cost)
  shim_handler-->>Client: SSE chunk with usage.prompt_tokens_details.cached_tokens

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

genlayerlabs/unhardcoded#18: Directly adds the per-session usage/cost accumulator in route_session_meter and the initial tokens_cached/cost_reported propagation through llm_router_host and shim that this PR extends with warm-route metadata and streaming session attribution.

Poem

🐇 Hop hop, the session glows warm,
Each cache hit tracked, each routed norm.
The meter records which provider stayed near,
Costs summed from nodes, the flow path clear.
/v1/session now speaks to the crowd —
A bunny's data panel, fluffy and proud! 🌟

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 55.56% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly captures the main additions: metering aggregation, warm route tracking, and the new consumer session endpoint.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch metering-warm-sidebar

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

route_session_meter.py (1)
20-24: 🧹 Nitpick | 🔵 Trivial

Unbounded per-session growth of in-memory stores.

_warm (like the existing _acc) accumulates one entry per distinct session id with no eviction or TTL, and reset() is only a test hook. In a long-running process, sessions are never reclaimed, so both maps grow without bound. Consider a bounded LRU or TTL-based expiry for _warm/_acc.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@route_session_meter.py` around lines 20 - 24, The in-memory session maps in
route_session_meter are unbounded, so `_warm` and `_acc` can grow forever as new
session IDs are seen. Update the session storage logic in `route_session_meter`
to use bounded eviction or TTL-based expiration instead of keeping entries
indefinitely, and make sure any helper methods that read/write these maps still
work with the new lifecycle. If there is a central accessor or mutation path for
`_warm`/`_acc`, adjust that code so stale sessions are removed automatically
without relying on `reset()`.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@auth_proxy.py`:
- Around line 1962-1965: The upstream session proxy in the code path using
_client.get and sid should URL-encode the sid path parameter before building the
request target. Update the request in the session handler to quote sid with
safe="" so reserved characters stay within a single path segment and do not
alter the upstream URL.

In `@shim.py`:
- Around line 944-956: The session meter fold in _final_chunk_parts is counting
failed streaming requests because route_session_meter.observe is called
unconditionally whenever session is present. Add a guard so the per-session
meter is only updated for successful results, using result.get("ok") (or the
equivalent response status available in _final_chunk_parts) before calling
observe and setting x_router["session_acc"]. Keep the change localized to
_final_chunk_parts so streaming and non-streaming paths stay consistent with
_openai_error_from_router.

---

Nitpick comments:
In `@route_session_meter.py`:
- Around line 20-24: The in-memory session maps in route_session_meter are
unbounded, so `_warm` and `_acc` can grow forever as new session IDs are seen.
Update the session storage logic in `route_session_meter` to use bounded
eviction or TTL-based expiration instead of keeping entries indefinitely, and
make sure any helper methods that read/write these maps still work with the new
lifecycle. If there is a central accessor or mutation path for `_warm`/`_acc`,
adjust that code so stale sessions are removed automatically without relying on
`reset()`.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a805402b-1045-4168-8dbe-56cb40ec489b

📥 Commits

Reviewing files that changed from the base of the PR and between d35af18 and 5cd5350.

📒 Files selected for processing (4)

auth_proxy.py
llm_router_host.py
route_session_meter.py
shim.py

coderabbitai · 2026-06-24T16:56:04Z

+    assert _client is not None
+    try:
+        r = await _client.get(f"{UPSTREAM}/x/session/{sid}", timeout=5.0)
+        return JSONResponse(status_code=r.status_code, content=r.json())


🔒 Security & Privacy | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Confirm whether other upstream proxy calls encode path params, and how UPSTREAM is defined rg -nP -C2 'UPSTREAM\}/x/' --type=py rg -nP 'from urllib.parse import|quote' auth_proxy.py

Repository: genlayerlabs/unhardcoded

Length of output: 162

🏁 Script executed:

#!/bin/bash set -euo pipefail # Map the relevant file and locate the session handler plus related routing. git ls-files auth_proxy.py ast-grep outline auth_proxy.py --view expanded || true echo echo "=== session-related lines ===" rg -n -C 3 'session|UPSTREAM|sid' auth_proxy.py echo echo "=== upstream URL construction across repository ===" rg -n -C 2 'f"\{UPSTREAM\}/x/' . echo echo "=== urllib.parse usage in auth_proxy.py ===" rg -n -C 2 'urllib\.parse|quote\(' auth_proxy.py

Repository: genlayerlabs/unhardcoded

Length of output: 50380

🏁 Script executed:

#!/bin/bash set -euo pipefail # Inspect the handler around the cited lines and the route definition for `sid`. wc -l auth_proxy.py sed -n '1930,1985p' auth_proxy.py echo echo "=== route definitions for session endpoints ===" rg -n -C 2 '`@app`\.(get|post|put|delete|api_route)\(.*/x/session|/x/session/\{sid' auth_proxy.py

Repository: genlayerlabs/unhardcoded

Length of output: 3708

Quote sid before proxying it upstream.

sid comes from a path parameter, so a value containing ? or other reserved characters can change the upstream request target. Use quote(sid, safe="") to keep it a single path segment.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@auth_proxy.py` around lines 1962 - 1965, The upstream session proxy in the code path using _client.get and sid should URL-encode the sid path parameter before building the request target. Update the request in the session handler to quote sid with safe="" so reserved characters stay within a single path segment and do not alter the upstream URL.

Source: Linters/SAST tools

coderabbitai · 2026-06-24T16:56:04Z

+        # Per-session meter on the STREAMING path too (mirrors the non-streaming
+        # _router_response_to_openai). Streaming clients (e.g. opencode, stream:true)
+        # otherwise never fold into the session total. Idempotent per request: this
+        # runs once per final chunk.
+        if session:
+            acc = route_session_meter.observe(
+                session,
+                tokens_in=resp.get("tokens_in") or 0,
+                tokens_out=resp.get("tokens_out") or 0,
+                tokens_cached=resp.get("tokens_cached") or 0,
+                cost_usd=x_router["cost_usd"] or 0.0)
+            x_router["session_acc"] = acc
        return resp, usage or None, x_router


🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Session meter counts failed streaming calls but not failed non-streaming calls.

_final_chunk_parts is also invoked on the streaming failure paths (_sse_gen error branch at Line 1041 and _flow_stream at Line 993, which runs before the ok check). Since observe() unconditionally does calls += 1, a failed streaming request inflates session_acc.calls. The non-streaming failure path (_openai_error_from_router) never meters, so the two paths diverge. Consider gating the fold on result.get("ok") for consistency.

Proposed guard

- if session: + if session and result.get("ok"): acc = route_session_meter.observe( session, tokens_in=resp.get("tokens_in") or 0, tokens_out=resp.get("tokens_out") or 0, tokens_cached=resp.get("tokens_cached") or 0, cost_usd=x_router["cost_usd"] or 0.0) x_router["session_acc"] = acc

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Per-session meter on the STREAMING path too (mirrors the non-streaming

# _router_response_to_openai). Streaming clients (e.g. opencode, stream:true)

# otherwise never fold into the session total. Idempotent per request: this

# runs once per final chunk.

if session:

acc = route_session_meter.observe(

session,

tokens_in=resp.get("tokens_in") or 0,

tokens_out=resp.get("tokens_out") or 0,

tokens_cached=resp.get("tokens_cached") or 0,

cost_usd=x_router["cost_usd"] or 0.0)

x_router["session_acc"] = acc

return resp, usage or None, x_router

# Per-session meter on the STREAMING path too (mirrors the non-streaming

# _router_response_to_openai). Streaming clients (e.g. opencode, stream:true)

# otherwise never fold into the session total. Idempotent per request: this

# runs once per final chunk.

if session and result.get("ok"):

acc = route_session_meter.observe(

session,

tokens_in=resp.get("tokens_in") or 0,

tokens_out=resp.get("tokens_out") or 0,

tokens_cached=resp.get("tokens_cached") or 0,

cost_usd=x_router["cost_usd"] or 0.0)

x_router["session_acc"] = acc

return resp, usage or None, x_router

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@shim.py` around lines 944 - 956, The session meter fold in _final_chunk_parts is counting failed streaming requests because route_session_meter.observe is called unconditionally whenever session is present. Add a guard so the per-session meter is only updated for successful results, using result.get("ok") (or the equivalent response status available in _final_chunk_parts) before calling observe and setting x_router["session_acc"]. Keep the change localized to _final_chunk_parts so streaming and non-streaming paths stay consistent with _openai_error_from_router.

…ner) GET /v1/session/{sid} authed only that the caller was *a* valid consumer, never that the sid belonged to *that* consumer. route_session_meter is keyed by sid alone, so any authed consumer could read any other consumer's session economics (cost/tokens/cache) and `warm[]` — which discloses the real family/provider/served_by peers serving someone else's conversation. Bind sid -> owning consumer key and enforce it: - route_session_meter: add _owner (sid->consumer key), guarded by _lock. observe() gains owner=; records first-writer-wins (setdefault) so a consumer reusing another's opaque sid cannot steal/overwrite ownership. add owner(); reset() clears it. - shim: thread the authed consumer (the ingress proxy's x-llm-router-caller header, captured onto ChatRequest.caller) into BOTH observe sites (streaming _final_chunk_parts and non-streaming _router_response_to_openai) as owner=. - shim /x/session/{sid}: when a caller header is present (the consumer-facing path), only the owner may read; anyone else gets 404 — NOT 403, so the endpoint never confirms another consumer's sid exists. Operator /x/* (no header) stays unscoped. - auth_proxy session_view: forward the authed caller as x-llm-router-caller so the upstream meter scopes the read. Tests: meter-level first-writer-wins binding (test_metering) and an end-to-end endpoint test (test_shim): owner reads 200, other consumer/unknown sid 404, operator unscoped.

…26) Reconstructed from Edgars Nemše's #24, carrying only the family half. The provider-adapter half is in its sibling PR; this one makes discovered OpenRouter marketplace families provider-neutral. What it does: - sources/openrouter.py: a discovered marketplace model's policy-facing family is the provider-neutral name (`openai/gpt-5-mini` -> `gpt-5-mini`), while `wire_model_id` keeps the exact OpenRouter slug for the wire. `service_aliases` (config) handle the canonicalization exceptions where stripping the vendor isn't the right family (dated/suffixed slugs). Curated families served by the static `openrouter` provider are deduped out, so a marketplace row never shadows a curated family. - This pairs with the engine's `provider_eq` (#18): family is the model, provider is a separate axis the algebra filters. It lets a single provider-agnostic policy span curated + marketplace routes for one model, which is the point of the router ("stop hardcoding models"). Core bump: `core` -> 97d0333 (unhardcoded-engine #22), which makes `served_model_id` the offer's wire id (`offer.wire_model_id or model_family`). With a neutral family + a distinct wire slug, the engine now wires the slug on both the curated and discovered paths without every adapter special-casing `wire_model_id`, and the replayable trace records the real wire id. The §3 determinism boundary holds: `model_meta.lua` stays curated-keyed (refresh_model_meta untouched), so an on-chain/genvm host lacking a discovered offer still fails closed. Pinned by test_openrouter_model_meta_still_keyed_by _curated_family and the dedup assertion in test_openrouter_discovery_derives_policy_families_from_raw_model_ids. Verification: nix-shell --run 'python -m pytest tests -q' -> 333 passed, 2 skipped, 0 failed, against the bumped core. No new dependency.

coderabbitai Bot reviewed Jun 24, 2026

View reviewed changes

jmlago merged commit 6ee4a49 into main Jun 24, 2026
1 check passed

jmlago mentioned this pull request Jun 26, 2026

[codex] Add dynamic OpenRouter families and native provider adapters #24

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(metering): flow cost+cache aggregation, per-session warm routes, consumer /v1/session#22

feat(metering): flow cost+cache aggregation, per-session warm routes, consumer /v1/session#22
jmlago merged 2 commits into
mainfrom
metering-warm-sidebar

jmlago commented Jun 24, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 24, 2026 •

edited

Loading

Review limit reached

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 24, 2026

Uh oh!

coderabbitai Bot Jun 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

jmlago commented Jun 24, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Verified live

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jmlago commented Jun 24, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 24, 2026 •

edited

Loading