feat(metering): flow cost+cache aggregation, per-session warm routes, consumer /v1/session#22
Conversation
…rm routes, consumer /v1/session Post-#18 follow-on (stranded on the merged cache-metrics branch). Lands: - route_session_meter: /x/session endpoints wiring + per-(session,family) warm map (observe_route/warm) for display. - shim: read session from X-Unhardcoded-Session header; meter the session on the STREAMING paths too; emit standard usage.prompt_tokens_details.cached_tokens so OpenAI-compatible clients (opencode) see cache; /x/session returns warm. - llm_router_host: aggregate cost + cached across flow nodes (the synthetic 'flow' chosen has no price); record warm route per node on success. - auth_proxy: consumer-facing GET /v1/session/{sid} (consumer-key authed) so a harness reads cost/tokens/cache/warm without operator /x/* access. Verified live: flow cost_usd 0.034 + cached 2304; GET /v1/session/{sid} returns warm [gpt-5.5/openrouter, z-ai/glm-5.2/openrouter_market].
|
Warning Review limit reached
More reviews will be available in 35 minutes and 33 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits. 🚦 How do rate limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (5)
📝 WalkthroughWalkthroughAdds per-session warm-route metadata tracking to ChangesSession Warm-Route Tracking and Display
Sequence Diagram(s)sequenceDiagram
participant Consumer
participant auth_proxy as auth_proxy<br/>(GET /v1/session/{sid})
participant shim_router as shim<br/>(/x/session/{sid})
participant route_session_meter
rect rgba(100, 149, 237, 0.5)
note over Consumer,auth_proxy: Authentication
Consumer->>auth_proxy: GET /v1/session/{sid}<br/>Authorization: Bearer token
auth_proxy->>auth_proxy: _extract_token + _caller_auth
auth_proxy-->>Consumer: 401/403 on auth failure
end
rect rgba(60, 179, 113, 0.5)
note over auth_proxy,route_session_meter: Proxy + warm data assembly
auth_proxy->>shim_router: GET /x/session/{sid}
shim_router->>route_session_meter: warm(sid)
route_session_meter-->>shim_router: [{family, provider, served_by}, ...]
shim_router-->>auth_proxy: {calls, tokens, cost, warm: [...]}
auth_proxy-->>Consumer: upstream JSON or 502
end
sequenceDiagram
participant Client
participant shim_handler
participant llm_router_host
participant _fold_route_outcome
participant route_session_meter
Client->>shim_handler: POST /v1/chat/completions<br/>[X-Unhardcoded-Session: sid]
shim_handler->>shim_handler: _session_from_header → req.session = sid
shim_handler->>llm_router_host: execute_flow_async(session=sid)
llm_router_host->>_fold_route_outcome: fold each node result
_fold_route_outcome->>route_session_meter: observe_route(session, provider, family, served_by)
llm_router_host-->>shim_handler: response with tokens_cached + cost_reported
shim_handler->>shim_handler: _final_chunk_parts(session=sid)
shim_handler->>route_session_meter: observe(sid, tokens_in, tokens_out, cost)
shim_handler-->>Client: SSE chunk with usage.prompt_tokens_details.cached_tokens
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
route_session_meter.py (1)
20-24: 🧹 Nitpick | 🔵 TrivialUnbounded per-session growth of in-memory stores.
_warm(like the existing_acc) accumulates one entry per distinct session id with no eviction or TTL, andreset()is only a test hook. In a long-running process, sessions are never reclaimed, so both maps grow without bound. Consider a bounded LRU or TTL-based expiry for_warm/_acc.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@route_session_meter.py` around lines 20 - 24, The in-memory session maps in route_session_meter are unbounded, so `_warm` and `_acc` can grow forever as new session IDs are seen. Update the session storage logic in `route_session_meter` to use bounded eviction or TTL-based expiration instead of keeping entries indefinitely, and make sure any helper methods that read/write these maps still work with the new lifecycle. If there is a central accessor or mutation path for `_warm`/`_acc`, adjust that code so stale sessions are removed automatically without relying on `reset()`.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@auth_proxy.py`:
- Around line 1962-1965: The upstream session proxy in the code path using
_client.get and sid should URL-encode the sid path parameter before building the
request target. Update the request in the session handler to quote sid with
safe="" so reserved characters stay within a single path segment and do not
alter the upstream URL.
In `@shim.py`:
- Around line 944-956: The session meter fold in _final_chunk_parts is counting
failed streaming requests because route_session_meter.observe is called
unconditionally whenever session is present. Add a guard so the per-session
meter is only updated for successful results, using result.get("ok") (or the
equivalent response status available in _final_chunk_parts) before calling
observe and setting x_router["session_acc"]. Keep the change localized to
_final_chunk_parts so streaming and non-streaming paths stay consistent with
_openai_error_from_router.
---
Nitpick comments:
In `@route_session_meter.py`:
- Around line 20-24: The in-memory session maps in route_session_meter are
unbounded, so `_warm` and `_acc` can grow forever as new session IDs are seen.
Update the session storage logic in `route_session_meter` to use bounded
eviction or TTL-based expiration instead of keeping entries indefinitely, and
make sure any helper methods that read/write these maps still work with the new
lifecycle. If there is a central accessor or mutation path for `_warm`/`_acc`,
adjust that code so stale sessions are removed automatically without relying on
`reset()`.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: a805402b-1045-4168-8dbe-56cb40ec489b
📒 Files selected for processing (4)
auth_proxy.pyllm_router_host.pyroute_session_meter.pyshim.py
| assert _client is not None | ||
| try: | ||
| r = await _client.get(f"{UPSTREAM}/x/session/{sid}", timeout=5.0) | ||
| return JSONResponse(status_code=r.status_code, content=r.json()) |
There was a problem hiding this comment.
🔒 Security & Privacy | 🟡 Minor
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Confirm whether other upstream proxy calls encode path params, and how UPSTREAM is defined
rg -nP -C2 'UPSTREAM\}/x/' --type=py
rg -nP 'from urllib.parse import|quote' auth_proxy.pyRepository: genlayerlabs/unhardcoded
Length of output: 162
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Map the relevant file and locate the session handler plus related routing.
git ls-files auth_proxy.py
ast-grep outline auth_proxy.py --view expanded || true
echo
echo "=== session-related lines ==="
rg -n -C 3 'session|UPSTREAM|sid' auth_proxy.py
echo
echo "=== upstream URL construction across repository ==="
rg -n -C 2 'f"\{UPSTREAM\}/x/' .
echo
echo "=== urllib.parse usage in auth_proxy.py ==="
rg -n -C 2 'urllib\.parse|quote\(' auth_proxy.pyRepository: genlayerlabs/unhardcoded
Length of output: 50380
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Inspect the handler around the cited lines and the route definition for `sid`.
wc -l auth_proxy.py
sed -n '1930,1985p' auth_proxy.py
echo
echo "=== route definitions for session endpoints ==="
rg -n -C 2 '`@app`\.(get|post|put|delete|api_route)\(.*/x/session|/x/session/\{sid' auth_proxy.pyRepository: genlayerlabs/unhardcoded
Length of output: 3708
Quote sid before proxying it upstream.
sid comes from a path parameter, so a value containing ? or other reserved characters can change the upstream request target. Use quote(sid, safe="") to keep it a single path segment.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@auth_proxy.py` around lines 1962 - 1965, The upstream session proxy in the
code path using _client.get and sid should URL-encode the sid path parameter
before building the request target. Update the request in the session handler to
quote sid with safe="" so reserved characters stay within a single path segment
and do not alter the upstream URL.
Source: Linters/SAST tools
| # Per-session meter on the STREAMING path too (mirrors the non-streaming | ||
| # _router_response_to_openai). Streaming clients (e.g. opencode, stream:true) | ||
| # otherwise never fold into the session total. Idempotent per request: this | ||
| # runs once per final chunk. | ||
| if session: | ||
| acc = route_session_meter.observe( | ||
| session, | ||
| tokens_in=resp.get("tokens_in") or 0, | ||
| tokens_out=resp.get("tokens_out") or 0, | ||
| tokens_cached=resp.get("tokens_cached") or 0, | ||
| cost_usd=x_router["cost_usd"] or 0.0) | ||
| x_router["session_acc"] = acc | ||
| return resp, usage or None, x_router |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
Session meter counts failed streaming calls but not failed non-streaming calls.
_final_chunk_parts is also invoked on the streaming failure paths (_sse_gen error branch at Line 1041 and _flow_stream at Line 993, which runs before the ok check). Since observe() unconditionally does calls += 1, a failed streaming request inflates session_acc.calls. The non-streaming failure path (_openai_error_from_router) never meters, so the two paths diverge. Consider gating the fold on result.get("ok") for consistency.
Proposed guard
- if session:
+ if session and result.get("ok"):
acc = route_session_meter.observe(
session,
tokens_in=resp.get("tokens_in") or 0,
tokens_out=resp.get("tokens_out") or 0,
tokens_cached=resp.get("tokens_cached") or 0,
cost_usd=x_router["cost_usd"] or 0.0)
x_router["session_acc"] = acc📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Per-session meter on the STREAMING path too (mirrors the non-streaming | |
| # _router_response_to_openai). Streaming clients (e.g. opencode, stream:true) | |
| # otherwise never fold into the session total. Idempotent per request: this | |
| # runs once per final chunk. | |
| if session: | |
| acc = route_session_meter.observe( | |
| session, | |
| tokens_in=resp.get("tokens_in") or 0, | |
| tokens_out=resp.get("tokens_out") or 0, | |
| tokens_cached=resp.get("tokens_cached") or 0, | |
| cost_usd=x_router["cost_usd"] or 0.0) | |
| x_router["session_acc"] = acc | |
| return resp, usage or None, x_router | |
| # Per-session meter on the STREAMING path too (mirrors the non-streaming | |
| # _router_response_to_openai). Streaming clients (e.g. opencode, stream:true) | |
| # otherwise never fold into the session total. Idempotent per request: this | |
| # runs once per final chunk. | |
| if session and result.get("ok"): | |
| acc = route_session_meter.observe( | |
| session, | |
| tokens_in=resp.get("tokens_in") or 0, | |
| tokens_out=resp.get("tokens_out") or 0, | |
| tokens_cached=resp.get("tokens_cached") or 0, | |
| cost_usd=x_router["cost_usd"] or 0.0) | |
| x_router["session_acc"] = acc | |
| return resp, usage or None, x_router |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@shim.py` around lines 944 - 956, The session meter fold in _final_chunk_parts
is counting failed streaming requests because route_session_meter.observe is
called unconditionally whenever session is present. Add a guard so the
per-session meter is only updated for successful results, using result.get("ok")
(or the equivalent response status available in _final_chunk_parts) before
calling observe and setting x_router["session_acc"]. Keep the change localized
to _final_chunk_parts so streaming and non-streaming paths stay consistent with
_openai_error_from_router.
…ner)
GET /v1/session/{sid} authed only that the caller was *a* valid consumer,
never that the sid belonged to *that* consumer. route_session_meter is keyed
by sid alone, so any authed consumer could read any other consumer's session
economics (cost/tokens/cache) and `warm[]` — which discloses the real
family/provider/served_by peers serving someone else's conversation.
Bind sid -> owning consumer key and enforce it:
- route_session_meter: add _owner (sid->consumer key), guarded by _lock.
observe() gains owner=; records first-writer-wins (setdefault) so a consumer
reusing another's opaque sid cannot steal/overwrite ownership. add owner();
reset() clears it.
- shim: thread the authed consumer (the ingress proxy's x-llm-router-caller
header, captured onto ChatRequest.caller) into BOTH observe sites (streaming
_final_chunk_parts and non-streaming _router_response_to_openai) as owner=.
- shim /x/session/{sid}: when a caller header is present (the consumer-facing
path), only the owner may read; anyone else gets 404 — NOT 403, so the
endpoint never confirms another consumer's sid exists. Operator /x/* (no
header) stays unscoped.
- auth_proxy session_view: forward the authed caller as x-llm-router-caller so
the upstream meter scopes the read.
Tests: meter-level first-writer-wins binding (test_metering) and an end-to-end
endpoint test (test_shim): owner reads 200, other consumer/unknown sid 404,
operator unscoped.
…26) Reconstructed from Edgars Nemše's #24, carrying only the family half. The provider-adapter half is in its sibling PR; this one makes discovered OpenRouter marketplace families provider-neutral. What it does: - sources/openrouter.py: a discovered marketplace model's policy-facing family is the provider-neutral name (`openai/gpt-5-mini` -> `gpt-5-mini`), while `wire_model_id` keeps the exact OpenRouter slug for the wire. `service_aliases` (config) handle the canonicalization exceptions where stripping the vendor isn't the right family (dated/suffixed slugs). Curated families served by the static `openrouter` provider are deduped out, so a marketplace row never shadows a curated family. - This pairs with the engine's `provider_eq` (#18): family is the model, provider is a separate axis the algebra filters. It lets a single provider-agnostic policy span curated + marketplace routes for one model, which is the point of the router ("stop hardcoding models"). Core bump: `core` -> 97d0333 (unhardcoded-engine #22), which makes `served_model_id` the offer's wire id (`offer.wire_model_id or model_family`). With a neutral family + a distinct wire slug, the engine now wires the slug on both the curated and discovered paths without every adapter special-casing `wire_model_id`, and the replayable trace records the real wire id. The §3 determinism boundary holds: `model_meta.lua` stays curated-keyed (refresh_model_meta untouched), so an on-chain/genvm host lacking a discovered offer still fails closed. Pinned by test_openrouter_model_meta_still_keyed_by _curated_family and the dedup assertion in test_openrouter_discovery_derives_policy_families_from_raw_model_ids. Verification: nix-shell --run 'python -m pytest tests -q' -> 333 passed, 2 skipped, 0 failed, against the bumped core. No new dependency.
Follow-on to #18 (the work landed on cache-metrics after it was squash-merged, so it had no PR).
What
flowchosen has no price. Now aggregates cost + cached across nodes.usage.prompt_tokens_details.cached_tokens(not just x_router) so OpenAI-compatible clients (opencode) parse cache reads.X-Unhardcoded-Sessionas the conversation id (clients that can set headers but not body fields).GET /v1/session/{sid}(consumer-key authed) →{calls, tokens_in/out, tokens_cached, cost_usd, warm[]}, so a harness (opencode plugin) shows live economics without operator /x/* access.Verified live
flow
cost_usd 0.034,cached 2304;GET /v1/session/{sid}→ warm[gpt-5.5/openrouter, z-ai/glm-5.2/openrouter_market].Summary by CodeRabbit
New Features
Bug Fixes