Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 26 additions & 0 deletions .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -32,3 +32,29 @@ PUBLIC_BASE_URL=http://127.0.0.1:8080/v1
RATE_PER_MIN=600
BURST=200
LOG_LEVEL=INFO

# ============================================================================
# AntSeed marketplace (OPTIONAL — only with `docker compose --profile antseed`)
# ============================================================================
# AntSeed lets the router buy inference from a decentralized marketplace, paid in
# REAL USDC on Base mainnet from a hot wallet you control. Off by default.
#
# ⚠️ ALWAYS use a DEDICATED DEV WALLET here with a tiny balance — NEVER your
# production wallet key. This var IS a private key: treat it like a password,
# never commit it (.env / .env.secrets are gitignored).
#
# Setup (3 steps):
# 1. Generate a dev wallet: ./scripts/gen-dev-wallet.sh (prints the two
# lines below; paste them into .env)
# 2. Bring it up: docker compose --profile antseed up -d --build
# 3. Get the address to fund: docker compose exec antseed antseed buyer balance --json
# then send a little USDC + ETH (gas) on **Base mainnet** to that address,
# and `deposit` it into escrow from the dashboard Catalog (wallet cell).
ANTSEED_IDENTITY_HEX=
# Shared secret enabling the dashboard's wallet self-service (deposit/withdraw).
# Same value on the router and the antseed sidecar. Unset => those endpoints 503.
ANTSEED_CONTROL_TOKEN=
# Wide outer spend ceilings (USD per million tokens). The real per-call price gate
# is the caller's Σ_pol policy; these are just rails.
ANTSEED_MAX_INPUT=1000
ANTSEED_MAX_OUTPUT=1000
4 changes: 4 additions & 0 deletions behave.ini
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
[behave]
paths = features
tags = -manual
show_timings = true
14 changes: 14 additions & 0 deletions docs/PROVIDERS.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,20 @@ The host pins the policy-selected peer per request via `x-antseed-pin-peer`
(the browse-mode buyer disables auto-selection), keeping peer choice inside
Σ_pol rather than an opaque buyer-side router.

### Local dev wallet (testing)

For local testing use a **dedicated dev wallet**, never your production key.
`./scripts/gen-dev-wallet.sh` prints a fresh `ANTSEED_IDENTITY_HEX` +
`ANTSEED_CONTROL_TOKEN` to paste into `.env`; bring the sidecar up
(`docker compose --profile antseed up -d`), read the derived address with
`docker compose exec antseed antseed buyer balance --json`, fund it with a little
USDC + ETH (gas) on Base, then **Deposit** into escrow from the dashboard Catalog
(wallet cell). Keep dev and prod wallet secrets separate. See `.env.example`.

> Note: the AntSeed deposits contract **locks** deposited funds — an immediate
> `withdraw` after a `deposit` reverts. Funds are safe in escrow and become
> withdrawable later, or are spent as the buyer routes paid calls.

### Running the node (vendored sidecar)

Built from `Dockerfile.antseed` (pinned `@antseed/cli`, `socat`) and run by
Expand Down
34 changes: 34 additions & 0 deletions features/01_onboarding.feature
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
Feature: Onboarding & setup — a new user gets a running, healthy stack
The clone/compose steps themselves are environment-level (@manual: cannot be
re-run inside the suite); here we assert their OUTCOME on the running stack.

@p0 @onboarding
Scenario: The core engine submodule is populated (recursive clone outcome)
Then the file "core/router.lua" exists
And the file "core/llm_policy.lua" exists

@p0 @onboarding
Scenario: The stack is up and healthy (compose up outcome)
Given the stack is healthy
When I GET "/healthz" as none
Then the status is 200
And the field "ok" equals "True"

@p0 @onboarding
Scenario: The router loaded its catalog (engine embedded + config.live.lua)
Given I have a caller token
When I GET "/v1/models" as consumer
Then the status is 200
And the array "data" has at least 5 items

@manual @onboarding
Scenario: Recursive clone (manual — run once on a fresh machine)
# git clone --recursive https://github.com/genlayerlabs/unhardcoded.git
# -> core/ submodule populated; covered by the 'submodule populated' outcome above.
Given the stack is healthy

@manual @onboarding
Scenario: docker compose up --build (manual — environment setup)
# cp .env.example .env.secrets; fill secrets; docker compose up -d --build
# -> router + ingress healthy; covered by the 'stack up and healthy' outcome above.
Given the stack is healthy
39 changes: 39 additions & 0 deletions features/02_auth.feature
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
Feature: Authentication — dashboard sessions and the caller bearer contract

Background:
Given the stack is healthy

@p0 @auth
Scenario: DASHBOARD_NO_AUTH grants local admin to the console API
When I GET "/dashboard/api/stats" as admin
Then the status is 200
And the field "viewer_role" equals "admin"

@p0 @auth
Scenario: A valid caller bearer token is accepted on /v1
Given I have a caller token
When I GET "/v1/models" as consumer
Then the status is 200

@p0 @auth
Scenario: A missing caller token is rejected on /v1
When I GET "/v1/models" as none
Then the status is 401
And the field "error.code" equals "caller_auth"

@p1 @auth
Scenario: A consumer can log into the dashboard with their API key (scoped session)
Given I have a caller token
When I log into the dashboard with my caller key
Then the status is 200
And the field "role" equals "consumer"

@manual @auth
Scenario: Admin password login (manual — needs DASHBOARD_PASSWORD_SHA256 set and NO_AUTH off)
# POST /dashboard/login {password} -> sets an admin session cookie.
# Not auto-tested: the local dev stack runs with DASHBOARD_NO_AUTH=1.
Given the stack is healthy

@manual @auth
Scenario: Trusted-header SSO admin (manual — needs a reverse proxy injecting the header+secret)
Given the stack is healthy
94 changes: 94 additions & 0 deletions features/03_consumer_api.feature
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
Feature: Consumer API flows (/v1) — the calling service's surface
As a consuming service I call /v1 with my bearer token and the router
decides/falls-back over the operator's provider keys. All end-to-end chats
here route to codex ($0) so the suite is free.

Background:
Given the stack is healthy
And I have a caller token

@p0 @api
Scenario: List the routable model catalog
When I GET "/v1/models" as consumer
Then the status is 200
And the field "object" equals "list"
And the array "data" has at least 5 items
And the array "data" includes an item where "id" equals "profile:default"

@p0 @api
Scenario: Chat completion runs a policy and returns a real answer + trace
When I POST a free chat as consumer
Then the status is 200
And the field "object" equals "chat.completion"
And the field "choices[0].message.content" is non-empty
And the field "usage.total_tokens" is a number
And the field "x_router.provider" is non-empty
And the field "x_router.served_model_id" is non-empty
And the field "x_router.decision_trace" is present

@p0 @api
Scenario: Per-call policy_ir is admitted and executed
When I POST "/v1/chat/completions" as consumer with json
"""
{"model":"","max_tokens":16,"messages":[{"role":"user","content":"hi"}],
"policy_ir":["policy",
["and",["meets_req"],["not",["is","disabled"]],["family_eq","gpt-5.5"]],
["neg",["normalize",["field","price_in"]]],
["argmax"],["id"],["always",{"action":"next_candidate"}]]}
"""
Then the status is 200
And the field "x_router.policy_fingerprint" is present
And the field "choices[0].message.content" is non-empty

@p1 @api
Scenario: Malformed policy_ir is rejected cleanly at admission (no spend)
When I POST "/v1/chat/completions" as consumer with json
"""
{"model":"","messages":[{"role":"user","content":"hi"}],
"policy_ir":["policy","not-a-valid-term"]}
"""
Then the status is 400
And the field "error.type" equals "invalid_request_error"
And the field "error.message" contains "policy_ir"

@p0 @api
Scenario: Sigma_flow DAG runs and returns the sink answer with a per-node trace
When I POST a free flow as consumer
Then the status is 200
And the field "x_router.provider" equals "flow"
And the field "choices[0].message.content" is non-empty
And the array "x_router.decision_trace.flow_nodes" has at least 2 items
And every item in "x_router.decision_trace.flow_nodes" has a "provider"
And every item in "x_router.decision_trace.flow_nodes" has a "served_model_id"

@p1 @api
Scenario: Malformed flow_ir is rejected at admission
When I POST "/v1/chat/completions" as consumer with json
"""
{"model":"","messages":[{"role":"user","content":"hi"}],
"flow_ir":["flow",{"out":{"kind":"output","inputs":["missing"]}}]}
"""
Then the status is 400
And the field "error.message" contains "flow_ir"

@p1 @api
Scenario: Per-key usage self-service is scoped and sanitized
When I POST a free chat as consumer
And I GET "/v1/usage?window=24h" as consumer
Then the status is 200
And the field "kind" equals "router_key_usage"
And the field "key_sha256_prefix" is non-empty
And the field "totals.requests" is at least 1
And the field "consumer_settings.status" is present

@p0 @api
Scenario: Missing bearer token is rejected
When I GET "/v1/models" as none
Then the status is 401
And the field "error.code" equals "caller_auth"

@p0 @api
Scenario: Unknown bearer token is rejected
When I GET "/v1/models" as bad
Then the status is 401
And the field "error.code" equals "caller_auth"
109 changes: 109 additions & 0 deletions features/04_dashboard.feature
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
Feature: Dashboard data — what the operator console renders MUST be present and correct
The dashboard is a thin renderer of /dashboard/api/*. These scenarios assert the
backing data is complete and correct (so the frontend shows real, correct values
in Analytics, Activity, Catalog, Config, Consumers, Provider keys). Seeded
activity (one chat + one flow) is created in before_all.

Background:
Given the stack is healthy

@p0 @dashboard
Scenario: The dashboard HTML page loads with all its tabs and renderers
When I GET "/dashboard" as admin
Then the status is 200
And the response text contains "Analytics"
And the response text contains "Builder"
And the response text contains "Activity"
And the response text contains "Catalog"
And the response text contains "Config"
And the response text contains "renderActivity"
And the response text contains "renderAnalytics"

@p0 @dashboard
Scenario: Analytics — totals, breakdowns and health are populated
When I GET "/dashboard/api/stats" as admin
Then the status is 200
And the field "viewer_role" equals "admin"
And the field "totals.requests" is at least 1
And the field "totals.tokens_total" is a number
And the field "totals.cost_usd" is a number
And the field "by_provider" is non-empty
And the field "by_status" is non-empty
And the field "health_summary" is present
And the array "daily_totals" has at least 1 items

@p0 @dashboard
Scenario: Activity — recent requests carry a full, correct per-request trace
When I POST a free chat as consumer
And I POST a free flow as consumer
And I GET "/dashboard/api/stats" as admin
Then the status is 200
And the array "recent" has at least 2 items
And every item in "recent" has a "status"
And every item in "recent" has a "ts"
And the array "recent" includes an item where "provider" equals "flow"
And the array "recent" includes an item where "provider" equals "openai"

@p0 @dashboard
Scenario: Catalog (Market) — families list with prices and per-seller perf
When I GET "/dashboard/api/market" as admin
Then the status is 200
And the array "families" has at least 3 items
And every item in "families" has a "family"
And every item in "families" has a "quality"
And every item in "families" has a "rows"
And the array "families" includes an item where "family" equals "gpt-5.5"

@p0 @dashboard
Scenario: Policies — the default profile and live providers with health
When I GET "/dashboard/api/policies" as admin
Then the status is 200
And the array "profiles" includes an item where "name" equals "default"
And the field "providers" is non-empty
And every item in "providers" has a "health"

@p1 @dashboard
Scenario: Builder field vocabulary is available
When I GET "/dashboard/api/fields" as admin
Then the status is 200
And the array "fields" includes an item where "name" equals "price_in"
And the array "fields" includes an item where "name" equals "latency_ms"
And the array "fields" includes an item where "name" equals "success_rate"

@p1 @dashboard
Scenario: Config — per-provider tunable knobs are present
When I GET "/dashboard/api/config" as admin
Then the status is 200
And the field "knobs" is non-empty

@p1 @dashboard
Scenario: Consumers — the test consumer is listed with stats
When I GET "/dashboard/api/keys" as admin
Then the status is 200
And the array "keys" includes an item where "consumer" equals "bdd-test"

@p1 @dashboard
Scenario: Provider keys — credentials snapshot is privatized but present
When I GET "/dashboard/api/provider-keys" as admin
Then the status is 200
And the field "rows" is non-empty

@p1 @dashboard
Scenario: Codex accounts — an active account is configured
When I GET "/dashboard/api/codex/accounts" as admin
Then the status is 200
And the field "accounts" is non-empty
And the field "active" is non-empty
And the field "activity" is present

@p1 @dashboard
Scenario: Builder dry-run ranking (policy preview) returns an ordering (no spend)
When I POST "/dashboard/api/policy/preview" as admin with json
"""
{"policy_ir":["policy",
["and",["meets_req"],["not",["is","disabled"]],["family_eq","gpt-5.5"]],
["neg",["normalize",["field","price_in"]]],
["argmax"],["id"],["always",{"action":"next_candidate"}]]}
"""
Then the status is 200
And the field "ranked" is non-empty
42 changes: 42 additions & 0 deletions features/05_providers.feature
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
Feature: Providers — OpenRouter, Codex, discovery and registered model traits
Asserts the configured providers are live and the registered benchmark/modality
fields (model_meta) are part of the field vocabulary the builder/policies use.

Background:
Given the stack is healthy

@p0 @providers
Scenario: OpenRouter and Codex providers are present with health
When I GET "/dashboard/api/policies" as admin
Then the status is 200
And the array "providers" includes an item where "name" equals "openrouter"
And the array "providers" includes an item where "name" equals "openai"
And every item in "providers" has a "health"

@p1 @providers
Scenario: Codex is configured as a ChatGPT-subscription (openai_codex) provider
When I GET "/dashboard/api/policies" as admin
Then the status is 200
And the array "providers" includes an item where "name" equals "openai"
And the matched item field "api_kind" equals "openai_codex"

@p1 @providers
Scenario: A Codex account is active (auth wired through)
When I GET "/dashboard/api/codex/accounts" as admin
Then the status is 200
And the field "accounts" is non-empty
And the field "active" is non-empty

@p1 @providers
Scenario: Registered model traits (model_meta benchmarks) are in the field vocabulary
When I GET "/dashboard/api/fields" as admin
Then the status is 200
And the array "fields" includes an item where "name" equals "bench_intelligence"
And the array "fields" includes an item where "name" equals "bench_coding"

@p1 @providers
Scenario: The discovered catalog exposes routable families
Given I have a caller token
When I GET "/v1/models" as consumer
Then the status is 200
And the array "data" includes an item where "id" equals "family:gpt-5.5"
Loading