Skip to content

test: BDD user-flow suite (behave) + AntSeed dev-wallet setup#4

Merged
jmlago merged 1 commit into
mainfrom
test/bdd-user-flows
Jun 22, 2026
Merged

test: BDD user-flow suite (behave) + AntSeed dev-wallet setup#4
jmlago merged 1 commit into
mainfrom
test/bdd-user-flows

Conversation

@jmlago

@jmlago jmlago commented Jun 22, 2026

Copy link
Copy Markdown
Member

A Gherkin/behave suite covering the unhardcoded user flows end-to-end against the live local stack, plus the local AntSeed dev-wallet setup.

What it verifies

Drives the same endpoints the dashboard renders and asserts the data is present and correct (not just 200), plus real headless-browser (chromium) checks that the operator actually SEES the data.

  • user_flows.json — exhaustive catalogue of user flows in entry order (the spec the .features are lowered from).
  • features/ — 9 features: onboarding, auth, consumer API (policy_ir/flow_ir/streaming/errors), dashboard data (Analytics/Activity/Catalog/Config/Consumers/Provider-keys/Codex all correct), providers, consumer-key lifecycle (route/rate/revoke/inactive), AntSeed money, dashboard UI in a real browser, and flow1 (the GLM ∥ GPT→merge ensemble).
  • 48 scenarios green, 8 skipped (@manual docs + the gated real-money spend).

Money / safety

  • Read-only AntSeed scenarios verify the wallet/escrow data the dashboard shows; they auto-skip when the funded sidecar isn't up.
  • The real-money AntSeed spend is gated behind RUN_ANTSEED_SPEND=1 so it never runs by accident.
  • scripts/gen-dev-wallet.sh + .env.example + docs/PROVIDERS.md document a dev wallet setup (separate from prod): generate a throwaway key → fund the derived Base address → deposit from the dashboard.

How to run

nix-shell -p chromium chromedriver "python3.withPackages(ps: with ps; [selenium behave requests])" --run 'behave'

Free & repeatable: end-to-end chats route to Codex ($0).

Note

The browser regression scenario (an expanded Activity row surviving the 15s auto-refresh) validates the fix in #3 — merge that first, or it depends on a stack built with that fix.

🤖 Generated with Claude Code

A Gherkin (behave) suite that drives the live stack end-to-end — the same
endpoints the dashboard renders — and asserts the data is present AND correct,
plus real headless-browser checks of the dashboard tabs.

- user_flows.json: exhaustive catalogue of user flows in entry order (the spec
  the features are lowered from).
- features/: 9 features (onboarding, auth, consumer API, dashboard data,
  providers, consumer keys, AntSeed money, dashboard UI in chromium, flow1).
  48 scenarios green; real-money AntSeed gated behind RUN_ANTSEED_SPEND=1;
  @manual/@AntSeed excluded/auto-skipped when their preconditions aren't met.
- scripts/gen-dev-wallet.sh + .env.example + docs/PROVIDERS.md: documented local
  AntSeed *dev* wallet setup (generate a throwaway key, fund the derived address,
  deposit from the dashboard) — dev/prod wallet separation.

Run: nix-shell -p chromium chromedriver "python3.withPackages(ps: with ps;
     [selenium behave requests])" --run 'behave'

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 22, 2026

Copy link
Copy Markdown

Warning

Review limit reached

@jmlago, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 59 minutes and 15 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 449ff0f9-ee0d-4c32-a1e8-b9ab9d74d53b

📥 Commits

Reviewing files that changed from the base of the PR and between 0ff3bb2 and 02e24d7.

📒 Files selected for processing (18)
  • .env.example
  • behave.ini
  • docs/PROVIDERS.md
  • features/01_onboarding.feature
  • features/02_auth.feature
  • features/03_consumer_api.feature
  • features/04_dashboard.feature
  • features/05_providers.feature
  • features/06_consumer_keys.feature
  • features/07_money_antseed.feature
  • features/08_dashboard_ui.feature
  • features/09_flow1.feature
  • features/environment.py
  • features/fixtures/flow1.json
  • features/steps/browser_steps.py
  • features/steps/steps.py
  • scripts/gen-dev-wallet.sh
  • user_flows.json
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch test/bdd-user-flows

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@jmlago jmlago merged commit 8948b3a into main Jun 22, 2026
1 check passed
jmlago added a commit that referenced this pull request Jun 22, 2026
test: BDD user-flow suite (behave) + AntSeed dev-wallet setup
jmlago added a commit that referenced this pull request Jun 29, 2026
…bump)

#3 of the operational-store migration: enrich the `calls` fact table with the
two raw per-call facts the #4 route/analytics views will derive from — the
executed route identity and the cache-token breakdown. Prerequisite for keying
per-route stats off the ledger.

- Submodule bump core 97d0333 -> 537e204 (unhardcoded-engine #23): the engine's
  `chosen` now carries `served_by` — the marketplace peer that served the call,
  or the provider itself for a direct route (never nil). Host suite green on it.
- host_store.py: `calls` gains `served_by TEXT` + `tokens_cached BIGINT`, applied
  to existing tables via idempotent `ALTER TABLE ... ADD COLUMN IF NOT EXISTS`
  (CREATE TABLE IF NOT EXISTS never alters an existing table — the store gains its
  first in-place migration). insert_call maps both. route_key is left unchanged:
  deriving a peer-granular route key from served_by is #4's job; this commit only
  captures the raw fact.
- shim.py: `_build_x_router` surfaces `served_by` from `chosen` (tokens_cached was
  already there).
- auth_proxy.py: the ingress threads served_by + tokens_cached off x_router into
  the recorded call (both stream and unary paths) -> insert_call.

ttft was intentionally NOT added: nothing measures it yet, so the column would be
idle (Axis 3). error_type was already a column.

Verification: full suite 411 passed, 2 skipped, 0 failed against the compose
Postgres; the ALTER migration applies in place on boot; a live chat records
served_by + tokens_cached in `calls` end to end against engine #23.
jmlago added a commit that referenced this pull request Jun 29, 2026
… the filesystem (#38)

* feat(host-store): peer_offers — antseed market book off the filesystem

Move the antseed marketplace book from market.json (a file on a shared
volume, unioned by hand in merge-market.js) into the Postgres host store —
the next slice of the JSON/in-process migration after #36.

Form delta:
- Definition: a new `peer_offers` table holds one RAW row per (peer, service)
  — the seller's announced prices/cap/reputation as columns, not interpreted.
  The antseed sidecar is the sole writer (it runs `antseed network browse`);
  sources/antseed._load_market is the sole reader. The 15-min sliding window
  that merge-market.js unioned by hand is now a read-time filter on
  observed_at (WHERE observed_at >= now - window); the sidecar prunes rows
  past the window.
- Invariants: store raw, derive by query — no scoring host-side; the negative
  / cached>input / reputation gates stay in offers_sync. Fail-soft: a DB error
  degrades to "no antseed candidates" exactly as a missing dump did.
  Behaviour preserved: offers_sync / market_book unchanged.
- Irreversible: peer_offers is new DB state; market.json is retired.

Changes:
- host_store.py: peer_offers schema (PK (peer_id, service) + observed_at
  index) and a window-filtered peer_offers() reader; truncate hook updated.
- antseed/write-market.js: replaces merge-market.js — flattens the browse
  dump to (peer, service) rows, UPSERTs into peer_offers (type-cleaning at the
  write, mirroring the old Python coercion), prunes past the window.
- sources/antseed.py: _load_market reads host_store.peer_offers(); the file /
  staleness / flatten code and the now-dead coercion helpers are removed.
- Dockerfile.antseed: pin pg@8.16.3 + NODE_PATH so the writer can require it.
- compose.yml: DATABASE_URL + postgres dependency for the antseed service (it
  already shares the llm-router-internal network with postgres).
- tests: seed peer_offers (shared conftest helper) instead of market.json; new
  host_store peer_offers round-trip + window tests.

Sovereignty (Axis 4): pg is the boring standard Postgres client, pinned, and
lives only in the sidecar; no new Python dependency (psycopg is from #36).

Verification: full suite 409 passed, 2 skipped, 0 failed against the compose
Postgres; the real Node writer -> Postgres -> Python reader round-trip,
non-dump validation and window prune checked; the full stack boots healthy and
/x/market surfaces a seeded antseed peer end to end.

* feat(host-store): buyer_status — antseed buyer status off the filesystem

Twin of the peer_offers move: the antseed buyer's status (session pin +
escrow + wallet) goes from status-<id>.json on the shared volume to the
Postgres host store. With both off the filesystem, sources/antseed.py no
longer touches disk and the antseed-market volume is removed entirely.

Form delta:
- Definition: a new `buyer_status` table holds one row per buyer pid — the
  raw buyer-reported fields (pinned_peer_id, deposits_available/_reserved,
  wallet_address, connection_state) as columns. The antseed sidecar writes it
  (write-status.js on the poll loop + control.js after a wallet op);
  sources/antseed reads it (_pinned_peer + balances).
- Invariants: store raw — deposits stay the strings the buyer reports and are
  coerced on read, exactly as the JSON status was. Fail-soft: a missing row /
  store error degrades to "no pin, no balance" as a missing status file did.
  Behaviour preserved: _pinned_peer / balances unchanged but for the source.
- Irreversible: buyer_status is new DB state; status-<id>.json is retired and
  the antseed-market volume (+ both mounts) is dropped.

Changes:
- host_store.py: buyer_status schema + a buyer_status(pid) reader; truncate
  hook updated.
- antseed/store.js: shared buyer_status row shape + UPSERT, used by both
  writers so they can't drift.
- antseed/write-status.js: replaces the inline node -e + atomic_write; reads
  `buyer status --json`, UPSERTs buyer_status, validates (non-status -> no
  write).
- antseed/control.js: refreshStatus UPSERTs buyer_status via a pg pool instead
  of writing the file; still returns the fresh status for the HTTP response.
- antseed/entrypoint.sh: write_status calls write-status.js; the now-dead
  atomic_write helper is removed; comments updated.
- sources/antseed.py: _pinned_peer + balances read host_store.buyer_status;
  the file / json / Path / market_dir machinery is removed (no disk access).
- Dockerfile.antseed: COPY store.js + write-status.js.
- compose.yml: drop the antseed-market volume and its router/antseed mounts.
- tests: seed buyer_status (shared conftest helper) instead of status files;
  new host_store buyer_status round-trip/absent test.

Verification: full suite 410 passed, 2 skipped, 0 failed against the compose
Postgres; the real write-status.js -> Postgres -> Python reader round-trip and
non-status validation checked; all four sidecar JS files pass node --check;
the full stack boots healthy and creates buyer_status on boot.

* feat(host-store): calls carries served_by + tokens_cached (engine #23 bump)

#3 of the operational-store migration: enrich the `calls` fact table with the
two raw per-call facts the #4 route/analytics views will derive from — the
executed route identity and the cache-token breakdown. Prerequisite for keying
per-route stats off the ledger.

- Submodule bump core 97d0333 -> 537e204 (unhardcoded-engine #23): the engine's
  `chosen` now carries `served_by` — the marketplace peer that served the call,
  or the provider itself for a direct route (never nil). Host suite green on it.
- host_store.py: `calls` gains `served_by TEXT` + `tokens_cached BIGINT`, applied
  to existing tables via idempotent `ALTER TABLE ... ADD COLUMN IF NOT EXISTS`
  (CREATE TABLE IF NOT EXISTS never alters an existing table — the store gains its
  first in-place migration). insert_call maps both. route_key is left unchanged:
  deriving a peer-granular route key from served_by is #4's job; this commit only
  captures the raw fact.
- shim.py: `_build_x_router` surfaces `served_by` from `chosen` (tokens_cached was
  already there).
- auth_proxy.py: the ingress threads served_by + tokens_cached off x_router into
  the recorded call (both stream and unary paths) -> insert_call.

ttft was intentionally NOT added: nothing measures it yet, so the column would be
idle (Axis 3). error_type was already a column.

Verification: full suite 411 passed, 2 skipped, 0 failed against the compose
Postgres; the ALTER migration applies in place on boot; a live chat records
served_by + tokens_cached in `calls` end to end against engine #23.

* test(host-store): guard the peer_offers/buyer_status cross-language column contract

peer_offers and buyer_status are CREATEd by the Python host store but WRITTEN by
the Node antseed sidecar (write-market.js, antseed/store.js) and seeded by Python
test mimics (conftest). Three places must agree on the column set and nothing at
runtime makes them: the readers are fail-soft, so a renamed/added/dropped column
degrades antseed to "no candidates" silently -- and the unit suite can't see it,
because it seeds via the Python mimic, not the real Node writer (green proves the
reader works, not that Node and Python agree).

Add a static contract test that parses the column list out of all three sources
and asserts it matches per table. Pure text parsing: no DB, no node runtime, runs
in the ordinary unit suite; red on any drift (verified by injecting a rename).
The live behave e2e stays the only thing exercising the real Node writer; this
guards the part that drifts.

* fix(antseed): guard a non-hex ANTSEED_IDENTITY_HEX in the entrypoint

Prod runs the sidecar as the image now (not the inline node command), so the
entrypoint must keep the inline's safety: a CHANGE_ME / unset-secret placeholder
is not a valid identity and the CLI would reject it. Unset it when it isn't a
64-hex string so the buyer falls back to a generated key on the data volume
(matching the previous inline behaviour); the prod secret is a real hot-wallet.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant