Skip to content

Decentralized EIREL validation: validator-side execution, consensus weights, auditability#6

Open
kyron1112567 wants to merge 13 commits into
mainfrom
feat/decentralized-validation
Open

Decentralized EIREL validation: validator-side execution, consensus weights, auditability#6
kyron1112567 wants to merge 13 commits into
mainfrom
feat/decentralized-validation

Conversation

@kyron1112567

Copy link
Copy Markdown
Contributor

Summary

Decentralizes EIREL validation so no single party — including the owner — can forge or bias a winner, and makes the outcome auditable. Execution and measurement move to the validators; the owner-api shrinks to a registry/roster/task distributor. These 13 commits span the consensus layer, validator-side execution, the validator deployment stack, and auditability hardening, plus three integration bugs caught by an end-to-end live test.

What changed

Consensus & weights

  • Stake-weighted-median gossip consensus for weight setting, gated by EIREL_WEIGHT_MODE (owner → shadow → decentralized).

Validator-side execution (EIREL_EXEC_MODE=local)

  • Each validator runs miner submissions on its own k3s behind its own provider-proxy + tool services + a ledger-only owner-api sink, measuring response/latency/cost/tool-use first-hand.
  • Owner-signed roster + hash-pinned archives make what code runs tamper-evident.
  • k8s validator-stack manifests for the validator-local stack.

Auditability

  • local-exec ignores the owner's cached_baseline so the grading reference is validator-independent.
  • On-chain score-hash commitment (Bittensor commitments pallet) makes validator equivocation provable (EIREL_CONSENSUS_REQUIRE_COMMITMENT, default off).
  • Durable signed score archive (off-chain pre-image of the on-chain commitment) + restart-safe score store.
  • Shadow-mode divergence gate — persisted, hotkey-signed, served at GET /v1/consensus/divergence.

Fixes caught by the live end-to-end test

  • Sign the bare path for owner-api reads — the owner verifies request.url.path (no query string), so signing ?family_id=/?job_id= 401s. Fatal to the local-exec roster fetch; also silently emptied owner-mode tool attestation.
  • EIREL_OWNER_DISABLE_RUNTIME_RECONCILE (default off) so a decentralized owner-api doesn't demote validator-hosted deployments.
  • Raise the eval ledger job_id cap 64 → 128 (the local-exec cost tag task-eval={task_id};deployment={uuid} is ~94 chars).

Validation

  • Live end-to-end test on a fully isolated stack (cloned DB → eirel_test, dedicated eirel-test-* namespaces, zero prod impact): owner-signed roster verified → archive hash-pinned → miner pod booted on the validator's own k3s → invoked locally → cost measured first-hand → composite scored → served as a validator-signed consensus map + signed divergence summary.
  • 651 passed, 1 skipped.

Deploy note

The signed-path + job_id fixes re-activate owner-mode tool attestation, which had been silently inert in prod since 2026-05-07 (benign — ledger_tools always empty but no false knockouts, since required_tool wasn't firing). Watch task_miner_results.ledger_tools after deploy.

🤖 Generated with Claude Code

…ed_claims)

- dashboard TTLCache: add a periodic sweep + 512-entry cap so a churning
  keyspace (run_id / hotkey / pagination) can't grow without bound. get()
  only evicted the key it was asked for, so stale entries accumulated and
  OOMKilled owner-api ~every 3 days. Now hard-bounded.
- eval scoring: anchor grounded_correctness on oracle expected_claims; drop
  the static expected_answer / must_not_claim / abstention_probe paths across
  the validator engine, reconciler, metrics, judge_client, and owner scoring.
- tests updated to match the refactor.

Owner-side re-judge (re_judge_handler + its evaluation_task_manager /
evaluation_tasks hooks) intentionally left uncommitted as WIP.
Add a consensus package so validators set on-chain weights from their own
gossiped, stake-weighted-median-aggregated scores instead of copying the
owner-api /v1/weights pick (which made the owner a single point of trust).

- consensus/aggregate.py: stake-weighted median per miner + winner-take-all
  with dethrone-by-margin hysteresis (tolerates <50% dishonest stake)
- consensus/score_store.py: validator-local retention of its own per-miner
  composites (mirrors owner's coalesce(final_task_score, agreement_score))
- consensus/discovery.py: peer validators/stake/axons read from the metagraph,
  not the owner-api (owner can't censor or partition the validator set)
- consensus/gossip.py: sign own scores; fetch + verify peers (served hotkey
  must match the axon's metagraph hotkey; run/family pinned against replay)
- consensus/weights.py: orchestrate discovery -> gossip -> median -> winner
  under quorum gating; incumbent read from metagraph.I

Integration is behind EIREL_WEIGHT_MODE: owner (default, no behavior change) /
shadow (compute consensus and log divergence vs the owner pick) / decentralized
(set the consensus winner; hold the on-chain incumbent on a quorum miss).
main.py serves the signed GET /v1/consensus/scores/{run_id}; engine.py captures
per-miner scores in the eval loop and runs consensus in the weight loop.

49 consensus tests pass.
…or-side

Remove the owner from the evaluation data path. With EIREL_EXEC_MODE=local a
validator deploys miner submissions on its OWN k3s, behind its own provider-proxy
and tool services, and measures response/latency/cost/tool-use first-hand — so
the stake-weighted-median consensus no longer aggregates inputs from one
forgeable owner oracle. Default EIREL_EXEC_MODE=owner is unchanged.

Owner-signed roster + archive integrity (anti-equivocation):
- shared/common/roster.py: build/sign/verify a per-run roster; each member
  carries hotkey + submission_id + archive_sha256 + manifest_json under the
  owner signature. Validators pin the owner hotkey and refuse any archive whose
  bytes don't match the committed hash.
- owner-api GET /v1/runs/{run_id}/roster + .../submissions/{id}/archive.

Validator-local stack:
- EIREL_OWNER_API_ROLE=ledger: owner-api stripped to the tool-call attestation
  sink (health + /v1/internal/eval/*) on a local store, with bearer read-auth —
  the validator's own ledger. Default "full" role byte-for-byte unchanged.
- _build_network_policy: pure-pod egress now reaches all four tool services;
  host-NAT branch tool ports corrected (add rag 18088, drop spurious 18086).
- docker-compose.validator.local.yml overlay: redis + provider-proxy + 4 tools
  + ledger-sink + engine wiring. .env.validator.example local-exec contract.

Execution path:
- validation/validator/local_exec/runtime_controller.py LocalRunController:
  fetch+verify roster, pull+hash-check archives, deploy roster ∩ claimed
  hotkeys via the reused KubernetesMinerRuntimeManager, teardown/reconcile.
- engine.py: lazy ensure_run on first task; _invoke_one_miner redirects to the
  local pod and stamps X-Eirel-Job-Id/-Job-Token; _judge_miner reads cost from
  the local proxy and attestation from the local ledger; teardown when drained.
- shared/common/job_attribution.py: cost-tag + job-token-header minting shared
  by owner-api runtime.py and the validator so both derive identical values.

Fully decentralized = EIREL_EXEC_MODE=local + EIREL_WEIGHT_MODE=decentralized.
53 new tests; 710 pass across validator/common/owner_api, no regression.
All-in-k3s topology for EIREL_EXEC_MODE=local: provider-proxy + the four tool
services + redis + ledger-sink (owner-api in EIREL_OWNER_API_ROLE=ledger) as
pods in the eirel-system namespace. Pod `app:` labels and the namespace
`name: eirel-system` label match the miner NetworkPolicy pure-pod egress branch,
so miner pods in eirel-miners reach exactly these and nothing else.

- namespace.yaml: eirel-system + eirel-miners, both with the `name` label the
  NetworkPolicy namespaceSelector matches.
- configmap.yaml / secret.example.yaml: shared non-secret config; validator-
  generated tokens + provider keys template (never committed filled, never in a
  miner pod).
- deployments.yaml / services.yaml: 7 deployments + 7 services; sandbox keeps the
  hardened securityContext (non-root, drop-all, read-only rootfs, tmpfs, mem cap).
- README.md: deploy steps + the engine env that points at this stack.

`kubectl kustomize` builds 17 objects; all manifests parse.
… cutover

Shadow mode logged that the consensus winner diverged from the owner pick, but
nothing accumulated how often they agree across runs — so there was no
quantitative gate for flipping EIREL_WEIGHT_MODE to decentralized.

- consensus/divergence.py: top1_agreement, spearman_rank_correlation (tie-
  corrected), topk_overlap, and DivergenceTracker — a windowed accumulator that
  dedupes re-armed continuous-pool runs and reports running winner-agreement
  rate + mean rank rho.
- engine.py shadow branch records (consensus winner vs owner winner, local score
  map) per run and logs the running summary, e.g.
  "shadow divergence over 18 run(s): winner_agreement=94% mean_rank_rho=0.971".
  owner/decentralized modes untouched.

9 new tests; 311 pass across validator/common.
…r variance

Two decentralization open questions resolved:

- Cost in scoring (verified, no behavior change): cost is NOT a scoring input
  today — score_latency_cost skips the cost ramp when cost_budget_usd is unset,
  and judge_eval_composite omits cost_budget/floor so the eiretes composite has
  no threshold to score cost_usd against (cost is leaderboard-display only). So
  per-validator price variance under EIREL_EXEC_MODE=local cannot pollute the
  stake-weighted-median consensus. Added inline guard notes at both engine cost
  call sites: wiring a cost budget/floor later requires per-validator cost
  normalization first, or raw dollars (which differ by each validator's provider
  prices) would skew the median.

- Ledger durability: ledger-sink now stores its SQLite attestation DB on a PVC
  (validator-stack/pvc.yaml, default StorageClass) instead of emptyDir, so a pod
  restart mid-run no longer zeroes the tool-call ledger. README documents the
  emptyDir fallback for clusters without a default StorageClass.

311 pass across validator/common; kustomize build now 18 objects.
… exec

The audit found the last owner scoring lever surviving EIREL_EXEC_MODE=local:
when the claim carried a cached_baseline, the validator used the owner's
response_text + reconciled expected_claims as the pairwise comparator and
grounded-correctness gold, skipping its own oracle — so the owner could still
shape what counts as "correct."

In local/decentralized exec the grading reference must be validator-independent.
_should_use_cached_baseline() now returns False whenever EIREL_EXEC_MODE=local
(regardless of a present cached_baseline), so each validator recomputes its own
3-oracle reference; the stake-weighted-median consensus tolerates the
per-validator variance. The local-oracle path already falls back to "disputed"
on any oracle error, so bypassing the cache is safe. Owner mode keeps the cache
as a cross-cycle cost optimization, unchanged.

5 new cases; 316 pass across validator/common.
Closes the equivocation auditability gap: a dishonest validator could sign two
conflicting score maps for one run and serve them to different peers, undetected.

- consensus/commitment.py: H(run_id, family_id, sorted scores) → an on-chain
  payload "eirel/scores/v1:<run>:<sha256>". Pure hash/build/parse/verify core +
  thin duck-typed wrappers over subtensor.commit / get_all_commitments.
- gossip.py: fetch_peer/collect_reports gain optional commitments map +
  require_commitment; under enforcement a peer's gossiped scores must hash to
  its on-chain commitment for the run, else it's dropped. A validator matches at
  most one commitment per run, so a second conflicting map is rejected. Self is
  never commitment-checked.
- weights.py compute_consensus threads commitments + require_commitment through.
- engine.py: validators commit their score hash on-chain once per run
  (EIREL_COMMIT_SCORES, default on in shadow/decentralized — builds the
  auditable trail) and read all commitments to verify peers. Enforcement
  (EIREL_CONSENSUS_REQUIRE_COMMITMENT) defaults off so it flips on once
  validators are committing. Reuses the weight loop's existing wallet/subtensor.
  Best-effort: chain errors never block consensus.

11 new tests; 327 pass across validator/common.
Closes the ephemeral-scores audit gap: validator scores were served from the
in-memory STORE and vanished on restart/run-end, so an auditor couldn't later
ask "what did validator V sign for run R?"

- consensus/archive.py: ScoreArchive persists the canonical SIGNED score report
  per run ({run_id, family_id, validator_hotkey, scores, timestamp, signature})
  to one JSON file (atomic, path-sanitized). This is the off-chain pre-image for
  the on-chain commitment: an auditor verifies the signature and checks its
  scores hash to the validator's on-chain commitment for the run.
- engine.py: writes the signed archive once per run alongside the on-chain commit
  (same scores, so commitment hash and archived pre-image are mutually
  consistent); persists the raw STORE after each claim batch.
- main.py: GET /v1/consensus/scores/{run_id} returns the durable archived report
  when present (fixed signature, survives restart) else signs live; loads the
  persisted STORE on startup.
- score_store.py: store_path() (EIREL_SCORE_STORE_PATH / EIREL_DATA_DIR) shared
  by persist + load. All best-effort — disk errors never break the loops.

4 new tests; 331 pass across validator/common.
Closes the last audit gap: the shadow→decentralized flip was gated on divergence
stats that lived only in memory and only in logs — reset on restart and
self-attested, so the decision to hand consensus on-chain control couldn't be
independently verified.

- divergence.py: DivergenceTracker gains persist()/load() (atomic JSON) so the
  gate window is a durable cross-restart history; sign_divergence()/
  verify_divergence() produce + check a hotkey-signed summary (running
  winner-agreement + mean rank ρ + the per-run records behind them); TRACKER is
  now the shared singleton + divergence_path().
- engine.py: records owner per-miner weights as owner_scores (so rank ρ
  populates when ≥2 miners overlap; sparse/WTA → stays n/a, no overclaim) and
  persists the tracker after each shadow cycle. Uses the shared TRACKER.
- main.py: GET /v1/consensus/divergence serves the signed summary; loads the
  persisted tracker on startup.

All best-effort. 5 new tests; 336 pass across validator/common.
…ring)

The owner-api verifies request signatures over ``request.url.path`` — the path
WITHOUT the query string (shared/common/security.authenticate_request). The
local-exec roster fetch and the ledger-tools fetch signed the full path
including ``?family_id=``/``?job_id=``, producing a signature the server can't
reproduce -> 401. Roster fetch is fatal to local execution; the ledger fetch
silently fell back to empty tools (zeroing tool attestation).

Sign ``path.split("?", 1)[0]`` in both the engine's shared _signed_headers and
LocalRunController._signed_get. Caught by a live single-validator local-exec
run; unit tests used a mock HTTP client and never hit real signature verify.
…zed exec

In decentralized execution the owner no longer hosts miner runtimes — each
validator runs submissions on its own cluster. A full-role owner-api still ran
the startup reconcile + capacity/health/reaper loops, which demote deployments
it isn't actually hosting (active+healthy -> unhealthy), emptying the validator
claim. Gate node-inventory + reconcile + the management loops behind
EIREL_OWNER_DISABLE_RUNTIME_RECONCILE (default off -> owner-hosted path
unchanged) so the owner-api can serve registry/roster/claim/archive only.
… tag

The local-exec cost tag is ``task-eval={task_id};deployment={deployment_id}`` —
a real task_id plus a 36-char deployment UUID is ~80-98 chars, overrunning the
64-char job_id limit on both the tool-call write body and the job_ledger read
query. Every validator-local tool-call write/read 422'd, silently emptying tool
attestation. Raise both job_id limits to 128 (tool_name/args_hash/run_id keep
64). Verified live: ledger read goes 422 -> 200.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant