Decentralized EIREL validation: validator-side execution, consensus weights, auditability by kyron1112567 · Pull Request #6 · RendixNetwork/eirel-ai

kyron1112567 · 2026-06-29T12:14:42Z

Summary

Decentralizes EIREL validation so no single party — including the owner — can forge or bias a winner, and makes the outcome auditable. Execution and measurement move to the validators; the owner-api shrinks to a registry/roster/task distributor. These 13 commits span the consensus layer, validator-side execution, the validator deployment stack, and auditability hardening, plus three integration bugs caught by an end-to-end live test.

What changed

Consensus & weights

Stake-weighted-median gossip consensus for weight setting, gated by EIREL_WEIGHT_MODE (owner → shadow → decentralized).

Validator-side execution (EIREL_EXEC_MODE=local)

Each validator runs miner submissions on its own k3s behind its own provider-proxy + tool services + a ledger-only owner-api sink, measuring response/latency/cost/tool-use first-hand.
Owner-signed roster + hash-pinned archives make what code runs tamper-evident.
k8s validator-stack manifests for the validator-local stack.

Auditability

local-exec ignores the owner's cached_baseline so the grading reference is validator-independent.
On-chain score-hash commitment (Bittensor commitments pallet) makes validator equivocation provable (EIREL_CONSENSUS_REQUIRE_COMMITMENT, default off).
Durable signed score archive (off-chain pre-image of the on-chain commitment) + restart-safe score store.
Shadow-mode divergence gate — persisted, hotkey-signed, served at GET /v1/consensus/divergence.

Fixes caught by the live end-to-end test

Sign the bare path for owner-api reads — the owner verifies request.url.path (no query string), so signing ?family_id=/?job_id= 401s. Fatal to the local-exec roster fetch; also silently emptied owner-mode tool attestation.
EIREL_OWNER_DISABLE_RUNTIME_RECONCILE (default off) so a decentralized owner-api doesn't demote validator-hosted deployments.
Raise the eval ledger job_id cap 64 → 128 (the local-exec cost tag task-eval={task_id};deployment={uuid} is ~94 chars).

Validation

Live end-to-end test on a fully isolated stack (cloned DB → eirel_test, dedicated eirel-test-* namespaces, zero prod impact): owner-signed roster verified → archive hash-pinned → miner pod booted on the validator's own k3s → invoked locally → cost measured first-hand → composite scored → served as a validator-signed consensus map + signed divergence summary.
651 passed, 1 skipped.

Deploy note

The signed-path + job_id fixes re-activate owner-mode tool attestation, which had been silently inert in prod since 2026-05-07 (benign — ledger_tools always empty but no false knockouts, since required_tool wasn't firing). Watch task_miner_results.ledger_tools after deploy.

🤖 Generated with Claude Code

…ed_claims) - dashboard TTLCache: add a periodic sweep + 512-entry cap so a churning keyspace (run_id / hotkey / pagination) can't grow without bound. get() only evicted the key it was asked for, so stale entries accumulated and OOMKilled owner-api ~every 3 days. Now hard-bounded. - eval scoring: anchor grounded_correctness on oracle expected_claims; drop the static expected_answer / must_not_claim / abstention_probe paths across the validator engine, reconciler, metrics, judge_client, and owner scoring. - tests updated to match the refactor. Owner-side re-judge (re_judge_handler + its evaluation_task_manager / evaluation_tasks hooks) intentionally left uncommitted as WIP.

Add a consensus package so validators set on-chain weights from their own gossiped, stake-weighted-median-aggregated scores instead of copying the owner-api /v1/weights pick (which made the owner a single point of trust). - consensus/aggregate.py: stake-weighted median per miner + winner-take-all with dethrone-by-margin hysteresis (tolerates <50% dishonest stake) - consensus/score_store.py: validator-local retention of its own per-miner composites (mirrors owner's coalesce(final_task_score, agreement_score)) - consensus/discovery.py: peer validators/stake/axons read from the metagraph, not the owner-api (owner can't censor or partition the validator set) - consensus/gossip.py: sign own scores; fetch + verify peers (served hotkey must match the axon's metagraph hotkey; run/family pinned against replay) - consensus/weights.py: orchestrate discovery -> gossip -> median -> winner under quorum gating; incumbent read from metagraph.I Integration is behind EIREL_WEIGHT_MODE: owner (default, no behavior change) / shadow (compute consensus and log divergence vs the owner pick) / decentralized (set the consensus winner; hold the on-chain incumbent on a quorum miss). main.py serves the signed GET /v1/consensus/scores/{run_id}; engine.py captures per-miner scores in the eval loop and runs consensus in the weight loop. 49 consensus tests pass.

…or-side Remove the owner from the evaluation data path. With EIREL_EXEC_MODE=local a validator deploys miner submissions on its OWN k3s, behind its own provider-proxy and tool services, and measures response/latency/cost/tool-use first-hand — so the stake-weighted-median consensus no longer aggregates inputs from one forgeable owner oracle. Default EIREL_EXEC_MODE=owner is unchanged. Owner-signed roster + archive integrity (anti-equivocation): - shared/common/roster.py: build/sign/verify a per-run roster; each member carries hotkey + submission_id + archive_sha256 + manifest_json under the owner signature. Validators pin the owner hotkey and refuse any archive whose bytes don't match the committed hash. - owner-api GET /v1/runs/{run_id}/roster + .../submissions/{id}/archive. Validator-local stack: - EIREL_OWNER_API_ROLE=ledger: owner-api stripped to the tool-call attestation sink (health + /v1/internal/eval/*) on a local store, with bearer read-auth — the validator's own ledger. Default "full" role byte-for-byte unchanged. - _build_network_policy: pure-pod egress now reaches all four tool services; host-NAT branch tool ports corrected (add rag 18088, drop spurious 18086). - docker-compose.validator.local.yml overlay: redis + provider-proxy + 4 tools + ledger-sink + engine wiring. .env.validator.example local-exec contract. Execution path: - validation/validator/local_exec/runtime_controller.py LocalRunController: fetch+verify roster, pull+hash-check archives, deploy roster ∩ claimed hotkeys via the reused KubernetesMinerRuntimeManager, teardown/reconcile. - engine.py: lazy ensure_run on first task; _invoke_one_miner redirects to the local pod and stamps X-Eirel-Job-Id/-Job-Token; _judge_miner reads cost from the local proxy and attestation from the local ledger; teardown when drained. - shared/common/job_attribution.py: cost-tag + job-token-header minting shared by owner-api runtime.py and the validator so both derive identical values. Fully decentralized = EIREL_EXEC_MODE=local + EIREL_WEIGHT_MODE=decentralized. 53 new tests; 710 pass across validator/common/owner_api, no regression.

All-in-k3s topology for EIREL_EXEC_MODE=local: provider-proxy + the four tool services + redis + ledger-sink (owner-api in EIREL_OWNER_API_ROLE=ledger) as pods in the eirel-system namespace. Pod `app:` labels and the namespace `name: eirel-system` label match the miner NetworkPolicy pure-pod egress branch, so miner pods in eirel-miners reach exactly these and nothing else. - namespace.yaml: eirel-system + eirel-miners, both with the `name` label the NetworkPolicy namespaceSelector matches. - configmap.yaml / secret.example.yaml: shared non-secret config; validator- generated tokens + provider keys template (never committed filled, never in a miner pod). - deployments.yaml / services.yaml: 7 deployments + 7 services; sandbox keeps the hardened securityContext (non-root, drop-all, read-only rootfs, tmpfs, mem cap). - README.md: deploy steps + the engine env that points at this stack. `kubectl kustomize` builds 17 objects; all manifests parse.

… cutover Shadow mode logged that the consensus winner diverged from the owner pick, but nothing accumulated how often they agree across runs — so there was no quantitative gate for flipping EIREL_WEIGHT_MODE to decentralized. - consensus/divergence.py: top1_agreement, spearman_rank_correlation (tie- corrected), topk_overlap, and DivergenceTracker — a windowed accumulator that dedupes re-armed continuous-pool runs and reports running winner-agreement rate + mean rank rho. - engine.py shadow branch records (consensus winner vs owner winner, local score map) per run and logs the running summary, e.g. "shadow divergence over 18 run(s): winner_agreement=94% mean_rank_rho=0.971". owner/decentralized modes untouched. 9 new tests; 311 pass across validator/common.

…r variance Two decentralization open questions resolved: - Cost in scoring (verified, no behavior change): cost is NOT a scoring input today — score_latency_cost skips the cost ramp when cost_budget_usd is unset, and judge_eval_composite omits cost_budget/floor so the eiretes composite has no threshold to score cost_usd against (cost is leaderboard-display only). So per-validator price variance under EIREL_EXEC_MODE=local cannot pollute the stake-weighted-median consensus. Added inline guard notes at both engine cost call sites: wiring a cost budget/floor later requires per-validator cost normalization first, or raw dollars (which differ by each validator's provider prices) would skew the median. - Ledger durability: ledger-sink now stores its SQLite attestation DB on a PVC (validator-stack/pvc.yaml, default StorageClass) instead of emptyDir, so a pod restart mid-run no longer zeroes the tool-call ledger. README documents the emptyDir fallback for clusters without a default StorageClass. 311 pass across validator/common; kustomize build now 18 objects.

… exec The audit found the last owner scoring lever surviving EIREL_EXEC_MODE=local: when the claim carried a cached_baseline, the validator used the owner's response_text + reconciled expected_claims as the pairwise comparator and grounded-correctness gold, skipping its own oracle — so the owner could still shape what counts as "correct." In local/decentralized exec the grading reference must be validator-independent. _should_use_cached_baseline() now returns False whenever EIREL_EXEC_MODE=local (regardless of a present cached_baseline), so each validator recomputes its own 3-oracle reference; the stake-weighted-median consensus tolerates the per-validator variance. The local-oracle path already falls back to "disputed" on any oracle error, so bypassing the cache is safe. Owner mode keeps the cache as a cross-cycle cost optimization, unchanged. 5 new cases; 316 pass across validator/common.

Closes the equivocation auditability gap: a dishonest validator could sign two conflicting score maps for one run and serve them to different peers, undetected. - consensus/commitment.py: H(run_id, family_id, sorted scores) → an on-chain payload "eirel/scores/v1:<run>:<sha256>". Pure hash/build/parse/verify core + thin duck-typed wrappers over subtensor.commit / get_all_commitments. - gossip.py: fetch_peer/collect_reports gain optional commitments map + require_commitment; under enforcement a peer's gossiped scores must hash to its on-chain commitment for the run, else it's dropped. A validator matches at most one commitment per run, so a second conflicting map is rejected. Self is never commitment-checked. - weights.py compute_consensus threads commitments + require_commitment through. - engine.py: validators commit their score hash on-chain once per run (EIREL_COMMIT_SCORES, default on in shadow/decentralized — builds the auditable trail) and read all commitments to verify peers. Enforcement (EIREL_CONSENSUS_REQUIRE_COMMITMENT) defaults off so it flips on once validators are committing. Reuses the weight loop's existing wallet/subtensor. Best-effort: chain errors never block consensus. 11 new tests; 327 pass across validator/common.

Closes the ephemeral-scores audit gap: validator scores were served from the in-memory STORE and vanished on restart/run-end, so an auditor couldn't later ask "what did validator V sign for run R?" - consensus/archive.py: ScoreArchive persists the canonical SIGNED score report per run ({run_id, family_id, validator_hotkey, scores, timestamp, signature}) to one JSON file (atomic, path-sanitized). This is the off-chain pre-image for the on-chain commitment: an auditor verifies the signature and checks its scores hash to the validator's on-chain commitment for the run. - engine.py: writes the signed archive once per run alongside the on-chain commit (same scores, so commitment hash and archived pre-image are mutually consistent); persists the raw STORE after each claim batch. - main.py: GET /v1/consensus/scores/{run_id} returns the durable archived report when present (fixed signature, survives restart) else signs live; loads the persisted STORE on startup. - score_store.py: store_path() (EIREL_SCORE_STORE_PATH / EIREL_DATA_DIR) shared by persist + load. All best-effort — disk errors never break the loops. 4 new tests; 331 pass across validator/common.

Closes the last audit gap: the shadow→decentralized flip was gated on divergence stats that lived only in memory and only in logs — reset on restart and self-attested, so the decision to hand consensus on-chain control couldn't be independently verified. - divergence.py: DivergenceTracker gains persist()/load() (atomic JSON) so the gate window is a durable cross-restart history; sign_divergence()/ verify_divergence() produce + check a hotkey-signed summary (running winner-agreement + mean rank ρ + the per-run records behind them); TRACKER is now the shared singleton + divergence_path(). - engine.py: records owner per-miner weights as owner_scores (so rank ρ populates when ≥2 miners overlap; sparse/WTA → stays n/a, no overclaim) and persists the tracker after each shadow cycle. Uses the shared TRACKER. - main.py: GET /v1/consensus/divergence serves the signed summary; loads the persisted tracker on startup. All best-effort. 5 new tests; 336 pass across validator/common.

…ring) The owner-api verifies request signatures over ``request.url.path`` — the path WITHOUT the query string (shared/common/security.authenticate_request). The local-exec roster fetch and the ledger-tools fetch signed the full path including ``?family_id=``/``?job_id=``, producing a signature the server can't reproduce -> 401. Roster fetch is fatal to local execution; the ledger fetch silently fell back to empty tools (zeroing tool attestation). Sign ``path.split("?", 1)[0]`` in both the engine's shared _signed_headers and LocalRunController._signed_get. Caught by a live single-validator local-exec run; unit tests used a mock HTTP client and never hit real signature verify.

…zed exec In decentralized execution the owner no longer hosts miner runtimes — each validator runs submissions on its own cluster. A full-role owner-api still ran the startup reconcile + capacity/health/reaper loops, which demote deployments it isn't actually hosting (active+healthy -> unhealthy), emptying the validator claim. Gate node-inventory + reconcile + the management loops behind EIREL_OWNER_DISABLE_RUNTIME_RECONCILE (default off -> owner-hosted path unchanged) so the owner-api can serve registry/roster/claim/archive only.

… tag The local-exec cost tag is ``task-eval={task_id};deployment={deployment_id}`` — a real task_id plus a 36-char deployment UUID is ~80-98 chars, overrunning the 64-char job_id limit on both the tool-call write body and the job_ledger read query. Every validator-local tool-call write/read 422'd, silently emptying tool attestation. Raise both job_id limits to 128 (tool_name/args_hash/run_id keep 64). Verified live: ledger read goes 422 -> 200.

kyron1112567 added 13 commits June 24, 2026 11:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Decentralized EIREL validation: validator-side execution, consensus weights, auditability#6

Decentralized EIREL validation: validator-side execution, consensus weights, auditability#6
kyron1112567 wants to merge 13 commits into
mainfrom
feat/decentralized-validation

kyron1112567 commented Jun 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

kyron1112567 commented Jun 29, 2026

Summary

What changed

Validation

Deploy note

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant