Decentralized EIREL validation: validator-side execution, consensus weights, auditability#6
Open
kyron1112567 wants to merge 13 commits into
Open
Decentralized EIREL validation: validator-side execution, consensus weights, auditability#6kyron1112567 wants to merge 13 commits into
kyron1112567 wants to merge 13 commits into
Conversation
…ed_claims) - dashboard TTLCache: add a periodic sweep + 512-entry cap so a churning keyspace (run_id / hotkey / pagination) can't grow without bound. get() only evicted the key it was asked for, so stale entries accumulated and OOMKilled owner-api ~every 3 days. Now hard-bounded. - eval scoring: anchor grounded_correctness on oracle expected_claims; drop the static expected_answer / must_not_claim / abstention_probe paths across the validator engine, reconciler, metrics, judge_client, and owner scoring. - tests updated to match the refactor. Owner-side re-judge (re_judge_handler + its evaluation_task_manager / evaluation_tasks hooks) intentionally left uncommitted as WIP.
Add a consensus package so validators set on-chain weights from their own
gossiped, stake-weighted-median-aggregated scores instead of copying the
owner-api /v1/weights pick (which made the owner a single point of trust).
- consensus/aggregate.py: stake-weighted median per miner + winner-take-all
with dethrone-by-margin hysteresis (tolerates <50% dishonest stake)
- consensus/score_store.py: validator-local retention of its own per-miner
composites (mirrors owner's coalesce(final_task_score, agreement_score))
- consensus/discovery.py: peer validators/stake/axons read from the metagraph,
not the owner-api (owner can't censor or partition the validator set)
- consensus/gossip.py: sign own scores; fetch + verify peers (served hotkey
must match the axon's metagraph hotkey; run/family pinned against replay)
- consensus/weights.py: orchestrate discovery -> gossip -> median -> winner
under quorum gating; incumbent read from metagraph.I
Integration is behind EIREL_WEIGHT_MODE: owner (default, no behavior change) /
shadow (compute consensus and log divergence vs the owner pick) / decentralized
(set the consensus winner; hold the on-chain incumbent on a quorum miss).
main.py serves the signed GET /v1/consensus/scores/{run_id}; engine.py captures
per-miner scores in the eval loop and runs consensus in the weight loop.
49 consensus tests pass.
…or-side
Remove the owner from the evaluation data path. With EIREL_EXEC_MODE=local a
validator deploys miner submissions on its OWN k3s, behind its own provider-proxy
and tool services, and measures response/latency/cost/tool-use first-hand — so
the stake-weighted-median consensus no longer aggregates inputs from one
forgeable owner oracle. Default EIREL_EXEC_MODE=owner is unchanged.
Owner-signed roster + archive integrity (anti-equivocation):
- shared/common/roster.py: build/sign/verify a per-run roster; each member
carries hotkey + submission_id + archive_sha256 + manifest_json under the
owner signature. Validators pin the owner hotkey and refuse any archive whose
bytes don't match the committed hash.
- owner-api GET /v1/runs/{run_id}/roster + .../submissions/{id}/archive.
Validator-local stack:
- EIREL_OWNER_API_ROLE=ledger: owner-api stripped to the tool-call attestation
sink (health + /v1/internal/eval/*) on a local store, with bearer read-auth —
the validator's own ledger. Default "full" role byte-for-byte unchanged.
- _build_network_policy: pure-pod egress now reaches all four tool services;
host-NAT branch tool ports corrected (add rag 18088, drop spurious 18086).
- docker-compose.validator.local.yml overlay: redis + provider-proxy + 4 tools
+ ledger-sink + engine wiring. .env.validator.example local-exec contract.
Execution path:
- validation/validator/local_exec/runtime_controller.py LocalRunController:
fetch+verify roster, pull+hash-check archives, deploy roster ∩ claimed
hotkeys via the reused KubernetesMinerRuntimeManager, teardown/reconcile.
- engine.py: lazy ensure_run on first task; _invoke_one_miner redirects to the
local pod and stamps X-Eirel-Job-Id/-Job-Token; _judge_miner reads cost from
the local proxy and attestation from the local ledger; teardown when drained.
- shared/common/job_attribution.py: cost-tag + job-token-header minting shared
by owner-api runtime.py and the validator so both derive identical values.
Fully decentralized = EIREL_EXEC_MODE=local + EIREL_WEIGHT_MODE=decentralized.
53 new tests; 710 pass across validator/common/owner_api, no regression.
All-in-k3s topology for EIREL_EXEC_MODE=local: provider-proxy + the four tool services + redis + ledger-sink (owner-api in EIREL_OWNER_API_ROLE=ledger) as pods in the eirel-system namespace. Pod `app:` labels and the namespace `name: eirel-system` label match the miner NetworkPolicy pure-pod egress branch, so miner pods in eirel-miners reach exactly these and nothing else. - namespace.yaml: eirel-system + eirel-miners, both with the `name` label the NetworkPolicy namespaceSelector matches. - configmap.yaml / secret.example.yaml: shared non-secret config; validator- generated tokens + provider keys template (never committed filled, never in a miner pod). - deployments.yaml / services.yaml: 7 deployments + 7 services; sandbox keeps the hardened securityContext (non-root, drop-all, read-only rootfs, tmpfs, mem cap). - README.md: deploy steps + the engine env that points at this stack. `kubectl kustomize` builds 17 objects; all manifests parse.
… cutover Shadow mode logged that the consensus winner diverged from the owner pick, but nothing accumulated how often they agree across runs — so there was no quantitative gate for flipping EIREL_WEIGHT_MODE to decentralized. - consensus/divergence.py: top1_agreement, spearman_rank_correlation (tie- corrected), topk_overlap, and DivergenceTracker — a windowed accumulator that dedupes re-armed continuous-pool runs and reports running winner-agreement rate + mean rank rho. - engine.py shadow branch records (consensus winner vs owner winner, local score map) per run and logs the running summary, e.g. "shadow divergence over 18 run(s): winner_agreement=94% mean_rank_rho=0.971". owner/decentralized modes untouched. 9 new tests; 311 pass across validator/common.
…r variance Two decentralization open questions resolved: - Cost in scoring (verified, no behavior change): cost is NOT a scoring input today — score_latency_cost skips the cost ramp when cost_budget_usd is unset, and judge_eval_composite omits cost_budget/floor so the eiretes composite has no threshold to score cost_usd against (cost is leaderboard-display only). So per-validator price variance under EIREL_EXEC_MODE=local cannot pollute the stake-weighted-median consensus. Added inline guard notes at both engine cost call sites: wiring a cost budget/floor later requires per-validator cost normalization first, or raw dollars (which differ by each validator's provider prices) would skew the median. - Ledger durability: ledger-sink now stores its SQLite attestation DB on a PVC (validator-stack/pvc.yaml, default StorageClass) instead of emptyDir, so a pod restart mid-run no longer zeroes the tool-call ledger. README documents the emptyDir fallback for clusters without a default StorageClass. 311 pass across validator/common; kustomize build now 18 objects.
… exec The audit found the last owner scoring lever surviving EIREL_EXEC_MODE=local: when the claim carried a cached_baseline, the validator used the owner's response_text + reconciled expected_claims as the pairwise comparator and grounded-correctness gold, skipping its own oracle — so the owner could still shape what counts as "correct." In local/decentralized exec the grading reference must be validator-independent. _should_use_cached_baseline() now returns False whenever EIREL_EXEC_MODE=local (regardless of a present cached_baseline), so each validator recomputes its own 3-oracle reference; the stake-weighted-median consensus tolerates the per-validator variance. The local-oracle path already falls back to "disputed" on any oracle error, so bypassing the cache is safe. Owner mode keeps the cache as a cross-cycle cost optimization, unchanged. 5 new cases; 316 pass across validator/common.
Closes the equivocation auditability gap: a dishonest validator could sign two conflicting score maps for one run and serve them to different peers, undetected. - consensus/commitment.py: H(run_id, family_id, sorted scores) → an on-chain payload "eirel/scores/v1:<run>:<sha256>". Pure hash/build/parse/verify core + thin duck-typed wrappers over subtensor.commit / get_all_commitments. - gossip.py: fetch_peer/collect_reports gain optional commitments map + require_commitment; under enforcement a peer's gossiped scores must hash to its on-chain commitment for the run, else it's dropped. A validator matches at most one commitment per run, so a second conflicting map is rejected. Self is never commitment-checked. - weights.py compute_consensus threads commitments + require_commitment through. - engine.py: validators commit their score hash on-chain once per run (EIREL_COMMIT_SCORES, default on in shadow/decentralized — builds the auditable trail) and read all commitments to verify peers. Enforcement (EIREL_CONSENSUS_REQUIRE_COMMITMENT) defaults off so it flips on once validators are committing. Reuses the weight loop's existing wallet/subtensor. Best-effort: chain errors never block consensus. 11 new tests; 327 pass across validator/common.
Closes the ephemeral-scores audit gap: validator scores were served from the
in-memory STORE and vanished on restart/run-end, so an auditor couldn't later
ask "what did validator V sign for run R?"
- consensus/archive.py: ScoreArchive persists the canonical SIGNED score report
per run ({run_id, family_id, validator_hotkey, scores, timestamp, signature})
to one JSON file (atomic, path-sanitized). This is the off-chain pre-image for
the on-chain commitment: an auditor verifies the signature and checks its
scores hash to the validator's on-chain commitment for the run.
- engine.py: writes the signed archive once per run alongside the on-chain commit
(same scores, so commitment hash and archived pre-image are mutually
consistent); persists the raw STORE after each claim batch.
- main.py: GET /v1/consensus/scores/{run_id} returns the durable archived report
when present (fixed signature, survives restart) else signs live; loads the
persisted STORE on startup.
- score_store.py: store_path() (EIREL_SCORE_STORE_PATH / EIREL_DATA_DIR) shared
by persist + load. All best-effort — disk errors never break the loops.
4 new tests; 331 pass across validator/common.
Closes the last audit gap: the shadow→decentralized flip was gated on divergence stats that lived only in memory and only in logs — reset on restart and self-attested, so the decision to hand consensus on-chain control couldn't be independently verified. - divergence.py: DivergenceTracker gains persist()/load() (atomic JSON) so the gate window is a durable cross-restart history; sign_divergence()/ verify_divergence() produce + check a hotkey-signed summary (running winner-agreement + mean rank ρ + the per-run records behind them); TRACKER is now the shared singleton + divergence_path(). - engine.py: records owner per-miner weights as owner_scores (so rank ρ populates when ≥2 miners overlap; sparse/WTA → stays n/a, no overclaim) and persists the tracker after each shadow cycle. Uses the shared TRACKER. - main.py: GET /v1/consensus/divergence serves the signed summary; loads the persisted tracker on startup. All best-effort. 5 new tests; 336 pass across validator/common.
…ring)
The owner-api verifies request signatures over ``request.url.path`` — the path
WITHOUT the query string (shared/common/security.authenticate_request). The
local-exec roster fetch and the ledger-tools fetch signed the full path
including ``?family_id=``/``?job_id=``, producing a signature the server can't
reproduce -> 401. Roster fetch is fatal to local execution; the ledger fetch
silently fell back to empty tools (zeroing tool attestation).
Sign ``path.split("?", 1)[0]`` in both the engine's shared _signed_headers and
LocalRunController._signed_get. Caught by a live single-validator local-exec
run; unit tests used a mock HTTP client and never hit real signature verify.
…zed exec In decentralized execution the owner no longer hosts miner runtimes — each validator runs submissions on its own cluster. A full-role owner-api still ran the startup reconcile + capacity/health/reaper loops, which demote deployments it isn't actually hosting (active+healthy -> unhealthy), emptying the validator claim. Gate node-inventory + reconcile + the management loops behind EIREL_OWNER_DISABLE_RUNTIME_RECONCILE (default off -> owner-hosted path unchanged) so the owner-api can serve registry/roster/claim/archive only.
… tag
The local-exec cost tag is ``task-eval={task_id};deployment={deployment_id}`` —
a real task_id plus a 36-char deployment UUID is ~80-98 chars, overrunning the
64-char job_id limit on both the tool-call write body and the job_ledger read
query. Every validator-local tool-call write/read 422'd, silently emptying tool
attestation. Raise both job_id limits to 128 (tool_name/args_hash/run_id keep
64). Verified live: ledger read goes 422 -> 200.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Decentralizes EIREL validation so no single party — including the owner — can forge or bias a winner, and makes the outcome auditable. Execution and measurement move to the validators; the owner-api shrinks to a registry/roster/task distributor. These 13 commits span the consensus layer, validator-side execution, the validator deployment stack, and auditability hardening, plus three integration bugs caught by an end-to-end live test.
What changed
Consensus & weights
EIREL_WEIGHT_MODE(owner → shadow → decentralized).Validator-side execution (
EIREL_EXEC_MODE=local)validator-stackmanifests for the validator-local stack.Auditability
cached_baselineso the grading reference is validator-independent.EIREL_CONSENSUS_REQUIRE_COMMITMENT, default off).GET /v1/consensus/divergence.Fixes caught by the live end-to-end test
request.url.path(no query string), so signing?family_id=/?job_id=401s. Fatal to the local-exec roster fetch; also silently emptied owner-mode tool attestation.EIREL_OWNER_DISABLE_RUNTIME_RECONCILE(default off) so a decentralized owner-api doesn't demote validator-hosted deployments.job_idcap 64 → 128 (the local-exec cost tagtask-eval={task_id};deployment={uuid}is ~94 chars).Validation
eirel_test, dedicatedeirel-test-*namespaces, zero prod impact): owner-signed roster verified → archive hash-pinned → miner pod booted on the validator's own k3s → invoked locally → cost measured first-hand → composite scored → served as a validator-signed consensus map + signed divergence summary.651 passed, 1 skipped.Deploy note
The signed-path +
job_idfixes re-activate owner-mode tool attestation, which had been silently inert in prod since 2026-05-07 (benign —ledger_toolsalways empty but no false knockouts, sincerequired_toolwasn't firing). Watchtask_miner_results.ledger_toolsafter deploy.🤖 Generated with Claude Code