Skip to content

maintainer-only: value-weighted gate-prediction calibration — optimize for merged-net-positive, not raw volume #2348

Description

@JSONbored

This is the core anti-slop objective-function change for Phase 7: today's accuracy measurement (computeGateEval's mergePrecision/closePrecision) treats every PR as equally weighted. This work changes what the calibration OPTIMIZES for — weighting outcomes by whether a merged PR was actually net-positive (survived without a later revert/reopen — the existing reversal_reverted/reversal_reopened signal already tracked in review_audit) rather than raw merge-volume, so a miner (or the fleet) cannot game the accuracy number by producing high volumes of barely-passing, later-reverted PRs.

This touches how the live calibration signal is interpreted and therefore how any future tuning decision (including the auto-tune circuit-breaker) reads the world — it is the definition of a live-scoring-adjacent change, hence maintainer-only.

Deliverables

  • A value-weighted variant of the gate-eval fold that discounts a "merged" outcome later marked reversal_reverted/reversal_reopened
  • A documented, versioned weighting formula (not silently tunable at runtime) so the objective function itself is auditable
  • Regression tests against the existing computeGateEval fixtures plus new reversal-weighted fixtures
  • A migration/rollout note: this changes what "accuracy" means for anyone consuming the eval report, so downstream consumers (auto-tune breaker, orb collector) must be re-verified against the new definition before cutover

References

  • src/review/parity.ts (301 lines) — computeGateEval, the report this changes the fold semantics of
  • src/review/outcomes-wire.ts (536 lines) — recordReversalSignals, the existing reversal (revert/reopen) tracking this weighting reads
  • src/review/auto-tune.ts — the consumer whose forward-measured cautious-only breaker must be re-verified against the new weighted definition

Metadata

Metadata

Assignees

Labels

maintainer-onlyOwner-only work — yields no Gittensor points.

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions