Skip to content

feat(grading): hidden-criteria firewall + held-out blend as a substrate primitive#283

Merged
drewstone merged 1 commit into
mainfrom
lift/hidden-criteria-firewall
Jun 24, 2026
Merged

feat(grading): hidden-criteria firewall + held-out blend as a substrate primitive#283
drewstone merged 1 commit into
mainfrom
lift/hidden-criteria-firewall

Conversation

@drewstone

Copy link
Copy Markdown
Contributor

What

Lifts the held-out / hidden-criteria grading FIREWALL out of the coding-benchmark example (in agent-runtime) into a domain-agnostic substrate primitive, so any domain — research, legal, tax, content — can grade an agent on hidden criteria it never saw, not just coding.

New module: src/hidden-criteria-grading.ts. Additive subpath on the root index — no breaking change, zero consumer updates needed.

The two reusable, domain-free pieces

The coding-LOCAL execution mechanism (node --test, TAP parsing) stays in the example. Only the general pieces are lifted, composed from existing types (JudgeScore) — nothing reinvented, no node/test/TS/exec/regex baked into the substrate.

1. Field routing by destination (the firewall as a type). A scenario tags each field by where it is allowed to flow:

FieldDestination reaches the agent? use
agent-visible yes the prompt / task
develop-against yes (intentional, TDD) a visible example/test
grading-only never the held-out suite / answer key
judge-only never rubric anchors / design intent
  • routeFields(routing, values) builds the routed field set from a domain's (field → destination) + (field → value) maps (fail-loud on a missing value).
  • assertNoHiddenLeak(fields, agentContext) is the firewall: throws ValidationError if any grading-only/judge-only value appears in the exact text that reaches the agent.
  • agentVisibleFields(...) returns the safe-to-render fields so a caller assembles the context from the routing instead of hand-picking.

2. Hidden-criteria grading. The domain supplies its own grader; the substrate supplies firewall enforcement + the composite:

  • HiddenCriteriaGrader<TArtifact, THidden> = (artifact, hiddenCriteria, signal?) => { passRate, total }the one seam a non-coding domain implements. The coding node-test executor is ONE implementation a consumer plugs in.
  • gradeOnHidden({ artifact, hiddenCriteria, grader, firewall }) — re-asserts the firewall at grading time on the real agent context, then runs the grader.
  • hiddenGrade(passed, total) — the single-sourced honest-zero pass-rate rule (total === 0 → passRate 0, never a spurious pass).
  • blendHeldout(heldoutPassRate, judgeScore, weights?) — the composite (default 0.7 hidden correctness / 0.3 judge quality; weights renormalized; inputs clamped to [0,1]).
  • withHeldoutBlend(score, heldoutPassRate, weights?) — wraps a judge's score so the reported composite becomes the held-out-weighted blend (passes a failed verdict through untouched).

How a NON-coding domain plugs in

```ts
import { routeFields, gradeOnHidden, blendHeldout, hiddenGrade } from '@tangle-network/agent-eval'

// 1. Declare where each field flows
const fields = routeFields(
{ question: 'agent-visible', sample: 'develop-against', required: 'grading-only', rubric: 'judge-only' },
{ question, sample, required, rubric },
)

// 2. Bring YOUR OWN grader — no node/test here
const legalGrader = (artifact, hidden) =>
hiddenGrade(hidden.mustCite.filter(c => artifact.brief.includes(c)).length, hidden.mustCite.length)

// 3. Grade behind the firewall, blend with the judge
const heldout = await gradeOnHidden({ artifact, hiddenCriteria, grader: legalGrader, firewall: { fields, agentContext } })
const score = blendHeldout(heldout.passRate, judgeComposite)
```

Tests

20 focused tests on a non-coding (legal-brief) domain — proving the firewall has no domain coupling. They cover the two required proofs explicitly:

  • (a) assertNoHiddenLeak / gradeOnHidden reject a grading-only (and judge-only) field reaching the agent context.
  • (b) blendHeldout composes correctly (default + renormalized weights, clamping, zero-sum guard, withHeldoutBlend composite replacement + failed-verdict pass-through).

Verification

pnpm typecheck + pnpm build + pnpm test (251 files / 2581 tests) + pnpm lint + pnpm run verify:package — all green. Version trio bumped together: npm package.json, clients/python/pyproject.toml, __init__.py0.100.0.

Grain mirrors the recently-landed treatment-gate.ts: pure predicates + pure composition, fail-loud, parameterized matchers/graders, no domain literal in the module. Placed next to test-graded-scenario.ts / partition-held-out.ts (a scorecard/grading concept that makes sense without a running loop).

…te primitive

Lift the held-out / hidden-criteria grading firewall out of the coding
benchmark example into a domain-agnostic primitive so any domain (research,
legal, tax, content) can grade an agent on criteria it never saw.

Two reusable, domain-free pieces, composed from existing types (JudgeScore),
no node/test/TS/exec baked in:

  - Field routing by destination: a scenario tags each field agent-visible /
    develop-against / grading-only / judge-only; routeFields + assertNoHiddenLeak
    enforce that a grading-only/judge-only value never reaches the agent context
    (fail-loud ValidationError).

  - Hidden-criteria grading: the domain supplies its own
    (artifact, hiddenCriteria) => { passRate, total } grader; the substrate
    provides firewall enforcement (gradeOnHidden) + the held-out-weighted
    composite (blendHeldout / withHeldoutBlend, default 0.7/0.3).

The coding node-test executor stays in the example as ONE grader implementation.
20 focused tests on a non-coding (legal) domain prove the firewall rejects a
leaked grading-only field and that blendHeldout composes correctly.

@tangletools tangletools left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Auto-approved PR — 7e582fce

Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-24T18:08:33Z

@drewstone drewstone merged commit aa066bd into main Jun 24, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants