feat(groundedness): retrieval-quality scorer by drewstone · Pull Request #282 · tangle-network/agent-eval

drewstone · 2026-06-24T12:33:08Z

What

Lifts groundedness — retrieval-quality scoring — into the substrate (./groundedness subpath).

Did the retrieval/search provider actually return the fact the task needed, independent of whether the agent then used it. Isolates provider quality from agent skill (a perfect result + weak model still fails; junk results + strong model recovers — so task pass-rate alone conflates the two).

Pure scorer scoreGroundedness(resultText, requiredKnowledge[]) → { score, found, missing, total, hadResults }.
Trace extractor over the substrate's own retrieval/tool spans (RetrievalSpan.hits[].content + provider ToolSpan.result) — the provider tool matcher is injectable (defaults to search/research-not-fetch).
Structural sibling of authenticity (pure deterministic scorer + consumer-supplied domain config); subpath-only export, no root re-export.

Why

A fundamental retrieval-eval primitive every retrieval-augmented eval needs; the provider's job is to surface the answer, this measures whether it did. Distinct from output-coverage scoring (which scores the agent's produced artifact) — this scores the provider's retrieved text.

Tests / checks

tsc --noEmit --strict 0 errors, vitest 10/10, tsup emits dist/groundedness.

Pure scoreGroundedness(resultText, requiredKnowledge[]) -> {score,found, missing,total,hadResults} plus a span-based extractRetrievedText over the canonical TraceSchema (RetrievalSpan.hits + provider ToolSpan.result), the structural sibling of src/authenticity. Provider-tool selection is an injected matcher (default search/research-not-fetch), not a baked literal; requiredKnowledge is a bare string[] supplied by the consumer. Subpath-only export (./groundedness), no root re-export — mirrors authenticity.

tangletools

✅ Auto-approved PR — `11677e14`

Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

_{tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-24T12:33:15Z}

tangletools approved these changes Jun 24, 2026

View reviewed changes

drewstone merged commit 9abe3b4 into main Jun 24, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(groundedness): retrieval-quality scorer#282

feat(groundedness): retrieval-quality scorer#282
drewstone merged 1 commit into
mainfrom
lift/groundedness-from-benchmark

drewstone commented Jun 24, 2026

Uh oh!

tangletools left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

drewstone commented Jun 24, 2026

What

Why

Tests / checks

Uh oh!

tangletools left a comment

Choose a reason for hiding this comment

✅ Auto-approved PR — 11677e14

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

✅ Auto-approved PR — `11677e14`