Skip to content

feat(groundedness): retrieval-quality scorer#282

Merged
drewstone merged 1 commit into
mainfrom
lift/groundedness-from-benchmark
Jun 24, 2026
Merged

feat(groundedness): retrieval-quality scorer#282
drewstone merged 1 commit into
mainfrom
lift/groundedness-from-benchmark

Conversation

@drewstone

Copy link
Copy Markdown
Contributor

What

Lifts groundedness — retrieval-quality scoring — into the substrate (./groundedness subpath).

Did the retrieval/search provider actually return the fact the task needed, independent of whether the agent then used it. Isolates provider quality from agent skill (a perfect result + weak model still fails; junk results + strong model recovers — so task pass-rate alone conflates the two).

  • Pure scorer scoreGroundedness(resultText, requiredKnowledge[]){ score, found, missing, total, hadResults }.
  • Trace extractor over the substrate's own retrieval/tool spans (RetrievalSpan.hits[].content + provider ToolSpan.result) — the provider tool matcher is injectable (defaults to search/research-not-fetch).
  • Structural sibling of authenticity (pure deterministic scorer + consumer-supplied domain config); subpath-only export, no root re-export.

Why

A fundamental retrieval-eval primitive every retrieval-augmented eval needs; the provider's job is to surface the answer, this measures whether it did. Distinct from output-coverage scoring (which scores the agent's produced artifact) — this scores the provider's retrieved text.

Tests / checks

tsc --noEmit --strict 0 errors, vitest 10/10, tsup emits dist/groundedness.

Pure scoreGroundedness(resultText, requiredKnowledge[]) -> {score,found,
missing,total,hadResults} plus a span-based extractRetrievedText over the
canonical TraceSchema (RetrievalSpan.hits + provider ToolSpan.result),
the structural sibling of src/authenticity. Provider-tool selection is an
injected matcher (default search/research-not-fetch), not a baked literal;
requiredKnowledge is a bare string[] supplied by the consumer. Subpath-only
export (./groundedness), no root re-export — mirrors authenticity.

@tangletools tangletools left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Auto-approved PR — 11677e14

Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-24T12:33:15Z

@drewstone drewstone merged commit 9abe3b4 into main Jun 24, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants