DOC: Scoring Docs Refactor by rlundeen2 · Pull Request #1892 · microsoft/PyRIT

rlundeen2 · 2026-06-02T20:13:01Z

This PR attempts to make scoring docs a lot more cohesive and clearer. I really like the digestability more.

The intro and metrics stay close to the same, but the others are collapsed into three sections: true_false, float_scale, and combining.

One note is the _ScorerInfo class. It isn't ideal, I eventually want scorer capabilities. But for now I like it, and it's documented as temporary, and can keep our documentation up to date.

Replace the sprawled scoring docs with five focused pages (overview, true/false, float-scale, combining, metrics) plus an auto-generated scorer reference table backed by a new get_scorer_info() helper. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

# Conflicts: # doc/myst.yml

… get_scorer_info - Add silent=True to initialize_pyrit_async in the scoring overview page - Delete standalone owasp_llm02_scorers doc page; fold a concise OWASP LLM02 output-scorer subsection (with a live XSSOutputScorer example) into the regex section of 1_true_false_scorers - Make get_scorer_info defensive against mocked pyrit.score exports (spec=type MagicMocks make isinstance(obj, type) True but issubclass raise TypeError under test pollution); add a regression test Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The reference table's Uses LLM? column already conveys the speed/cost axis, so the prose section was redundant. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The section duplicated the sidebar/TOC navigation to the child pages. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Address PR review feedback: - Note that TextTarget records the prompt and returns no assistant content, so the attack example has nothing substantive to score; point readers to an LLM-backed target for a real response - Clarify that SelfAskRefusalScorer only short-circuits on a fully blocked response; partially blocked content is still scored by the LLM Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Merge 26 commits from main, including: - MAINT Breaking: Convert ScenarioResult to Pydantic (microsoft#1908) - MAINT: Migrating Seed classes to Pydantic (microsoft#1898) - MAINT: Migrating AttackResult to Pydantic (microsoft#1899) - MAINT: Bump ty-pre-commit v0.0.32 -> 0.0.43 (microsoft#1919) - FEAT: Realtime streaming session support and server-side barge-in attack (microsoft#1766) - FEAT text adaptive scenario (microsoft#1760) - FIX: Integration Test Fixes (microsoft#1907) - DOC: Scoring Docs Refactor (microsoft#1892) - Various dependency bumps Conflicts (15 files) resolved by taking main's version + re-running ruff --fix to re-apply PEP 604 typing modernization on the incoming code (177 violations auto-fixed). All resolved files re-staged. Local verification: - ruff check: All checks passed - ruff format: clean - pytest tests/unit -n 8: 9550 passed, 6 skipped Known issue (pre-existing on main, not caused by this merge): - ty 0.0.43 enabled missing-override-decorator rule, which flags hundreds of pre-existing methods across the codebase. Main's own CI is currently failing on this. Our PR will inherit the same failure since touched files come into pre-commit scope. Fixing this rule globally is a separate, large mechanical change orthogonal to typing modernization. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

rlundeen2 and others added 3 commits June 2, 2026 12:53

Merge remote-tracking branch 'origin/main' into rlundeen2/laughing-disco

3d0491e

# Conflicts: # doc/myst.yml

Merge remote-tracking branch 'origin/main' into rlundeen2/laughing-disco

1cf16ad

romanlutz approved these changes Jun 2, 2026

View reviewed changes

rlundeen2 and others added 3 commits June 2, 2026 13:27

Remove redundant Fast vs. slow section from scoring overview

8231519

The reference table's Uses LLM? column already conveys the speed/cost axis, so the prose section was redundant. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Remove redundant Scorer categories section from scoring overview

af24897

The section duplicated the sidebar/TOC navigation to the child pages. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

jsong468 reviewed Jun 2, 2026

View reviewed changes

Comment thread doc/code/scoring/0_scoring.py

jsong468 reviewed Jun 2, 2026

View reviewed changes

Comment thread doc/code/scoring/1_true_false_scorers.py Outdated

jsong468 reviewed Jun 2, 2026

View reviewed changes

Comment thread doc/code/scoring/3_combining_scorers.py

jsong468 reviewed Jun 2, 2026

View reviewed changes

Comment thread doc/code/scoring/4_scorer_metrics.ipynb

jsong468 approved these changes Jun 2, 2026

View reviewed changes

rlundeen2 enabled auto-merge June 3, 2026 18:50

rlundeen2 added this pull request to the merge queue Jun 3, 2026

Merged via the queue into microsoft:main with commit d50caf0 Jun 3, 2026
52 checks passed

rlundeen2 deleted the rlundeen2/laughing-disco branch June 3, 2026 19:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DOC: Scoring Docs Refactor#1892

DOC: Scoring Docs Refactor#1892
rlundeen2 merged 7 commits into
microsoft:mainfrom
rlundeen2:rlundeen2/laughing-disco

rlundeen2 commented Jun 2, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

rlundeen2 commented Jun 2, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants