Skip to content

MAINT: Migrating Seed classes to Pydantic#1898

Merged
rlundeen2 merged 12 commits into
microsoft:mainfrom
rlundeen2:rlundeen2/phase-8-seeds-pydantic
Jun 3, 2026
Merged

MAINT: Migrating Seed classes to Pydantic#1898
rlundeen2 merged 12 commits into
microsoft:mainfrom
rlundeen2:rlundeen2/phase-8-seeds-pydantic

Conversation

@rlundeen2
Copy link
Copy Markdown
Contributor

This converts Seed to a Pydantic BaseModel. It is phase 8 of the pyrit.models refactor:

https://gist.github.com/rlundeen2/3e8daa8e12a11b4b6e52587b3c9b1dca

Some things to look at are the yaml separation. For now, it's still in models, but moved out of the Seed classes themselves.

rlundeen2 and others added 11 commits June 2, 2026 13:31
Convert all 8 seed classes (Seed, SeedPrompt, SeedObjective,
SeedSimulatedConversation, SeedGroup, SeedAttackGroup,
SeedAttackTechniqueGroup, SeedDataset) from dataclasses/plain classes to
Pydantic v2 BaseModel. Establish a two-layer import rule between
pyrit.models and pyrit.common, enforced by the import-boundary test.

Add str->list coercion (shared coerce_str_to_list helper) so YAML seed
files may specify list fields (harm_categories/authors/groups/parameters)
as bare scalars, preserving the old unvalidated behavior.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…k data_type

Implements all reviewer feedback on the seeds Pydantic conversion:

Blocking fix: SeedGroup/SeedAttackGroup with mixed Seed subclasses no longer corrupts polymorphism on model_dump/model_validate round-trips. Introduce a Literal seed_type discriminator on each leaf class (SeedPrompt/SeedObjective/SeedSimulatedConversation), and switch the polymorphic seeds field to a Field(discriminator=seed_type) annotated union (SeedUnion). The base Seed class is deliberately excluded from the union.

NB1: rename validate -> _check_invariants on SeedGroup/SeedAttackGroup/SeedAttackTechniqueGroup so it does not shadow Pydantic v1's BaseModel.validate. External callers updated (atomic_attack, attack_parameters).

NB2: stop silently dropping fields on SeedSimulatedConversation from a dict. Delete the bespoke from_dict and route through model_validate; add a before-validator that drops only the computed value field so round-trips are clean.

NB3: lock data_type to Literal[text] on SeedObjective and SeedSimulatedConversation. Strip dataset/group-level data_type, role, sequence, parameters from non-prompt seed dicts so dataset-level defaults do not bleed in.

Thin-class cleanups: Annotated StrOrList alias replaces the per-field _coerce_str_to_list validators on Seed and SeedPrompt; deterministic order-preserving list merge replaces utils.combine_list (which was nondeterministic across processes).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Move yaml-loading, path-resolution, and the is_jinja_template=True trust marker

from inside Seed/SeedDataset classmethods into a dedicated seed_loader module.

- New pyrit/models/seeds/seed_loader.py exposes load_seed_from_yaml,

  load_seed_dataset_from_yaml, and load_seed_prompt_from_yaml_with_required_parameters.

- Seed.from_yaml_file, SeedDataset.from_yaml_file, and

  SeedPrompt.from_yaml_with_required_parameters reduced to thin shims that delegate

  to the loader functions, so all ~70 existing callsites keep working unchanged.

- Deleted SeedObjective.from_yaml_with_required_parameters (no-op shim, no callers),

  SeedSimulatedConversation.from_yaml_with_required_parameters (no callers), and the

  base Seed.from_yaml_with_required_parameters (only SeedPrompt's real validation

  is left, where it is actually used).

- yaml and verify_and_resolve_path no longer imported in seed.py / seed_dataset.py.

- Stricter loader validation: empty files and top-level non-mappings now raise

  ValueError with a clear message rather than cryptic TypeErrors.

- New tests/unit/models/test_seed_loader.py (14 tests) covers the trust-marker

  behavior, error paths, dataset propagation, and classmethod-shim equivalence.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…loader

StrOrList / coerce_str_to_list existed solely to accommodate YAML's scalar-or-sequence shorthand (e.g. `authors: Jane Doe`). That is a loader-layer concern leaking into the data class, same pattern as the is_jinja_template trust marker handled in 9.1.

Move it to the loader: a new _canonicalize_scalar_lists helper wraps bare strings for known list-typed seed fields (harm_categories, authors, groups, parameters) at the YAML boundary and recurses into nested seeds for dataset/group files. The model fields are now plain Optional[list[str]], so programmatic constructors are strict.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…tance check

The previous loop `for sg in seed_groups: sg._check_invariants()` reached across a class boundary into a private hook and was both redundant and ineffective:

- Redundant: SeedAttackGroup is a Pydantic v2 model; its `_finalize` validator already runs `_check_invariants` at construction time. By the time AtomicAttack receives an instance, it has been validated.

- Ineffective: the docstring claimed it caught seed groups ''missing an objective'', but SeedGroup._check_invariants allows zero objectives. Only SeedAttackGroup enforces ''exactly one'', so the old call silently passed on any plain SeedGroup with no objective.

Replace with an `isinstance(sg, SeedAttackGroup)` check that enforces the runtime contract already expressed by the type annotation `seed_groups: list[SeedAttackGroup]`. Raise TypeError with a clear message if a caller passes a plain SeedGroup or a SeedAttackTechniqueGroup. Update tests that were passing the wrong type to the typed parameter.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Make the loader's medium explicit in the filename so future non-YAML loaders (e.g. JSON, remote dataset, hub) read as siblings rather than overloading a single ''seed_loader'' module.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Per code review: phase references go stale fast and are confusing to future readers. Module-level docstring still explains the architectural reason the loader is separate; the planning context belonged in the gist, not the codebase.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
SeedLeaf was used only as the inner type of SeedUnion plus one local list annotation in _finalize. The two-alias pattern read as ''which one do I import?'' confusion without buying anything — Annotated[..., Field(discriminator=...)] is fine as a local variable annotation too (the Field metadata is ignored outside Pydantic field contexts).

Inline the Union into SeedUnion's definition and use SeedUnion for the local list. SeedDataset and any future container can keep importing the single name.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Per style guide: top-of-file imports unless deferred for heavy third-party packages. SeedPrompt, SeedDataset, and Seed are lightweight first-party. The __init__.py imports yaml_seed_loader last, and none of those modules import yaml_seed_loader at module load, so no circular import.

Also switch Union[str, Path] / Optional[str] to X | Y / X | None to match the modern type-syntax rules in the style guide.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Same pattern as the earlier atomic_attack fix: the prior _check_invariants() call on a SeedAttackGroup-typed parameter was redundant (Pydantic already validates at construction) and the downstream ''objective is None'' check guarded against an impossible state for a real SeedAttackGroup.

The actually-useful runtime guard is rejecting incorrect subtypes — callers passing a plain SeedGroup, which is silently accepted by the type annotation but doesn''t enforce ''exactly one objective''. Switch to isinstance(seed_group, SeedAttackGroup) raising TypeError, drop the dead objective-None branch, and replace the test that exercised the impossible state with one covering the new isinstance guard.

Audited the remaining _check_invariants references: only override/super() definitions in the three SeedGroup subclasses and two direct test calls that assert the method''s own behavior. No other cross-class consumers.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ds-pydantic

# Conflicts:
#	pyrit/models/seeds/seed_group.py
#	pyrit/models/seeds/seed_prompt.py
#	tests/unit/models/test_import_boundary.py
Comment thread pyrit/models/seeds/__init__.py Outdated
Comment thread pyrit/models/seeds/seed.py Outdated
Comment thread pyrit/models/seeds/seed.py Outdated
Comment thread pyrit/models/seeds/seed_dataset.py Outdated
Comment thread pyrit/models/seeds/seed_dataset.py Outdated
Comment thread pyrit/models/seeds/seed_dataset.py Outdated
Comment thread pyrit/models/seeds/seed_dataset.py Outdated
Comment thread pyrit/models/seeds/seed_dataset.py
Comment thread pyrit/models/seeds/seed_group.py
Comment thread pyrit/models/seeds/seed_prompt.py Outdated
Comment thread pyrit/models/seeds/seed_prompt.py Outdated
Comment thread pyrit/models/seeds/yaml_seed_loader.py
…habetized __all__

Replace Optional[X]/Union[X, Y] with PEP 604 X | None syntax across the seeds
module, route date_added through an AwareDatetimeUTC validator that coerces
naive datetimes and bare date strings to UTC, share PROMPT_ONLY_SEED_KEYS,
convert reST :func:/:meth: roles to plain backticks, and alphabetize __all__.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@rlundeen2 rlundeen2 enabled auto-merge June 3, 2026 16:44
@rlundeen2 rlundeen2 added this pull request to the merge queue Jun 3, 2026
Merged via the queue into microsoft:main with commit 2f42aad Jun 3, 2026
52 checks passed
@rlundeen2 rlundeen2 deleted the rlundeen2/phase-8-seeds-pydantic branch June 3, 2026 17:09
romanlutz added a commit to romanlutz/PyRIT that referenced this pull request Jun 4, 2026
Merge 26 commits from main, including:
- MAINT Breaking: Convert ScenarioResult to Pydantic (microsoft#1908)
- MAINT: Migrating Seed classes to Pydantic (microsoft#1898)
- MAINT: Migrating AttackResult to Pydantic (microsoft#1899)
- MAINT: Bump ty-pre-commit v0.0.32 -> 0.0.43 (microsoft#1919)
- FEAT: Realtime streaming session support and server-side barge-in attack (microsoft#1766)
- FEAT text adaptive scenario (microsoft#1760)
- FIX: Integration Test Fixes (microsoft#1907)
- DOC: Scoring Docs Refactor (microsoft#1892)
- Various dependency bumps

Conflicts (15 files) resolved by taking main's version + re-running
ruff --fix to re-apply PEP 604 typing modernization on the incoming code
(177 violations auto-fixed). All resolved files re-staged.

Local verification:
- ruff check: All checks passed
- ruff format: clean
- pytest tests/unit -n 8: 9550 passed, 6 skipped

Known issue (pre-existing on main, not caused by this merge):
- ty 0.0.43 enabled missing-override-decorator rule, which flags hundreds
  of pre-existing methods across the codebase. Main's own CI is currently
  failing on this. Our PR will inherit the same failure since touched
  files come into pre-commit scope. Fixing this rule globally is a
  separate, large mechanical change orthogonal to typing modernization.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
rlundeen2 added a commit to riedgar-ms/PyRIT that referenced this pull request Jun 4, 2026
Reconciles the JSON-schema-on-seeds refactor with main's Seed-to-Pydantic
migration (microsoft#1898), Message module rename (microsoft#1885), and pyrit.common
import-boundary cleanup.

Conflict resolutions:
- pyrit/models/seeds/seed_prompt.py: ported response_json_schema and
  response_json_schema_name from dataclass/InitVar/__post_init__ onto
  main's Pydantic model. Used @model_validator(mode='before') to pop
  response_json_schema_name from input data before Pydantic's
  extra='forbid' rejects it, preserving the init-only semantic.
- pyrit/models/__init__.py: kept both main's MEDIA_PATH_DATA_TYPES /
  messages-module rename and this branch's json_schema_definition
  re-exports.
- tests/unit/models/test_import_boundary.py: accepted main's empty
  KNOWN_TOP_LEVEL_VIOLATIONS (main now allows the full pyrit.common
  prefix from pyrit.models, retiring the per-module ratchet entries).
- tests/unit/models/test_seed_prompt.py: replaced dataclasses.fields
  introspection with SeedPrompt.model_fields for the
  response_json_schema_name regression guard.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants