MAINT: Migrating Seed classes to Pydantic#1898
Merged
rlundeen2 merged 12 commits intoJun 3, 2026
Merged
Conversation
Convert all 8 seed classes (Seed, SeedPrompt, SeedObjective, SeedSimulatedConversation, SeedGroup, SeedAttackGroup, SeedAttackTechniqueGroup, SeedDataset) from dataclasses/plain classes to Pydantic v2 BaseModel. Establish a two-layer import rule between pyrit.models and pyrit.common, enforced by the import-boundary test. Add str->list coercion (shared coerce_str_to_list helper) so YAML seed files may specify list fields (harm_categories/authors/groups/parameters) as bare scalars, preserving the old unvalidated behavior. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…k data_type Implements all reviewer feedback on the seeds Pydantic conversion: Blocking fix: SeedGroup/SeedAttackGroup with mixed Seed subclasses no longer corrupts polymorphism on model_dump/model_validate round-trips. Introduce a Literal seed_type discriminator on each leaf class (SeedPrompt/SeedObjective/SeedSimulatedConversation), and switch the polymorphic seeds field to a Field(discriminator=seed_type) annotated union (SeedUnion). The base Seed class is deliberately excluded from the union. NB1: rename validate -> _check_invariants on SeedGroup/SeedAttackGroup/SeedAttackTechniqueGroup so it does not shadow Pydantic v1's BaseModel.validate. External callers updated (atomic_attack, attack_parameters). NB2: stop silently dropping fields on SeedSimulatedConversation from a dict. Delete the bespoke from_dict and route through model_validate; add a before-validator that drops only the computed value field so round-trips are clean. NB3: lock data_type to Literal[text] on SeedObjective and SeedSimulatedConversation. Strip dataset/group-level data_type, role, sequence, parameters from non-prompt seed dicts so dataset-level defaults do not bleed in. Thin-class cleanups: Annotated StrOrList alias replaces the per-field _coerce_str_to_list validators on Seed and SeedPrompt; deterministic order-preserving list merge replaces utils.combine_list (which was nondeterministic across processes). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Move yaml-loading, path-resolution, and the is_jinja_template=True trust marker from inside Seed/SeedDataset classmethods into a dedicated seed_loader module. - New pyrit/models/seeds/seed_loader.py exposes load_seed_from_yaml, load_seed_dataset_from_yaml, and load_seed_prompt_from_yaml_with_required_parameters. - Seed.from_yaml_file, SeedDataset.from_yaml_file, and SeedPrompt.from_yaml_with_required_parameters reduced to thin shims that delegate to the loader functions, so all ~70 existing callsites keep working unchanged. - Deleted SeedObjective.from_yaml_with_required_parameters (no-op shim, no callers), SeedSimulatedConversation.from_yaml_with_required_parameters (no callers), and the base Seed.from_yaml_with_required_parameters (only SeedPrompt's real validation is left, where it is actually used). - yaml and verify_and_resolve_path no longer imported in seed.py / seed_dataset.py. - Stricter loader validation: empty files and top-level non-mappings now raise ValueError with a clear message rather than cryptic TypeErrors. - New tests/unit/models/test_seed_loader.py (14 tests) covers the trust-marker behavior, error paths, dataset propagation, and classmethod-shim equivalence. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…loader StrOrList / coerce_str_to_list existed solely to accommodate YAML's scalar-or-sequence shorthand (e.g. `authors: Jane Doe`). That is a loader-layer concern leaking into the data class, same pattern as the is_jinja_template trust marker handled in 9.1. Move it to the loader: a new _canonicalize_scalar_lists helper wraps bare strings for known list-typed seed fields (harm_categories, authors, groups, parameters) at the YAML boundary and recurses into nested seeds for dataset/group files. The model fields are now plain Optional[list[str]], so programmatic constructors are strict. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…tance check The previous loop `for sg in seed_groups: sg._check_invariants()` reached across a class boundary into a private hook and was both redundant and ineffective: - Redundant: SeedAttackGroup is a Pydantic v2 model; its `_finalize` validator already runs `_check_invariants` at construction time. By the time AtomicAttack receives an instance, it has been validated. - Ineffective: the docstring claimed it caught seed groups ''missing an objective'', but SeedGroup._check_invariants allows zero objectives. Only SeedAttackGroup enforces ''exactly one'', so the old call silently passed on any plain SeedGroup with no objective. Replace with an `isinstance(sg, SeedAttackGroup)` check that enforces the runtime contract already expressed by the type annotation `seed_groups: list[SeedAttackGroup]`. Raise TypeError with a clear message if a caller passes a plain SeedGroup or a SeedAttackTechniqueGroup. Update tests that were passing the wrong type to the typed parameter. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Make the loader's medium explicit in the filename so future non-YAML loaders (e.g. JSON, remote dataset, hub) read as siblings rather than overloading a single ''seed_loader'' module. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Per code review: phase references go stale fast and are confusing to future readers. Module-level docstring still explains the architectural reason the loader is separate; the planning context belonged in the gist, not the codebase. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
SeedLeaf was used only as the inner type of SeedUnion plus one local list annotation in _finalize. The two-alias pattern read as ''which one do I import?'' confusion without buying anything — Annotated[..., Field(discriminator=...)] is fine as a local variable annotation too (the Field metadata is ignored outside Pydantic field contexts). Inline the Union into SeedUnion's definition and use SeedUnion for the local list. SeedDataset and any future container can keep importing the single name. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Per style guide: top-of-file imports unless deferred for heavy third-party packages. SeedPrompt, SeedDataset, and Seed are lightweight first-party. The __init__.py imports yaml_seed_loader last, and none of those modules import yaml_seed_loader at module load, so no circular import. Also switch Union[str, Path] / Optional[str] to X | Y / X | None to match the modern type-syntax rules in the style guide. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Same pattern as the earlier atomic_attack fix: the prior _check_invariants() call on a SeedAttackGroup-typed parameter was redundant (Pydantic already validates at construction) and the downstream ''objective is None'' check guarded against an impossible state for a real SeedAttackGroup. The actually-useful runtime guard is rejecting incorrect subtypes — callers passing a plain SeedGroup, which is silently accepted by the type annotation but doesn''t enforce ''exactly one objective''. Switch to isinstance(seed_group, SeedAttackGroup) raising TypeError, drop the dead objective-None branch, and replace the test that exercised the impossible state with one covering the new isinstance guard. Audited the remaining _check_invariants references: only override/super() definitions in the three SeedGroup subclasses and two direct test calls that assert the method''s own behavior. No other cross-class consumers. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ds-pydantic # Conflicts: # pyrit/models/seeds/seed_group.py # pyrit/models/seeds/seed_prompt.py # tests/unit/models/test_import_boundary.py
romanlutz
reviewed
Jun 3, 2026
romanlutz
reviewed
Jun 3, 2026
romanlutz
reviewed
Jun 3, 2026
romanlutz
reviewed
Jun 3, 2026
romanlutz
reviewed
Jun 3, 2026
romanlutz
reviewed
Jun 3, 2026
romanlutz
reviewed
Jun 3, 2026
romanlutz
reviewed
Jun 3, 2026
romanlutz
reviewed
Jun 3, 2026
romanlutz
reviewed
Jun 3, 2026
romanlutz
reviewed
Jun 3, 2026
romanlutz
reviewed
Jun 3, 2026
romanlutz
approved these changes
Jun 3, 2026
…habetized __all__ Replace Optional[X]/Union[X, Y] with PEP 604 X | None syntax across the seeds module, route date_added through an AwareDatetimeUTC validator that coerces naive datetimes and bare date strings to UTC, share PROMPT_ONLY_SEED_KEYS, convert reST :func:/:meth: roles to plain backticks, and alphabetize __all__. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
romanlutz
added a commit
to romanlutz/PyRIT
that referenced
this pull request
Jun 4, 2026
Merge 26 commits from main, including: - MAINT Breaking: Convert ScenarioResult to Pydantic (microsoft#1908) - MAINT: Migrating Seed classes to Pydantic (microsoft#1898) - MAINT: Migrating AttackResult to Pydantic (microsoft#1899) - MAINT: Bump ty-pre-commit v0.0.32 -> 0.0.43 (microsoft#1919) - FEAT: Realtime streaming session support and server-side barge-in attack (microsoft#1766) - FEAT text adaptive scenario (microsoft#1760) - FIX: Integration Test Fixes (microsoft#1907) - DOC: Scoring Docs Refactor (microsoft#1892) - Various dependency bumps Conflicts (15 files) resolved by taking main's version + re-running ruff --fix to re-apply PEP 604 typing modernization on the incoming code (177 violations auto-fixed). All resolved files re-staged. Local verification: - ruff check: All checks passed - ruff format: clean - pytest tests/unit -n 8: 9550 passed, 6 skipped Known issue (pre-existing on main, not caused by this merge): - ty 0.0.43 enabled missing-override-decorator rule, which flags hundreds of pre-existing methods across the codebase. Main's own CI is currently failing on this. Our PR will inherit the same failure since touched files come into pre-commit scope. Fixing this rule globally is a separate, large mechanical change orthogonal to typing modernization. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
rlundeen2
added a commit
to riedgar-ms/PyRIT
that referenced
this pull request
Jun 4, 2026
Reconciles the JSON-schema-on-seeds refactor with main's Seed-to-Pydantic migration (microsoft#1898), Message module rename (microsoft#1885), and pyrit.common import-boundary cleanup. Conflict resolutions: - pyrit/models/seeds/seed_prompt.py: ported response_json_schema and response_json_schema_name from dataclass/InitVar/__post_init__ onto main's Pydantic model. Used @model_validator(mode='before') to pop response_json_schema_name from input data before Pydantic's extra='forbid' rejects it, preserving the init-only semantic. - pyrit/models/__init__.py: kept both main's MEDIA_PATH_DATA_TYPES / messages-module rename and this branch's json_schema_definition re-exports. - tests/unit/models/test_import_boundary.py: accepted main's empty KNOWN_TOP_LEVEL_VIOLATIONS (main now allows the full pyrit.common prefix from pyrit.models, retiring the per-module ratchet entries). - tests/unit/models/test_seed_prompt.py: replaced dataclasses.fields introspection with SeedPrompt.model_fields for the response_json_schema_name regression guard. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This converts Seed to a Pydantic BaseModel. It is phase 8 of the pyrit.models refactor:
https://gist.github.com/rlundeen2/3e8daa8e12a11b4b6e52587b3c9b1dca
Some things to look at are the yaml separation. For now, it's still in models, but moved out of the Seed classes themselves.