FIX: Preserve DatasetConfiguration subclass when backend overrides dataset_names by varunj-msft · Pull Request #1911 · microsoft/PyRIT

varunj-msft · 2026-06-03T21:07:23Z

Description

When dataset_names is passed through the backend (ScenarioRunService._build_init_kwargs), we used to always construct a plain DatasetConfiguration. That silently dropped subclass-specific behavior — most notably EncodingDatasetConfiguration.get_all_seed_attack_groups(), which shapes each seed into a SeedAttackGroup with a synthetic objective.

For garak.encoding this surfaced as a confusing runtime error during attack construction:

ValueError: SeedAttackGroup must have exactly one objective. Found 0.

Reproducible end-to-end against the real garak_slur_terms_en dataset.

Fix: when dataset_names is supplied, build a fresh instance of the scenario's own default-dataset-config class so subclass overrides are preserved. If a future subclass adds required init kwargs we can't populate, fall back to the plain DatasetConfiguration with a logged warning so the operator has a trail.

The max_dataset_size-only path is unchanged — it still mutates the throwaway introspection instance's default config.

First in a series of small PRs for the Standardizing Scenarios work . Lands ahead of the Encoding scenario standardization PR, which depends on this fix to make the documented fast path usable via the API.

Tests and Documentation

5 new regression tests in tests/unit/backend/test_scenario_run_service.py covering: subclass preservation with dataset_names, with dataset_names + max_dataset_size, with dataset_names only (no max), fallback to plain DatasetConfiguration when subclass init is incompatible (+ caplog assertion on the warning), and the introspection-failure path.
All 5 new tests fail against pre-fix code; verified by reverting the prod change and rerunning.
All 30 pre-existing tests in the file still pass.
Full backend suite: 619 passed, 4 skipped.
Full scenario suite: 624 passed.
ruff check + ruff format --check + ty all clean on both touched files.
No JupyText / notebook changes (backend service fix, no doc impact).

…es dataset_names ScenarioRunService._build_init_kwargs() used to construct a plain DatasetConfiguration whenever the caller passed dataset_names. This silently lost subclass-specific behavior such as EncodingDatasetConfiguration.get_all_seed_attack_groups(), which shapes each seed into a SeedAttackGroup with a synthetic objective. The downstream symptom for the Encoding scenario was: ValueError: SeedAttackGroup must have exactly one objective. Found 0. raised during attack construction. Reproducible end-to-end against the real garak_slur_terms_en dataset. Fix: when dataset_names is supplied, construct a fresh instance of the scenario's own default-dataset-config class so subclass overrides are preserved. Fall back to the plain DatasetConfiguration (with a logged warning) if a future subclass adds required __init__ kwargs we cannot populate. The max_dataset_size-only path keeps reusing-and-mutating the throwaway introspection instance's default config (no behavior change). Tests: - 5 new regression tests, all of which fail against pre-fix code. - All 30 existing tests still pass. - Full backend suite: 619 passed, 4 skipped. - Full scenario suite: 624 passed. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIX: Preserve DatasetConfiguration subclass when backend overrides dataset_names#1911

FIX: Preserve DatasetConfiguration subclass when backend overrides dataset_names#1911
varunj-msft wants to merge 1 commit into
microsoft:mainfrom
varunj-msft:varunj-msft/8380-Standardizing-Scenarios-type-preservation-fix

varunj-msft commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

varunj-msft commented Jun 3, 2026

Description

Tests and Documentation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant