Skip to content

FEAT: Adversarial Benchmark Scenario Refactor#1765

Merged
ValbuenaVC merged 62 commits into
microsoft:mainfrom
ValbuenaVC:adversarial_benchmark_refactor
Jun 1, 2026
Merged

FEAT: Adversarial Benchmark Scenario Refactor#1765
ValbuenaVC merged 62 commits into
microsoft:mainfrom
ValbuenaVC:adversarial_benchmark_refactor

Conversation

@ValbuenaVC
Copy link
Copy Markdown
Contributor

@ValbuenaVC ValbuenaVC commented May 20, 2026

Refactors AdversarialBenchmark to be CLI-compatible, registry-driven, and consistent with other scenarios. Builds on #1785 (factory registry) and #1784 (scenario base contract).

What changed

AdversarialBenchmark owns its adversarial-target axis via supported_parameters().
Users pass --adversarial-targets [...] from the CLI or set scenario.args.adversarial_targets in .pyrit_conf. Targets must be registered in TargetRegistry (auto-registered from ADVERSARIAL_CHAT_* env vars by TargetInitializer). The (technique × target × dataset) cross-product is built lazily in _get_atomic_attacks_async using factory.create(attack_adversarial_config_override=AttackAdversarialConfig(target=...)) — no global registry mutation.

Cross-run caching via use_cached: bool = False. Keyed on technique_eval_hash × objective_target_eval_hash. Delegated to the new stateless pyrit.analytics.get_cached_results_for_technique helper. ERROR/UNDETERMINED outcomes always retry; per-hash lookup failures are swallowed so a flaky cache never blocks startup. AdversarialBenchmark.VERSION bumped 1→2 (new atomic_attack_name format).

Cached results are dataset-scoped: two atomic attacks sharing the same technique+target hash (e.g. harmbench vs advbench against the same model) are each independently evaluated for cache hits. A harmbench result will not cause advbench to be skipped. Filtering is done Python-side via attribution_data["parent_collection"] since the underlying DB query has no attribution parameter.

Cached attack results are injected into the current ScenarioResult rather than silently skipped, so callers always receive a complete result regardless of how many attacks were served from cache.

New primitives (reusable).

ObjectiveTargetEvaluationIdentifier — stable eval hash for an objective target, mirrors ScorerEvaluationIdentifier.
pyrit.analytics.get_cached_results_for_technique — SQL pre-filter on atomic_attack_identifier.eval_hash, Python post-filter on objective-target hash, returns results newest-first.

TargetInitializer fixes.

  • Propagates config.tags into TargetRegistry entries (previously silently dropped; made TargetInitializerTags inert).
  • Skips with warning when a TargetConfig declares model_var but the env var is unset (prevents silent fallback to global OPENAI_CHAT_MODEL).

New adversarial chat variants (ADVERSARIAL_CHAT_SINGLETURN_, ADVERSARIAL_CHAT_MULTITURN_, ADVERSARIAL_CHAT_REASONING_*) in .env_example; auto-registered with DEFAULT tag by TargetInitializer.

Test tree reorganized — tests/unit/scenario/ now mirrors source layout: airt/, benchmark/, core/, garak/, foundry/. No test logic changed, only moves.

doc/scanner/benchmark.{py,ipynb} rewritten to mirror other scanner pages (CLI quickstart → setup → run → output).

Tests

Full tests/unit/{scenario,setup,registry,identifiers,analytics} suite passes (8654 passed). New coverage in test_adversarial.py (58 tests), test_result_analysis.py, test_evaluation_identifier.py, test_targets_initializer.py.

Follow-up (separate PRs)

  • Widen objective_scorer from TrueFalseScorer to general Scorer (requires widening AttackScoringConfig).
  • Promote use_cached caching to base Scenario so other scenarios can opt in.

Victor Valbuena and others added 14 commits May 20, 2026 13:19
Adds three TargetConfig entries (singleturn, multiturn, reasoning), each
tagged [DEFAULT, ADVERSARIAL], for the env-driven variants already
declared in .env_example. Tightens _register_target to skip with a
warning when a TargetConfig declares model_var but the env var is unset;
without this guard the target silently falls back to the global
OPENAI_CHAT_MODEL default and sends requests to the wrong model.

New tests covering naming-related failure modes flagged for the eventual
PR review:
- test_register_instance_with_duplicate_name_silently_overwrites pins
  the current "second write wins" behavior so future hardening (warn /
  raise / idempotent skip) is intentional.
- test_target_configs_have_unique_registry_names guards against typos
  in ENV_TARGET_CONFIGS that would otherwise silently drop a target.
- test_double_initialize_async_is_idempotent regression-guards the
  re-init path that depends on the silent-overwrite semantics above.
- test_variant_skips_when_model_env_var_missing parameterizes the
  missing-_MODEL skip+warning for all three new variants.

Failure modes surfaced during this change but not addressed here
(tracked for the PR description batch):
- Duplicate registry_name silently overwrites in BaseInstanceRegistry.
- registry_name has no format validation; risk grows with per-user
  TargetConfig support in P1.
- No-adversarial-models-found error message UX is owned by the upcoming
  BenchmarkInitializer commit and needs a clear, actionable message.
Registers one AttackTechniqueSpec variant per (adversarial-capable
technique, adversarial-tagged target) pair into AttackTechniqueRegistry
with the live target bound onto adversarial_chat. Variants are named
f"{source}__{target_name}" and tagged ["benchmark_fanout",
f"model:{target_name}"] so the benchmark scenario can discover them via
tag query in a later commit.

Adversarial-capability is determined by reusing _spec_needs_adversarial
from scenario_techniques (multi-turn attacks + crescendo-style simulated
conversations). Single-turn techniques without an adversarial chat
target (prompt_sending, role_play, many_shot, context_compliance) are
not fanned — the benchmark holds the objective target constant and
varies the adversarial chat helper across runs.

Placed at pyrit/setup/initializers/benchmark.py (top level) alongside
AIRTInitializer and SimpleInitializer, not under components/. The
components/ initializers (TargetInitializer, ScorerInitializer,
ScenarioTechniqueInitializer) are auto-bundled building blocks that
populate their registries during every PyRIT setup. BenchmarkInitializer
is the opposite shape: a user-opted workflow profile named after the
use case, listed in .pyrit_conf when the user wants a benchmarking
trial. The placement convention is itself underspecified in the
codebase and is tracked for follow-up.

Parameter contract (target_names: list[str] | None):

The optional target_names parameter is declared via
PyRITInitializer.supported_parameters, which is the single source of
truth shared across three consumer sites:

1. .pyrit_conf YAML:

       initializers:
         - name: benchmark
           args:
             target_names:
               - adversarial_chat_singleturn
               - adversarial_chat_reasoning

   ConfigurationLoader._resolve_initializers calls
   instance.set_params_from_args(args=config.args) and then
   _validate_params against supported_parameters, so unknown keys fail
   fast at config-load time. Omitting the args block uses the default
   (fan over every adversarial-tagged target).

2. CLI (--list-initializers via frontend_core._print_initializer_meta):
   reads metadata.supported_parameters and prints name + description +
   default for each declared parameter, so users discover what they can
   put in .pyrit_conf without reading source.

3. GUI backend (InitializerService): wraps each declared parameter as an
   InitializerParameterSummary({name, description, default}) on the
   RegisteredInitializer Pydantic model. The GUI renders form fields
   from this metadata.

All three paths terminate at the same self.params dict that
initialize_async reads via self.params.get("target_names"), so adding,
renaming, or retyping the parameter is a single-site change.

target_names narrows fan-out to a subset of adversarial targets by
registry name; unknown names raise ValueError listing both the unknowns
and the discovered set. Empty discovery raises ValueError naming the
ADVERSARIAL_CHAT_* env vars and the TargetInitializer ordering
dependency (closes one of the failure-mode follow-ups surfaced in the
previous commit).

Failure modes audited during this change but not addressed here
(tracked for the PR description batch):
- AttackTechniqueRegistry.register_from_specs is first-write-wins on
  name collision with no log entry. Disjoint name spaces between
  ScenarioTechniqueInitializer and BenchmarkInitializer mean this is
  inert today; future extensions that produce colliding names would be
  silently no-op'd.
- BenchmarkInitializer's TargetRegistry walk is a snapshot at init
  time; later mutations to TargetRegistry leave the fanned specs
  holding stale references. Failure surfaces at API-call time, not at
  registration.
- Top-level vs components/ initializer placement convention is
  implicit; this commit picks "top-level for workflow profiles",
  matching AIRT/Simple. Worth a CONTRIBUTING note when convention is
  formalized.
…_refactor

Updating branch off fork to include latest commits.
Removes the local factory-construction override and the
adversarial_models constructor parameter. AdversarialBenchmark now
inherits the base Scenario._get_atomic_attacks_async loop and reads its
strategy enum from AttackTechniqueRegistry entries tagged
benchmark_fanout (registered by BenchmarkInitializer in the previous
commit).

What's removed:
- adversarial_models: list[PromptTarget] constructor param + validation.
- _adversarial_configs dict construction in __init__.
- _get_atomic_attacks_async override that built local factories,
  iterated models x techniques x datasets, and injected
  attack_adversarial_config_override at create-time.
- _infer_labels static method + the entire dedupe/collision-suffix loop
  that inferred model labels from target identifiers - replaced by
  TargetConfig.registry_name as the canonical label (set explicitly in
  ENV_TARGET_CONFIGS, no inference needed).
- _get_benchmarkable_specs and _build_benchmark_strategy as
  @staticmethods on the class - replaced by a module-level
  _build_benchmark_strategy function. Strategy-class construction never
  reads scenario instance state, so the function does not belong to the
  class; module-level placement makes the dependency (only the
  registry) explicit and the unit-test surface flat.

What's added:
- BENCHMARK_FANOUT_TAG module constant (= "benchmark_fanout") as the
  shared contract between BenchmarkInitializer (writes the tag) and
  AdversarialBenchmark (reads it).
- _StrategyOnlyMarker sentinel class to satisfy the required
  AttackTechniqueSpec.attack_class field when reconstructing minimal
  specs for strategy-enum construction. build_strategy_class_from_specs
  reads only name + strategy_tags, so the sentinel never reaches a
  runtime construction site; the real factory is fetched by name from
  the registry at attack-execution time.
- _build_display_group override: extracts the target label from the
  fanned f"src__target" technique name so display rolls up per-model.
  Falls back to the full name when no __ separator is present.

Where the (technique x target x dataset) permutation now happens:

The pre-collapse override did all three dimensions at scenario runtime
in one nested loop. Post-collapse the permutation is split across two
stages, owned by different layers:

1. Initializer time - BenchmarkInitializer.initialize_async runs the
   (technique x adversarial-target) cross-product and registers one
   fanned AttackTechniqueFactory per pair into AttackTechniqueRegistry,
   tagged benchmark_fanout. Target binding lives on the factory.
2. Scenario runtime - Scenario._get_atomic_attacks_async (inherited,
   base class) runs the (fanned-variant x dataset) cross-product,
   building one AtomicAttack per pair. The target dimension is already
   resolved on the factory at this point.

Net atomic-attack count is unchanged for the same inputs; the change
is which layer owns which dimension. See the AdversarialBenchmark
class docstring for the full explanation.

VERSION bump 1 -> 2:

The atomic_attack_name format changes from
f"{technique}__{model}__{dataset}" (triple-segment, old
override-driven) to f"{technique}__{model}_{dataset}" (double-then-
single-underscore, base-inherited). Cached results from VERSION=1
remain queryable via memory.get_scenario_results(scenario_version=1)
but won't suppress fresh runs with skip_cached=True (the param itself
lands in the next commit). No CHANGELOG file in this repo; this note
will land in the PR description.

Doc notebook (doc/scanner/benchmark.{py,ipynb}) still references the
removed adversarial_models API and will fail at runtime until Commit 9
rewrites it; deferred per plan F7. Not gated by any unit test.

Tests rewritten end-to-end (619 -> 268 lines): see test_adversarial.py
for the four test classes covering metadata, strategy construction,
collapsed init surface, and display grouping.

Wider regression: 1091/1091 pass across scenario+setup+registry;
547/547 pass in backend.
Adds a skip_cached: bool = False constructor parameter and a thin
_get_atomic_attacks_async override on AdversarialBenchmark that, when
enabled, filters out atomic-attack candidates whose
(atomic_attack_name, technique_eval_hash) tuple appears in any prior
COMPLETED ScenarioResult for the same scenario name + VERSION with
outcome SUCCESS or FAILURE. ERROR and UNDETERMINED outcomes always
retry. Caching is off by default to preserve existing behavior.

Built on the AttackResultAttribution primitives introduced in microsoft#1758:
- AtomicAttack.technique_eval_hash provides the candidate side of the
  cache key (content-derived via AtomicAttackEvaluationIdentifier).
- AttackResultEntry.attribution_data['parent_collection' +
  'parent_eval_hash'] provides the persisted side; the executor stamps
  these per AttackResult, so two atomic attacks sharing a name but
  using different technique configurations don't cross-pollinate.

Defensive behavior:
- Missing attribution_data or missing parent_collection -> skip the row
  silently (treat as not-cached).
- Memory exceptions from get_scenario_results / get_attack_results ->
  log a warning and fall back to no filtering. Caching becomes a no-op
  rather than blocking the run.
- Scenarios in IN_PROGRESS / FAILED / CANCELLED state contribute
  nothing (no get_attack_results query made for them at all).
- Scenario name is matched on type(self).__name__ (PascalCase
  "AdversarialBenchmark"), aligned with how ScenarioIdentifier stores
  it; VERSION filter ensures the VERSION bump in the previous commit
  invalidates old VERSION=1 results for cache purposes (they remain
  queryable; they just don't suppress fresh runs).

Tests: 11 new unit tests (TestAdversarialBenchmarkSkipCachedFilter +
TestAdversarialBenchmarkSkipCachedInit) covering filtering semantics,
outcome filters, eval-hash disambiguation, scenario-state filter,
query-arg shape, missing-attribution defense, memory-error defense,
and constructor defaults. Integration test with full persistence
round-trip is a separate follow-up commit (F6.3 per plan).

Wider regression: 1649/1649 pass across scenario+setup+registry+
backend.

Failure mode flagged for the PR description batch:
- The override + helper are scenario-agnostic in shape and should
  probably live on base Scenario behind a duck-typed identity hook
  (e.g. cls.cache_scope_name() classmethod) so other scenarios
  (RapidResponse, Scam, etc.) can opt into skip_cached without
  copy-pasting the wrapper. Enhancement, not a bug; tracked as
  lift-skip-cached-to-base-scenario.
Stage 1 of the scorer-flexibility refactor: widens the parameter
annotation on AdversarialBenchmark.__init__ from TrueFalseScorer | None
to Scorer | None, while preserving the existing runtime contract via
an isinstance(resolved, TrueFalseScorer) guard that raises TypeError
with a pointer at the new-scoring follow-up. Forward-compatible: when
stage 2 lands and AttackScoringConfig + atomic-attack types are widened
to Scorer, removing the guard is the only change needed here.

Why widen the annotation now rather than wait for stage 2:
- Lets the follow-up PR be a behavior change (drop the guard, wire the
  new scorer path through AttackScoringConfig) without a parameter
  signature change. Users coding to AdversarialBenchmark.__init__'s
  signature see the eventual contract today.
- Self-documents the planned direction in IDE tooling and
  --list-scenarios output.
- TypeError message names the constraint AND points readers at the
  follow-up so the broken case isn't silent.

Out of scope (stage 2, separate follow-up):
- AttackScoringConfig.objective_scorer widening
- Atomic attack types' objective_scorer widening
- pyrit.scenario.core.scenario casts at lines :778, :990, :1034
- Removing this guard

Tests: 3 new (test_objective_scorer_annotation_is_scorer,
test_construct_accepts_truefalse_scorer_subclass,
test_non_truefalse_scorer_raises_typeerror_with_pointer). Existing
default-scorer / explicit-scorer init tests already cover the happy
TrueFalseScorer path.

Wider regression: 1652/1652 pass across scenario+setup+registry+
backend. Pre-commit clean.
@ValbuenaVC ValbuenaVC changed the title [DRAFT] FEAT: Adversarial Benchmark Scenario Refactor FEAT: Adversarial Benchmark Scenario Refactor May 21, 2026
@ValbuenaVC ValbuenaVC marked this pull request as ready for review May 21, 2026 20:11
Victor Valbuena added 2 commits May 21, 2026 13:16
Adds DEFAULT_INITIALIZERS + SCENARIO_INITIALIZERS to
tests/end_to_end/test_scenarios.py so scenarios that need scenario-
specific initialization (post-collapse benchmark.adversarial needs
BenchmarkInitializer to fan adversarial techniques across registry-
discovered targets) can opt into a longer initializer list without
forcing every other scenario to load the same extras.

Default for every scenario: ["target", "load_default_datasets"]
(unchanged from prior behavior).
Override for benchmark.adversarial: defaults + ["benchmark"], so
BenchmarkInitializer runs after TargetInitializer has populated
TargetRegistry with the ADVERSARIAL-tagged env-driven targets.

Plan-vs-reality fix caught during implementation: the plan referred to
the scenario key as "adversarial_benchmark", but the actual
ScenarioRegistry name (used by pyrit_scan) is the dotted module path
"benchmark.adversarial", mirroring "airt.cyber" / "garak.encoding". The
override map uses the dotted form. Comment in the file pins the
convention so future overrides don't hit the same gotcha.

E2e tests are not part of CI; they run via make end-to-end-test on
developer machines that have ADVERSARIAL_CHAT_* env vars set. When the
env vars are absent, BenchmarkInitializer surfaces the actionable
error message added in Commit 4 (closes failure_mode_followup
no-adversarial-model-clear-error, also referenced in Commit 4 body).

No regression run included: e2e tests require live API credentials.
Smoke-tested with pytest --collect-only (9 scenarios, including
benchmark.adversarial) and a manual resolution check that
_initializers_for("benchmark.adversarial") returns the override list
while _initializers_for("airt.cyber") falls back to defaults.
Rewrites doc/scanner/benchmark.{py,ipynb} end-to-end around the new
registry-driven flow. The previous notebook constructed
AdversarialBenchmark with adversarial_models=[OpenAIChatTarget()],
which no longer exists after the collapse.

New notebook content:
- Prerequisites: ADVERSARIAL_CHAT_* env vars (plus optional
  _SINGLETURN / _MULTITURN / _REASONING variants).
- CLI quickstart: pyrit_scan benchmark.adversarial --initializers
  target load_default_datasets benchmark --target openai_chat ...
- Setup cell: initialize_pyrit_async with TargetInitializer +
  ScorerInitializer + LoadDefaultDatasets + BenchmarkInitializer (in
  that order, since BenchmarkInitializer reads TargetRegistry).
- Run cell: AdversarialBenchmark() with no model args; default "light"
  strategy.
- Cross-run caching cell: AdversarialBenchmark(skip_cached=True);
  documents (atomic_attack_name, technique_eval_hash) cache key,
  SUCCESS/FAILURE-only caching, ERROR/UNDETERMINED retry semantics,
  and the "add new adversarial targets incrementally" use case.
- Narrowing the fan-out: BenchmarkInitializer.set_params_from_args
  with target_names = [...].
- .pyrit_conf bootstrap: full YAML with initializer ordering.
- Scorer flexibility: documents the widened Scorer | None annotation
  and the TrueFalseScorer-only runtime contract for stage 1.

Per microsoft.github.io/PyRIT/contributing/notebooks/ the .ipynb is
generated from the .py via jupytext; this commit regenerates the
.ipynb to match the new .py source. Both committed pre-execution
(no real output cells). Maintainers running pct_to_ipynb.py before
the next release will re-execute against real endpoints; doing so
now would require ADVERSARIAL_CHAT_* env vars set in this dev env.

Downstream: the published doc page at
https://microsoft.github.io/PyRIT/scanner/benchmark/ is built from
doc/scanner/benchmark.py and will update automatically when this PR
merges into main.

Companion test (smoke-run the notebook with mocked targets) is
deferred to its own follow-up commit (F6.2 / scanner-notebook-test
per plan).
@ValbuenaVC ValbuenaVC requested review from Copilot, rlundeen2 and romanlutz and removed request for Copilot May 21, 2026 20:23
Upstream 23e2aa6 (DOC strict build) tightened ruff TC rules. RegistryEntry and TagQuery are only used in type annotations on TargetRegistry, so they belong in the if TYPE_CHECKING: block. Pre-commit clean across all PR-touched files after this fix.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the adversarial benchmarking workflow to be registry-driven and more resumable, primarily by introducing a benchmark initializer that fans adversarial-capable techniques across ADVERSARIAL-tagged targets and by improving target tagging/discovery to support that flow end-to-end.

Changes:

  • Add BenchmarkInitializer to discover ADVERSARIAL-tagged targets and register per-model fanned AttackTechniqueSpecs into AttackTechniqueRegistry.
  • Fix TargetInitializer to propagate TargetConfig.tags into TargetRegistry and add a guard to skip model-configured targets when the per-target model env var is missing.
  • Extend TargetRegistry with get_by_tag_query (TagQuery-based key matching) and add/expand unit & e2e tests plus updated benchmark documentation and env examples.

Reviewed changes

Copilot reviewed 14 out of 33 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
pyrit/setup/initializers/components/targets.py Registers new ADVERSARIAL chat variants; propagates config.tags into the registry; adds a skip-with-warning guard for missing per-target model env vars.
pyrit/setup/initializers/benchmark.py New initializer to discover adversarial targets via tag query and fan out adversarial-capable technique specs across them.
pyrit/setup/initializers/__init__.py Exports BenchmarkInitializer from the initializers package.
pyrit/registry/object_registries/target_registry.py Adds get_by_tag_query to enable TagQuery-based discovery of tagged targets.
tests/unit/setup/test_targets_initializer.py Adds coverage for tag propagation, idempotency, unique registry names, and new ADVERSARIAL_CHAT variants.
tests/unit/setup/test_benchmark_initializer.py New unit tests for benchmark initializer discovery, fan-out naming/tags, idempotency, and narrowing behavior.
tests/unit/registry/test_target_registry.py Adds characterization for duplicate-name overwrites; adds get_by_tag_query test coverage including composite queries and key-only semantics.
tests/unit/scenario/benchmark/test_adversarial.py Rewritten tests for the collapsed AdversarialBenchmark shape, registry-driven strategies, display grouping, caching behavior, and scorer-typing guard.
tests/end_to_end/test_scenarios.py Adds per-scenario initializer override map so benchmark.adversarial runs with benchmark initializer in e2e.
doc/scanner/benchmark.py Updates benchmark documentation to the new env-driven registry + initializer workflow, including caching and fan-out narrowing.
doc/scanner/benchmark.ipynb Synchronized notebook update matching the rewritten benchmark.py content.
.env_example Adds ADVERSARIAL chat variant env var groups (SINGLETURN/MULTITURN/REASONING) for discovery and fan-out.
tests/unit/scenario/core/test_strategy_validation.py Adds tests around composite strategy naming and ScenarioCompositeStrategy deprecation warnings.
tests/unit/scenario/core/test_scenario_strategy_invariants.py Adds shared invariant tests for dynamically generated strategy enums.
tests/unit/scenario/core/test_scenario_partial_results.py Adds tests for scenario retry/resume behavior when atomic attacks return partial results.
tests/unit/scenario/core/test_dataset_configuration.py Adds comprehensive unit tests for DatasetConfiguration behaviors (data sources, sampling, error cases).
tests/unit/scenario/core/test_baseline_deprecation.py Adds tests for deprecated baseline constructor shims and their runtime behavior.
tests/unit/scenario/core/test_attack_technique.py Adds unit tests for AttackTechnique initialization and identifier behavior.
tests/unit/scenario/core/test_attack_technique_factory.py Adds extensive unit tests for AttackTechniqueFactory validation, creation, identifier hashing, and scorer override policies.
tests/unit/scenario/garak/test_encoding.py Adds/rewrites Encoding scenario tests and a baseline-uniformity regression test under max_dataset_size.
tests/unit/scenario/airt/test_cyber.py Updates Cyber scenario tests for technique registry pattern and dynamic strategy behavior.
tests/unit/scenario/airt/test_jailbreak.py Updates Jailbreak scenario tests around baseline behavior, many-shot patching, and strategy execution.
tests/unit/scenario/airt/test_leakage.py Adds/updates Leakage scenario tests for dynamic strategies and baseline policy expectations.
tests/unit/scenario/airt/test_scam.py Updates Scam scenario tests, including supported parameter plumbing and baseline-uniformity regression test.
tests/unit/scenario/airt/test_psychosocial.py Adds/updates Psychosocial scenario tests, including capability requirement validation and baseline-uniformity regression test.

Comment thread .env_example
Comment thread doc/scanner/benchmark.py
Victor Valbuena and others added 5 commits May 28, 2026 16:50
…also in PR (1811), but its components were incorrectly added to this PR.
The previous cleanup commit (31ed2fb) removed the pyrit/tools/ package and tests/unit/tools/ directory, but several tool-calling changes from PR microsoft#1811 (MCP) remained mixed in:

- pyrit/exceptions: ToolCallNotSupported and ToolCallLoopLimitExceeded

- pyrit/prompt_target/common/: @tool_loop decoration on send_prompt_async, supports_tool_use capability, tool_event_policy and tool_backend slots on TargetConfiguration

- pyrit/prompt_target/openai/openai_response_target.py: migration onto @tool_loop + LocalToolBackend (the in-class agentic loop was removed in favor of the decorator)

- tests/integration/tools/ and tests/unit/prompt_target/target/test_openai_response_target_c6_migration.py

- pyproject.toml + uv.lock: mcp Python SDK dependency

All of the above are reverted to origin/main. The adversarial benchmark refactor (this PR's actual scope) is unaffected; 128 targeted unit tests across openai_response_target, function_chaining, and scenario/benchmark still pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Comment thread pyrit/analytics/result_analysis.py
Comment thread pyrit/scenario/scenarios/benchmark/adversarial.py Outdated
Comment thread pyrit/scenario/scenarios/benchmark/adversarial.py Outdated
Comment thread pyrit/scenario/scenarios/benchmark/adversarial.py Outdated
Comment thread doc/scanner/benchmark.ipynb
@rlundeen2 rlundeen2 self-assigned this May 29, 2026
Victor Valbuena and others added 7 commits May 29, 2026 14:11
…pt factory registry API

- Replace SCENARIO_TECHNIQUES/AttackTechniqueSpec with AttackTechniqueFactory registry
- Add @cache to _build_benchmark_strategy; drop deleted classmethods
- Adopt default_strategy=/default_dataset_config= in super().__init__ (microsoft#1784 contract)
- Loop: factory.create(attack_adversarial_config_override=AttackAdversarialConfig(target=...))
- Rename skip_cached -> use_cached throughout
- Test fixture: use build_scenario_technique_factories() + mock adversarial_chat
  target (matches PR microsoft#1785 pattern); module-level constants from production catalog

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Resolves four real conflicts produced by upstream PR microsoft#1858
(Moving Identifiers to models) on top of our adversarial-benchmark
refactor:

- pyrit/identifiers/__init__.py: accept upstream's deprecation shim and
  forward ObjectiveTargetEvaluationIdentifier through it.
- pyrit/identifiers/evaluation_identifier.py: same treatment.
- tests/unit/analytics/test_result_analysis.py: keep our import set,
  but source everything from pyrit.models so we don't hit the shim.
- tests/unit/scenario/test_adversarial.py: 'git rm' the modify/delete
  (the file was moved to tests/unit/scenario/benchmark/test_adversarial.py
  earlier in this PR).

Ports our additions (ObjectiveTargetEvaluationIdentifier, OWN_RULE,
own_rule param on compute_eval_hash) into the new home at
pyrit/models/identifiers/evaluation_identifier.py and re-exports them
from pyrit.models.identifiers / pyrit.models.

Sweeps in-tree imports off the pyrit.identifiers shim
(adversarial.py, analytics/result_analysis.py, and the two test files).

172 targeted tests pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…Result

When use_cached=True, skipped atomic attacks now have their prior cached AttackResults attached to the live ScenarioResult instead of being silently dropped. This addresses the PR microsoft#1765 review comment that we still need to surface cached runs in the final scenario output (not just skip execution).

- _collect_cached_completion_pairs now stores per-hash cached results as a side effect for downstream lookup.

- _get_atomic_attacks_async filters cached rows by attribution_data['parent_collection'] so a cache hit from one dataset/target slot does not leak into another.

- run_async override merges _precomputed_cached_results into ScenarioResult.attack_results and updates _display_group_map.

- Adds TestRunAsyncCacheInjection (3 tests) and two TestSkipCachedFilter tests covering the full pipeline and parent_collection filtering.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Changes _collect_cached_completion_pairs to return a set of
�tomic_attack_names (instead of technique hashes) and filters
cached results per-slot using attribution_data['parent_collection'].
This fixes a bug where two atomic attacks sharing the same
technique+target hash (e.g. harmbench vs advbench) would incorrectly
share cache hits, causing one dataset to be skipped with empty results.

The dataset filter is a Python-side semantic filter, not a DB query,
since get_cached_results_for_technique has no attribution parameter.
Documented explicitly in the docstring.

Also adds two new cross-dataset real-memory regression tests to
verify harmbench and advbench results are correctly scoped.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@ValbuenaVC ValbuenaVC enabled auto-merge June 1, 2026 19:15
@ValbuenaVC ValbuenaVC added this pull request to the merge queue Jun 1, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to a conflict with the base branch Jun 1, 2026
@ValbuenaVC ValbuenaVC enabled auto-merge June 1, 2026 19:59
Victor Valbuena and others added 2 commits June 1, 2026 13:02
…nt CI collision

datetime.now().timestamp() can return the same float for two rapid
calls in CI, causing _dedup_attack_entries to discard one result
(dedup is by conversation_id). Use uuid4 to guarantee uniqueness.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@ValbuenaVC ValbuenaVC added this pull request to the merge queue Jun 1, 2026
Merged via the queue into microsoft:main with commit 9f55bfb Jun 1, 2026
48 checks passed
@ValbuenaVC ValbuenaVC deleted the adversarial_benchmark_refactor branch June 1, 2026 20:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants