Skip to content

FEAT text adaptive scenario#1760

Merged
hannahwestra25 merged 53 commits into
microsoft:mainfrom
hannahwestra25:hawestra/text_adaptive_scenario
Jun 3, 2026
Merged

FEAT text adaptive scenario#1760
hannahwestra25 merged 53 commits into
microsoft:mainfrom
hannahwestra25:hawestra/text_adaptive_scenario

Conversation

@hannahwestra25
Copy link
Copy Markdown
Contributor

@hannahwestra25 hannahwestra25 commented May 19, 2026

Add Adaptive Scenario Framework with TextAdaptive

Summary

Adds an adaptive scenario framework that picks attack techniques per-objective using an epsilon-greedy bandit informed by observed success rates, instead of running every technique against every objective. Concentrates spend on what works against the target and stops on first success — O(max_attempts × objectives) instead of O(techniques × objectives).

Includes:

  • AdaptiveScenario (modality-agnostic base) + TextAdaptive (text subclass)
  • TechniqueSelector Protocol + EpsilonGreedyTechniqueSelector (Laplace-smoothed, pooled cold-start backoff, thread-safe)
  • AdaptiveTechniqueDispatcher — a thin factory (not an AttackStrategy) that selects techniques up-front per objective and returns a ready-to-run SequentialAttack(FIRST_SUCCESS)
  • Walkthrough notebook + unit tests across selector, dispatcher, scenario, and analytics

How it works

For each objective:

  1. Select — ε-greedy over Laplace-smoothed (s+1)/(n+1) estimates; cells with fewer than pool_threshold observations back off to the technique's pooled rate. Each decision derives a per-decision RNG from SHA-256(seed|context|decision_key) for resume-safe reproducibility.
  2. Wrap — the chosen max_attempts_per_objective techniques become children of one SequentialAttack(FIRST_SUCCESS), wrapped in one AtomicAttack. Selection happens once at initialize_async; execution is plain framework code.
  3. Record — selector updates (context, technique) → (s, n) from persisted child rows after the run.
from pyrit.scenario.scenarios.adaptive import TextAdaptive, EpsilonGreedyTechniqueSelector

scenario = TextAdaptive(
    selector=EpsilonGreedyTechniqueSelector(epsilon=0.3, random_seed=42),
)
scenario.set_params_from_args(args={"max_attempts_per_objective": 5})
await scenario.initialize_async(objective_target=target)
result = await scenario.run_async()

Resumable via scenario_result_id="..." — prior dispatch trails replay into the selector before the remainder runs.

Notes

  • One AtomicAttack per objectiveatomic_attack_name = "{prefix}_{dataset}::{objective_sha[:12]}", display_group = dataset_name preserves dataset grouping for reporting.
  • All dispatchers share one selector — learning accumulates globally across datasets.
  • BASELINE_ATTACK_POLICY = Enabledprompt_sending runs as the baseline comparison rather than as an adaptive technique.
  • Selector is a constructor kwarg — selector-specific params (epsilon, pool_threshold, random_seed) live on the selector; max_attempts_per_objective is a scenario parameter.

Updates since initial review

  • Dispatcher is a plain factory (not an AttackStrategy) — execution and persistence are entirely the executor's responsibility via SequentialAttack (depends on FEAT add StrategySequenceAttack compound attack primitive #1819). No custom envelope subclass, no run_async override.
  • No custom metadata stamping (addresses thread A) — the per-attempt trail is reconstructed from each child's auto-stamped atomic_attack_identifier; the notebook reads child.get_attack_strategy_identifier().unique_name directly. New MemoryInterface.get_attack_results(atomic_attack_eval_hashes=...) kwarg + compute_inner_attack_eval_hash helper (regression-tested so predictor and executor never diverge).
  • Per-factory scoring configAdaptiveScenario constructs each factory's narrowed AttackScoringConfig subtype (e.g. TAPAttackScoringConfig); incompatible techniques are skipped with a warning instead of failing the run.
  • SeedAttackGroup.with_technique — deep-copies merged seeds so reused seed_techniques don't leak prompt_group_id mutations.
  • CLI ergonomics — new --request-timeout; poll endpoint uses read=None so a busy server doesn't fail the run; httpx.ReadTimeout now points at the flag instead of a bare Error:.

Testing

pytest tests/unit/{scenario/scenarios/adaptive,analytics,identifiers,memory,cli} — all green.

Comment thread pyrit/scenario/scenarios/adaptive/text_adaptive.py
Comment thread pyrit/scenario/scenarios/adaptive/adaptive_scenario.py Outdated
Copy link
Copy Markdown
Contributor

@rlundeen2 rlundeen2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great!

Comment thread pyrit/scenario/scenarios/adaptive/adaptive_scenario.py Outdated
Comment thread pyrit/scenario/scenarios/adaptive/adaptive_scenario.py Outdated
Comment thread pyrit/scenario/scenarios/adaptive/adaptive_scenario.py Outdated
Comment thread pyrit/scenario/scenarios/adaptive/adaptive_scenario.py Outdated
Comment thread pyrit/scenario/scenarios/adaptive/adaptive_scenario.py Outdated
Comment thread pyrit/scenario/scenarios/adaptive/adaptive_scenario.py Outdated
Comment thread pyrit/scenario/scenarios/adaptive/selectors/epsilon_greedy.py Outdated
Comment thread pyrit/scenario/scenarios/adaptive/adaptive_scenario.py
hannahwestra25 and others added 3 commits May 21, 2026 15:56
- Remove prompt_sending from adaptive pool; enable baseline comparison
- Expose max_attempts_per_objective via supported_parameters() (scam.py pattern)
- Rename AdaptiveTechniqueSelector -> EpsilonGreedyTechniqueSelector
- Extract TechniqueSelector Protocol; accept custom selector via kwarg
- Per-decision RNG derivation (SHA-256) for resume reproducibility
- Drop uuid.uuid4() fallback for objective IDs
- Per-dataset atomic attacks (one AtomicAttack per dataset, not per objective)
- AdaptiveDispatchParams with per-call seed_group and compatibility filtering
- Context extraction moved to dispatcher
- Rehydration uses get_attack_results with attribution_data filtering
- Split selector.py into selectors/ folder (protocol.py + epsilon_greedy.py)
- Update notebooks for new API patterns

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Comment thread pyrit/scenario/scenarios/adaptive/selectors/protocol.py Outdated
hannahwestra25 added 2 commits June 1, 2026 18:33
@hannahwestra25 hannahwestra25 force-pushed the hawestra/text_adaptive_scenario branch from f0db060 to 929d398 Compare June 1, 2026 23:19
…Pydantic identifiers)

Four related fixes needed to unblock CI after the upstream merge that
brought in PR microsoft#1881 (Refactoring Identifiers to be Pydantic classes):

1. `pyrit/models/identifiers/evaluation_identifier.py`:
   The merge hoisted `from pyrit.executor.attack.core.attack_strategy
   import AttackStrategy` to module level, forming a cycle through
   `pyrit.executor.attack` -> `pyrit.message_normalizer` ->
   `pyrit.common.data_url_converter` -> `pyrit.models`. Move it back
   inside `if TYPE_CHECKING:` (`from __future__ import annotations`
   is already enabled, so the string annotation `attack: AttackStrategy`
   still resolves at type-check time).

2. `tests/unit/models/identifiers/test_evaluation_identifier.py`:
   Add missing blank lines between top-level classes (ruff format) and
   replace four `from pyrit.identifiers import ...` lines with
   `from pyrit.models.identifiers import ...` (the former is now a
   deprecation shim that the static-scan deprecation test forbids
   internal callers from using).

3. `pyrit/scenario/scenarios/adaptive/adaptive_scenario.py`:
   Mark the three classmethod stubs as `@abstractmethod` so
   `inspect.isabstract(AdaptiveScenario)` returns `True` and the
   scenario registry's auto-discovery skips it (otherwise the registry
   tries to instantiate the abstract base and raises
   `NotImplementedError`, breaking `test_load_default_datasets`).

4. `pyrit/scenario/scenarios/adaptive/adaptive_scenario.py` and
   `pyrit/scenario/scenarios/adaptive/text_adaptive.py`:
   Move `from pyrit.setup.initializers.components.scenario_techniques
   import build_scenario_technique_factories` from module level back
   into the function body. `scenario_techniques` imports
   `pyrit.scenario.core`, which transitively re-imports the adaptive
   package during `pyrit.scenario` initialization, so a top-level
   import forms a cycle.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@hannahwestra25 hannahwestra25 force-pushed the hawestra/text_adaptive_scenario branch 2 times, most recently from f1b6373 to 77a3704 Compare June 2, 2026 16:43
hannahwestra25 and others added 8 commits June 2, 2026 13:28
…n SequentialAttack

Replaces the previous AdaptiveDispatchAttack (an AttackStrategy subclass that
delegated to an internal SequentialAttack) with a slim factory + subclass:

  - AdaptiveTechniqueDispatcher: plain class, not an AttackStrategy. Exposes
    compatible_techniques(seed_group=...) and async build_attack_async(seed_group=...).
    Selects techniques up-front per objective via TechniqueSelector and returns
    a fully-wired AdaptiveSequentialAttack.

  - AdaptiveSequentialAttack: ~15-LoC SequentialAttack subclass. Adds a
    technique_labels constructor argument and stamps adaptive_attempts
    metadata onto the envelope returned by super()._perform_async, then
    delegates everything else to the framework.

Atomic-attack shape:
  - One AtomicAttack per (dataset, seed_group) instead of one per dataset.
  - atomic_attack_name = `{prefix}_{dataset}::{objective_sha[:12]}` so each
    objective gets its own deterministic, hash-disambiguated identifier.
  - display_group = dataset_name preserves the grouping for reporting.
  - All per-dataset dispatchers share the same TechniqueSelector instance so
    learning still accumulates globally.

Rationale: eliminates the dispatcher's coupling to SequentialAttack's private
lifecycle internals (no more manual `_perform_async` orchestration, no more
duplicating envelope/metadata wiring) and removes a layer of indirection that
was making the call graph hard to reason about. The dispatcher is now an
`envelope factory'', not an attack.

All 56 adaptive unit tests pass (verified via docker devcontainer pytest run).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Comment thread pyrit/scenario/scenarios/adaptive/selectors/epsilon_greedy.py Outdated
Comment thread pyrit/scenario/scenarios/adaptive/adaptive_scenario.py
@hannahwestra25 hannahwestra25 enabled auto-merge June 3, 2026 19:04
@hannahwestra25 hannahwestra25 added this pull request to the merge queue Jun 3, 2026
Merged via the queue into microsoft:main with commit 519b709 Jun 3, 2026
52 checks passed
romanlutz added a commit to romanlutz/PyRIT that referenced this pull request Jun 4, 2026
Merge 26 commits from main, including:
- MAINT Breaking: Convert ScenarioResult to Pydantic (microsoft#1908)
- MAINT: Migrating Seed classes to Pydantic (microsoft#1898)
- MAINT: Migrating AttackResult to Pydantic (microsoft#1899)
- MAINT: Bump ty-pre-commit v0.0.32 -> 0.0.43 (microsoft#1919)
- FEAT: Realtime streaming session support and server-side barge-in attack (microsoft#1766)
- FEAT text adaptive scenario (microsoft#1760)
- FIX: Integration Test Fixes (microsoft#1907)
- DOC: Scoring Docs Refactor (microsoft#1892)
- Various dependency bumps

Conflicts (15 files) resolved by taking main's version + re-running
ruff --fix to re-apply PEP 604 typing modernization on the incoming code
(177 violations auto-fixed). All resolved files re-staged.

Local verification:
- ruff check: All checks passed
- ruff format: clean
- pytest tests/unit -n 8: 9550 passed, 6 skipped

Known issue (pre-existing on main, not caused by this merge):
- ty 0.0.43 enabled missing-override-decorator rule, which flags hundreds
  of pre-existing methods across the codebase. Main's own CI is currently
  failing on this. Our PR will inherit the same failure since touched
  files come into pre-commit scope. Fixing this rule globally is a
  separate, large mechanical change orthogonal to typing modernization.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants