Skip to content

FIX: Integration Test Fixes#1907

Merged
ValbuenaVC merged 11 commits into
microsoft:mainfrom
ValbuenaVC:integration_test_fixes_2
Jun 3, 2026
Merged

FIX: Integration Test Fixes#1907
ValbuenaVC merged 11 commits into
microsoft:mainfrom
ValbuenaVC:integration_test_fixes_2

Conversation

@ValbuenaVC
Copy link
Copy Markdown
Contributor

@ValbuenaVC ValbuenaVC commented Jun 3, 2026

Description

This PR bundles three integration-test fixes plus a related notebook cleanup.

1. Crescendo system prompts hardened (replaces the original aliasing patch)

Original failure: 4_sequential_attack.ipynb integration test — CrescendoAttack._get_attack_prompt_async exhausted all retries and raised _StrategyRuntimeError. Root cause: the adversarial LLM (Azure OpenAI GPT-4o in CI) consistently returned rationale_behind_question instead of the schema-required rationale_behind_jailbreak, particularly on early turns where the "jailbreak" framing doesn't apply yet. Each retry failed identically with InvalidJsonException: Missing required keys {'rationale_behind_jailbreak'}.

Initial approach (reverted): A parser-level alias step that renamed rationale_behind_questionrationale_behind_jailbreak. Reviewers raised — fairly — that this kind of per-drift patch would snowball as models invent new variants (reasoning_behind_jailbreak, jailbreak_rationale, etc.).

Adopted fix: Tighten the upstream source instead. All 7 Crescendo system prompt YAMLs (crescendo_variant_1..5, escalation_crisis, therapist) now use explicit anti-paraphrase wording in the task section:

Format your response as a JSON object with exactly these three keys
(use these exact names — do not rename, abbreviate, or rephrase them):

The example JSON block is also normalized to valid JSON (quoted keys + string values) to remove a second source of model ambiguity. The parser itself was reverted to its pre-PR state — keeping only the camelCase tolerance added in #1862. Long-term, Hannah suggested wiring json_schema through prompt_metadata for targets advertising supports_json_schema; that's tracked separately.

2. barge_in_attack.ipynb_role AttributeError

Failure: 'MessagePiece' object has no attribute '_role'. After the Message → Pydantic migration (#1885), Pydantic's __getattr__ rejects unknown attributes, and the notebook had a debug print loop accessing the private piece._role.

Fix: Updated the notebook (and its .py jupytext source) to use the public piece.role attribute.

3. 3_adaptive_scenarios.ipynbAttackTechniqueRegistry is empty

Failure: RuntimeError: AttackTechniqueRegistry is empty. Register attack technique factories before executing scenarios... raised from LoadDefaultDatasets.initialize_asyncScenarioRegistry.list_metadata → instantiation of Cyber_build_cyber_strategyAttackTechniqueRegistry.get_factories_or_raise.

Root cause: The shared doc/scanner/pyrit_conf.yaml config used by 4 notebooks was missing the scenario_technique initializer in its initializer list. With nothing populating AttackTechniqueRegistry before LoadDefaultDatasets ran, every scenario instantiation that walks _build_cyber_strategy blew up.

Fix: Added - name: scenario_technique to doc/scanner/pyrit_conf.yaml, ordered between scorer and load_default_datasets. Initializers now execute in YAML list order (the old execution_order property is deprecated → removed in 0.16.0), so listing it before load_default_datasets guarantees the registry is populated in time. Benefits all 4 consumers of the YAML (garak, foundry, 1_common_scenario_parameters, 3_adaptive_scenarios).

4. Removed redundant workaround in 1_common_scenario_parameters.py

PR #1897 added a notebook-level await ScenarioTechniqueInitializer().initialize_async() call as a stopgap for the same registry-empty bug. Now that pyrit_conf.yaml is the source of truth, that workaround is dead code. Removed the import + call. Left 0_scenarios.py and 2_custom_scenario_parameters.py alone — they use manual initialize_pyrit_async("InMemory") rather than the YAML config, so the explicit ScenarioTechniqueInitializer call is still the right pattern there (preserves the "minimal manual setup" demo).

Tests and Documentation

  • Unit tests: Crescendo parser is reverted to its pre-PR state, so no test changes needed for fix Migrate PyRIT to open source repository #1. The temporary test_parse_adversarial_response_accepts_rationale_behind_question_alias was removed alongside the alias logic.
  • Integration tests run locally (require live Azure endpoints):
    • tests/integration/executors/test_executor_notebooks.py::test_execute_notebooks[4_sequential_attack.ipynb] ✅ (178s)
    • tests/integration/executors/test_executor_notebooks.py::test_execute_notebooks[barge_in_attack.ipynb] ✅ (104s)
    • tests/integration/scenarios/test_notebooks_scenarios.py — all 4 notebooks ✅ (480s)
  • JupyText: ran uv run jupytext --execute --to notebook on 1_common_scenario_parameters.py and 3_adaptive_scenarios.py to verify both regenerated .ipynb files execute end-to-end against live endpoints.
  • pre-commit: clean (after auto-fix hooks stripped notebook outputs and sanitized paths).

Victor Valbuena and others added 3 commits June 3, 2026 10:56
Some adversarial LLMs return 
ationale_behind_question instead of

ationale_behind_jailbreak — semantically equivalent but different key
name. This caused all JSON retries to fail with InvalidJsonException,
eventually raising _StrategyRuntimeError and crashing the
4_sequential_attack.ipynb notebook.

The fix extends _parse_adversarial_response with a key-alias step
(applied after camelCase normalisation) that renames

ationale_behind_question -> 
ationale_behind_jailbreak before
schema validation, matching the same pattern used to accept both
camelCase and snake_case keys.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@jsong468
Copy link
Copy Markdown
Contributor

jsong468 commented Jun 3, 2026

Description

Fixes a recurring integration test failure in 4_sequential_attack.ipynb where CrescendoAttack exhausted all retries and raised _StrategyRuntimeError.

What Crescendo does: _get_attack_prompt_async sends a prompt to a separate adversarial chat LLM (not the target being red-teamed). That LLM's job is to generate the next attack question. It is instructed — via the Crescendo system prompt YAMLs — to respond with a JSON object containing three keys: generated_question, last_response_summary, and rationale_behind_jailbreak.

What went wrong: The adversarial LLM (Azure OpenAI GPT-4o in CI) would sometimes respond with rationale_behind_question instead of rationale_behind_jailbreak — particularly on early turns where it is generating an innocent, non-jailbreaking question and the "jailbreak" framing doesn't apply yet. The existing parser treated this as a schema violation (InvalidJsonException: Missing required keys {'rationale_behind_jailbreak'}), triggering a retry. Since the model kept using the same key name, all retries failed, and the attack raised _StrategyRuntimeError.

The fix: _parse_adversarial_response already normalizes camelCase keys to snake_case before validation (added in #1862). This PR adds a second normalization step — applied after camelCase conversion — that renames rationale_behind_question to rationale_behind_jailbreak if the canonical key is absent. The alias is stripped from the output before the extra-key check, so it doesn't loosen the schema in any other way.

Tests and Documentation

  • Added test_parse_adversarial_response_accepts_rationale_behind_question_alias in tests/unit/executor/attack/multi_turn/test_crescendo.py — verifies that a response with rationale_behind_question parses successfully and returns the correct question string.
  • No notebook or doc changes required; the fix is entirely in the parser layer.
  • JupyText not run locally (requires live Azure endpoints); this is the same notebook that was failing in CI.

Everything looks good for this PR, I was just wondering if you considered tweaking the system prompt instead to enforce adherence to the key? Just wondering at what point we reach "hacky" territory to parse the adversarial LLM response haha

Comment thread pyrit/executor/attack/multi_turn/crescendo.py Outdated
Comment thread pyrit/executor/attack/multi_turn/crescendo.py Outdated
@hannahwestra25 hannahwestra25 self-assigned this Jun 3, 2026
ValbuenaVC and others added 4 commits June 3, 2026 14:30
Replaces the parser-level alias workaround (microsoft#1907 review feedback) with
upstream prevention: all 7 Crescendo system prompt variants now
explicitly instruct the adversarial LLM to use the exact key names and
not to rename, abbreviate, or rephrase them. The example JSON block is
also normalized to valid JSON syntax (quoted keys and string values)
to reduce ambiguity.

Changes:
- Revert _parse_adversarial_response to its pre-aliasing state; remove
  _key_aliases dict and rename loop.
- Tighten format instructions in all 7 crescendo YAMLs (variants 1-5,
  escalation_crisis, therapist).
- Drop the alias-specific unit test.
- Trim the parser docstring; the camelCase tolerance is the only
  remaining nuance worth calling out.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…aVC/PyRIT into integration_test_fixes_2

Merging to ensure integration test fixes apply to latest main.
@ValbuenaVC ValbuenaVC changed the title FIX: Accept rationale_behind_question alias in Crescendo parser FIX: Integration Test Fixes Jun 3, 2026
@ValbuenaVC ValbuenaVC added this pull request to the merge queue Jun 3, 2026
Merged via the queue into microsoft:main with commit 2fe634b Jun 3, 2026
52 checks passed
@ValbuenaVC ValbuenaVC deleted the integration_test_fixes_2 branch June 3, 2026 23:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants