FEAT: Define GCG extension protocols (typing surface only)#1861
Merged
romanlutz merged 8 commits intoJun 2, 2026
Conversation
Adds pyrit/auxiliary_attacks/gcg/extension_protocols.py containing four runtime_checkable Protocol classes that mark the algorithmic seams in the GCG optimization loop where a future caller may substitute custom behavior: - SamplingStrategy.sample_candidates — abstracts GCGPromptManager.sample_control - LossFunction.compute_loss — abstracts the weighted target/control CE - CandidateFilter.filter_candidates — abstracts MultiPromptAttack.get_filtered_cands - SuffixInitializer.make_initial_suffix — abstracts the literal control_init plumbing This PR is pure typing surface: no concrete implementations, no defaults, no wiring into GCGAlgorithmConfig or GCGMultiPromptAttack. The default implementations (extracted byte-for-byte from current attack code with a parity gate) and the optional config fields that select between defaults and custom impls land in follow-up PRs. The module uses `from __future__ import annotations` plus a TYPE_CHECKING import for torch so it imports cleanly on the base `dev` extra (no torch), preserving the invariant added by commit 36aaaa3 in Sub-PR A. All four Protocols are re-exported from pyrit.auxiliary_attacks.gcg via the existing PEP 562 _LAZY_IMPORTS pathway so the public surface is consistent with how GCG / GCGGenerator / GCGContext / GCGResult are exposed. Tests cover module `__all__`, package re-export identity, runtime_checkable positive and negative isinstance, and a return-shape smoke test per protocol with a trivial in-test stub implementation. `pytest.importorskip("torch")` gates the whole file because the stubs construct real `torch.Tensor` arguments for the shape assertions. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Renames in pyrit/auxiliary_attacks/gcg/extension_protocols.py and the corresponding test stubs: control_toks -> control_tokens candidate_toks -> candidate_tokens nonascii_toks -> non_ascii_tokens (mirrors allow_non_ascii) topk -> top_k temp -> temperature control_len -> control_length (docstring shape annotations) The legacy GCGAlgorithmConfig fields (topk, temp) and the legacy attack code (GCGPromptManager.sample_control, get_filtered_cands) keep their existing names. Renaming those is a separate API change that belongs in the B3 wiring PR (where GCGAlgorithmConfig is extended anyway). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Completes the parameter-name spell-out pass on the four extension protocols (previous commit handled SamplingStrategy / CandidateFilter). `ids` is a common ML shorthand but `token_ids` is unambiguous and consistent with the other tokens-* parameters in the same module. Descriptive uses of the word `ids` in surrounding docstring prose are left as-is since they read naturally. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
rlundeen2
reviewed
Jun 2, 2026
rlundeen2
reviewed
Jun 2, 2026
rlundeen2
approved these changes
Jun 2, 2026
Per @rlundeen2's review on PR microsoft#1861: 1. Replace `References:` blocks that cited line ranges in `gcg_attack.py` / `attack_manager.py` with symbol-only references. Line numbers drift the moment the legacy attack code is touched (B3 wiring will do exactly that); symbol names are stable across the refactors that follow. 2. Re-type `SamplingStrategy.sample_candidates(temperature=)` as `float` instead of `int`. The protocol is a brand-new surface and was previously mirroring the legacy `GCGAlgorithmConfig.temp: int = 1` field for no good reason — sampling temperatures are conceptually continuous. The legacy field stays as-is; B3 wiring owns deciding whether to widen it or coerce at the boundary. The stub used by the runtime-checkable tests is updated to match, and the shape-smoke test now passes `temperature=1.0`. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
GCGAlgorithmConfig.temp goes from `int = 1` to `float = 1.0`. The matching parameter on the three downstream methods that still typed it as `int` is widened too: - GCGPromptManager.sample_control - GCGMultiPromptAttack.step - MultiPromptAttack.run The other two strategy `run` overloads (ProgressiveMultiPromptAttack, IndividualPromptAttack) were already `float = 1.0` — the pre-existing inconsistency is now resolved. Sampling temperature is conceptually continuous; typing it as int in a brand-new public-API field made no sense. The module is experimental, no deprecation cycle owed. Also updates the SamplingStrategy protocol docstring to drop the stale "kept for API compatibility with the legacy code path" framing in favour of a description of why the parameter exists (the default sampler ignores it, but custom strategies that want softmax weighting receive it). While here, replace seven Sphinx reST cross-reference roles (`:class:...`, `:meth:...`, `:func:...`) in `config.py` with plain double-backtick code spans. PyRIT renders docstrings with MyST, not Sphinx — these roles show up as raw literal text in the built docs and are now blocked by the `check-no-rest-roles` pre-commit hook. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The `check-no-rest-roles` pre-commit hook blocks `:class:Foo` patterns; PyRIT renders docstrings with MyST, not Sphinx, so those roles appear as raw literal text in the built docs. Two `:class:...` roles in the module-level docstring (`GCG`, `GCGGenerator`, `PromptGeneratorStrategy`) are replaced with plain double-backtick code spans per the documented convention. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds
pyrit/auxiliary_attacks/gcg/extension_protocols.pycontaining fourruntime_checkableProtocolclasses that mark the algorithmic seams in the GCG optimization loop where a future caller may substitute custom behavior:SamplingStrategy— how candidate suffix token sequences are proposed from the gradient. Current implementation: top-k by-grad, uniform pick within top-k (GCGPromptManager.sample_control).LossFunction— how candidate suffixes are scored against the target. Current implementation: weighted cross-entropy on target + control slices.CandidateFilter— how proposed candidates get pruned before evaluation. Current implementation: drops candidates whose decoded string re-tokenizes to a different token count (MultiPromptAttack.get_filtered_cands).SuffixInitializer— how the initial suffix string is constructed. Current implementation: literal string fromGCGAlgorithmConfig.control_init.Each protocol has a Google-style docstring with
Args:/Returns:and aReferences:section pointing at the symbols ingcg_attack.py/attack_manager.pyit abstracts.What this PR is not
Pure typing surface — zero behavior change, zero wiring. No concrete implementations of the protocols, no new fields on
GCGAlgorithmConfig, no dispatch inGCGMultiPromptAttack. The protocols are exposed so users can implement them; consuming them is left for follow-up work.Design notes
control_tokens,non_ascii_tokens,top_k,temperature,token_ids) rather than mirroring the legacy abbreviations.from __future__ import annotationsplus aTYPE_CHECKINGimport fortorchso it imports cleanly on installs that only have the basedevextra (no torch), preserving the invariant introduced by commit36aaaa31from FEAT: GCG public API - GCG + GCGConfig + ExperimentalWarning, shifts module to experimental status #1792.pyrit.auxiliary_attacks.gcgvia the existing PEP 562_LAZY_IMPORTSpathway so the public surface stays consistent with howGCG/GCGGenerator/GCGContext/GCGResultare exposed.LossFunctionowns its entire loss computation (criterion choice, slicing, and any weighted combination of target/control terms). The currenttarget_weight/control_weightknobs onGCGAlgorithmConfigkeep working unchanged.Drive-by cleanups (in response to review feedback)
GCGAlgorithmConfig.tempis widened fromint = 1tofloat = 1.0and the same change is propagated toGCGPromptManager.sample_control,GCGMultiPromptAttack.step, andMultiPromptAttack.run— the other two strategyrunoverloads were alreadyfloat, so this also resolves a pre-existing inconsistency. Module is experimental, no deprecation cycle.:class:,:meth:,:func:) inconfig.pyare replaced with plain double-backtick code spans, since PyRIT renders docstrings with MyST and thecheck-no-rest-rolespre-commit hook now blocks them.Tests
New
tests/unit/auxiliary_attacks/gcg/test_extension_protocols.pywith 19 parametrized test instances covering:__all__contents.extension_protocols).@runtime_checkable.isinstance(impl, ProtocolName).isinstancecheck for each protocol (catches accidental signature drift in future PRs).Gated with
pytest.importorskip("torch")since the stubs construct realtorch.Tensorarguments for the shape assertions.Full GCG unit suite still passes: 133/133 in
tests/unit/auxiliary_attacks/gcg/.