Skip to content

FEAT: Define GCG extension protocols (typing surface only)#1861

Merged
romanlutz merged 8 commits into
microsoft:mainfrom
romanlutz:romanlutz/gcg-extension-protocols
Jun 2, 2026
Merged

FEAT: Define GCG extension protocols (typing surface only)#1861
romanlutz merged 8 commits into
microsoft:mainfrom
romanlutz:romanlutz/gcg-extension-protocols

Conversation

@romanlutz
Copy link
Copy Markdown
Contributor

@romanlutz romanlutz commented May 30, 2026

Summary

Adds pyrit/auxiliary_attacks/gcg/extension_protocols.py containing four runtime_checkable Protocol classes that mark the algorithmic seams in the GCG optimization loop where a future caller may substitute custom behavior:

  • SamplingStrategy — how candidate suffix token sequences are proposed from the gradient. Current implementation: top-k by -grad, uniform pick within top-k (GCGPromptManager.sample_control).
  • LossFunction — how candidate suffixes are scored against the target. Current implementation: weighted cross-entropy on target + control slices.
  • CandidateFilter — how proposed candidates get pruned before evaluation. Current implementation: drops candidates whose decoded string re-tokenizes to a different token count (MultiPromptAttack.get_filtered_cands).
  • SuffixInitializer — how the initial suffix string is constructed. Current implementation: literal string from GCGAlgorithmConfig.control_init.

Each protocol has a Google-style docstring with Args: / Returns: and a References: section pointing at the symbols in gcg_attack.py / attack_manager.py it abstracts.

What this PR is not

Pure typing surface — zero behavior change, zero wiring. No concrete implementations of the protocols, no new fields on GCGAlgorithmConfig, no dispatch in GCGMultiPromptAttack. The protocols are exposed so users can implement them; consuming them is left for follow-up work.

Design notes

  • Protocol parameter names use fully spelled-out tokens (control_tokens, non_ascii_tokens, top_k, temperature, token_ids) rather than mirroring the legacy abbreviations.
  • The module uses from __future__ import annotations plus a TYPE_CHECKING import for torch so it imports cleanly on installs that only have the base dev extra (no torch), preserving the invariant introduced by commit 36aaaa31 from FEAT: GCG public API - GCG + GCGConfig + ExperimentalWarning, shifts module to experimental status #1792.
  • All four protocols are re-exported from pyrit.auxiliary_attacks.gcg via the existing PEP 562 _LAZY_IMPORTS pathway so the public surface stays consistent with how GCG / GCGGenerator / GCGContext / GCGResult are exposed.
  • LossFunction owns its entire loss computation (criterion choice, slicing, and any weighted combination of target/control terms). The current target_weight / control_weight knobs on GCGAlgorithmConfig keep working unchanged.

Drive-by cleanups (in response to review feedback)

  • GCGAlgorithmConfig.temp is widened from int = 1 to float = 1.0 and the same change is propagated to GCGPromptManager.sample_control, GCGMultiPromptAttack.step, and MultiPromptAttack.run — the other two strategy run overloads were already float, so this also resolves a pre-existing inconsistency. Module is experimental, no deprecation cycle.
  • Seven Sphinx reST cross-reference roles (:class:, :meth:, :func:) in config.py are replaced with plain double-backtick code spans, since PyRIT renders docstrings with MyST and the check-no-rest-roles pre-commit hook now blocks them.

Tests

New tests/unit/auxiliary_attacks/gcg/test_extension_protocols.py with 19 parametrized test instances covering:

  • Module __all__ contents.
  • Package re-export identity (each name imported from the package root is the same object as the one in extension_protocols).
  • Each protocol is @runtime_checkable.
  • Each protocol accepts a minimal in-test concrete implementation via isinstance(impl, ProtocolName).
  • A class missing every method fails the isinstance check for each protocol (catches accidental signature drift in future PRs).
  • One return-shape smoke test per protocol with a trivial stub implementation.

Gated with pytest.importorskip("torch") since the stubs construct real torch.Tensor arguments for the shape assertions.

Full GCG unit suite still passes: 133/133 in tests/unit/auxiliary_attacks/gcg/.

romanlutz and others added 4 commits May 29, 2026 16:44
Adds pyrit/auxiliary_attacks/gcg/extension_protocols.py containing four
runtime_checkable Protocol classes that mark the algorithmic seams in the
GCG optimization loop where a future caller may substitute custom behavior:

- SamplingStrategy.sample_candidates  — abstracts GCGPromptManager.sample_control
- LossFunction.compute_loss           — abstracts the weighted target/control CE
- CandidateFilter.filter_candidates   — abstracts MultiPromptAttack.get_filtered_cands
- SuffixInitializer.make_initial_suffix — abstracts the literal control_init plumbing

This PR is pure typing surface: no concrete implementations, no defaults,
no wiring into GCGAlgorithmConfig or GCGMultiPromptAttack. The default
implementations (extracted byte-for-byte from current attack code with a
parity gate) and the optional config fields that select between defaults
and custom impls land in follow-up PRs.

The module uses `from __future__ import annotations` plus a TYPE_CHECKING
import for torch so it imports cleanly on the base `dev` extra (no torch),
preserving the invariant added by commit 36aaaa3 in Sub-PR A.

All four Protocols are re-exported from pyrit.auxiliary_attacks.gcg via the
existing PEP 562 _LAZY_IMPORTS pathway so the public surface is consistent
with how GCG / GCGGenerator / GCGContext / GCGResult are exposed.

Tests cover module `__all__`, package re-export identity, runtime_checkable
positive and negative isinstance, and a return-shape smoke test per protocol
with a trivial in-test stub implementation. `pytest.importorskip("torch")`
gates the whole file because the stubs construct real `torch.Tensor`
arguments for the shape assertions.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Renames in pyrit/auxiliary_attacks/gcg/extension_protocols.py and the
corresponding test stubs:

  control_toks    -> control_tokens
  candidate_toks  -> candidate_tokens
  nonascii_toks   -> non_ascii_tokens   (mirrors allow_non_ascii)
  topk            -> top_k
  temp            -> temperature
  control_len     -> control_length     (docstring shape annotations)

The legacy GCGAlgorithmConfig fields (topk, temp) and the legacy attack
code (GCGPromptManager.sample_control, get_filtered_cands) keep their
existing names. Renaming those is a separate API change that belongs in
the B3 wiring PR (where GCGAlgorithmConfig is extended anyway).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Completes the parameter-name spell-out pass on the four extension
protocols (previous commit handled SamplingStrategy / CandidateFilter).
`ids` is a common ML shorthand but `token_ids` is unambiguous and
consistent with the other tokens-* parameters in the same module.

Descriptive uses of the word `ids` in surrounding docstring prose are
left as-is since they read naturally.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@rlundeen2 rlundeen2 self-assigned this Jun 2, 2026
Comment thread pyrit/auxiliary_attacks/gcg/extension_protocols.py Outdated
Comment thread tests/unit/auxiliary_attacks/gcg/test_extension_protocols.py Outdated
romanlutz and others added 4 commits June 2, 2026 15:18
Per @rlundeen2's review on PR microsoft#1861:

1. Replace `References:` blocks that cited line ranges in
   `gcg_attack.py` / `attack_manager.py` with symbol-only references.
   Line numbers drift the moment the legacy attack code is touched (B3
   wiring will do exactly that); symbol names are stable across the
   refactors that follow.

2. Re-type `SamplingStrategy.sample_candidates(temperature=)` as
   `float` instead of `int`. The protocol is a brand-new surface and
   was previously mirroring the legacy `GCGAlgorithmConfig.temp: int = 1`
   field for no good reason — sampling temperatures are conceptually
   continuous. The legacy field stays as-is; B3 wiring owns deciding
   whether to widen it or coerce at the boundary.

The stub used by the runtime-checkable tests is updated to match, and the
shape-smoke test now passes `temperature=1.0`.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
GCGAlgorithmConfig.temp goes from `int = 1` to `float = 1.0`. The
matching parameter on the three downstream methods that still typed it as
`int` is widened too:

  - GCGPromptManager.sample_control
  - GCGMultiPromptAttack.step
  - MultiPromptAttack.run

The other two strategy `run` overloads (ProgressiveMultiPromptAttack,
IndividualPromptAttack) were already `float = 1.0` — the pre-existing
inconsistency is now resolved.

Sampling temperature is conceptually continuous; typing it as int in a
brand-new public-API field made no sense. The module is experimental, no
deprecation cycle owed.

Also updates the SamplingStrategy protocol docstring to drop the stale
"kept for API compatibility with the legacy code path" framing in favour
of a description of why the parameter exists (the default sampler ignores
it, but custom strategies that want softmax weighting receive it).

While here, replace seven Sphinx reST cross-reference roles
(`:class:...`, `:meth:...`, `:func:...`) in `config.py` with
plain double-backtick code spans. PyRIT renders docstrings with MyST, not
Sphinx — these roles show up as raw literal text in the built docs and
are now blocked by the `check-no-rest-roles` pre-commit hook.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The `check-no-rest-roles` pre-commit hook blocks `:class:Foo`
patterns; PyRIT renders docstrings with MyST, not Sphinx, so those roles
appear as raw literal text in the built docs. Two `:class:...` roles
in the module-level docstring (`GCG`, `GCGGenerator`,
`PromptGeneratorStrategy`) are replaced with plain double-backtick code
spans per the documented convention.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@romanlutz romanlutz enabled auto-merge June 2, 2026 23:04
@romanlutz romanlutz added this pull request to the merge queue Jun 2, 2026
Merged via the queue into microsoft:main with commit f58a218 Jun 2, 2026
52 checks passed
@romanlutz romanlutz deleted the romanlutz/gcg-extension-protocols branch June 2, 2026 23:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants