Skip to content

FEAT: Add default implementations of GCG extension protocols#1902

Open
romanlutz wants to merge 1 commit into
microsoft:mainfrom
romanlutz:romanlutz/romanlutz-gcg-default-implementations
Open

FEAT: Add default implementations of GCG extension protocols#1902
romanlutz wants to merge 1 commit into
microsoft:mainfrom
romanlutz:romanlutz/romanlutz-gcg-default-implementations

Conversation

@romanlutz
Copy link
Copy Markdown
Contributor

This PR adds default concrete implementations for the four GCG extension protocols introduced in #1861.

The new module pyrit/auxiliary_attacks/gcg/default_implementations.py contains four classes that byte-identically reproduce the existing GCG attack code paths:

  • StandardGCGSampling — reproduces GCGPromptManager.sample_control.
  • CrossEntropyLoss(target_weight, control_weight) — reproduces the weighted sum of AttackPrompt.target_loss and AttackPrompt.control_loss applied inside GCGMultiPromptAttack.step.
  • LengthPreservingFilter(filter) — reproduces MultiPromptAttack.get_filtered_cands.
  • LiteralStringInit(suffix) — reproduces the literal-string control_init parameter threaded through the attack constructors.

The four classes are exported from the package root via the existing PEP 562 lazy-import machinery in pyrit/auxiliary_attacks/gcg/__init__.py.

What this PR does NOT do

This PR is purely additive. The existing methods (GCGPromptManager.sample_control, AttackPrompt.target_loss / control_loss, MultiPromptAttack.get_filtered_cands) and the literal control_init parameter plumbing all remain the production code path — the defaults are extracted alongside the existing code, not replacing it.

The follow-up PR will wire these defaults into GCGAlgorithmConfig and GCGMultiPromptAttack.step. That is the PR where the existing per-step logic is replaced by dispatch through these protocol objects (with the defaults preserving today's behavior when no custom implementation is configured).

Acceptance gate: golden-input parity

The accompanying tests in tests/unit/auxiliary_attacks/gcg/test_default_implementations.py are the acceptance gate. Each default has at least one parity test that:

  1. Constructs the default and the existing code path with a fixed torch.manual_seed(...) and the same deterministic inputs.
  2. Runs both.
  3. Asserts byte-identical equality (torch.equal(...) for tensors, == for lists/strings).

Branch and edge cases are also covered (allow_non_ascii=True/False, filter=True/False, individual-weight-zero loss paths, out-of-vocab clamping, constructor validation).

Verification

  • 23/23 new parity tests pass.
  • 152/152 GCG unit suite passes.
  • Pre-commit clean.

Adds a new module `pyrit/auxiliary_attacks/gcg/default_implementations.py`
containing four concrete classes that byte-identically reproduce the legacy
GCG attack code paths:

- `StandardGCGSampling` — reproduces `GCGPromptManager.sample_control`
- `CrossEntropyLoss` — reproduces the weighted sum of
  `AttackPrompt.target_loss` and `AttackPrompt.control_loss` applied
  inside `GCGMultiPromptAttack.step`
- `LengthPreservingFilter` — reproduces
  `MultiPromptAttack.get_filtered_cands`
- `LiteralStringInit` — reproduces the literal-string `control_init`
  parameter threaded through the attack constructors

The defaults are exported from the package root via the existing PEP 562
lazy-import machinery in `pyrit/auxiliary_attacks/gcg/__init__.py`. They
are not yet wired into `GCGMultiPromptAttack` — the legacy code paths
remain the production path until a follow-up change.

Acceptance gate is golden-input parity. New file
`tests/unit/auxiliary_attacks/gcg/test_default_implementations.py` contains
one `torch.equal` parity test per default (plus branch and edge-case
coverage), comparing the default's output against the legacy code path
called with the same seeded inputs.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant