FEAT: Add default implementations of GCG extension protocols#1902
Open
romanlutz wants to merge 1 commit into
Open
FEAT: Add default implementations of GCG extension protocols#1902romanlutz wants to merge 1 commit into
romanlutz wants to merge 1 commit into
Conversation
Adds a new module `pyrit/auxiliary_attacks/gcg/default_implementations.py` containing four concrete classes that byte-identically reproduce the legacy GCG attack code paths: - `StandardGCGSampling` — reproduces `GCGPromptManager.sample_control` - `CrossEntropyLoss` — reproduces the weighted sum of `AttackPrompt.target_loss` and `AttackPrompt.control_loss` applied inside `GCGMultiPromptAttack.step` - `LengthPreservingFilter` — reproduces `MultiPromptAttack.get_filtered_cands` - `LiteralStringInit` — reproduces the literal-string `control_init` parameter threaded through the attack constructors The defaults are exported from the package root via the existing PEP 562 lazy-import machinery in `pyrit/auxiliary_attacks/gcg/__init__.py`. They are not yet wired into `GCGMultiPromptAttack` — the legacy code paths remain the production path until a follow-up change. Acceptance gate is golden-input parity. New file `tests/unit/auxiliary_attacks/gcg/test_default_implementations.py` contains one `torch.equal` parity test per default (plus branch and edge-case coverage), comparing the default's output against the legacy code path called with the same seeded inputs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR adds default concrete implementations for the four GCG extension protocols introduced in #1861.
The new module
pyrit/auxiliary_attacks/gcg/default_implementations.pycontains four classes that byte-identically reproduce the existing GCG attack code paths:StandardGCGSampling— reproducesGCGPromptManager.sample_control.CrossEntropyLoss(target_weight, control_weight)— reproduces the weighted sum ofAttackPrompt.target_lossandAttackPrompt.control_lossapplied insideGCGMultiPromptAttack.step.LengthPreservingFilter(filter)— reproducesMultiPromptAttack.get_filtered_cands.LiteralStringInit(suffix)— reproduces the literal-stringcontrol_initparameter threaded through the attack constructors.The four classes are exported from the package root via the existing PEP 562 lazy-import machinery in
pyrit/auxiliary_attacks/gcg/__init__.py.What this PR does NOT do
This PR is purely additive. The existing methods (
GCGPromptManager.sample_control,AttackPrompt.target_loss/control_loss,MultiPromptAttack.get_filtered_cands) and the literalcontrol_initparameter plumbing all remain the production code path — the defaults are extracted alongside the existing code, not replacing it.The follow-up PR will wire these defaults into
GCGAlgorithmConfigandGCGMultiPromptAttack.step. That is the PR where the existing per-step logic is replaced by dispatch through these protocol objects (with the defaults preserving today's behavior when no custom implementation is configured).Acceptance gate: golden-input parity
The accompanying tests in
tests/unit/auxiliary_attacks/gcg/test_default_implementations.pyare the acceptance gate. Each default has at least one parity test that:torch.manual_seed(...)and the same deterministic inputs.torch.equal(...)for tensors,==for lists/strings).Branch and edge cases are also covered (
allow_non_ascii=True/False,filter=True/False, individual-weight-zero loss paths, out-of-vocab clamping, constructor validation).Verification