Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
a26052b
Server VAD config for RealtimeTarget
BaedrianG May 14, 2026
dea833d
Stream PCM chunks via input_audio_buffer.append
BaedrianG May 14, 2026
22c0b54
Turn state and response cancel for RealtimeTarget
BaedrianG May 14, 2026
276c06c
Event dispatcher for RealtimeTarget
BaedrianG May 15, 2026
66bc828
Persist interrupted flag on RealtimeTargetResult and message metadata
BaedrianG May 18, 2026
69b7a1c
Add user audio committed callback to realtime dispatcher
BaedrianG May 18, 2026
d624e6a
Add wire primitive methods to RealtimeTarget for streaming attacks
BaedrianG May 19, 2026
1aaf10b
Add streaming barge-in attack with subscription and turn-future targe…
BaedrianG May 19, 2026
d1edfd5
Add convert-on-commit to streaming barge-in attack via PromptNormalizer
BaedrianG May 19, 2026
23225ad
Persist streaming barge-in turns to CentralMemory
BaedrianG May 19, 2026
836b2a9
Add barge-in demo notebook, fix dispatcher deadlock and turn-await te…
BaedrianG May 20, 2026
32dd43e
Fix review findings: insert-before-delete, CentralMemory, connect_asy…
BaedrianG May 20, 2026
892beb5
Remove redundant response.cancel (server auto-cancels on speech detec…
BaedrianG May 20, 2026
7e122e3
Trim verbose docstrings to match codebase conventions
BaedrianG May 20, 2026
258ef03
Merge remote-tracking branch 'origin/main' into adrian-gavrila/realti…
BaedrianG May 20, 2026
b1abe9c
Enable supports_streaming_barge_in in permissive probe configuration
BaedrianG May 20, 2026
2cdfe14
Replace assert with explicit ValueError in BargeInAttack
BaedrianG May 22, 2026
aa9feee
Extract audio conversion into a target-owned AudioStreamNormalizer
BaedrianG May 22, 2026
32347fd
Decompose _perform_async into named per-turn helpers
BaedrianG May 22, 2026
50aec9e
Rename 'utterance' to 'statement' in barge-in notebook
BaedrianG May 22, 2026
6f53913
Promote realtime streaming types to public and add swap_user_audio pr…
BaedrianG May 22, 2026
5056145
Fix self-review polish: ordering test, filtered-config short-circuit,…
BaedrianG May 22, 2026
fb28586
Route BargeInAttack through PromptNormalizer.send_prompt_async
BaedrianG May 26, 2026
87c6b93
Replace BargeInAttackContext.system_prompt with prepended_conversation
BaedrianG May 26, 2026
69dfe87
Bridge audio_start_ms from speech_started to committed event
BaedrianG May 26, 2026
7950de5
Persist BargeInAttack prepended_conversation via ConversationManager
BaedrianG May 27, 2026
d710772
Regenerate barge_in_attack notebook to pick up source changes
BaedrianG May 27, 2026
3b9bfad
Expose BargeInAttack post-stream wait as configurable constructor arg
BaedrianG May 27, 2026
0bf30aa
Merge branch 'main' into adrian-gavrila/realtime-server-vad
adrian-gavrila May 27, 2026
47c4d6d
Tighten BargeInAttack input validation and target typing
BaedrianG May 27, 2026
a16c9ff
Merge remote-tracking branch 'fork/adrian-gavrila/realtime-server-vad…
BaedrianG May 27, 2026
410c59c
Push BargeInAttack streaming surface onto PromptTarget and refine val…
BaedrianG May 27, 2026
8a46c39
Refactor BargeInAttack streaming surface to composition and split com…
BaedrianG May 28, 2026
db539f7
Regenerate barge_in_attack notebook to pick up source changes
BaedrianG May 28, 2026
8e949e5
Merge branch 'main' into adrian-gavrila/realtime-server-vad
adrian-gavrila May 28, 2026
b677cc5
Replace PromptTarget.streaming attribute with SupportsStreamingBargeI…
BaedrianG May 29, 2026
823f81e
Merge remote-tracking branch 'fork/adrian-gavrila/realtime-server-vad…
BaedrianG May 29, 2026
81a9877
Scaffold realtime streaming session contract
BaedrianG Jun 2, 2026
9117fea
Implement OpenAI realtime streaming session
BaedrianG Jun 2, 2026
c389217
Rewrite BargeInAttack on top of streaming session
BaedrianG Jun 2, 2026
4c5cc6d
Drop dead realtime streaming subscription scaffolding
BaedrianG Jun 2, 2026
f5af803
Drop dead realtime methods _stream_pcm_async and insert_user_text_async
BaedrianG Jun 2, 2026
b56538a
Move streaming-only methods onto _OpenAIRealtimeStreamingSession
BaedrianG Jun 2, 2026
620b998
Collapse StreamingHandle ABC; hoist members onto RealtimeTarget
BaedrianG Jun 2, 2026
3a3d5cc
Make server_vad a streaming-only concept; drop from RealtimeTarget
BaedrianG Jun 2, 2026
bff23a0
Collapse common/streaming package into realtime_audio
BaedrianG Jun 2, 2026
b050ad9
Polish realtime streaming: async suffixes, capability rename, docstri…
BaedrianG Jun 3, 2026
4b10fe9
Merge remote-tracking branch 'origin/main' into adrian-gavrila/realti…
BaedrianG Jun 3, 2026
2cd5431
Fix D420 docstring section order in open_streaming_session
BaedrianG Jun 3, 2026
8a7bbd0
Test deprecated realtime aliases and cleanup error paths
BaedrianG Jun 3, 2026
6ec0cc7
Address PR review nits: trim comment, docstring, doc reorder, async a…
BaedrianG Jun 3, 2026
caafa62
Extract realtime dispatcher to its own module; replace lifecycle asserts
BaedrianG Jun 3, 2026
27e61f5
Use save_formatted_audio_async; drop redundant PromptTarget base
BaedrianG Jun 3, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added assets/photosynthesis_question.wav
Binary file not shown.
475 changes: 475 additions & 0 deletions doc/code/executor/attack/barge_in_attack.ipynb

Large diffs are not rendered by default.

208 changes: 208 additions & 0 deletions doc/code/executor/attack/barge_in_attack.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,208 @@
# ---
# jupyter:
# jupytext:
# cell_metadata_filter: -all
# text_representation:
# extension: .py
# format_name: percent
# format_version: '1.3'
# jupytext_version: 1.18.1
# ---

# %% [markdown]
# # Barge-In Attack (Streaming Audio)
#
# `BargeInAttack` streams user audio to a `RealtimeTarget` and uses server-side voice-activity
# detection (VAD) to detect turn boundaries. When the user speaks while the assistant is still
# responding, server VAD cancels the in-flight response (barge-in). Interrupted turns are
# persisted with `prompt_metadata["interrupted"] = True`.
#
# Audio converters are applied per turn after VAD commits. The raw audio drives interruption
# timing while the model responds to the converted version.
#
# > **Note:** Memory must be initialized via `initialize_pyrit_async`. See the
# > [Memory Configuration Guide](../../memory/0_memory.md).

# %% [markdown]
# ## Setup
#
# `BargeInAttack` requires a `RealtimeTarget` with `server_vad=True` (or a `ServerVadConfig`
# for custom tuning).

# %%
import asyncio
import wave
from pathlib import Path

from pyrit.executor.attack import (
AttackConverterConfig,
BargeInAttack,
BargeInAttackContext,
ConsoleAttackResultPrinter,
)
from pyrit.executor.attack.core import AttackParameters
from pyrit.memory import CentralMemory
from pyrit.prompt_converter import AudioFrequencyConverter
from pyrit.prompt_normalizer import PromptConverterConfiguration
from pyrit.prompt_target import RealtimeTarget
from pyrit.setup import IN_MEMORY, initialize_pyrit_async

await initialize_pyrit_async(memory_db_type=IN_MEMORY) # type: ignore

# %% [markdown]
# ## Shared setup
#
# Both sections use a pre-recorded 24 kHz mono PCM16 question about photosynthesis. The
# format matches what the OpenAI Realtime API expects. Any async generator yielding 24 kHz
# PCM16 bytes works as a chunk source (live mic, TTS, etc.).

# %%
CHUNK_MS = 100
CHUNK_SIZE = CHUNK_MS * 48 # PCM16 @ 24 kHz mono = 48 bytes per millisecond.
SILENCE_CHUNK = b"\x00" * CHUNK_SIZE
audio_path = Path("../../../../assets/photosynthesis_question.wav").resolve()


def _load_pcm(path: Path) -> bytes:
"""Read a WAV at 24 kHz / mono / PCM16 into raw PCM bytes."""
with wave.open(str(path), "rb") as wav:
assert wav.getframerate() == 24000 and wav.getnchannels() == 1 and wav.getsampwidth() == 2
Comment thread
hannahwestra25 marked this conversation as resolved.
return wav.readframes(wav.getnframes())


async def _yield_chunks(pcm: bytes, real_time: bool = True):
"""Yield PCM in 100ms slices, optionally pacing at real-time."""
for offset in range(0, len(pcm), CHUNK_SIZE):
yield pcm[offset : offset + CHUNK_SIZE]
if real_time:
await asyncio.sleep(CHUNK_MS / 1000)


question_pcm_24k = _load_pcm(audio_path)
print(f"Loaded question: {len(question_pcm_24k) / 48 / 1000:.2f}s @ 24 kHz")

converters = PromptConverterConfiguration.from_converters(converters=[AudioFrequencyConverter(shift_value=200)])


# %% [markdown]
# ## Section 1: Single-turn streaming with a converter
#
# Streams one user statement, applies a frequency-shift converter after VAD commits the turn,
# and gets the model's response. Exercises the full pipeline (chunk push, convert-on-commit,
# item swap, response trigger, memory persistence) without barge-in.


# %%
async def single_turn_source():
async for chunk in _yield_chunks(question_pcm_24k):
yield chunk
# Trailing silence helps server VAD recognize end-of-turn.
for _ in range(25): # 2.5s trailing silence, above the 1.5s VAD threshold
yield SILENCE_CHUNK
await asyncio.sleep(CHUNK_MS / 1000)


target = RealtimeTarget()
attack = BargeInAttack(
objective_target=target,
attack_converter_config=AttackConverterConfig(request_converters=converters),
)

context = BargeInAttackContext(
params=AttackParameters(objective="Observe a single converted user turn end-to-end"),
audio_chunks=single_turn_source(),
)

result = await attack.execute_with_context_async(context=context) # type: ignore
print(f"executed_turns: {result.executed_turns}")
await ConsoleAttackResultPrinter(width=200).write_async(result=result) # type: ignore
await target.cleanup_target_async() # type: ignore

# %% [markdown]
# ## Section 2: Barge-in (interrupting the assistant mid-response)
#
# Plays the question twice with timing arranged so turn 2's speech arrives during turn 1's
# response. Server VAD detects the new speech, cancels turn 1's response, and resolves it
# with `interrupted=True`.

# %% [markdown]
# ### Reading the barge-in output
#
# After running the next cell, if barge-in fired successfully:
# - `executed_turns: 2` (two VAD-detected user turns)
# - First assistant turn shows `[INTERRUPTED]` with a truncated transcript
# - Second assistant turn completes normally
#
# If you don't see `[INTERRUPTED]`, decrease `TURN1_RESPONSE_WAIT_S` so turn 2's audio
# arrives earlier in turn 1's response window.

# %%
TURN1_RESPONSE_WAIT_S = 0.2 # how long to let the model start speaking before barging in


async def barge_in_source():
# Turn 1: speak the question, then 1.5s of silence so VAD commits.
async for chunk in _yield_chunks(question_pcm_24k):
yield chunk
for _ in range(25): # 2.5s trailing silence
yield SILENCE_CHUNK
await asyncio.sleep(CHUNK_MS / 1000)

# Let the model get partway into its response before we interrupt.
for _ in range(int(TURN1_RESPONSE_WAIT_S * 10)):
yield SILENCE_CHUNK
await asyncio.sleep(CHUNK_MS / 1000)

# Turn 2: speak the question again. VAD's speech_started fires while turn 1's response
# is still streaming → server cancels + truncates turn 1.
async for chunk in _yield_chunks(question_pcm_24k):
yield chunk
for _ in range(25): # 2.5s trailing silence
yield SILENCE_CHUNK
await asyncio.sleep(CHUNK_MS / 1000)


target2 = RealtimeTarget()
attack2 = BargeInAttack(
objective_target=target2,
attack_converter_config=AttackConverterConfig(request_converters=converters),
)

barge_in_context = BargeInAttackContext(
params=AttackParameters(objective="Demonstrate barge-in by interrupting a benign answer"),
audio_chunks=barge_in_source(),
)

barge_in_result = await attack2.execute_with_context_async(context=barge_in_context) # type: ignore
print(f"executed_turns: {barge_in_result.executed_turns}")

# Inspect memory to verify the barge-in landed in metadata.
memory = CentralMemory.get_memory_instance()
turns = memory.get_conversation(conversation_id=barge_in_result.conversation_id)
print(f"\nPersisted pieces ({len(turns)} messages):")
for message in turns:
for piece in message.message_pieces:
interrupted = piece.prompt_metadata.get("interrupted")
marker = " [INTERRUPTED]" if interrupted else ""
val = piece.converted_value
if piece.converted_value_data_type == "audio_path":
val = Path(val).name
value_preview = (val[:80] + "...") if len(val) > 80 else val
print(f" {piece._role} {piece.converted_value_data_type}{marker}: {value_preview}")

await ConsoleAttackResultPrinter(width=200).write_async(result=barge_in_result) # type: ignore
await target2.cleanup_target_async() # type: ignore

# %% [markdown]
# ## Alternate chunk sources
#
# The chunk source is the main strategy hook:
#
# - **Pre-recorded WAV** (this notebook): most common starting point
# - **TTS converter**: generate audio from text prompts dynamically
# - **Live microphone**: use `sounddevice` or similar; yield what the mic produces
#
# For feedback-driven attacks — for example, scoring each assistant turn and choosing
# to barge in with follow-up audio only when the response shows incomplete refusal —
# subclass `BargeInAttack` and override `_perform_async` to interleave turn observation
# with chunk generation.
1 change: 1 addition & 0 deletions doc/myst.yml
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,7 @@ project:
- file: code/executor/attack/role_play_attack.ipynb
- file: code/executor/attack/skeleton_key_attack.ipynb
- file: code/executor/attack/tap_attack.ipynb
- file: code/executor/attack/barge_in_attack.ipynb
- file: code/executor/attack/violent_durian_attack.ipynb
- file: code/executor/workflow/0_workflow.md
children:
Expand Down
3 changes: 3 additions & 0 deletions pyrit/executor/attack/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,7 @@
SingleTurnAttackStrategy,
SkeletonKeyAttack,
)
from pyrit.executor.attack.streaming import BargeInAttack, BargeInAttackContext

# Backward-compatibility aliases — import from pyrit.output.attack_result directly.
# TODO: Remove these re-exports in two releases (target removal: 0.16.0).
Expand All @@ -73,6 +74,8 @@
"AttackResultPrinter",
"AttackScoringConfig",
"AttackStrategy",
"BargeInAttack",
"BargeInAttackContext",
"ChunkedRequestAttack",
"ChunkedRequestAttackContext",
"ConsoleAttackResultPrinter",
Expand Down
11 changes: 11 additions & 0 deletions pyrit/executor/attack/streaming/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.

"""Streaming attack strategies (barge-in over realtime audio targets)."""

from pyrit.executor.attack.streaming.barge_in import BargeInAttack, BargeInAttackContext

__all__ = [
"BargeInAttack",
"BargeInAttackContext",
]
Loading
Loading