Skip to content

Bug: save_formatted_audio leaks temp_audio.wav on Azure upload failure #1894

@romanlutz

Description

@romanlutz

Context

In DataTypeSerializer.save_formatted_audio (pyrit/models/data_type_serializer.py, around lines 195-208), the Azure-storage branch writes audio to a local temp file, uploads it, then unlinks the temp file:

if self._is_azure_storage_url(str(file_path)):
    local_temp_path = Path(DB_DATA_PATH, "temp_audio.wav")
    with wave.open(str(local_temp_path), "wb") as wav_file:
        ...
        wav_file.writeframes(data)

    async with aiofiles.open(local_temp_path, "rb") as f:
        audio_data = await f.read()
        if self._memory.results_storage_io is None:
            raise RuntimeError("self._memory.results_storage_io is not initialized")
        await self._memory.results_storage_io.write_file(file_path, audio_data)
    local_temp_path.unlink()

The unlink is not in a finally block, so if any of these raise the temp file leaks:

  • self._memory.results_storage_io is None (the explicit RuntimeError)
  • results_storage_io.write_file(...) (network / auth / IO error)
  • aiofiles.open(...) / f.read() (disk error)

There's also a sibling issue: local_temp_path = Path(DB_DATA_PATH, "temp_audio.wav") is a fixed name, so concurrent calls collide and clobber each other's WAV before upload.

@jsong468 flagged this in #1877 (comment). It was deliberately left out of that PR to keep the os.path -> pathlib migration mechanical.

Suggested fix

Two reasonable options:

  1. Minimum: wrap the upload in try/finally so the unlink always runs.

    local_temp_path = Path(DB_DATA_PATH, "temp_audio.wav")
    try:
        with wave.open(str(local_temp_path), "wb") as wav_file:
            ...
        async with aiofiles.open(local_temp_path, "rb") as f:
            audio_data = await f.read()
        if self._memory.results_storage_io is None:
            raise RuntimeError("self._memory.results_storage_io is not initialized")
        await self._memory.results_storage_io.write_file(file_path, audio_data)
    finally:
        local_temp_path.unlink(missing_ok=True)
  2. Better: use tempfile.NamedTemporaryFile(suffix=".wav", dir=DB_DATA_PATH, delete=False) so the path is unique per call, and combine with try/finally for cleanup. This also fixes the concurrent-call collision.

Acceptance criteria

  • save_formatted_audio does not leak temp_audio.wav in DB_DATA_PATH when the Azure upload raises.
  • Concurrent calls do not clobber each other's temp file.
  • A regression test exercises the failure-cleanup path (mock results_storage_io.write_file to raise, assert temp file is gone after the call).

Out of scope

  • Other temp-file lifecycles in the codebase. Filing as a single targeted fix; broader audit can be a separate issue if useful.

Metadata

Metadata

Labels

bugSomething isn't working

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions