Context
In DataTypeSerializer.save_formatted_audio (pyrit/models/data_type_serializer.py, around lines 195-208), the Azure-storage branch writes audio to a local temp file, uploads it, then unlinks the temp file:
if self._is_azure_storage_url(str(file_path)):
local_temp_path = Path(DB_DATA_PATH, "temp_audio.wav")
with wave.open(str(local_temp_path), "wb") as wav_file:
...
wav_file.writeframes(data)
async with aiofiles.open(local_temp_path, "rb") as f:
audio_data = await f.read()
if self._memory.results_storage_io is None:
raise RuntimeError("self._memory.results_storage_io is not initialized")
await self._memory.results_storage_io.write_file(file_path, audio_data)
local_temp_path.unlink()
The unlink is not in a finally block, so if any of these raise the temp file leaks:
self._memory.results_storage_io is None (the explicit RuntimeError)
results_storage_io.write_file(...) (network / auth / IO error)
aiofiles.open(...) / f.read() (disk error)
There's also a sibling issue: local_temp_path = Path(DB_DATA_PATH, "temp_audio.wav") is a fixed name, so concurrent calls collide and clobber each other's WAV before upload.
@jsong468 flagged this in #1877 (comment). It was deliberately left out of that PR to keep the os.path -> pathlib migration mechanical.
Suggested fix
Two reasonable options:
-
Minimum: wrap the upload in try/finally so the unlink always runs.
local_temp_path = Path(DB_DATA_PATH, "temp_audio.wav")
try:
with wave.open(str(local_temp_path), "wb") as wav_file:
...
async with aiofiles.open(local_temp_path, "rb") as f:
audio_data = await f.read()
if self._memory.results_storage_io is None:
raise RuntimeError("self._memory.results_storage_io is not initialized")
await self._memory.results_storage_io.write_file(file_path, audio_data)
finally:
local_temp_path.unlink(missing_ok=True)
-
Better: use tempfile.NamedTemporaryFile(suffix=".wav", dir=DB_DATA_PATH, delete=False) so the path is unique per call, and combine with try/finally for cleanup. This also fixes the concurrent-call collision.
Acceptance criteria
save_formatted_audio does not leak temp_audio.wav in DB_DATA_PATH when the Azure upload raises.
- Concurrent calls do not clobber each other's temp file.
- A regression test exercises the failure-cleanup path (mock
results_storage_io.write_file to raise, assert temp file is gone after the call).
Out of scope
- Other temp-file lifecycles in the codebase. Filing as a single targeted fix; broader audit can be a separate issue if useful.
Context
In
DataTypeSerializer.save_formatted_audio(pyrit/models/data_type_serializer.py, around lines 195-208), the Azure-storage branch writes audio to a local temp file, uploads it, then unlinks the temp file:The unlink is not in a
finallyblock, so if any of these raise the temp file leaks:self._memory.results_storage_io is None(the explicitRuntimeError)results_storage_io.write_file(...)(network / auth / IO error)aiofiles.open(...)/f.read()(disk error)There's also a sibling issue:
local_temp_path = Path(DB_DATA_PATH, "temp_audio.wav")is a fixed name, so concurrent calls collide and clobber each other's WAV before upload.@jsong468 flagged this in #1877 (comment). It was deliberately left out of that PR to keep the
os.path->pathlibmigration mechanical.Suggested fix
Two reasonable options:
Minimum: wrap the upload in
try/finallyso the unlink always runs.Better: use
tempfile.NamedTemporaryFile(suffix=".wav", dir=DB_DATA_PATH, delete=False)so the path is unique per call, and combine withtry/finallyfor cleanup. This also fixes the concurrent-call collision.Acceptance criteria
save_formatted_audiodoes not leaktemp_audio.wavinDB_DATA_PATHwhen the Azure upload raises.results_storage_io.write_fileto raise, assert temp file is gone after the call).Out of scope