Skip to content

Avoid shell invocation in subset downloader#257

Open
resolvicomai wants to merge 2 commits into
PrimeIntellect-ai:mainfrom
resolvicomai:codex/safe-subset-download
Open

Avoid shell invocation in subset downloader#257
resolvicomai wants to merge 2 commits into
PrimeIntellect-ai:mainfrom
resolvicomai:codex/safe-subset-download

Conversation

@resolvicomai
Copy link
Copy Markdown

@resolvicomai resolvicomai commented May 21, 2026

Summary

  • build dataset wget invocations as argument lists instead of shell strings
  • build the shared zeroband.utils.wget() invocation as an argument list too
  • omit the Authorization header when no Hugging Face token is available
  • add focused regression tests for both subprocess invocation paths

Why

The download paths currently joined command strings and used shell=True. Dataset file paths, output paths, and utility inputs can be data-derived, so keeping shell interpretation out of these paths avoids accidental command parsing while preserving the same wget download flow.

Verification

  • uv run --no-project --with pytest pytest -q tests/test_subset_data_download.py tests/test_wget_util.py
  • python3 -m py_compile src/zeroband/utils/wget.py tests/test_subset_data_download.py tests/test_wget_util.py
  • git diff --check

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant