Skip to content

build: expose image build parallelism knobs to workflows#516

Merged
simonrosenberg merged 4 commits intomainfrom
fix/workflow-build-parallelism-split
Mar 13, 2026
Merged

build: expose image build parallelism knobs to workflows#516
simonrosenberg merged 4 commits intomainfrom
fix/workflow-build-parallelism-split

Conversation

@simonrosenberg
Copy link
Collaborator

@simonrosenberg simonrosenberg commented Mar 13, 2026

Summary

  • add explicit build-batch-size plumbing to the shared image build helper and current build entrypoints
  • expose consistent workflow inputs for SWE-Bench and SWT-Bench image build batch sizing alongside max worker controls
  • ensure explicit workflow values win over BUILD_BATCH_SIZE environment defaults, with regression coverage
  • fix benchmarks/swebenchmultilingual/build_images.py so --force-build is forwarded instead of being silently ignored

Testing

  • uv run pre-commit run --files .github/workflows/build-swebench-images.yml .github/workflows/build-swtbench-images.yml benchmarks/utils/build_utils.py benchmarks/commit0/build_images.py benchmarks/gaia/build_images.py benchmarks/multiswebench/build_images.py benchmarks/swebench/build_images.py benchmarks/swebenchmultilingual/build_images.py benchmarks/swebenchmultimodal/build_images.py benchmarks/swegym/build_images.py benchmarks/swesmith/build_images.py benchmarks/swtbench/build_images.py tests/test_image_utils.py
  • uv run pytest tests/test_image_utils.py

Split out of #507.

Add an explicit build_batch_size parameter to the shared build helper, thread it through the current image build entrypoints, and surface the corresponding workflow inputs for SWE-Bench and SWT-Bench. This lets workflow-dispatched max worker and batch-size values reach the build logic without being overridden by environment defaults.

Co-authored-by: openhands <openhands@all-hands.dev>
Copy link
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taste Rating: 🟡 Acceptable - Solves a real operational problem with clean parameter threading, but has minor consistency issues.

Linus-Style Analysis:

This is good pragmatic work. You're exposing real knobs that matter for actual production builds, and the parameter threading is straightforward. The test proves the precedence logic works. Worth merging.

Minor warts:

  • Workflow parameter naming is inconsistent (see inline comments)
  • One unrelated bug fix snuck in that should be called out

See inline comments for details.

Rename the SWT image-build batch-size workflow input to match the SWE-Bench workflow and move the eval-env-specific knob to eval-env-build-batch-size for clarity.

Co-authored-by: openhands <openhands@all-hands.dev>
@simonrosenberg
Copy link
Collaborator Author

Addressed the review feedback:

  • normalized the SWT workflow input naming so the agent-image knob is now build-batch-size, matching SWE-Bench
  • renamed the eval-env-specific SWT knob to eval-env-build-batch-size to keep the two concepts distinct
  • updated the PR description to explicitly call out the pre-existing swebenchmultilingual --force-build forwarding bug that this branch also fixes

@simonrosenberg
Copy link
Collaborator Author

Merged current origin/main into this branch and resolved the conflicts by keeping both sets of changes:

  • retained the SDK sdist-cache support from #515 in benchmarks/utils/build_utils.py, tests/test_image_utils.py, uv.lock, and the vendored SDK pointer
  • kept the explicit workflow/build-script parallelism plumbing from #516, including the normalized SWT workflow input names

Validation after conflict resolution:

  • uv run pre-commit run --files .github/workflows/build-swebench-images.yml .github/workflows/build-swtbench-images.yml benchmarks/utils/build_utils.py benchmarks/commit0/build_images.py benchmarks/gaia/build_images.py benchmarks/multiswebench/build_images.py benchmarks/swebench/build_images.py benchmarks/swebenchmultilingual/build_images.py benchmarks/swebenchmultimodal/build_images.py benchmarks/swegym/build_images.py benchmarks/swesmith/build_images.py benchmarks/swtbench/build_images.py tests/test_image_utils.py
  • uv run pytest tests/test_image_utils.py

Keep the newly exposed workflow knobs, but reset their default values to match the original main-branch behavior for SWE-Bench and SWT-Bench image builds.

Co-authored-by: openhands <openhands@all-hands.dev>
@simonrosenberg
Copy link
Collaborator Author

Restored the workflow defaults to match the original main behavior while keeping the new knobs exposed:

  • SWE-Bench: max-workers=12, build-batch-size=15
  • SWT-Bench agent image builds: max-workers=4, build-batch-size=15
  • SWT eval-env builds remain on their separate eval-env-build-batch-size default of 10

The PR still changes propagation, but no longer changes the default runtime behavior.

Copy link
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Good taste - Clean parameter threading that solves a real operational problem (CI build parallelism control). Straightforward precedence logic (explicit arg > env var) with test coverage. The force_build fix is a good catch. No fundamental issues found.

Verdict: ✅ Ship it.

Key insight: This is how you add operational controls without over-engineering - simple parameter threading with clear precedence.

Copy link
Collaborator

@juanmichelini juanmichelini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@simonrosenberg simonrosenberg merged commit 3082f3b into main Mar 13, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants