build: expose image build parallelism knobs to workflows by simonrosenberg · Pull Request #516 · OpenHands/benchmarks

simonrosenberg · 2026-03-13T20:02:17Z

Summary

add explicit build-batch-size plumbing to the shared image build helper and current build entrypoints
expose consistent workflow inputs for SWE-Bench and SWT-Bench image build batch sizing alongside max worker controls
ensure explicit workflow values win over BUILD_BATCH_SIZE environment defaults, with regression coverage
fix benchmarks/swebenchmultilingual/build_images.py so --force-build is forwarded instead of being silently ignored

Testing

uv run pre-commit run --files .github/workflows/build-swebench-images.yml .github/workflows/build-swtbench-images.yml benchmarks/utils/build_utils.py benchmarks/commit0/build_images.py benchmarks/gaia/build_images.py benchmarks/multiswebench/build_images.py benchmarks/swebench/build_images.py benchmarks/swebenchmultilingual/build_images.py benchmarks/swebenchmultimodal/build_images.py benchmarks/swegym/build_images.py benchmarks/swesmith/build_images.py benchmarks/swtbench/build_images.py tests/test_image_utils.py
uv run pytest tests/test_image_utils.py

Split out of #507.

Add an explicit build_batch_size parameter to the shared build helper, thread it through the current image build entrypoints, and surface the corresponding workflow inputs for SWE-Bench and SWT-Bench. This lets workflow-dispatched max worker and batch-size values reach the build logic without being overridden by environment defaults. Co-authored-by: openhands <openhands@all-hands.dev>

all-hands-bot

Taste Rating: 🟡 Acceptable - Solves a real operational problem with clean parameter threading, but has minor consistency issues.

Linus-Style Analysis:

This is good pragmatic work. You're exposing real knobs that matter for actual production builds, and the parameter threading is straightforward. The test proves the precedence logic works. Worth merging.

Minor warts:

Workflow parameter naming is inconsistent (see inline comments)
One unrelated bug fix snuck in that should be called out

See inline comments for details.

.github/workflows/build-swtbench-images.yml

benchmarks/swebenchmultilingual/build_images.py

tests/test_image_utils.py

Rename the SWT image-build batch-size workflow input to match the SWE-Bench workflow and move the eval-env-specific knob to eval-env-build-batch-size for clarity. Co-authored-by: openhands <openhands@all-hands.dev>

simonrosenberg · 2026-03-13T20:33:53Z

Addressed the review feedback:

normalized the SWT workflow input naming so the agent-image knob is now build-batch-size, matching SWE-Bench
renamed the eval-env-specific SWT knob to eval-env-build-batch-size to keep the two concepts distinct
updated the PR description to explicitly call out the pre-existing swebenchmultilingual --force-build forwarding bug that this branch also fixes

simonrosenberg · 2026-03-13T20:49:23Z

Merged current origin/main into this branch and resolved the conflicts by keeping both sets of changes:

retained the SDK sdist-cache support from #515 in benchmarks/utils/build_utils.py, tests/test_image_utils.py, uv.lock, and the vendored SDK pointer
kept the explicit workflow/build-script parallelism plumbing from #516, including the normalized SWT workflow input names

Validation after conflict resolution:

uv run pre-commit run --files .github/workflows/build-swebench-images.yml .github/workflows/build-swtbench-images.yml benchmarks/utils/build_utils.py benchmarks/commit0/build_images.py benchmarks/gaia/build_images.py benchmarks/multiswebench/build_images.py benchmarks/swebench/build_images.py benchmarks/swebenchmultilingual/build_images.py benchmarks/swebenchmultimodal/build_images.py benchmarks/swegym/build_images.py benchmarks/swesmith/build_images.py benchmarks/swtbench/build_images.py tests/test_image_utils.py
uv run pytest tests/test_image_utils.py

Keep the newly exposed workflow knobs, but reset their default values to match the original main-branch behavior for SWE-Bench and SWT-Bench image builds. Co-authored-by: openhands <openhands@all-hands.dev>

simonrosenberg · 2026-03-13T20:50:54Z

Restored the workflow defaults to match the original main behavior while keeping the new knobs exposed:

SWE-Bench: max-workers=12, build-batch-size=15
SWT-Bench agent image builds: max-workers=4, build-batch-size=15
SWT eval-env builds remain on their separate eval-env-build-batch-size default of 10

The PR still changes propagation, but no longer changes the default runtime behavior.

all-hands-bot

🟢 Good taste - Clean parameter threading that solves a real operational problem (CI build parallelism control). Straightforward precedence logic (explicit arg > env var) with test coverage. The force_build fix is a good catch. No fundamental issues found.

Verdict: ✅ Ship it.

Key insight: This is how you add operational controls without over-engineering - simple parameter threading with clear precedence.

juanmichelini

LGTM!

all-hands-bot reviewed Mar 13, 2026

View reviewed changes

.github/workflows/build-swtbench-images.yml Outdated Show resolved Hide resolved

benchmarks/swebenchmultilingual/build_images.py Show resolved Hide resolved

tests/test_image_utils.py Show resolved Hide resolved

build: normalize SWT workflow batch-size input

89e3a3e

Rename the SWT image-build batch-size workflow input to match the SWE-Bench workflow and move the eval-env-specific knob to eval-env-build-batch-size for clarity. Co-authored-by: openhands <openhands@all-hands.dev>

Merge origin/main into fix/workflow-build-parallelism-split

e67b531

build: restore workflow default parallelism values

03bd11d

Keep the newly exposed workflow knobs, but reset their default values to match the original main-branch behavior for SWE-Bench and SWT-Bench image builds. Co-authored-by: openhands <openhands@all-hands.dev>

simonrosenberg requested a review from all-hands-bot March 13, 2026 20:51

all-hands-bot reviewed Mar 13, 2026

View reviewed changes

juanmichelini approved these changes Mar 13, 2026

View reviewed changes

simonrosenberg merged commit 3082f3b into main Mar 13, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

build: expose image build parallelism knobs to workflows#516

build: expose image build parallelism knobs to workflows#516
simonrosenberg merged 4 commits intomainfrom
fix/workflow-build-parallelism-split

simonrosenberg commented Mar 13, 2026 •

edited

Loading

Uh oh!

all-hands-bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

simonrosenberg commented Mar 13, 2026

Uh oh!

simonrosenberg commented Mar 13, 2026

Uh oh!

simonrosenberg commented Mar 13, 2026

Uh oh!

all-hands-bot left a comment

Uh oh!

juanmichelini left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

simonrosenberg commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

simonrosenberg commented Mar 13, 2026

Uh oh!

simonrosenberg commented Mar 13, 2026

Uh oh!

simonrosenberg commented Mar 13, 2026

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Uh oh!

juanmichelini left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

simonrosenberg commented Mar 13, 2026 •

edited

Loading