Add PAI-Bench-C reproduction guide for Cosmos3#224
Open
trungtpham wants to merge 1 commit into
Open
Conversation
80a3c53 to
4981e86
Compare
qianlim
reviewed
Jun 22, 2026
| "_eval_venv_torchrun = PAIBENCH_EVAL_ROOT / \".venv\" / \"bin\" / \"torchrun\"\n", | ||
| "if not _eval_venv_torchrun.exists():\n", | ||
| " print(f\"Setting up physical-ai-bench venv at {PAIBENCH_EVAL_ROOT} ...\")\n", | ||
| " !cd {PAIBENCH_EVAL_ROOT} && uv sync\n", |
Collaborator
There was a problem hiding this comment.
This uv sync inherits UV_PROJECT_ENVIRONMENT from the earlier Cosmos setup cell, where it was set to COSMOS3_UV_ENV. As a result, physical-ai-bench deps are synced into the Cosmos venv instead of PAIBENCH_EVAL_ROOT/.venv, but the next cell runs .venv/bin/torchrun relative to PAIBENCH_EVAL_ROOT. Maybe that will cause issue?
Contributor
Author
There was a problem hiding this comment.
Fixed — UV_PROJECT_ENVIRONMENT is now explicitly stripped from the environment passed to all uv sync / uv pip install calls for physical-ai-bench (line 4803), so the deps land in PAIBENCH_EVAL_ROOT/.venv as expected. Confirmed working end-to-end.
4981e86 to
a76b22f
Compare
- run_paibench_c.sh: self-contained end-to-end script that downloads the model checkpoint and PAI-Bench-C dataset, runs generation across all four control modalities (edge, blur, depth, seg) with the canonical sampling parameters, and evaluates using the public physical-ai-bench library (trungtpham/pai-bench-c-cosmos3). - run_with_cosmos_framework.ipynb: equivalent interactive notebook. - README.md: setup and usage instructions, reference scores, and notes on the evaluation methodology (GT depth/seg recomputed on the fly to match the internal imaginaire4 evaluation pipeline). - prompts.json: canonical opus JSON prompts for all 600 tasks.
be06e2d to
b74fa49
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds an end-to-end recipe for reproducing PAI-Bench Conditional Generation (PAI-Bench-C) results with Cosmos3 using the native Cosmos Framework PyTorch entrypoint.
New files under evaluation/cosmos3/generator/paibench_c/:
Also updates the top-level README.md Evaluation section to include the PAI-Bench-C entry alongside PhysicsIQ, PAI-Bench-G, and RBench.