Add PAI-Bench-C reproduction guide for Cosmos3 by trungtpham · Pull Request #224 · NVIDIA/cosmos

trungtpham · 2026-06-22T04:32:40Z

Adds an end-to-end recipe for reproducing PAI-Bench Conditional Generation (PAI-Bench-C) results with Cosmos3 using the native Cosmos Framework PyTorch entrypoint.

New files under evaluation/cosmos3/generator/paibench_c/:

README.md: reference scores table, sampling settings, dataset layout, step-by-step generation and evaluation commands.
assets/prompts.json: 600 task entries, each with the fully-upsampled opus JSON caption used in the internal evaluation run, control-signal paths, and a shared negative prompt. All 600 tasks use the exact prompts from the published benchmark.
run_with_cosmos_framework.ipynb: self-contained notebook covering demo mode (1–N tasks, single modality) and optional full sweeps across all four modalities (edge, blur, depth, seg). Includes a demo evaluation step that runs compute_metrics.py on generated outputs and prints the metrics JSON.
.gitignore: excludes runtime artifacts (outputs/, .cache/, dataset clone, executed notebooks).

Also updates the top-level README.md Evaluation section to include the PAI-Bench-C entry alongside PhysicsIQ, PAI-Bench-G, and RBench.

qianlim · 2026-06-22T20:18:16Z

+        "_eval_venv_torchrun = PAIBENCH_EVAL_ROOT / \".venv\" / \"bin\" / \"torchrun\"\n",
+        "if not _eval_venv_torchrun.exists():\n",
+        "    print(f\"Setting up physical-ai-bench venv at {PAIBENCH_EVAL_ROOT} ...\")\n",
+        "    !cd {PAIBENCH_EVAL_ROOT} && uv sync\n",


This uv sync inherits UV_PROJECT_ENVIRONMENT from the earlier Cosmos setup cell, where it was set to COSMOS3_UV_ENV. As a result, physical-ai-bench deps are synced into the Cosmos venv instead of PAIBENCH_EVAL_ROOT/.venv, but the next cell runs .venv/bin/torchrun relative to PAIBENCH_EVAL_ROOT. Maybe that will cause issue?

Fixed — UV_PROJECT_ENVIRONMENT is now explicitly stripped from the environment passed to all uv sync / uv pip install calls for physical-ai-bench (line 4803), so the deps land in PAIBENCH_EVAL_ROOT/.venv as expected. Confirmed working end-to-end.

- run_paibench_c.sh: self-contained end-to-end script that downloads the model checkpoint and PAI-Bench-C dataset, runs generation across all four control modalities (edge, blur, depth, seg) with the canonical sampling parameters, and evaluates using the public physical-ai-bench library (trungtpham/pai-bench-c-cosmos3). - run_with_cosmos_framework.ipynb: equivalent interactive notebook. - README.md: setup and usage instructions, reference scores, and notes on the evaluation methodology (GT depth/seg recomputed on the fly to match the internal imaginaire4 evaluation pipeline). - prompts.json: canonical opus JSON prompts for all 600 tasks.

trungtpham force-pushed the features/paibenchc-reproduce branch 14 times, most recently from 80a3c53 to 4981e86 Compare June 22, 2026 18:20

qmiao-hub requested a review from yaoxu-crypto June 22, 2026 18:44

qianlim reviewed Jun 22, 2026

View reviewed changes

trungtpham force-pushed the features/paibenchc-reproduce branch from 4981e86 to a76b22f Compare June 23, 2026 00:06

trungtpham force-pushed the features/paibenchc-reproduce branch from be06e2d to b74fa49 Compare June 23, 2026 14:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add PAI-Bench-C reproduction guide for Cosmos3#224

Add PAI-Bench-C reproduction guide for Cosmos3#224
trungtpham wants to merge 1 commit into
NVIDIA:mainfrom
trungtpham:features/paibenchc-reproduce

trungtpham commented Jun 22, 2026 •

edited

Loading

Uh oh!

Uh oh!

qianlim Jun 22, 2026

Uh oh!

trungtpham Jun 23, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

trungtpham commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

qianlim Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

trungtpham Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

trungtpham commented Jun 22, 2026 •

edited

Loading