Skip to content

Add PAI-Bench-C reproduction guide for Cosmos3#224

Open
trungtpham wants to merge 1 commit into
NVIDIA:mainfrom
trungtpham:features/paibenchc-reproduce
Open

Add PAI-Bench-C reproduction guide for Cosmos3#224
trungtpham wants to merge 1 commit into
NVIDIA:mainfrom
trungtpham:features/paibenchc-reproduce

Conversation

@trungtpham

@trungtpham trungtpham commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Adds an end-to-end recipe for reproducing PAI-Bench Conditional Generation (PAI-Bench-C) results with Cosmos3 using the native Cosmos Framework PyTorch entrypoint.

New files under evaluation/cosmos3/generator/paibench_c/:

  • README.md: reference scores table, sampling settings, dataset layout, step-by-step generation and evaluation commands.
  • assets/prompts.json: 600 task entries, each with the fully-upsampled opus JSON caption used in the internal evaluation run, control-signal paths, and a shared negative prompt. All 600 tasks use the exact prompts from the published benchmark.
  • run_with_cosmos_framework.ipynb: self-contained notebook covering demo mode (1–N tasks, single modality) and optional full sweeps across all four modalities (edge, blur, depth, seg). Includes a demo evaluation step that runs compute_metrics.py on generated outputs and prints the metrics JSON.
  • .gitignore: excludes runtime artifacts (outputs/, .cache/, dataset clone, executed notebooks).

Also updates the top-level README.md Evaluation section to include the PAI-Bench-C entry alongside PhysicsIQ, PAI-Bench-G, and RBench.

@trungtpham trungtpham force-pushed the features/paibenchc-reproduce branch 14 times, most recently from 80a3c53 to 4981e86 Compare June 22, 2026 18:20
@qmiao-hub qmiao-hub requested a review from yaoxu-crypto June 22, 2026 18:44
Comment thread evaluation/cosmos3/generator/paibench_c/README.md Outdated
"_eval_venv_torchrun = PAIBENCH_EVAL_ROOT / \".venv\" / \"bin\" / \"torchrun\"\n",
"if not _eval_venv_torchrun.exists():\n",
" print(f\"Setting up physical-ai-bench venv at {PAIBENCH_EVAL_ROOT} ...\")\n",
" !cd {PAIBENCH_EVAL_ROOT} && uv sync\n",

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This uv sync inherits UV_PROJECT_ENVIRONMENT from the earlier Cosmos setup cell, where it was set to COSMOS3_UV_ENV. As a result, physical-ai-bench deps are synced into the Cosmos venv instead of PAIBENCH_EVAL_ROOT/.venv, but the next cell runs .venv/bin/torchrun relative to PAIBENCH_EVAL_ROOT. Maybe that will cause issue?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — UV_PROJECT_ENVIRONMENT is now explicitly stripped from the environment passed to all uv sync / uv pip install calls for physical-ai-bench (line 4803), so the deps land in PAIBENCH_EVAL_ROOT/.venv as expected. Confirmed working end-to-end.

Comment thread evaluation/cosmos3/generator/paibench_c/README.md Outdated
Comment thread evaluation/cosmos3/generator/paibench_c/README.md Outdated
@trungtpham trungtpham force-pushed the features/paibenchc-reproduce branch from 4981e86 to a76b22f Compare June 23, 2026 00:06
- run_paibench_c.sh: self-contained end-to-end script that downloads
  the model checkpoint and PAI-Bench-C dataset, runs generation across
  all four control modalities (edge, blur, depth, seg) with the
  canonical sampling parameters, and evaluates using the public
  physical-ai-bench library (trungtpham/pai-bench-c-cosmos3).

- run_with_cosmos_framework.ipynb: equivalent interactive notebook.

- README.md: setup and usage instructions, reference scores, and notes
  on the evaluation methodology (GT depth/seg recomputed on the fly to
  match the internal imaginaire4 evaluation pipeline).

- prompts.json: canonical opus JSON prompts for all 600 tasks.
@trungtpham trungtpham force-pushed the features/paibenchc-reproduce branch from be06e2d to b74fa49 Compare June 23, 2026 14:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants