Skip to content

PyTorch 2.8.0 cu128 template installs 2.4.1 — missing cu128 wheels #114

@dentity007

Description

@dentity007

Bug

The CUDA 12.8.1 + PyTorch 2.8.0 template (runpod/pytorch:1.0.2-cu1281-torch280-ubuntu2404) installs PyTorch 2.4.1 instead of 2.8.0.

Root Cause

In official-templates/pytorch/docker-bake.hcl line 27:

{ cuda_version = "12.8.1", torch = "2.8.0", whl_src = "128" },

The Dockerfile runs:

pip install torch==2.8.0 --index-url https://download.pytorch.org/whl/cu128

PyTorch 2.8.0 stable wheels are not published for cu128. Pip silently falls back to PyTorch 2.4.1+cu124.

Evidence

Deployed a 2x B200 pod using the "Runpod Pytorch 2.8.0" template:

$ nvidia-smi --query-gpu=name,driver_version --format=csv,noheader
NVIDIA B200, 580.126.09
NVIDIA B200, 580.126.09

$ python3 -c "import torch; print(torch.__version__, torch.version.cuda)"
2.4.1+cu124 12.4

$ python3 -c "import torch; print(torch.cuda.get_device_properties(0))"
UserWarning: NVIDIA B200 with CUDA capability sm_100 is not compatible
with the current PyTorch installation. The current PyTorch install
supports CUDA capabilities sm_50 sm_60 sm_70 sm_75 sm_80 sm_86 sm_90.

Pod hostname: 7ab25c9e0ebb
Pod IP: 38.80.152.146:31039

Impact

  • B200 GPUs (compute capability sm_100) are completely unusable with PyTorch 2.4.1
  • Users pay for B200 GPU time they cannot use
  • RunPod's own deployment page warns "B200s only support Pytorch 2.8 and above" but the template labeled 2.8.0 doesn't deliver it

Comparison with Other Templates

The older templates embed the actual torch version in the image tag and work correctly:

Pytorch 2.1: runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04  ✅
Pytorch 2.2: runpod/pytorch:2.2.0-py3.10-cuda12.1.1-devel-ubuntu22.04  ✅
Pytorch 2.4: runpod/pytorch:2.4.0-py3.11-cuda12.4.1-devel-ubuntu22.04  ✅
Pytorch 2.8: runpod/pytorch:1.0.2-cu1281-torch280-ubuntu2404            ❌ installs 2.4.1

Suggested Fix

Change the wheel source for torch 2.8.0 in docker-bake.hcl:

# Current (broken):
{ cuda_version = "12.8.1", torch = "2.8.0", whl_src = "128" },

# Fix option 1 — use cu124 wheels:
{ cuda_version = "12.8.1", torch = "2.8.0", whl_src = "124" },

# Fix option 2 — use nightly wheels (as the original commit abfb7ab did):
# TORCH = "torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128"

Also affects the CUDA 12.9.0 and 13.0.0 rows if the same wheel source issue applies.

Related

  • RunPod support ticket #35526 (same user, same template)
  • The same docker-bake.hcl rows for torch 2.6.0 and 2.7.1 may also be affected — worth validating all cu128 wheel availability

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions