This repository was archived by the owner on Aug 15, 2025. It is now read-only.
Work-around aarch64 conda installed numpy 2.x version.#1984
Merged
atalman merged 4 commits intopytorch:mainfrom Sep 11, 2024
Merged
Work-around aarch64 conda installed numpy 2.x version.#1984atalman merged 4 commits intopytorch:mainfrom
atalman merged 4 commits intopytorch:mainfrom
Conversation
atalman
reviewed
Sep 11, 2024
.github/scripts/validate_binaries.sh
Outdated
| fi | ||
| # Please note ffmpeg is required for torchaudio, see https://github.com/pytorch/pytorch/issues/96159 | ||
| conda create -y -n ${ENV_NAME} python=${MATRIX_PYTHON_VERSION} numpy ffmpeg | ||
| conda create -y -n ${ENV_NAME} python=${MATRIX_PYTHON_VERSION} ffmpeg |
Contributor
There was a problem hiding this comment.
Please restrict this change only for linux-aarch64 GPU builds. We want to continue testing the numpy from conda on all other builds
Contributor
Author
There was a problem hiding this comment.
Pushed a fix. Fingers crossed.
atalman
approved these changes
Sep 11, 2024
Contributor
atalman
left a comment
There was a problem hiding this comment.
Lgtm! Thank you very much!
atalman
approved these changes
Sep 11, 2024
Contributor
Author
|
The fix is not effective, mainly because: https://github.com/pytorch/builder/actions/runs/10820014857/job/30020147808#step:1:71 cuda jobs are incorrectly labeled as cpu gpu_arch_type. or perhaps, |
Contributor
Author
|
The CI, aarch64 cuda, by default tests torch 2.4.1, so it did not catch the issue that was only tested with main branch. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Background:
PyTorch Nightly Binary Validation workflow and PyTorch 2.5.0 RC1 Binary Validation workflow both failed for aarch64, which seems to co-relate to CUDA bump from 12.4.0 to 12.4.1 (see this )
Example failed github actions job: https://github.com/pytorch/builder/actions/runs/10794919545/job/29940441536 and v250 RC1 https://github.com/pytorch/builder/actions/runs/10794919545/job/29944860153
Locally reproduced this by following the critical step below:
/opt/conda/bin/conda create -y -n conda-env-10794919545 python=3.10 numpy ffmpeg
then run pip3 install torch --index-url https://download.pytorch.org/whl/test/cu124 could easily reproduce the following error (shown in the above github action failure links)
2024-09-10T16:08:19.4727026Z ++ python3 ./test/smoke_test/smoke_test.py --package torchonly
2024-09-10T16:08:19.4727531Z Traceback (most recent call last):
2024-09-10T16:08:19.4728089Z File "/pytorch/builder/./test/smoke_test/smoke_test.py", line 9, in
2024-09-10T16:08:19.4728654Z import torch._dynamo
2024-09-10T16:08:19.4729527Z File "/opt/conda/envs/conda-env-10794919545/lib/python3.10/site-packages/torch/_dynamo/init.py", line 3, in
2024-09-10T16:08:19.4730459Z from . import convert_frame, eval_frame, resume_execution
2024-09-10T16:08:19.4731531Z File "/opt/conda/envs/conda-env-10794919545/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 53, in
2024-09-10T16:08:19.4732512Z from . import config, exc, trace_rules
2024-09-10T16:08:19.4733556Z File "/opt/conda/envs/conda-env-10794919545/lib/python3.10/site-packages/torch/_dynamo/trace_rules.py", line 45, in
2024-09-10T16:08:19.4734616Z from .utils import getfile, hashable, NP_SUPPORTED_MODULES, unwrap_if_wrapper
2024-09-10T16:08:19.4736024Z ImportError: cannot import name 'NP_SUPPORTED_MODULES' from 'torch._dynamo.utils' (/opt/conda/envs/conda-env-10794919545/lib/python3.10/site-packages/torch/_dynamo/utils.py)
Two possible workarounds identified:
I currently do not quite know why on ARM64, numpy anaconda package does not seem to be compatible with our generated pytorch wheel. As a follow-up, maybe we can check whether the cuda 12.4.0 arm nightly wheel is compatible with this numpy version.
Update: cuda 12.4.0 aarch64 cuda wheel seems to get along well with conda numpy 2.1.1. So it is likely that cuda bump had introduced incompatbility with conda's numpy.
Since we cannot prevent users from using conda's numpy 2.x, ideally we should come up with a fix on the pytorch aarch64 cuda wheel side.
cc @atalman @malfet @ptrblck @tinglvv