fix(backends/onnx): fall back to next execution provider on init failure#1708
Open
tsushanth wants to merge 1 commit into
Open
fix(backends/onnx): fall back to next execution provider on init failure#1708tsushanth wants to merge 1 commit into
tsushanth wants to merge 1 commit into
Conversation
When `device: 'auto'` is requested in Node, `deviceToExecutionProviders` returns the full supported-device list (e.g. `['cuda', 'webgpu', 'cpu']` on Linux x64). ONNX Runtime treats that list as load-or-fail per provider — so on a Linux x64 host without the CUDA shared library, session creation fails hard with `OrtSessionOptionsAppendExecution Provider_Cuda: Failed to load shared library` instead of falling through to the remaining providers (huggingface#1642). Make `createInferenceSession` retry with the remaining providers when the first one fails to initialize. The retry only fires when the caller requested more than one provider — if a single provider was requested explicitly (e.g. `device: 'cuda'`) the error propagates as before, since the caller has expressed a deliberate intent. A warning is logged each time a provider is dropped, so silent fallback is visible to operators. This generalises beyond CUDA — the same pattern handles transient or missing-driver failures for DirectML on Windows, CoreML on macOS, or WebGPU when the runtime is unavailable. Fixes huggingface#1642
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
When
device: 'auto'is requested in Node,deviceToExecutionProvidersreturns the full supported-device list (e.g.['cuda', 'webgpu', 'cpu']on Linux x64). ONNX Runtime treats that list as load-or-fail per provider — so on a Linux x64 host without the CUDA shared library installed, session creation fails hard with:…instead of falling through to the next provider in the list. Users without CUDA can't use the SDK at all in
automode (#1642).Fix
Make
createInferenceSessionretry with the remaining providers when the first one fails to initialise.Three properties worth calling out:
device: 'cuda'), the error propagates as before — we don't second-guess deliberate intent.webInitChainchain semantics — the fallback only applies in Node, where the multi-provider list is what triggers the original bug.logger.warnso silent degradation is observable in logs.The change generalises beyond CUDA — the same pattern naturally handles transient or missing-driver failures for DirectML on Windows, CoreML on macOS, or WebGPU when the runtime is unavailable.
Test plan
pnpm -C packages/transformers build— clean (CJS + ESM + types)pnpm format:check— cleanFixes #1642