fix(backends/onnx): fall back to next execution provider on init failure by tsushanth · Pull Request #1708 · huggingface/transformers.js

tsushanth · 2026-06-11T23:41:15Z

Summary

When device: 'auto' is requested in Node, deviceToExecutionProviders returns the full supported-device list (e.g. ['cuda', 'webgpu', 'cpu'] on Linux x64). ONNX Runtime treats that list as load-or-fail per provider — so on a Linux x64 host without the CUDA shared library installed, session creation fails hard with:

OrtSessionOptionsAppendExecutionProvider_Cuda: Failed to load shared library
  at new OnnxruntimeSessionHandler (node_modules/onnxruntime-node/lib/backend)

…instead of falling through to the next provider in the list. Users without CUDA can't use the SDK at all in auto mode (#1642).

Fix

Make createInferenceSession retry with the remaining providers when the first one fails to initialise.

+    if (
+        !apis.IS_WEB_ENV &&
+        Array.isArray(session_options.executionProviders) &&
+        session_options.executionProviders.length > 1
+    ) {
+        let providers = session_options.executionProviders.slice();
+        let lastError;
+        while (providers.length > 0) {
+            try {
+                const session = await load(providers);
+                session.config = session_config;
+                return session;
+            } catch (error) {
+                lastError = error;
+                if (providers.length === 1) break;
+                logger.warn(
+                    `Execution provider \"${providerName(providers[0])}\" failed to initialize: ${error?.message ?? error}. Falling back to ${providers.slice(1).map(providerName).join(', ')}.`,
+                );
+                providers = providers.slice(1);
+            }
+        }
+        throw lastError;
+    }

Three properties worth calling out:

Only retries when more than one provider was requested. If a caller explicitly asks for a single provider (device: 'cuda'), the error propagates as before — we don't second-guess deliberate intent.
Web path is untouched. The web path (WASM + WebGPU) keeps its existing webInitChain chain semantics — the fallback only applies in Node, where the multi-provider list is what triggers the original bug.
Visibility. Each fallback emits a logger.warn so silent degradation is observable in logs.

The change generalises beyond CUDA — the same pattern naturally handles transient or missing-driver failures for DirectML on Windows, CoreML on macOS, or WebGPU when the runtime is unavailable.

Test plan

pnpm -C packages/transformers build — clean (CJS + ESM + types)
pnpm format:check — clean
No dedicated unit test added — the failure mode requires runtime mocking of ONNX session creation against specific error strings, which doesn't slot cleanly into the existing test suite. Happy to add one if you'd like to suggest a pattern that fits.

Fixes #1642

When `device: 'auto'` is requested in Node, `deviceToExecutionProviders` returns the full supported-device list (e.g. `['cuda', 'webgpu', 'cpu']` on Linux x64). ONNX Runtime treats that list as load-or-fail per provider — so on a Linux x64 host without the CUDA shared library, session creation fails hard with `OrtSessionOptionsAppendExecution Provider_Cuda: Failed to load shared library` instead of falling through to the remaining providers (huggingface#1642). Make `createInferenceSession` retry with the remaining providers when the first one fails to initialize. The retry only fires when the caller requested more than one provider — if a single provider was requested explicitly (e.g. `device: 'cuda'`) the error propagates as before, since the caller has expressed a deliberate intent. A warning is logged each time a provider is dropped, so silent fallback is visible to operators. This generalises beyond CUDA — the same pattern handles transient or missing-driver failures for DirectML on Windows, CoreML on macOS, or WebGPU when the runtime is unavailable. Fixes huggingface#1642

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(backends/onnx): fall back to next execution provider on init failure#1708

fix(backends/onnx): fall back to next execution provider on init failure#1708
tsushanth wants to merge 1 commit into
huggingface:mainfrom
tsushanth:fix/auto-device-cuda-fallback-1642

tsushanth commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tsushanth commented Jun 11, 2026

Summary

Fix

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant