Skip to content

fix(backends/onnx): fall back to next execution provider on init failure#1708

Open
tsushanth wants to merge 1 commit into
huggingface:mainfrom
tsushanth:fix/auto-device-cuda-fallback-1642
Open

fix(backends/onnx): fall back to next execution provider on init failure#1708
tsushanth wants to merge 1 commit into
huggingface:mainfrom
tsushanth:fix/auto-device-cuda-fallback-1642

Conversation

@tsushanth

Copy link
Copy Markdown

Summary

When device: 'auto' is requested in Node, deviceToExecutionProviders returns the full supported-device list (e.g. ['cuda', 'webgpu', 'cpu'] on Linux x64). ONNX Runtime treats that list as load-or-fail per provider — so on a Linux x64 host without the CUDA shared library installed, session creation fails hard with:

OrtSessionOptionsAppendExecutionProvider_Cuda: Failed to load shared library
  at new OnnxruntimeSessionHandler (node_modules/onnxruntime-node/lib/backend)

…instead of falling through to the next provider in the list. Users without CUDA can't use the SDK at all in auto mode (#1642).

Fix

Make createInferenceSession retry with the remaining providers when the first one fails to initialise.

+    if (
+        !apis.IS_WEB_ENV &&
+        Array.isArray(session_options.executionProviders) &&
+        session_options.executionProviders.length > 1
+    ) {
+        let providers = session_options.executionProviders.slice();
+        let lastError;
+        while (providers.length > 0) {
+            try {
+                const session = await load(providers);
+                session.config = session_config;
+                return session;
+            } catch (error) {
+                lastError = error;
+                if (providers.length === 1) break;
+                logger.warn(
+                    `Execution provider \"${providerName(providers[0])}\" failed to initialize: ${error?.message ?? error}. Falling back to ${providers.slice(1).map(providerName).join(', ')}.`,
+                );
+                providers = providers.slice(1);
+            }
+        }
+        throw lastError;
+    }

Three properties worth calling out:

  1. Only retries when more than one provider was requested. If a caller explicitly asks for a single provider (device: 'cuda'), the error propagates as before — we don't second-guess deliberate intent.
  2. Web path is untouched. The web path (WASM + WebGPU) keeps its existing webInitChain chain semantics — the fallback only applies in Node, where the multi-provider list is what triggers the original bug.
  3. Visibility. Each fallback emits a logger.warn so silent degradation is observable in logs.

The change generalises beyond CUDA — the same pattern naturally handles transient or missing-driver failures for DirectML on Windows, CoreML on macOS, or WebGPU when the runtime is unavailable.

Test plan

  • pnpm -C packages/transformers build — clean (CJS + ESM + types)
  • pnpm format:check — clean
  • No dedicated unit test added — the failure mode requires runtime mocking of ONNX session creation against specific error strings, which doesn't slot cleanly into the existing test suite. Happy to add one if you'd like to suggest a pattern that fits.

Fixes #1642

When `device: 'auto'` is requested in Node, `deviceToExecutionProviders`
returns the full supported-device list (e.g. `['cuda', 'webgpu', 'cpu']`
on Linux x64). ONNX Runtime treats that list as load-or-fail per
provider — so on a Linux x64 host without the CUDA shared library,
session creation fails hard with `OrtSessionOptionsAppendExecution
Provider_Cuda: Failed to load shared library` instead of falling
through to the remaining providers (huggingface#1642).

Make `createInferenceSession` retry with the remaining providers when
the first one fails to initialize. The retry only fires when the
caller requested more than one provider — if a single provider was
requested explicitly (e.g. `device: 'cuda'`) the error propagates as
before, since the caller has expressed a deliberate intent. A warning
is logged each time a provider is dropped, so silent fallback is
visible to operators.

This generalises beyond CUDA — the same pattern handles transient or
missing-driver failures for DirectML on Windows, CoreML on macOS, or
WebGPU when the runtime is unavailable.

Fixes huggingface#1642
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Auto device on Linux x64 fails hard when CUDA shared library is unavailable instead of falling back

1 participant