bug: Jetson Orin compile + install

### Issue description

Can't run on Jetson Orin / Thor

### Expected Behavior

Be able to run node-llama-cpp on a Jetson Orin / Thor.

### Actual Behavior

Crashing:

```
/root/.nvm/versions/node/v24.13.1/lib/node_modules/node-llama-cpp/llama/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:97: CUDA error
[node-llama-cpp] CUDA error: an internal operation failed
[node-llama-cpp]   current device: 0, in function ggml_cuda_op_mul_mat_cublas at /root/.nvm/versions/node/v24.13.1/lib/node_modules/node-llama-cpp/llama/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:1363
[node-llama-cpp]   cublasSgemm_v2(ctx.cublas_handle(id), CUBLAS_OP_T, CUBLAS_OP_N, row_diff, src1_ncols, ne10, &alpha, src0_ddf_i, ne00, src1_ddf1_i, ne10, &beta, dst_dd_i, ldc)
Aborted (core dumped)
```

### Steps to reproduce

after trying 2 full days to get it running I cant compile llama.cpp to run:

by now I have tried all sorts of compile options. This is only my last attempt:

`NODE_LLAMA_CPP_CMAKE_OPTION_CMAKE_CUDA_ARCHITECTURES=87 NODE_LLAMA_CPP_CMAKE_OPTION_GGML_CUDA_FORCE_MMQ=ON NODE_LLAMA_CPP_CMAKE_OPTION_GGML_CUDA_NO_VMM=ON npx --no node-llama-cpp source build --gpu`

If I dont set GGML_CUDA_NO_VMM=ON I get memory allocation errors.
If not set CMAKE_CUDA_ARCHITECTURES=87 some random virtual cuda arch is detected.

Here my latest attempt as done with the llama.cpp from GitHub:
`NODE_LLAMA_CPP_CMAKE_OPTION_GGML_CUDA=ON NODE_LLAMA_CPP_CMAKE_OPTION_DNLC_VARIANT=cuda.b8121 NODE_LLAMA_CPP_CMAKE_OPTION_CMAKE_CUDA_ARCHITECTURES=87 NODE_LLAMA_CPP_CMAKE_OPTION_GGML_CUDA_CUB_3DOT2=ON NODE_LLAMA_CPP_CMAKE_OPTION_GGML_BACKEND_DL=ON NODE_LLAMA_CPP_CMAKE_OPTION_GGML_CUDA_FORCE_MMQ=ON NODE_LLAMA_CPP_CMAKE_OPTION_GGML_CUDA_NO_VMM=ON NODE_LLAMA_CPP_CMAKE_OPTION_GGML_CUBLAS=OFF NODE_LLAMA_CPP_CMAKE_OPTION_CMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc NODE_LLAMA_CPP_CMAKE_OPTION_CUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda npx --no node-llama-cpp source build --gpu`


### My Environment

| Dependency               | Version             |
| ---                      | ---                 |
| Operating System         | Jetson Orin |
| CPU                      | ARM aarch64 |
| Node.js version          | 24.13.1             |
| Typescript version       | ?             |
| `node-llama-cpp` version | 3.16.2            |

$ cat /etc/nv_tegra_release
R36 (release), REVISION: 4.7, GCID: 42132812, BOARD: generic, EABI: aarch64, DATE: Thu Sep 18 22:54:44 UTC 2025
KERNEL_VARIANT: oot
TARGET_USERSPACE_LIB_DIR=nvidia
TARGET_USERSPACE_LIB_DIR_PATH=usr/lib/aarch64-linux-gnu/nvidia


`npx --yes node-llama-cpp inspect gpu` output:
```
# npx --yes node-llama-cpp inspect gpu
OS: Ubuntu 22.04.5 LTS (arm64)
Node: 24.13.1 (arm64)

node-llama-cpp: 3.16.2
Prebuilt binaries: b8121
Cloned source: b8121

CUDA: available
Vulkan: Vulkan is detected, but using it failed
To resolve errors related to Vulkan, see the Vulkan guide: https://node-llama-cpp.withcat.ai/guide/vulkan

CUDA device: Orin
CUDA used VRAM: 5.8% (3.56GB/61.37GB)
CUDA free VRAM: 94.19% (57.81GB/61.37GB)

CPU model: Cortex-A78AE
Math cores: 12
Used RAM: 5.8% (3.56GB/61.37GB)
Free RAM: 94.19% (57.81GB/61.37GB)
Used swap: 0.67% (211MB/30.68GB)
Max swap size: 30.68GB
mmap: supported

```


### Additional Context

using the llama.cpp from src compiled on the orin works great. 

$ compile.sh
```
cmake -B build -DGGML_CUDA=ON -DDNLC_VARIANT=cuda.b8121 -DCMAKE_CUDA_ARCHITECTURES=87 -DGGML_CUDA_CUB_3DOT2=ON -DGGML_BACKEND_DL=ON -DGGML_CUDA_FORCE_MMQ=ON -DGGML_CUDA_NO_VMM=ON -DGGML_CUBLAS=OFF -DCMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda
```

test run:
```
root@jetson-orin:/usr/src/llama.cpp/build/bin# ./llama-cli -m /root/.cache/qmd/models/hf_tobil_qmd-query-expansion-1.7B-q4_k_m.gguf -p "Hello, how are you?" -n 128
ggml_cuda_init: found 1 CUDA devices:
  Device 0: Orin, compute capability 8.7, VMM: no
load_backend: loaded CUDA backend from /mnt/src/llama.cpp/build/bin/libggml-cuda.so
load_backend: loaded CPU backend from /mnt/src/llama.cpp/build/bin/libggml-cpu.so

Loading model...  


▄▄ ▄▄
██ ██
██ ██  ▀▀█▄ ███▄███▄  ▀▀█▄    ▄████ ████▄ ████▄
██ ██ ▄█▀██ ██ ██ ██ ▄█▀██    ██    ██ ██ ██ ██
██ ██ ▀█▄██ ██ ██ ██ ▀█▄██ ██ ▀████ ████▀ ████▀
                                    ██    ██
                                    ▀▀    ▀▀

build      : b8136-9051663d5
model      : hf_tobil_qmd-query-expansion-1.7B-q4_k_m.gguf
modalities : text

available commands:
  /exit or Ctrl+C     stop or exit
  /regen              regenerate the last response
  /clear              clear the chat history
  /read               add a text file


> Hello, how are you?

[Start thinking]
lex: greetings from the virtual assistant
lex: how are you today?
vec: greetings from the virtual assistant
vec: how are you today?
hyde: The topic of hello, how are you? covers greetings from the virtual assistant. Proper implementation follows established patterns and best practices.

[ Prompt: 293.1 t/s | Generation: 60.9 t/s ]
```

### Relevant Features Used

- [ ] Metal support
- [X] CUDA support
- [ ] Vulkan support
- [ ] Grammar
- [ ] Function calling

### Are you willing to resolve this issue by submitting a Pull Request?

Yes, but no idea. Im not a dev.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

bug: Jetson Orin compile + install #560

Issue description

Expected Behavior

Actual Behavior

Steps to reproduce

My Environment

Additional Context

Relevant Features Used

Are you willing to resolve this issue by submitting a Pull Request?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Dependency	Version
Operating System	Jetson Orin
CPU	ARM aarch64
Node.js version	24.13.1
Typescript version	?
`node-llama-cpp` version	3.16.2

Uh oh!

bug: Jetson Orin compile + install #560

Description

Issue description

Expected Behavior

Actual Behavior

Steps to reproduce

My Environment

Additional Context

Relevant Features Used

Are you willing to resolve this issue by submitting a Pull Request?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions