Skip to content

Update to MLX 0.30.6#6

Open
robert-johansson wants to merge 23 commits into
frost-beta:mainfrom
robert-johansson:main
Open

Update to MLX 0.30.6#6
robert-johansson wants to merge 23 commits into
frost-beta:mainfrom
robert-johansson:main

Conversation

@robert-johansson
Copy link
Copy Markdown

Summary

  • Bump MLX submodule from 0.25.0 to 0.30.6
  • Add ki::Type specialization for mlx::core::SmallVector (MLX >= 0.26 uses SmallVector for Shape)
  • Update API call sites for breaking changes: std::vector<int>mx::Shape, new output_padding params in conv_transpose, extra arg in scaled_dot_product_attention
  • Split large ki::Set registration calls to stay within template parameter limits
  • Wrap mx::metal::device_info to return the new std::unordered_map return type

Tested on macOS with Apple Silicon (M4). All existing functionality works.

🤖 Generated with Claude Code

Robert Johansson and others added 15 commits February 22, 2026 21:08
Bump MLX submodule from v0.25.0 to v0.30.6 and fix all API changes:

- Add SmallVector<T> kizunapi type specialization (Shape changed from
  std::vector<int> to SmallVector in MLX >= 0.26)
- Add PutIntoShape helper, keep PutIntoVector for std::vector<int> uses
- Update FFT wrapper function pointer types for Shape parameter
- Add output_padding parameter to conv_transpose1d/2d/3d
- Add sinks parameter to scaled_dot_product_attention calls
- Move device_info from metal:: to gpu:: namespace
- Split large ki::Set calls to stay within template argument limits

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Update deps/mlx with fix for compile_fuse broadcast split_one bug that
caused "unordered_map::at: key not found" on compiled functions with
~100+ operations. This is an upstream MLX bug (v0.29.4+).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Update MLX submodule with improved compile_fuse fix that preserves
the broadcast fusion optimization while fixing the aliasing bug
that caused unordered_map::at crashes on large computation graphs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Points deps/mlx to ml-explore/mlx main (c8536f52) which includes
the merged compile_fuse broadcast split fix from PR #3166, plus
newer upstream fixes (Metal event leak, conv3d overflow, fence sync).

Replaces the local branch commits (65cefdef, a6d40e4a) which are
now superseded by the upstream merge.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Update MLX submodule to include native lgamma/digamma kernels and
add Node.js bindings for both operations.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Update deps/mlx submodule URL to robert-johansson/mlx (genmlx branch)
  with lgamma, digamma, bessel_i0e, bessel_i1e ops
- Add besselI0e/besselI1e bindings in ops.cc and type declarations

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Report external memory (min 1MB per array) via napi_adjust_external_memory
so the JS GC knows about Metal GPU buffer pressure. This makes GC run
earlier, reducing the chance of hitting Metal's 499K allocation limit.

- Point kizunapi submodule to robert-johansson fork with ExternalMemorySize trait
- Specialize ExternalMemorySize for mx::array (1MB minimum cost)
- Add napi_adjust_external_memory calls in Tidy and Dispose paths

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds a native function that bypasses the deferred N-API finalizer queue by
synchronously walking the wrapper registry and freeing arrays whose JS
wrappers have been GC'd. This is critical for synchronous inference loops
where the event loop never yields and deferred finalizers never run.

Includes kizunapi changes:
- CollectDeadWrappers<T>() in InstanceData
- ExternalMemorySize reporting on AllowPassByValue path
- Double-free guard in finalizer callbacks

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Kernel .h changes now take effect via JIT source string regeneration
without needing to manually delete .air/.metallib files.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ed calls

The Tidy function captured `auto& top = g_tidy_arrays.top()` and passed
`[&top]` to the AwaitFunction lambda. If the lambda executed after the
stack was modified (async Promise path, or nested tidy calls), `top`
became a dangling reference → segfault at address 0x5.

Fix: move the set off the stack inside cpp_then (at execution time, not
capture time). Use a shared_ptr<bool> flag to coordinate between cpp_then
and cpp_finally so the stack is popped exactly once — cpp_then pops on
success, cpp_finally pops only on error (if cpp_then didn't run).

Verified: nested tidy (3 levels), 218K-call stress test, GenMLX test suite
(165/165 gen_clj_compat).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…eference

ExternalMemorySize::Get(a) was called on array pointers before checking
if the pointer was still valid. If JS GC had already finalized the array
(calling TypeBridge::Finalize → delete), the pointer was dangling.

Fix: check GetWrapper/DeleteWrapper first. Only access the pointer if
the wrapper map confirms it's still alive (states 1 or 3, not state 2).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When the JS function passed to valueAndGrad threw during tracing,
the error was silently swallowed. The traced lambda returned an
empty vector, MLX's value_and_grad continued with garbage, and
TreeUnflatten returned a stale tracer Symbol instead of a concrete
mx.array. No error was ever propagated to the caller.

Fix: track callback failure with a flag. After value_and_grad_func
returns, check the flag and throw instead of proceeding with
invalid results.

Reproducer (before fix):
  const vg = mx.valueAndGrad((w, x) => { throw new Error('oops'); });
  const [v, g] = vg(mx.array([1]), mx.array([2]));
  // v.constructor.name was 'Symbol' — should have thrown

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Update deps/mlx to genmlx-rebased branch which includes:
  - 53 upstream commits (teardown fix, split-K matmul, etc.)
  - Library cleaner: Metal shader pipelines are released when compiled
    functions are erased from the compile cache
  - Custom ops (lgamma, digamma, bessel, vmap floor_divide fix)

- Export mx.detail.compile_clear_cache as compileClearCache in JS
  bindings, allowing explicit cleanup of all compiled function caches
  and their associated Metal resources.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Root cause: after mx.eval(), each array retains shared_ptr references
to its inputs through the computation graph. Under Bun/JSC, the GC is
non-deterministic and finalizers are deferred to the event loop. In
synchronous code (which nbb/ClojureScript is), finalizers never fire,
so Metal buffers accumulate monotonically — num_resources grew from 26
to 18,000+ in 60 seconds, eventually hitting the macOS 499K limit.

Fix: call array.detach() on evaluated arrays in Eval(). This severs
the graph links (primitive + inputs), allowing parent arrays and their
Metal buffers to be freed immediately. Safe because node-mlx manages
gradients via separate valueAndGrad/grad transforms that trace their
own graphs — the forward graph is never reused after eval.

Also:
- Expose getNumResources/getResourceLimit for Metal buffer monitoring
- Move SweepDeadArrays to shared header for cross-file access
- Update MLX submodule with resource tracking API

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Robert Johansson and others added 7 commits March 23, 2026 01:03
Wraps mx::searchsorted in node-mlx NAPI bindings.
TypeScript declaration added.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
AwaitFunction did not check for exceptions after calling func().
When func() threw (e.g., item() on non-scalar array inside tidy),
result was nullptr. Passing nullptr to napi_is_promise caused a
segfault at address 0x5 on Bun/JSC (V8/Node handled it gracefully).

Additionally, Bun's napi_is_exception_pending does not always report
pending exceptions, so we check for null result as a fallback.

Reproducer: any JS error inside mx.tidy() callback crashes Bun.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two fixes for Bun/JSC N-API bugs:

1. SweepDeadArrays double-check protocol:
   Bun's JSC napi_get_reference_value temporarily returns null for weak
   references to objects that are still alive and reachable (918 false
   positives observed in a single test run). Single-check sweep deleted
   these "dead" arrays causing use-after-free crashes.

   Fix: first scan marks arrays as pending, second scan confirms death
   before deletion. Arrays that "resurrect" between scans are kept.
   Adds ScanDeadWrappers to instance_data.h (scan without remove).

2. AwaitFunction null-result check:
   When a JS exception occurs inside a callback (e.g., tidy scope),
   func() returns nullptr. Bun's napi_is_exception_pending does not
   report the pending exception. Passing nullptr to napi_is_promise
   caused segfault at address 0x5.

   Fix: check for null result as fallback for exception detection.

Both bugs are Bun/JSC N-API implementation issues. Node.js/V8 is
not affected.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two fixes for Bun/JSC N-API bugs:
- Double-check sweep: prevents use-after-free from false positive weak refs
- AwaitFunction null-result check: prevents segfault on exception in tidy

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Synced mlx fork with latest upstream ml-explore/mlx, rebasing our
6 custom patches on top. Dropped the floor_divide fix (merged upstream
as PR #3292). Fork's main is now: upstream/main + GenMLX patches only.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- fft.cc: Updated FFT function wrappers for upstream's new FFTNorm
  parameter (added in mlx fft.h). Uses explicit static_cast to
  disambiguate overloaded function pointers.
- deps/mlx: Updated to include Cholesky JVP/VJP.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant