fix: drop gguf VRAM estimation (now redundant) by mudler · Pull Request #8325 · mudler/LocalAI

mudler · 2026-02-01T10:06:50Z

Cleanup. This is now handled directly in llama.cpp, no need to estimate from Go.

VRAM estimation in general is tricky, but llama.cpp ( https://github.com/ggml-org/llama.cpp/blob/41ea26144e55d23f37bb765f88c07588d786567f/src/llama.cpp#L168 ) lately has added an automatic "fitting" of models to VRAM, so we can drop backend-specific GGUF VRAM estimation from our code instead of trying to guess as we already enable it

LocalAI/backend/cpp/llama-cpp/grpc-server.cpp

Line 393 in 397f7f0

params.fit_params = true;

Fixes: #8302
See: #8302 (comment)

netlify · 2026-02-01T10:06:55Z

✅ Deploy Preview for localai ready!

Name	Link
🔨 Latest commit	`a52f1d8`
🔍 Latest deploy log	https://app.netlify.com/projects/localai/deploys/697f26cf6dbf1d0008fdefb7
😎 Deploy Preview	https://deploy-preview-8325--localai.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Cleanup. This is now handled directly in llama.cpp, no need to estimate from Go. VRAM estimation in general is tricky, but llama.cpp ( https://github.com/ggml-org/llama.cpp/blob/41ea26144e55d23f37bb765f88c07588d786567f/src/llama.cpp#L168 ) lately has added an automatic "fitting" of models to VRAM, so we can drop backend-specific GGUF VRAM estimation from our code instead of trying to guess as we already enable it https://github.com/mudler/LocalAI/blob/397f7f0862d4105b874523e1a0105ae036db18ec/backend/cpp/llama-cpp/grpc-server.cpp#L393 Fixes: #8302 See: #8302 (comment)

fix: drop gguf VRAM estimation Cleanup. This is now handled directly in llama.cpp, no need to estimate from Go. VRAM estimation in general is tricky, but llama.cpp ( https://github.com/ggml-org/llama.cpp/blob/41ea26144e55d23f37bb765f88c07588d786567f/src/llama.cpp#L168 ) lately has added an automatic "fitting" of models to VRAM, so we can drop backend-specific GGUF VRAM estimation from our code instead of trying to guess as we already enable it https://github.com/mudler/LocalAI/blob/397f7f0862d4105b874523e1a0105ae036db18ec/backend/cpp/llama-cpp/grpc-server.cpp#L393 Fixes: mudler#8302 See: mudler#8302 (comment)

mudler force-pushed the chore/drop-gguf-vram-estimation branch from 2a8bbc7 to 5162a40 Compare February 1, 2026 10:09

mudler force-pushed the chore/drop-gguf-vram-estimation branch from ca2e280 to a52f1d8 Compare February 1, 2026 10:11

mudler changed the title ~~fix: drop gguf VRAM estimation~~ fix: drop gguf VRAM estimation (now redundant) Feb 1, 2026

mudler merged commit 800f749 into master Feb 1, 2026
39 checks passed

mudler deleted the chore/drop-gguf-vram-estimation branch February 1, 2026 16:33

mudler added the bug Something isn't working label Feb 7, 2026

BrewTestBot mentioned this pull request Feb 7, 2026

localai 3.11.0 Homebrew/homebrew-core#266363

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: drop gguf VRAM estimation (now redundant)#8325

fix: drop gguf VRAM estimation (now redundant)#8325
mudler merged 1 commit intomasterfrom
chore/drop-gguf-vram-estimation

mudler commented Feb 1, 2026 •

edited

Loading

Uh oh!

netlify bot commented Feb 1, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

mudler commented Feb 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

netlify bot commented Feb 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for localai ready!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mudler commented Feb 1, 2026 •

edited

Loading

netlify bot commented Feb 1, 2026 •

edited

Loading