Skip to content

Add PyTorch-native K-quant pass#2479

Merged
jambayk merged 4 commits into
mainfrom
jambayk/py-kquant
May 29, 2026
Merged

Add PyTorch-native K-quant pass#2479
jambayk merged 4 commits into
mainfrom
jambayk/py-kquant

Conversation

@jambayk
Copy link
Copy Markdown
Contributor

@jambayk jambayk commented May 28, 2026

Describe your changes

Add KQuant pass under olive.passes.pytorch.kquant that implements the ggml K-quant weight-only search in PyTorch. A unified _kquant_search driver covers both variants:

  • Asymmetric (analogue of ggml's make_qkx2_quants used by Q2_K/Q4_K/Q5_K): tracks per-group (min, max), refines (scale, offset) via 2-D LSQ over a sweep of perturbed iscale factors.
  • Symmetric (analogue of ggml's make_qx_quants used by Q3_K/Q6_K): uses max|x| as normalizer, zero point fixed at midq, refines scale via 1-D LSQ sumlx/suml2.

Outputs (scales, zero_points) shaped to match WeightQuantizer.find_qparams, with the search parameterized by (maxq, minq) pulled directly from each module's WeightQuantizer (via get_maxq_minq) so the algorithm and finalize agree on the integer range. Uses the shared prepare_model/finalize plumbing so embeddings and lm_head can be quantized and retied like the other PyTorch quant passes (RTN, GPTQ).

Checklist before requesting a review

  • Add unit tests for this change.
  • Make sure all tests can pass.
  • Update documents if necessary.
  • Lint and apply fixes to your code by running lintrunner -a
  • Is this a user-facing change? If yes, give a description of this change to be included in the release notes.

Release note: New KQuant PyTorch pass for weight-only K-quant quantization (asymmetric + symmetric, 2/4/8-bit). Rtn and KQuant now also advertise uint2/int2 precisions.

(Optional) Issue link

Add KQuant pass under olive.passes.pytorch.kquant that implements the
ggml K-quant weight-only search (the asymmetric make_qkx2_quants and the
symmetric make_qx_quants variants) using a unified driver. It produces
per-group scale and zero point compatible with WeightQuantizer, supports
both asymmetric and symmetric quantization, and reuses the shared
prepare_model/finalize plumbing so embeddings and lm_head can be quantized
and retied like the other PyTorch quant passes.

Also opt-in 2-bit precisions (uint2/int2) for both KQuant and Rtn in
olive_config.json; the underlying WeightQuantizer and pack/unpack helpers
already support 2-bit, only the registration was missing.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 28, 2026 22:26
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new PyTorch-native KQuant pass for Hugging Face/PyTorch model weight-only quantization, using shared Olive quantization preparation/finalization flow and extending pass metadata to advertise 2-bit precisions.

Changes:

  • Introduces olive.passes.pytorch.kquant with K-quant qparam search and pass implementation.
  • Registers KQuant in olive_config.json and adds 2-bit precision support to Rtn.
  • Adds PyTorch unit tests for KQuant qparam quality, pass execution, overrides, embeddings, and lm_head composition.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
olive/passes/pytorch/kquant.py Implements KQuant search, qparam generation, config validation, and pass execution.
olive/olive_config.json Registers KQuant and updates Rtn supported precisions.
test/passes/pytorch/test_kquant.py Adds coverage for KQuant qparams and end-to-end pass behavior.

Comment thread olive/passes/pytorch/kquant.py
Comment thread olive/passes/pytorch/kquant.py Outdated
Comment thread olive/passes/pytorch/kquant.py
jambayk and others added 3 commits May 28, 2026 22:38
For asymmetric, clamp rmin <= 0 / rmax >= 0 and substitute the
RTN-style (-1, 1) sentinel for all-zero groups so the normalizer is
never zero. For symmetric, replace a zero normalizer with 1 (data is
all zero anyway). Mirrors WeightQuantizer.find_qparams and ggml's
make_qkx2_quants min<=0 clamp. Also updates the class docstring to
mention 2-bit support.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@jambayk jambayk merged commit cd47ebb into main May 29, 2026
12 checks passed
@jambayk jambayk deleted the jambayk/py-kquant branch May 29, 2026 18:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants