Add PyTorch-native K-quant pass by jambayk · Pull Request #2479 · microsoft/Olive

jambayk · 2026-05-28T22:26:05Z

Describe your changes

Add KQuant pass under olive.passes.pytorch.kquant that implements the ggml K-quant weight-only search in PyTorch. A unified _kquant_search driver covers both variants:

Asymmetric (analogue of ggml's make_qkx2_quants used by Q2_K/Q4_K/Q5_K): tracks per-group (min, max), refines (scale, offset) via 2-D LSQ over a sweep of perturbed iscale factors.
Symmetric (analogue of ggml's make_qx_quants used by Q3_K/Q6_K): uses max|x| as normalizer, zero point fixed at midq, refines scale via 1-D LSQ sumlx/suml2.

Outputs (scales, zero_points) shaped to match WeightQuantizer.find_qparams, with the search parameterized by (maxq, minq) pulled directly from each module's WeightQuantizer (via get_maxq_minq) so the algorithm and finalize agree on the integer range. Uses the shared prepare_model/finalize plumbing so embeddings and lm_head can be quantized and retied like the other PyTorch quant passes (RTN, GPTQ).

Checklist before requesting a review

Add unit tests for this change.
Make sure all tests can pass.
Update documents if necessary.
Lint and apply fixes to your code by running lintrunner -a
Is this a user-facing change? If yes, give a description of this change to be included in the release notes.

Release note: New KQuant PyTorch pass for weight-only K-quant quantization (asymmetric + symmetric, 2/4/8-bit). Rtn and KQuant now also advertise uint2/int2 precisions.

(Optional) Issue link

Add KQuant pass under olive.passes.pytorch.kquant that implements the ggml K-quant weight-only search (the asymmetric make_qkx2_quants and the symmetric make_qx_quants variants) using a unified driver. It produces per-group scale and zero point compatible with WeightQuantizer, supports both asymmetric and symmetric quantization, and reuses the shared prepare_model/finalize plumbing so embeddings and lm_head can be quantized and retied like the other PyTorch quant passes. Also opt-in 2-bit precisions (uint2/int2) for both KQuant and Rtn in olive_config.json; the underlying WeightQuantizer and pack/unpack helpers already support 2-bit, only the registration was missing. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Adds a new PyTorch-native KQuant pass for Hugging Face/PyTorch model weight-only quantization, using shared Olive quantization preparation/finalization flow and extending pass metadata to advertise 2-bit precisions.

Changes:

Introduces olive.passes.pytorch.kquant with K-quant qparam search and pass implementation.
Registers KQuant in olive_config.json and adds 2-bit precision support to Rtn.
Adds PyTorch unit tests for KQuant qparam quality, pass execution, overrides, embeddings, and lm_head composition.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File	Description
`olive/passes/pytorch/kquant.py`	Implements KQuant search, qparam generation, config validation, and pass execution.
`olive/olive_config.json`	Registers KQuant and updates Rtn supported precisions.
`test/passes/pytorch/test_kquant.py`	Adds coverage for KQuant qparams and end-to-end pass behavior.

For asymmetric, clamp rmin <= 0 / rmax >= 0 and substitute the RTN-style (-1, 1) sentinel for all-zero groups so the normalizer is never zero. For symmetric, replace a zero normalizer with 1 (data is all zero anyway). Mirrors WeightQuantizer.find_qparams and ggml's make_qkx2_quants min<=0 clamp. Also updates the class docstring to mention 2-bit support. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings May 28, 2026 22:26

Copilot started reviewing on behalf of jambayk May 28, 2026 22:26 View session

Copilot AI reviewed May 28, 2026

View reviewed changes

Comment thread olive/passes/pytorch/kquant.py

Comment thread olive/passes/pytorch/kquant.py Outdated

Comment thread olive/passes/pytorch/kquant.py

jambayk and others added 3 commits May 28, 2026 22:38

remove int2/int4

36e4b52

Show tqdm progress bar over modules in KQuant pass

9270b6f

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

xiaoyu-work approved these changes May 29, 2026

View reviewed changes

jambayk merged commit cd47ebb into main May 29, 2026
12 checks passed

jambayk deleted the jambayk/py-kquant branch May 29, 2026 18:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add PyTorch-native K-quant pass#2479

Add PyTorch-native K-quant pass#2479
jambayk merged 4 commits into
mainfrom
jambayk/py-kquant

jambayk commented May 28, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jambayk commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe your changes

Checklist before requesting a review

(Optional) Issue link

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jambayk commented May 28, 2026 •

edited

Loading