First of all: CONGRATS ON YOUR AMAZING RESEARCH WORK.
Considering that this is using GGML and seems based directly on llama.cpp:
Why is this a separate project to llama.cpp, given that llama.cpp already supports BitNet ternary quants? (ggml-org/llama.cpp#8151)
Are these simply more optimised kernels?
If so, how do they compare to llama's implementation?
Can/should they be contributed back to llama.cpp?
First of all: CONGRATS ON YOUR AMAZING RESEARCH WORK.
Considering that this is using GGML and seems based directly on
llama.cpp:Why is this a separate project to
llama.cpp, given thatllama.cppalready supports BitNet ternary quants? (ggml-org/llama.cpp#8151)Are these simply more optimised kernels?
If so, how do they compare to llama's implementation?
Can/should they be contributed back to
llama.cpp?