We use sgemm to run matmuls for every query.
We can cut this in half by supporting float16, which MKL/OpenBLAS support.
Requirements:
- We need to be able to toggle this behavior. We should expect some perf loss from the loss in precision.
- We need to decide when float16 computation is allowed. e.g. Are we casting our stored embeddings to float16 instead of float32? Where does that happen?
We use sgemm to run matmuls for every query.
We can cut this in half by supporting float16, which MKL/OpenBLAS support.
Requirements: