MB-66395: [v17] Add Request Batcher Module (for GPU indexes)#389
MB-66395: [v17] Add Request Batcher Module (for GPU indexes)#389CascadingRadium wants to merge 27 commits intomasterfrom
Conversation
- Add latency-aware batching for vector search requests using a timer-driven coalescing queue. - Merge compatible requests (same k) and execute them together to reduce number of `Faiss` calls. - Ensure bounded latency via per-request deadlines using a package-level variable called `LatencyBudget`.
This reverts commit 02cddd2.
- Add latency-aware batching for vector search requests using a timer-driven coalescing queue. - Merge compatible requests (same k) and execute them together to reduce number of `Faiss` calls. - Ensure bounded latency via per-request deadlines using a package-level variable called `LatencyBudget`.
There was a problem hiding this comment.
Pull request overview
Adds a new latency-aware batching module intended to coalesce compatible Faiss vector search requests (same k) into fewer Faiss calls, while also introducing supporting vectorSet helpers and a small reset fix in vector index opaque state.
Changes:
- Introduces
requestBatcher/coalesceQueuefor timer-driven request coalescing and batched execution. - Adds
vectorSet.clone()andvectorSet.mergeWith()helpers to support request merging. - Extends vector index interfaces with a
faissIndexBatchabstraction and clearsvectorIndexOpaque.configon reset.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| section_faiss_vector_index.go | Clears vectorIndexOpaque.config during Reset() to avoid retaining prior config state. |
| faiss_vector_wrapper.go | Adds vectorSet cloning/merging helpers used by batching. |
| faiss_vector_request_batcher.go | New batching/coalescing implementation for Faiss searches. |
| faiss_vector_request_batcher_test.go | Adds a benchmark harness and fake index for the batcher. |
| faiss_vector_index.go | Adds faissIndexBatch interface for batched search operations. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 7 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
exploring a self-tuning algorithm for higher QPS while batching. Converting to draft. |
| type requestBatcher struct { | ||
| // the coalesce queue that manages the batching of incoming search requests. | ||
| cq *coalesceQueue | ||
| } |
There was a problem hiding this comment.
Let's add some unit testing and/or benchmarks for how many requests we can execute in a given window of time (assuming a constant runtime for each batch).
| } | ||
|
|
||
| // Interface for batched search operations on Faiss vector indices. | ||
| type faissIndexBatch interface { |
to combine compatible requests and execute them together to reduce number of
Faiss calls.