Skip to content

MB-66395: [v17] Add Request Batcher Module (for GPU indexes)#389

Open
CascadingRadium wants to merge 27 commits intomasterfrom
batcher
Open

MB-66395: [v17] Add Request Batcher Module (for GPU indexes)#389
CascadingRadium wants to merge 27 commits intomasterfrom
batcher

Conversation

@CascadingRadium
Copy link
Copy Markdown
Member

@CascadingRadium CascadingRadium commented Mar 31, 2026

  • Add latency-aware batching for vector search requests using a coalescing queue
    to combine compatible requests and execute them together to reduce number of
    Faiss calls.
  • Ensure adaptive batching by using Nagle's algorithm for maximum efficiency.

- Add latency-aware batching for vector search requests using a
timer-driven coalescing queue.
- Merge compatible requests (same k) and execute them together to reduce
number of `Faiss` calls.
- Ensure bounded latency via per-request deadlines using a package-level
variable called `LatencyBudget`.
- Add latency-aware batching for vector search requests using a
timer-driven coalescing queue.
- Merge compatible requests (same k) and execute them together to reduce
number of `Faiss` calls.
- Ensure bounded latency via per-request deadlines using a package-level
variable called `LatencyBudget`.
@CascadingRadium CascadingRadium changed the base branch from master to merge_fastmerge March 31, 2026 10:08
Base automatically changed from merge_fastmerge to master April 1, 2026 09:28
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new latency-aware batching module intended to coalesce compatible Faiss vector search requests (same k) into fewer Faiss calls, while also introducing supporting vectorSet helpers and a small reset fix in vector index opaque state.

Changes:

  • Introduces requestBatcher / coalesceQueue for timer-driven request coalescing and batched execution.
  • Adds vectorSet.clone() and vectorSet.mergeWith() helpers to support request merging.
  • Extends vector index interfaces with a faissIndexBatch abstraction and clears vectorIndexOpaque.config on reset.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
section_faiss_vector_index.go Clears vectorIndexOpaque.config during Reset() to avoid retaining prior config state.
faiss_vector_wrapper.go Adds vectorSet cloning/merging helpers used by batching.
faiss_vector_request_batcher.go New batching/coalescing implementation for Faiss searches.
faiss_vector_request_batcher_test.go Adds a benchmark harness and fake index for the batcher.
faiss_vector_index.go Adds faissIndexBatch interface for batched search operations.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 7 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

CascadingRadium and others added 3 commits April 2, 2026 03:29
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@CascadingRadium
Copy link
Copy Markdown
Member Author

exploring a self-tuning algorithm for higher QPS while batching. Converting to draft.

@CascadingRadium CascadingRadium marked this pull request as draft April 9, 2026 13:12
@CascadingRadium CascadingRadium marked this pull request as ready for review April 10, 2026 12:26
@abhinavdangeti abhinavdangeti changed the title [v17] Add Request Batcher Module MB-66395: [v17] Add Request Batcher Module Apr 10, 2026
@abhinavdangeti abhinavdangeti changed the title MB-66395: [v17] Add Request Batcher Module MB-66395: [v17] Add Request Batcher Module (for GPU indexes) Apr 10, 2026
type requestBatcher struct {
// the coalesce queue that manages the batching of incoming search requests.
cq *coalesceQueue
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add some unit testing and/or benchmarks for how many requests we can execute in a given window of time (assuming a constant runtime for each batch).

@CascadingRadium CascadingRadium self-assigned this Apr 10, 2026
}

// Interface for batched search operations on Faiss vector indices.
type faissIndexBatch interface {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: faissQueryBatch

@CascadingRadium CascadingRadium moved this from Todo to In Progress in GPU-Accelerated Vector Search Apr 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Development

Successfully merging this pull request may close these issues.

5 participants