Skip to content

hnsw#7

Merged
PhantomInTheWire merged 38 commits into
masterfrom
codex/review-slices
Mar 22, 2026
Merged

hnsw#7
PhantomInTheWire merged 38 commits into
masterfrom
codex/review-slices

Conversation

@PhantomInTheWire

@PhantomInTheWire PhantomInTheWire commented Mar 22, 2026

Copy link
Copy Markdown
Owner

Open with Devin

Summary by CodeRabbit

  • New Features

    • Full HNSW vector index support with configurable hyperparameters and on-disk HNSW sidecar persistence.
  • Improvements

    • Query planner/execution now picks search strategy from collection index configuration (flat vs HNSW).
    • Collection reopen validates persisted schema/state and rejects invalid manifests.
    • Checkpoints and segment storage now include and manage HNSW sidecars alongside flat indexes.
    • CLI/creation defaults updated to the new multi-index vector schema representation.
  • Tests

    • Expanded tests for HNSW build/search, sidecar persistence, index lifecycle, and query-mode parity.

@chatgpt-codex-connector

Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

@coderabbitai

coderabbitai Bot commented Mar 22, 2026

Copy link
Copy Markdown
📝 Walkthrough

Walkthrough

Adds full HNSW vector-index support and persistence: typed HNSW params/graph types, HNSW build/search, segment-sidecar encode/decode, planner/engine search-plan dispatch (Flat vs HNSW with ef_search), storage codec/layout changes for VectorIndexState, segment refactor, and many tests and CLI defaults updated.

Changes

Cohort / File(s) Summary
HNSW Core & Tests
crates/garuda-index-hnsw/src/lib.rs, crates/garuda-index-hnsw/src/{build.rs,search.rs,compare.rs,heap.rs,search_tests.rs}, crates/garuda-index-hnsw/tests/*, crates/garuda-index-hnsw/Cargo.toml
Implements HNSW index API, deterministic level sampling, node insertion/pruning, search (entry point & layer), comparators/heaps, unit/integration tests, and adds garuda-math dependency.
Types & HNSW Params
crates/garuda-types/src/{hnsw.rs,schema.rs,query.rs,ids.rs,status.rs}, crates/garuda-types/src/lib.rs
Introduces HNSW graph/types, validated wrapper types (HnswM, Ef, etc.), VectorIndexState enum and schema changes, Query/Doc/DenseVector types, IDs and Status, and rearranges lib exports.
Segment subsystem (split + indexing)
crates/garuda-segment/src/{lib.rs,types.rs,codec.rs,index.rs,search.rs,storage.rs}, crates/garuda-segment/tests/*
Modularizes segment code, adds SegmentFile/StoredRecord/RecordState, encodes/decodes HNSW graphs, builds/loads flat & HNSW indices, search entry points, sidecar handling, and sidecar-focused tests.
Engine & Planner integration
crates/garuda-engine/src/{lib.rs,query.rs,schema_ddl.rs,segment_manager.rs,recovery_service.rs,state.rs,segment_ddl.rs,checkpoint_service.rs,schema.rs}, crates/garuda-engine/tests/*, crates/garuda-engine/tests/common.rs, garuda/src/main.rs
Threads vector_field schema through segment lifecycle, makes index enable/drop fallible (enable_vector_index/drop_vector_index), calls state.rebuild_indexes() after schema changes, updates query execution to use SegmentSearchPlan (Flat/Hnsw) and propagate ef_search/top_k, and updates tests/CLI defaults.
Planner API changes
crates/garuda-planner/src/lib.rs, crates/garuda-planner/tests/*
Replaces ExactFlat execution model with SegmentSearchPlan, adds search field to QueryPlan, and changes build_query_plan signature to accept &VectorIndexState for ef_search selection.
Storage codec & layout
crates/garuda-storage/src/{codec.rs,layout.rs,lib.rs}
Switches persisted vector-index encoding to VectorIndexState with six HNSW fields, adds HNSW sidecar filename/path helpers and re-exports, adjusts encode/decode helpers.
Segment manager & state plumbing
crates/garuda-engine/src/segment_manager.rs, crates/garuda-engine/src/state.rs, crates/garuda-engine/src/recovery_service.rs
Passes &VectorFieldSchema into writing-segment creation/append/optimize, replaces free-sync helpers with SegmentFile methods (e.g., sync_meta), and ensures persisted segments are marked/synced correctly.
Engine/segment search dispatch
crates/garuda-engine/src/query.rs, crates/garuda-segment/src/search.rs, crates/garuda-segment/src/index.rs
Query execution now builds plans with vector index state, dispatches per-segment searches to search_flat or search_hnsw based on plan, derives per-segment top_k/ef_search correctly, and loads/builds indexes per segment.
Tests & validation
multiple crates/garuda-engine/tests/*, crates/garuda-segment/tests/*, crates/garuda-planner/tests/*
Adds many integration/unit tests for HNSW parameter validation, index lifecycle, sidecar persistence/removal, query parity, planner ef_search override, and manifest/schema validation. Areas to scrutinize: manifest schema validation, sidecar lifecycle assertions, and ef_search/top_k propagation.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client
    participant Engine as Engine
    participant Planner as Planner
    participant SegMgr as SegmentManager
    participant Segment as SegmentFile
    participant Flat as FlatIndex
    participant Hnsw as HnswIndex
    participant Results as Results

    Client->>Engine: Send VectorQuery (+ optional ef_search)
    Engine->>Planner: build_query_plan(query, filter, &indexes)
    Planner-->>Engine: QueryPlan { source, filter, top_k, search }
    Engine->>SegMgr: iterate segments, prepare per-segment request
    SegMgr->>Segment: prepare search (top_k, filter, ef_search)
    alt search == Flat
        Segment->>Flat: search_flat(request)
        Flat-->>Segment: flat hits
    else search == Hnsw
        Segment->>Hnsw: search_hnsw(request with ef_search)
        Hnsw->>Hnsw: select_entry_point
        Hnsw->>Hnsw: search_layer (greedy exploration)
        Hnsw-->>Segment: hnsw hits
    end
    Segment-->>SegMgr: segment results
    SegMgr-->>Engine: aggregate segment results
    Engine->>Engine: merge, truncate to plan.top_k
    Engine-->>Client: final top-k Results
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

  • Refactor/cleanup #5: Overlapping planner ↔ engine and index-state changes—touches query planning, engine dispatch, and schema/index lifecycle.
  • Feat/storage #4: Related segment/record representation and meta-sync changes (StoredRecord, sync_meta) that intersect this diff’s segment refactor.
  • Add zvec-style binary storage runtime for GarudaDB #3: Related storage/recovery and manifest/schema validation work affecting persisted manifest validation and recovery paths.

Poem

🐰 I hopped through graphs with tiny M,

neighbors pruned and levels blend—
sidecars snug upon the ground,
ef_search guides each graceful bound,
small-world hops bring vectors friend to friend.

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 12.38% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'hnsw' is too vague and generic; it does not convey meaningful information about the changeset's primary purpose or scope. Use a more descriptive title that summarizes the main change, such as 'Implement HNSW index support with vector search integration' or reference the commit message theme: 'Implement HNSW vector indexing and improve graph persistence.'
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/review-slices

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@PhantomInTheWire PhantomInTheWire force-pushed the codex/review-slices branch 4 times, most recently from 6ef03ab to b2ab57f Compare March 22, 2026 18:50

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

🐛 1 issue in files not directly in the diff

🐛 Checkpoint rollback does not capture HNSW sidecar files, risking data corruption on failed checkpoint (crates/garuda-engine/src/checkpoint_service.rs:147-155)

This PR introduces persisted HNSW graph sidecar files (written by write_segment at crates/garuda-segment/src/lib.rs:273-277), but the checkpoint rollback mechanism in capture_checkpoint_files (crates/garuda-engine/src/checkpoint_service.rs:144-183) was not updated to back up these new files. It captures segment_flat_index_path for each segment but never captures segment_hnsw_index_path. If a checkpoint fails after write_all_segments writes HNSW sidecars but before the manifest is committed, the rollback restores segment data and flat sidecars but leaves behind stale/corrupt HNSW sidecar files. On the next reopen, the manifest points to the old state, but the HNSW sidecar may correspond to the failed checkpoint's state, causing a mismatch between segment records and the HNSW graph index.

View 4 additional findings in Devin Review.

Open in Devin Review

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/garuda-storage/src/codec.rs (1)

9-12: ⚠️ Potential issue | 🟠 Major

Wire format change from 3 to 6 fields without version bump breaks backward compatibility with old HNSW manifests.

The HNSW parameter encoding changed from 3 fields (m, ef_construction, ef_search) to 6 fields (max_neighbors, scaling_factor, ef_construction, prune_width, min_neighbor_count, ef_search), but FORMAT_VERSION remained at 1. Any manifests persisted with the old 3-field format will fail to decode because the decoder expects 6 u64 values.

If HNSW is newly introduced and no prior manifests exist, this is acceptable. Otherwise, consider:

  1. Bumping FORMAT_VERSION to 2 with a migration path, or
  2. Adding a sub-version/field count marker within the HNSW tag to handle both layouts.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/garuda-storage/src/codec.rs` around lines 9 - 12, The FORMAT_VERSION
constant was not bumped when the HNSW wire layout changed, causing older 3-field
manifests to break because the decoder now expects 6 u64s; update FORMAT_VERSION
from 1 to 2 and implement a backwards-compatible decode path: in the HNSW
parameter decode logic (where HNSW params are read/parsed) detect the stored
field count or attempt to read 3 u64s first and fall back to 6 u64s (or add an
explicit sub-version/field-count marker in the HNSW tag) so both old (m,
ef_construction, ef_search) and new (max_neighbors, scaling_factor,
ef_construction, prune_width, min_neighbor_count, ef_search) layouts are
supported during migration.
🧹 Nitpick comments (1)
crates/garuda-segment/tests/test_hnsw_sidecar.rs (1)

101-140: Consider consolidating test helpers.

The helper functions temp_root, stored_record, and vector_field are duplicated between test_flat_sidecar.rs and test_hnsw_sidecar.rs. Consider extracting them into a shared common module within the test crate to reduce duplication.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/garuda-segment/tests/test_hnsw_sidecar.rs` around lines 101 - 140, The
three test helper functions temp_root, stored_record, and vector_field are
duplicated; extract them into a shared test helper module (e.g., tests/common)
and have both test files import that module. Move the implementations of
temp_root, stored_record (which constructs StoredRecord/Doc/DocId/InternalDocId
and uses garuda_segment::RecordState), and vector_field (which returns
VectorFieldSchema and references FieldName, VectorDimension, DistanceMetric,
IndexParams) into the common module, export them (pub), and replace the local
definitions in test_hnsw_sidecar.rs and test_flat_sidecar.rs with uses like
common::temp_root, common::stored_record, and common::vector_field (or add mod
common and call directly) so both tests share the same helpers.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/garuda-engine/src/query.rs`:
- Around line 178-194: Both functions currently try to move plan.search out of a
borrowed &QueryPlan; change their matches to borrow the enum instead. In
segment_top_k, replace "match plan.search" with "match &plan.search" and match
on "&SegmentSearchPlan::Flat" and "&SegmentSearchPlan::Hnsw { .. }" (returning
Ok(plan.top_k) unchanged). In segment_index_search, use "match &plan.search" and
patterns "&SegmentSearchPlan::Flat" and "&SegmentSearchPlan::Hnsw { ef_search }"
and, if ef_search is a reference, use *ef_search when constructing
SegmentIndexSearch::Hnsw { ef_search } so you don't move out of plan.search;
this preserves the borrow and fixes the compile error in segment_top_k and
segment_index_search.

In `@crates/garuda-index-hnsw/src/build.rs`:
- Around line 26-35: search_layer returns the retained WorstScoredNode heap as a
worst-first Vec, so using candidates.first() selects the weakest retained node;
change the selection to pick the best retained candidate (e.g., use
candidates.last() or select the element with the maximum score) when setting
entry_point so the next descent uses the strongest retained node (update the
code around search_layer, candidates, and entry_point to pick the max-scored
WorstScoredNode rather than the first element).

In `@crates/garuda-index-hnsw/src/lib.rs`:
- Around line 170-180: The public constructor from_parts currently trusts its
inputs; add validation at the API boundary by making from_parts return a Result
(or convert it to a non-public ctor) and check that graph.node_count() ==
entries.len() and that every HnswBuildEntry vector length equals
config.dimension before constructing Self; if a check fails, return an Err
describing the mismatch so downstream methods like entry(), vector(), and
score_doc() cannot panic or observe mismatched dimensions.

In `@crates/garuda-index-hnsw/src/search.rs`:
- Around line 149-163: The replacement logic when results is full uses only
score >= current_worst.score, which can evict the preferred smaller doc_id on
ties; change the gate so that instead of comparing scores only you use the same
compound ordering used by top_hits (the compare_score_then_doc_id comparator) to
decide whether scored_neighbor should replace the current worst (i.e., use
compare_score_then_doc_id between scored_neighbor and the peeked WorstScoredNode
to allow replacement only when the comparator says scored_neighbor is strictly
better); update the block that builds candidates/results (references:
scored_node, results, candidate_limit, WorstScoredNode::from) and add a
tied-score regression test that covers equal-score doc_id tie-breaking.

In `@crates/garuda-segment/src/lib.rs`:
- Around line 150-152: mark_persisted currently only sets self.kind =
SegmentKind::Persisted which leaves self.vector_search stale and causes
write_segment to treat the segment as still-writing (removing persisted
sidecars); update mark_persisted to also rebuild or clear vector_search
atomically: either change its signature to accept the vector_field (e.g.,
&VectorField or field id) and rebuild self.vector_search for the persisted form,
or (if vector_field is available on self) construct the persisted vector search
index and assign it to self.vector_search inside mark_persisted so the
SegmentKind and vector_search transition together; reference the mark_persisted
method, SegmentKind, vector_search, write_segment, and vector_field when making
this change.
- Around line 358-382: The match arms that call unreachable!() on plan/index
mismatches (the arms matching VectorSearchState::WritingFlat with
SegmentIndexSearch::Hnsw, VectorSearchState::WritingHnsw with
SegmentIndexSearch::Flat, VectorSearchState::UseFlatIndex with
SegmentIndexSearch::Hnsw, and VectorSearchState::UseHnswIndex with
SegmentIndexSearch::Flat) must return a Status error instead of panicking;
replace each unreachable!() with returning
Err(Status::err(StatusCode::FailedPrecondition, format!(...))) (include a short
message stating the actual VectorSearchState variant and the requested
SegmentIndexSearch variant for context), so the surrounding code that assigns
hits (which uses index.search and search_hnsw) continues to use the Result flow
and the ? operator can propagate the error; reference the VectorSearchState and
SegmentIndexSearch enum variants, the request.index usage, and the functions
index.search and search_hnsw when implementing the messages.
- Around line 298-311: The reload path always calls load_vector_search_state()
which fails for empty persisted segments because write_segment() removes
sidecars when segment.meta.doc_count == 0; change the logic in the reopen branch
(where you currently call load_vector_search_state(root, meta.id, &segment_meta,
&decoded.records, vector_field)) to check if segment_meta.doc_count == 0 and, if
so, construct an in-memory empty index by calling
build_vector_search_state(kind, vector_field, &segment_meta, &decoded.records)
instead of load_vector_search_state; this mirrors write_segment's behavior and
avoids requiring sidecars for empty persisted segments.

In `@crates/garuda-types/src/lib.rs`:
- Around line 679-692: The current loop only checks adjacency length; extend
validation inside the for (level, nodes) in self.levels.iter().enumerate() loop
(the code working with self.levels, neighbor_limits.for_level(), and returning
Status::err) to also reject malformed neighbors: for each neighbor id in
neighbors ensure the id < entry_count, ensure the neighbor actually appears on
the inspected level (or its assigned level <= inspected level depending on
invariant), and ensure no edge points to a node with a higher level than the
source node (i.e., neighbor_level <= node_level); if any of these checks fail
return a Status::err(StatusCode::Internal, ...) similar to the existing message
so HnswGraph::from_parts() will reject stale/malformed sidecars before later
dereferences.

---

Outside diff comments:
In `@crates/garuda-storage/src/codec.rs`:
- Around line 9-12: The FORMAT_VERSION constant was not bumped when the HNSW
wire layout changed, causing older 3-field manifests to break because the
decoder now expects 6 u64s; update FORMAT_VERSION from 1 to 2 and implement a
backwards-compatible decode path: in the HNSW parameter decode logic (where HNSW
params are read/parsed) detect the stored field count or attempt to read 3 u64s
first and fall back to 6 u64s (or add an explicit sub-version/field-count marker
in the HNSW tag) so both old (m, ef_construction, ef_search) and new
(max_neighbors, scaling_factor, ef_construction, prune_width,
min_neighbor_count, ef_search) layouts are supported during migration.

---

Nitpick comments:
In `@crates/garuda-segment/tests/test_hnsw_sidecar.rs`:
- Around line 101-140: The three test helper functions temp_root, stored_record,
and vector_field are duplicated; extract them into a shared test helper module
(e.g., tests/common) and have both test files import that module. Move the
implementations of temp_root, stored_record (which constructs
StoredRecord/Doc/DocId/InternalDocId and uses garuda_segment::RecordState), and
vector_field (which returns VectorFieldSchema and references FieldName,
VectorDimension, DistanceMetric, IndexParams) into the common module, export
them (pub), and replace the local definitions in test_hnsw_sidecar.rs and
test_flat_sidecar.rs with uses like common::temp_root, common::stored_record,
and common::vector_field (or add mod common and call directly) so both tests
share the same helpers.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 69029d51-ea97-43c4-867c-7182cd1777a1

📥 Commits

Reviewing files that changed from the base of the PR and between a186ba8 and 4a340ee.

📒 Files selected for processing (27)
  • crates/garuda-engine/src/lib.rs
  • crates/garuda-engine/src/query.rs
  • crates/garuda-engine/src/recovery_service.rs
  • crates/garuda-engine/src/schema_ddl.rs
  • crates/garuda-engine/src/segment_manager.rs
  • crates/garuda-engine/src/state.rs
  • crates/garuda-engine/tests/test_collection_create_and_open_validation.rs
  • crates/garuda-engine/tests/test_collection_hnsw_behavior.rs
  • crates/garuda-engine/tests/test_collection_index_lifecycle.rs
  • crates/garuda-engine/tests/test_collection_query_modes.rs
  • crates/garuda-index-hnsw/Cargo.toml
  • crates/garuda-index-hnsw/src/build.rs
  • crates/garuda-index-hnsw/src/compare.rs
  • crates/garuda-index-hnsw/src/heap.rs
  • crates/garuda-index-hnsw/src/lib.rs
  • crates/garuda-index-hnsw/src/search.rs
  • crates/garuda-index-hnsw/tests/test_hnsw_index.rs
  • crates/garuda-planner/src/lib.rs
  • crates/garuda-planner/tests/test_query_plan.rs
  • crates/garuda-segment/src/codec.rs
  • crates/garuda-segment/src/lib.rs
  • crates/garuda-segment/tests/test_flat_sidecar.rs
  • crates/garuda-segment/tests/test_hnsw_sidecar.rs
  • crates/garuda-storage/src/codec.rs
  • crates/garuda-storage/src/layout.rs
  • crates/garuda-storage/src/lib.rs
  • crates/garuda-types/src/lib.rs

Comment thread crates/garuda-engine/src/query.rs Outdated
Comment on lines +178 to +194
fn segment_top_k(segment: &garuda_segment::SegmentFile, plan: &QueryPlan) -> Result<TopK, Status> {
match plan.search {
SegmentSearchPlan::Flat => TopK::new(segment.meta.doc_count).map_err(|_| {
Status::err(
StatusCode::Internal,
"queryable segment must have at least one live document",
)
}),
SegmentSearchPlan::Hnsw { .. } => Ok(plan.top_k),
}
}

fn segment_index_search(plan: &QueryPlan) -> SegmentIndexSearch {
match plan.search {
SegmentSearchPlan::Flat => SegmentIndexSearch::Flat,
SegmentSearchPlan::Hnsw { ef_search } => SegmentIndexSearch::Hnsw { ef_search },
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo 'SegmentSearchPlan definition:'
sed -n '12,18p' crates/garuda-planner/src/lib.rs

echo
echo 'By-value matches in query.rs:'
sed -n '178,195p' crates/garuda-engine/src/query.rs

Repository: PhantomInTheWire/garudadb

Length of output: 964


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Find HnswEfSearch definition
echo "=== HnswEfSearch definition ==="
rg "enum HnswEfSearch|struct HnswEfSearch|type HnswEfSearch" -A 3 crates/

# Check if SegmentSearchPlan might be Copy-able
echo
echo "=== Check for Copy trait on SegmentSearchPlan or related types ==="
rg "impl Copy.*SegmentSearchPlan|impl Copy.*HnswEfSearch" crates/

# Try to find if there are compilation tests or results
echo
echo "=== Check for build status or test runs ==="
fd -e rs -x grep -l "compile_fail\|should_panic" {} \; | head -5

Repository: PhantomInTheWire/garudadb

Length of output: 403


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Get the full HnswEfSearch definition with derives
echo "=== Full HnswEfSearch definition ==="
rg -B 5 "pub struct HnswEfSearch" crates/garuda-types/src/lib.rs | head -15

# Check if it has #[derive(Copy)]
echo
echo "=== Check HnswEfSearch derives ==="
rg "HnswEfSearch" crates/garuda-types/src/lib.rs -B 5 | grep -E "derive|Copy|Clone"

# Check if code compiles by looking at Cargo.toml or test status
echo
echo "=== Check workspace structure ==="
fd Cargo.toml | head -10

# Try to see if there are any error messages or build artifacts
echo
echo "=== Check if query.rs has any compilation issues noted ==="
rg "query\.rs" crates/ --type md

Repository: PhantomInTheWire/garudadb

Length of output: 746


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Find QueryPlan definition to see the top_k field
echo "=== QueryPlan definition ==="
rg "struct QueryPlan|pub struct QueryPlan" -A 10 crates/

# Check what top_k is
echo
echo "=== top_k field type in QueryPlan ==="
rg "top_k:" crates/ -B 2 -A 1

# Check if there are any test runs that would confirm compilation
echo
echo "=== Look for tests or compilation runs ==="
fd "\.rs" crates/garuda-engine/src/query.rs -x wc -l

Repository: PhantomInTheWire/garudadb

Length of output: 5087


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Find TopK definition
echo "=== TopK definition ==="
rg "pub struct TopK|pub enum TopK|type TopK" -A 5 crates/

# Check if TopK is Copy
echo
echo "=== TopK derives ==="
rg "TopK" crates/garuda-types/src/lib.rs -B 3 | grep -E "derive|pub struct TopK|pub enum TopK"

# Now let's verify the actual compile status by checking if there's a CI/build log
echo
echo "=== Check if there are compilation artifacts or status ==="
git log --oneline -n 5 2>/dev/null || echo "Git not available"

Repository: PhantomInTheWire/garudadb

Length of output: 686


Borrow plan.search instead of moving it out of &QueryPlan.

Both match plan.search sites attempt to move a non-Copy SegmentSearchPlan enum out of a shared borrow, which is a compile error. Match on &plan.search instead.

Suggested fix
 fn segment_top_k(segment: &garuda_segment::SegmentFile, plan: &QueryPlan) -> Result<TopK, Status> {
-    match plan.search {
+    match &plan.search {
         SegmentSearchPlan::Flat => TopK::new(segment.meta.doc_count).map_err(|_| {
             Status::err(
                 StatusCode::Internal,
                 "queryable segment must have at least one live document",
             )
         }),
-        SegmentSearchPlan::Hnsw { .. } => Ok(plan.top_k),
+        SegmentSearchPlan::Hnsw { .. } => Ok(plan.top_k),
     }
 }
 
 fn segment_index_search(plan: &QueryPlan) -> SegmentIndexSearch {
-    match plan.search {
+    match &plan.search {
         SegmentSearchPlan::Flat => SegmentIndexSearch::Flat,
         SegmentSearchPlan::Hnsw { ef_search } => SegmentIndexSearch::Hnsw { ef_search },
     }
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/garuda-engine/src/query.rs` around lines 178 - 194, Both functions
currently try to move plan.search out of a borrowed &QueryPlan; change their
matches to borrow the enum instead. In segment_top_k, replace "match
plan.search" with "match &plan.search" and match on "&SegmentSearchPlan::Flat"
and "&SegmentSearchPlan::Hnsw { .. }" (returning Ok(plan.top_k) unchanged). In
segment_index_search, use "match &plan.search" and patterns
"&SegmentSearchPlan::Flat" and "&SegmentSearchPlan::Hnsw { ef_search }" and, if
ef_search is a reference, use *ef_search when constructing
SegmentIndexSearch::Hnsw { ef_search } so you don't move out of plan.search;
this preserves the borrow and fixes the compile error in segment_top_k and
segment_index_search.

Comment on lines +26 to +35
let candidates = self.search_layer(
level,
entry_point,
self.vector(node),
self.config.build_candidate_limit(level),
);

if let Some(best_candidate) = candidates.first() {
entry_point = best_candidate.index;
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Choose the best retained candidate before descending.

search_layer() returns the retained WorstScoredNode heap by repeated pop()s, so this vector is worst-first. Using candidates.first() here therefore picks the weakest retained node as the next entry_point, which can send lower-layer search down a much worse path and degrade graph quality.

Suggested fix
-            if let Some(best_candidate) = candidates.first() {
+            if let Some(best_candidate) = candidates.iter().min_by(|left, right| {
+                compare_score_then_doc_id(
+                    left.score,
+                    self.doc_id(left.index),
+                    right.score,
+                    self.doc_id(right.index),
+                )
+            }) {
                 entry_point = best_candidate.index;
             }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/garuda-index-hnsw/src/build.rs` around lines 26 - 35, search_layer
returns the retained WorstScoredNode heap as a worst-first Vec, so using
candidates.first() selects the weakest retained node; change the selection to
pick the best retained candidate (e.g., use candidates.last() or select the
element with the maximum score) when setting entry_point so the next descent
uses the strongest retained node (update the code around search_layer,
candidates, and entry_point to pick the max-scored WorstScoredNode rather than
the first element).

Comment on lines +170 to +180
pub fn from_parts(
config: HnswIndexConfig,
entries: Vec<HnswBuildEntry>,
graph: HnswGraph,
) -> Self {
Self {
config,
entries,
graph,
}
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate from_parts() inputs before exposing this constructor.

This public constructor trusts graph and entries, but later methods assume graph.node_count() == entries.len() and that every entry matches config.dimension. If those drift, entry()/vector() panic on out-of-bounds nodes or score_doc() sees mismatched vectors. Making from_parts() fallible, or keeping it non-public, would enforce the invariant at the API boundary.

Suggested fix
-    pub fn from_parts(
+    pub fn from_parts(
         config: HnswIndexConfig,
         entries: Vec<HnswBuildEntry>,
         graph: HnswGraph,
-    ) -> Self {
-        Self {
+    ) -> Result<Self, Status> {
+        if graph.node_count() != entries.len() {
+            return Err(Status::err(
+                StatusCode::InvalidArgument,
+                "hnsw graph node count does not match entry count",
+            ));
+        }
+        if entries
+            .iter()
+            .any(|entry| entry.vector().len() != config.dimension.get())
+        {
+            return Err(Status::err(
+                StatusCode::InvalidArgument,
+                "hnsw entry dimension does not match index dimension",
+            ));
+        }
+
+        Ok(Self {
             config,
             entries,
             graph,
-        }
+        })
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
pub fn from_parts(
config: HnswIndexConfig,
entries: Vec<HnswBuildEntry>,
graph: HnswGraph,
) -> Self {
Self {
config,
entries,
graph,
}
}
pub fn from_parts(
config: HnswIndexConfig,
entries: Vec<HnswBuildEntry>,
graph: HnswGraph,
) -> Result<Self, Status> {
if graph.node_count() != entries.len() {
return Err(Status::err(
StatusCode::InvalidArgument,
"hnsw graph node count does not match entry count",
));
}
if entries
.iter()
.any(|entry| entry.vector().len() != config.dimension.get())
{
return Err(Status::err(
StatusCode::InvalidArgument,
"hnsw entry dimension does not match index dimension",
));
}
Ok(Self {
config,
entries,
graph,
})
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/garuda-index-hnsw/src/lib.rs` around lines 170 - 180, The public
constructor from_parts currently trusts its inputs; add validation at the API
boundary by making from_parts return a Result (or convert it to a non-public
ctor) and check that graph.node_count() == entries.len() and that every
HnswBuildEntry vector length equals config.dimension before constructing Self;
if a check fails, return an Err describing the mismatch so downstream methods
like entry(), vector(), and score_doc() cannot panic or observe mismatched
dimensions.

Comment thread crates/garuda-index-hnsw/src/search.rs
Comment thread crates/garuda-segment/src/lib.rs Outdated
Comment on lines +150 to +152
pub fn mark_persisted(&mut self) {
self.kind = SegmentKind::Persisted;
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Keep vector_search aligned with kind during promotion.

mark_persisted() only flips kind. If a caller writes immediately after that, write_segment() still sees WritingFlat/WritingHnsw, skips the persisted-sidecar branches, and removes any sidecar files. Reopen then fails for non-empty persisted segments. This transition should rebuild vector_search too, or require vector_field here so the state change is atomic.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/garuda-segment/src/lib.rs` around lines 150 - 152, mark_persisted
currently only sets self.kind = SegmentKind::Persisted which leaves
self.vector_search stale and causes write_segment to treat the segment as
still-writing (removing persisted sidecars); update mark_persisted to also
rebuild or clear vector_search atomically: either change its signature to accept
the vector_field (e.g., &VectorField or field id) and rebuild self.vector_search
for the persisted form, or (if vector_field is available on self) construct the
persisted vector search index and assign it to self.vector_search inside
mark_persisted so the SegmentKind and vector_search transition together;
reference the mark_persisted method, SegmentKind, vector_search, write_segment,
and vector_field when making this change.

Comment thread crates/garuda-segment/src/lib.rs Outdated
Comment on lines +298 to +311
if matches!(kind, SegmentKind::Writing) {
let vector_search =
build_vector_search_state(kind, vector_field, &segment_meta, &decoded.records);

return Ok(SegmentFile {
kind,
meta: segment_meta,
records: decoded.records,
vector_search,
});
}

let vector_search =
load_vector_search_state(root, meta.id, &segment_meta, &decoded.records, vector_field)?;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Do not require sidecars for empty persisted segments.

write_segment() deliberately removes both sidecars when segment.meta.doc_count == 0, but this reload path always calls load_vector_search_state() for persisted segments. A valid empty persisted segment will therefore fail to reopen on a missing sidecar. Rebuild an empty in-memory index instead when segment_meta.doc_count == 0.

Suggested fix
-    let vector_search =
-        load_vector_search_state(root, meta.id, &segment_meta, &decoded.records, vector_field)?;
+    let vector_search = if segment_meta.doc_count == 0 {
+        build_vector_search_state(kind, vector_field, &segment_meta, &decoded.records)
+    } else {
+        load_vector_search_state(root, meta.id, &segment_meta, &decoded.records, vector_field)?
+    };
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if matches!(kind, SegmentKind::Writing) {
let vector_search =
build_vector_search_state(kind, vector_field, &segment_meta, &decoded.records);
return Ok(SegmentFile {
kind,
meta: segment_meta,
records: decoded.records,
vector_search,
});
}
let vector_search =
load_vector_search_state(root, meta.id, &segment_meta, &decoded.records, vector_field)?;
if matches!(kind, SegmentKind::Writing) {
let vector_search =
build_vector_search_state(kind, vector_field, &segment_meta, &decoded.records);
return Ok(SegmentFile {
kind,
meta: segment_meta,
records: decoded.records,
vector_search,
});
}
let vector_search = if segment_meta.doc_count == 0 {
build_vector_search_state(kind, vector_field, &segment_meta, &decoded.records)
} else {
load_vector_search_state(root, meta.id, &segment_meta, &decoded.records, vector_field)?
};
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/garuda-segment/src/lib.rs` around lines 298 - 311, The reload path
always calls load_vector_search_state() which fails for empty persisted segments
because write_segment() removes sidecars when segment.meta.doc_count == 0;
change the logic in the reopen branch (where you currently call
load_vector_search_state(root, meta.id, &segment_meta, &decoded.records,
vector_field)) to check if segment_meta.doc_count == 0 and, if so, construct an
in-memory empty index by calling build_vector_search_state(kind, vector_field,
&segment_meta, &decoded.records) instead of load_vector_search_state; this
mirrors write_segment's behavior and avoids requiring sidecars for empty
persisted segments.

Comment thread crates/garuda-segment/src/lib.rs Outdated
Comment on lines +358 to +382
let hits = match (&segment.vector_search, request.index) {
(VectorSearchState::WritingFlat(index), SegmentIndexSearch::Flat) => {
index.search(request.metric, request.query_vector, search_top_k)?
}
(VectorSearchState::WritingFlat(_), SegmentIndexSearch::Hnsw { .. }) => {
unreachable!("writing flat search state requires flat execution");
}
(VectorSearchState::WritingHnsw(index), SegmentIndexSearch::Hnsw { ef_search }) => {
search_hnsw(index, request.query_vector, search_top_k, ef_search)?
}
(VectorSearchState::WritingHnsw(_), SegmentIndexSearch::Flat) => {
unreachable!("writing hnsw search state requires hnsw execution");
}
(VectorSearchState::UseFlatIndex(index), SegmentIndexSearch::Flat) => {
index.search(request.metric, request.query_vector, search_top_k)?
}
(VectorSearchState::UseFlatIndex(_), SegmentIndexSearch::Hnsw { .. }) => {
unreachable!("flat vector search state requires flat execution");
}
(VectorSearchState::UseHnswIndex(index), SegmentIndexSearch::Hnsw { ef_search }) => {
search_hnsw(index, request.query_vector, search_top_k, ef_search)?
}
(VectorSearchState::UseHnswIndex(_), SegmentIndexSearch::Flat) => {
unreachable!("hnsw vector search requires hnsw search parameters");
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Return Status on plan/index mismatches instead of panicking.

These unreachable!() arms sit on a public request boundary. A stale planner result or direct API misuse turns a bad request into a process panic instead of a normal error response. Please convert the mismatch cases into Err(Status::err(StatusCode::FailedPrecondition, ...)).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/garuda-segment/src/lib.rs` around lines 358 - 382, The match arms that
call unreachable!() on plan/index mismatches (the arms matching
VectorSearchState::WritingFlat with SegmentIndexSearch::Hnsw,
VectorSearchState::WritingHnsw with SegmentIndexSearch::Flat,
VectorSearchState::UseFlatIndex with SegmentIndexSearch::Hnsw, and
VectorSearchState::UseHnswIndex with SegmentIndexSearch::Flat) must return a
Status error instead of panicking; replace each unreachable!() with returning
Err(Status::err(StatusCode::FailedPrecondition, format!(...))) (include a short
message stating the actual VectorSearchState variant and the requested
SegmentIndexSearch variant for context), so the surrounding code that assigns
hits (which uses index.search and search_hnsw) continues to use the Result flow
and the ? operator can propagate the error; reference the VectorSearchState and
SegmentIndexSearch enum variants, the request.index usage, and the functions
index.search and search_hnsw when implementing the messages.

Comment thread crates/garuda-types/src/lib.rs Outdated
Comment on lines +679 to +692
for (level, nodes) in self.levels.iter().enumerate() {
let max_neighbors = neighbor_limits.for_level(HnswLevel::new(level));

for neighbors in nodes {
if neighbors.len() <= max_neighbors {
continue;
}

return Err(Status::err(
StatusCode::Internal,
"persisted hnsw graph exceeds configured neighbor limit",
));
}
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Strengthen persisted-graph validation.

This loop only enforces max adjacency length. It never rejects neighbor IDs >= entry_count, edges above a node's own node_level, or neighbors that do not exist on the inspected level. crates/garuda-segment/src/codec.rs:167-221 relies on HnswGraph::from_parts() to validate reloaded sidecars, so malformed or stale HNSW sidecars can still pass here and later panic in crates/garuda-index-hnsw/src/lib.rs:198-207 when search dereferences entries[node.get()].

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/garuda-types/src/lib.rs` around lines 679 - 692, The current loop only
checks adjacency length; extend validation inside the for (level, nodes) in
self.levels.iter().enumerate() loop (the code working with self.levels,
neighbor_limits.for_level(), and returning Status::err) to also reject malformed
neighbors: for each neighbor id in neighbors ensure the id < entry_count, ensure
the neighbor actually appears on the inspected level (or its assigned level <=
inspected level depending on invariant), and ensure no edge points to a node
with a higher level than the source node (i.e., neighbor_level <= node_level);
if any of these checks fail return a Status::err(StatusCode::Internal, ...)
similar to the existing message so HnswGraph::from_parts() will reject
stale/malformed sidecars before later dereferences.

devin-ai-integration[bot]

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

🐛 1 issue in files not directly in the diff

🐛 Checkpoint rollback does not capture HNSW sidecar files, risking data corruption on failed checkpoint (crates/garuda-engine/src/checkpoint_service.rs:147-155)

This PR introduces persisted HNSW graph sidecar files (written by write_segment at crates/garuda-segment/src/lib.rs:273-277), but the checkpoint rollback mechanism in capture_checkpoint_files (crates/garuda-engine/src/checkpoint_service.rs:144-183) was not updated to back up these new files. It captures segment_flat_index_path for each segment but never captures segment_hnsw_index_path. If a checkpoint fails after write_all_segments writes HNSW sidecars but before the manifest is committed, the rollback restores segment data and flat sidecars but leaves behind stale/corrupt HNSW sidecar files. On the next reopen, the manifest points to the old state, but the HNSW sidecar may correspond to the failed checkpoint's state, causing a mismatch between segment records and the HNSW graph index.

View 7 additional findings in Devin Review.

Open in Devin Review

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (3)
crates/garuda-index-hnsw/src/search.rs (1)

149-163: ⚠️ Potential issue | 🟠 Major

Replacement logic does not respect doc-id tie-breaking.

The improves_worst check only compares score >= current_worst.score. When scores are equal, a later candidate with a larger doc_id can evict a smaller doc_id that compare_score_then_doc_id would prefer. This contradicts the tie-breaking behavior in top_hits.

Consider using the same compound ordering for replacement:

Suggested fix
                 let scored_neighbor = self.scored_node(query_vector, neighbor);
                 let has_capacity = results.len() < candidate_limit;
-                let improves_worst = results
-                    .peek()
-                    .is_some_and(|current_worst| scored_neighbor.score >= current_worst.score);
+                let improves_worst = results.peek().is_some_and(|current_worst| {
+                    // Use same ordering as top_hits: higher score wins, then lower doc_id
+                    let neighbor_doc_id = self.doc_id(scored_neighbor.index);
+                    let worst_doc_id = self.doc_id(current_worst.index);
+                    compare_score_then_doc_id(
+                        scored_neighbor.score,
+                        neighbor_doc_id,
+                        current_worst.score,
+                        worst_doc_id,
+                    ) == std::cmp::Ordering::Less
+                });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/garuda-index-hnsw/src/search.rs` around lines 149 - 163, The
replacement logic uses only score >= current_worst.score and can evict lower
doc_id ties incorrectly; change the condition that computes improves_worst to
use the same compound ordering used by top_hits (compare_score_then_doc_id) so
ties are broken by doc_id consistently: when evaluating a scored candidate from
scored_node against results.peek(), compare via compare_score_then_doc_id (or
the equivalent comparator used to order WorstScoredNode) and only treat it as
improving the worst when that comparator indicates the new candidate is strictly
better (or not worse according to the same tie-break rules); ensure all places
that push/pop into results (and creation via WorstScoredNode::from) follow that
same comparator logic with candidate_limit.
crates/garuda-engine/src/query.rs (1)

132-151: ⚠️ Potential issue | 🔴 Critical

Borrow plan.search instead of moving it out of &QueryPlan.

The match plan.search on line 132 attempts to move a non-Copy SegmentSearchPlan enum out of a shared borrow &QueryPlan, which will cause a compile error. Match on &plan.search instead.

Suggested fix
-    let hits = match plan.search {
-        SegmentSearchPlan::Flat => search_flat(
+    let hits = match &plan.search {
+        SegmentSearchPlan::Flat => search_flat(
             segment,
             FlatSearchRequest {
                 metric: state.catalog.schema.vector.metric,
                 query_vector,
                 top_k: segment_top_k(segment),
                 filter,
             },
         )?,
-        SegmentSearchPlan::Hnsw { ef_search } => search_hnsw(
+        SegmentSearchPlan::Hnsw { ef_search } => search_hnsw(
             segment,
             HnswSegmentSearchRequest {
                 query_vector,
                 top_k: plan.top_k,
-                ef_search,
+                ef_search: *ef_search,
                 filter,
             },
         )?,
     };
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/garuda-engine/src/query.rs` around lines 132 - 151, The code is moving
`plan.search` out of a borrowed `&QueryPlan`; change `match plan.search` to
`match &plan.search` and update the patterns to match by reference (e.g.,
`SegmentSearchPlan::Flat` -> `SegmentSearchPlan::Flat` becomes
`&SegmentSearchPlan::Flat`, and `SegmentSearchPlan::Hnsw { ef_search }` ->
`&SegmentSearchPlan::Hnsw { ef_search }`), and when building the
`HnswSegmentSearchRequest` pass the dereferenced or cloned `ef_search` (e.g.,
`ef_search: *ef_search` or `ef_search: ef_search.clone()` as appropriate) so you
no longer move out of `&QueryPlan` while still extracting needed values for
`search_flat` and `search_hnsw`.
crates/garuda-types/src/lib.rs (1)

683-695: ⚠️ Potential issue | 🔴 Critical

Reject impossible neighbor references during graph validation.

This loop still only enforces adjacency length. A persisted graph can point to NodeIndex >= entry_count, store edges on a level above the source node’s node_level, or reference a neighbor that does not exist on the inspected level. HnswGraph::from_parts() will currently accept those sidecars, and later search code assumes those invariants hold.

🔒 Tighten the structural checks here
-        for (level, nodes) in self.levels.iter().enumerate() {
-            let max_neighbors = neighbor_limits.for_level(HnswLevel::new(level));
-
-            for neighbors in nodes {
-                if neighbors.len() <= max_neighbors {
-                    continue;
-                }
-
-                return Err(Status::err(
-                    StatusCode::Internal,
-                    "persisted hnsw graph exceeds configured neighbor limit",
-                ));
-            }
-        }
+        for (level_idx, nodes) in self.levels.iter().enumerate() {
+            let level = HnswLevel::new(level_idx);
+            let max_neighbors = neighbor_limits.for_level(level);
+
+            for (node_idx, neighbors) in nodes.iter().enumerate() {
+                let node = NodeIndex::new(node_idx);
+
+                if self.node_level(node).get() < level.get() && !neighbors.is_empty() {
+                    return Err(Status::err(
+                        StatusCode::Internal,
+                        "persisted hnsw graph stores edges above a node's level",
+                    ));
+                }
+
+                if neighbors.len() > max_neighbors {
+                    return Err(Status::err(
+                        StatusCode::Internal,
+                        "persisted hnsw graph exceeds configured neighbor limit",
+                    ));
+                }
+
+                for &neighbor in neighbors {
+                    if neighbor.get() >= entry_count {
+                        return Err(Status::err(
+                            StatusCode::Internal,
+                            "persisted hnsw graph references a missing node",
+                        ));
+                    }
+
+                    if self.node_level(neighbor).get() < level.get() {
+                        return Err(Status::err(
+                            StatusCode::Internal,
+                            "persisted hnsw graph references a node above its level",
+                        ));
+                    }
+                }
+            }
+        }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/garuda-types/src/lib.rs` around lines 683 - 695, The current loop only
enforces adjacency length but must also validate each referenced neighbor is
structurally valid: inside the per-level loop (iterating self.levels.enumerate()
and using neighbor_limits.for_level(HnswLevel::new(level))), for each neighbor
index verify (1) neighbor < entry_count, (2) the source node actually has at
least this level (its node_level >= level), and (3) the neighbor node exists on
this inspected level (neighbor's node_level >= level); return a Status::err with
a clear message if any check fails. Use the existing symbols self.levels,
entry_count, node_level (or node_levels) and HnswLevel::new(level) to locate and
implement these checks before accepting the adjacency list.
🧹 Nitpick comments (2)
crates/garuda-planner/src/lib.rs (1)

50-65: The expect() on hnsw_params() is safe given the match arm, but consider a debug assertion.

When default_kind() returns IndexKind::Hnsw, hnsw_params() should always return Some. The expect() guards against broken invariants but could be clearer.

Alternative: Use if-let or clearer expect message
         IndexKind::Hnsw => SegmentSearchPlan::Hnsw {
             ef_search: query_ef_search.unwrap_or(
                 indexes
                     .hnsw_params()
-                    .expect("hnsw default requires hnsw params")
+                    .expect("VectorIndexState with Hnsw default_kind must have hnsw_params")
                     .ef_search,
             ),
         },
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/garuda-planner/src/lib.rs` around lines 50 - 65, In search_plan,
relying on hnsw_params().expect inside the IndexKind::Hnsw arm hides the
invariant; replace it with a clearer guard such as a debug_assert that
indexes.hnsw_params().is_some() before the match or use if let to unwrap with a
descriptive expect message referencing hnsw_params and IndexKind::Hnsw; update
the SegmentSearchPlan::Hnsw construction in search_plan to use the
safely-unwrapped ef_search (or give a clearer expect text) so the invariant
failure is explicit and debuggable.
crates/garuda-engine/src/query.rs (1)

185-188: Consider preserving Result return type for defensive safety.

The expect() relies on callers skipping empty segments (line 102-104). If this function is called from a different code path without that guard, it will panic. The previous Result<TopK, Status> approach was more defensive.

Alternative: Keep Result-based approach
-fn segment_top_k(segment: &garuda_segment::SegmentFile) -> TopK {
-    TopK::new(segment.meta.doc_count)
-        .expect("queryable segment must have at least one live document")
+fn segment_top_k(segment: &garuda_segment::SegmentFile) -> Result<TopK, Status> {
+    TopK::new(segment.meta.doc_count).map_err(|_| {
+        Status::err(
+            StatusCode::Internal,
+            "queryable segment must have at least one live document",
+        )
+    })
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/garuda-engine/src/query.rs` around lines 185 - 188, Change
segment_top_k to return Result<TopK, Status> instead of unwrapping with expect:
replace TopK::new(...).expect(...) with
TopK::new(segment.meta.doc_count).map_err(|e|
Status::invalid_argument(format!("empty segment: {}", e)))? (or another
appropriate Status variant) and update any callers to propagate or handle the
Result (previous guard that skipped empty segments should still work but callers
must call ?). Reference: function segment_top_k, TopK::new, and Status; ensure
all call sites that assumed a guaranteed TopK are adjusted to handle the Result.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/garuda-segment/src/index.rs`:
- Around line 109-114: The persisted HNSW graph must be validated against the
rebuilt entries: after calling hnsw_build_entries(...) and
decode_hnsw_graph(...), check that graph.node_count() == entries.len() and
reject (return Err or rebuild) if they differ instead of blindly calling
HnswIndex::from_parts(...). Locate the block around hnsw_build_entries,
decode_hnsw_graph, and HnswIndex::from_parts and add a guard that either returns
an error or triggers a full index rebuild when graph.node_count() !=
entries.len() to avoid node-id/entry mismatches.

In `@crates/garuda-types/src/lib.rs`:
- Around line 811-840: Make HnswIndexParams enforce min_neighbor_count <=
max_neighbors at construction: change HnswIndexParams so fields are private and
add a fallible constructor pub fn new(max_neighbors: HnswM, min_neighbor_count:
HnswMinNeighborCount, scaling_factor: HnswScalingFactor, ef_construction:
HnswEfConstruction, prune_width: HnswPruneWidth, ef_search: HnswEfSearch) ->
Result<Self, Status> that validates min <= max and returns an Err(Status) on
violation; update neighbor_config() to use the internal fields and remove
panicking call sites (or keep neighbor_config returning Result as-is), and
ensure deserialization uses the new constructor (implement Deserialize via a
custom visitor or implement TryFrom<Intermediate> and hook it into serde) so
deserializing invalid params fails instead of producing an instance with invalid
invariants.

---

Duplicate comments:
In `@crates/garuda-engine/src/query.rs`:
- Around line 132-151: The code is moving `plan.search` out of a borrowed
`&QueryPlan`; change `match plan.search` to `match &plan.search` and update the
patterns to match by reference (e.g., `SegmentSearchPlan::Flat` ->
`SegmentSearchPlan::Flat` becomes `&SegmentSearchPlan::Flat`, and
`SegmentSearchPlan::Hnsw { ef_search }` -> `&SegmentSearchPlan::Hnsw { ef_search
}`), and when building the `HnswSegmentSearchRequest` pass the dereferenced or
cloned `ef_search` (e.g., `ef_search: *ef_search` or `ef_search:
ef_search.clone()` as appropriate) so you no longer move out of `&QueryPlan`
while still extracting needed values for `search_flat` and `search_hnsw`.

In `@crates/garuda-index-hnsw/src/search.rs`:
- Around line 149-163: The replacement logic uses only score >=
current_worst.score and can evict lower doc_id ties incorrectly; change the
condition that computes improves_worst to use the same compound ordering used by
top_hits (compare_score_then_doc_id) so ties are broken by doc_id consistently:
when evaluating a scored candidate from scored_node against results.peek(),
compare via compare_score_then_doc_id (or the equivalent comparator used to
order WorstScoredNode) and only treat it as improving the worst when that
comparator indicates the new candidate is strictly better (or not worse
according to the same tie-break rules); ensure all places that push/pop into
results (and creation via WorstScoredNode::from) follow that same comparator
logic with candidate_limit.

In `@crates/garuda-types/src/lib.rs`:
- Around line 683-695: The current loop only enforces adjacency length but must
also validate each referenced neighbor is structurally valid: inside the
per-level loop (iterating self.levels.enumerate() and using
neighbor_limits.for_level(HnswLevel::new(level))), for each neighbor index
verify (1) neighbor < entry_count, (2) the source node actually has at least
this level (its node_level >= level), and (3) the neighbor node exists on this
inspected level (neighbor's node_level >= level); return a Status::err with a
clear message if any check fails. Use the existing symbols self.levels,
entry_count, node_level (or node_levels) and HnswLevel::new(level) to locate and
implement these checks before accepting the adjacency list.

---

Nitpick comments:
In `@crates/garuda-engine/src/query.rs`:
- Around line 185-188: Change segment_top_k to return Result<TopK, Status>
instead of unwrapping with expect: replace TopK::new(...).expect(...) with
TopK::new(segment.meta.doc_count).map_err(|e|
Status::invalid_argument(format!("empty segment: {}", e)))? (or another
appropriate Status variant) and update any callers to propagate or handle the
Result (previous guard that skipped empty segments should still work but callers
must call ?). Reference: function segment_top_k, TopK::new, and Status; ensure
all call sites that assumed a guaranteed TopK are adjusted to handle the Result.

In `@crates/garuda-planner/src/lib.rs`:
- Around line 50-65: In search_plan, relying on hnsw_params().expect inside the
IndexKind::Hnsw arm hides the invariant; replace it with a clearer guard such as
a debug_assert that indexes.hnsw_params().is_some() before the match or use if
let to unwrap with a descriptive expect message referencing hnsw_params and
IndexKind::Hnsw; update the SegmentSearchPlan::Hnsw construction in search_plan
to use the safely-unwrapped ef_search (or give a clearer expect text) so the
invariant failure is explicit and debuggable.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2f48fdb2-9ea5-4040-b7a7-e350a40378bf

📥 Commits

Reviewing files that changed from the base of the PR and between 29c718f and 922f595.

📒 Files selected for processing (25)
  • crates/garuda-engine/src/checkpoint_service.rs
  • crates/garuda-engine/src/lib.rs
  • crates/garuda-engine/src/query.rs
  • crates/garuda-engine/src/schema.rs
  • crates/garuda-engine/src/schema_ddl.rs
  • crates/garuda-engine/src/state.rs
  • crates/garuda-engine/tests/common.rs
  • crates/garuda-engine/tests/test_collection_ddl.rs
  • crates/garuda-engine/tests/test_collection_index_lifecycle.rs
  • crates/garuda-index-hnsw/src/lib.rs
  • crates/garuda-index-hnsw/src/search.rs
  • crates/garuda-index-hnsw/src/search_tests.rs
  • crates/garuda-index-hnsw/tests/test_hnsw_index.rs
  • crates/garuda-planner/src/lib.rs
  • crates/garuda-planner/tests/test_query_plan.rs
  • crates/garuda-segment/src/codec.rs
  • crates/garuda-segment/src/index.rs
  • crates/garuda-segment/src/lib.rs
  • crates/garuda-segment/src/search.rs
  • crates/garuda-segment/src/storage.rs
  • crates/garuda-segment/tests/test_flat_sidecar.rs
  • crates/garuda-segment/tests/test_hnsw_sidecar.rs
  • crates/garuda-storage/src/codec.rs
  • crates/garuda-types/src/lib.rs
  • garuda/src/main.rs
✅ Files skipped from review due to trivial changes (3)
  • crates/garuda-planner/tests/test_query_plan.rs
  • crates/garuda-index-hnsw/tests/test_hnsw_index.rs
  • crates/garuda-index-hnsw/src/lib.rs
🚧 Files skipped from review as they are similar to previous changes (4)
  • crates/garuda-engine/src/state.rs
  • crates/garuda-segment/src/search.rs
  • crates/garuda-segment/src/storage.rs
  • crates/garuda-engine/src/schema_ddl.rs

Comment thread crates/garuda-segment/src/index.rs
Comment thread crates/garuda-types/src/lib.rs Outdated
Comment on lines +811 to +840
#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)]
pub struct HnswIndexParams {
pub m: usize,
pub ef_construction: usize,
pub ef_search: usize,
pub max_neighbors: HnswM,
pub scaling_factor: HnswScalingFactor,
pub ef_construction: HnswEfConstruction,
pub prune_width: HnswPruneWidth,
pub min_neighbor_count: HnswMinNeighborCount,
pub ef_search: HnswEfSearch,
}

impl Default for HnswIndexParams {
fn default() -> Self {
Self {
m: 16,
ef_construction: 200,
ef_search: 64,
max_neighbors: HnswM::new(16).expect("default hnsw max_neighbors should be valid"),
scaling_factor: HnswScalingFactor::new(50)
.expect("default hnsw scaling_factor should be valid"),
ef_construction: HnswEfConstruction::new(200)
.expect("default hnsw ef_construction should be valid"),
prune_width: HnswPruneWidth::new(16).expect("default hnsw prune_width should be valid"),
min_neighbor_count: HnswMinNeighborCount::new(8)
.expect("default hnsw min_neighbor_count should be valid"),
ef_search: HnswEfSearch::new(64).expect("default hnsw ef_search should be valid"),
}
}
}

impl HnswIndexParams {
pub fn neighbor_config(&self) -> Result<HnswNeighborConfig, Status> {
HnswNeighborConfig::new(self.max_neighbors, self.min_neighbor_count)
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Seal HnswIndexParams invariants at construction time.

The individual wrapper types validate their own ranges, but HnswIndexParams can still be created or deserialized with min_neighbor_count > max_neighbors because every field is public. The first hard check is neighbor_config(), and crates/garuda-segment/src/index.rs currently unwraps that with expect(...), so a bad manifest crashes the process instead of returning Status.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/garuda-types/src/lib.rs` around lines 811 - 840, Make HnswIndexParams
enforce min_neighbor_count <= max_neighbors at construction: change
HnswIndexParams so fields are private and add a fallible constructor pub fn
new(max_neighbors: HnswM, min_neighbor_count: HnswMinNeighborCount,
scaling_factor: HnswScalingFactor, ef_construction: HnswEfConstruction,
prune_width: HnswPruneWidth, ef_search: HnswEfSearch) -> Result<Self, Status>
that validates min <= max and returns an Err(Status) on violation; update
neighbor_config() to use the internal fields and remove panicking call sites (or
keep neighbor_config returning Result as-is), and ensure deserialization uses
the new constructor (implement Deserialize via a custom visitor or implement
TryFrom<Intermediate> and hook it into serde) so deserializing invalid params
fails instead of producing an instance with invalid invariants.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/garuda-types/src/lib.rs (1)

15-33: ⚠️ Potential issue | 🟠 Major

Zero VectorDimension and TopK can bypass validation during deserialization.

new() rejects zero, but derived Deserialize accepts it. This allows invalid schema/query payloads to construct VectorDimension(0) or TopK(0) without hitting the validation guard.

Implement a custom Deserialize using #[serde(try_from = "usize")] or a manual Visitor that calls new(), so deserialization fails if the value is invalid.

Also applies to: lines 35-53 (TopK)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/garuda-types/src/lib.rs` around lines 15 - 33, The derived Deserialize
allows zero values to bypass the validation in VectorDimension::new (same for
TopK), so replace the auto-derived Deserialize by using serde(try_from =
"usize") on the VectorDimension and TopK structs and implement TryFrom<usize>
for each to call their existing new() constructor (returning an error mapped to
a serde::de::Error). Remove or keep Serialize derive but ensure Deserialize is
routed through TryFrom so deserialization fails for zero values; reference
VectorDimension, TopK, and the new() constructors and get() accessors when
updating the types.
♻️ Duplicate comments (2)
crates/garuda-segment/src/index.rs (2)

59-73: ⚠️ Potential issue | 🟠 Major

Propagate invalid HNSW params instead of panicking.

neighbor_config().expect(...) turns bad persisted schema/config into a process abort during segment open/build. This path already uses Status everywhere else, so invalid params should be returned the same way. The build/load callers can then bubble the failure with ?.

Suggested shape
-pub(crate) fn hnsw_index_config(
+pub(crate) fn hnsw_index_config(
     vector_field: &VectorFieldSchema,
     params: &garuda_types::HnswIndexParams,
-) -> HnswIndexConfig {
-    HnswIndexConfig::new(
+) -> Result<HnswIndexConfig, Status> {
+    Ok(HnswIndexConfig::new(
         vector_field.dimension,
         vector_field.metric,
         HnswBuildConfig::new(
-            params.neighbor_config().expect("validated hnsw params"),
+            params.neighbor_config()?,
             params.scaling_factor,
             params.ef_construction,
             params.prune_width,
         ),
-    )
+    ))
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/garuda-segment/src/index.rs` around lines 59 - 73, hnsw_index_config
currently calls params.neighbor_config().expect(...) which panics on invalid
persisted params; change hnsw_index_config to return a Result (e.g.,
Result<HnswIndexConfig, Status>) and propagate the neighbor_config error instead
of panicking by mapping the Option/Result from params.neighbor_config() into an
Err(Status::...) with a clear message, then construct and return
Ok(HnswIndexConfig::new(...)) using HnswBuildConfig::new; update callers that
open/build segments to use the ? operator so failures bubble properly.

98-121: ⚠️ Potential issue | 🟠 Major

Count equality does not prove this sidecar matches the current entries.

Matching graph.node_count() and entries.len() only proves arity. If the segment is rewritten with a different live entry set but the same count, from_parts(...) still pairs the old graph with new entries, so the graph is no longer built for the vectors/doc IDs it is traversing and HNSW results can drift silently after reopen.

Persist and validate an entry fingerprint (at least doc-id order, ideally doc-id plus vector hash), or rebuild when the live set changes. Please add a same-count mutation regression test here as well.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/garuda-segment/src/index.rs` around lines 98 - 121, The current
load_hnsw_search_state only checks node_count() vs entries.len(), which allows a
same-count-but-different-live-set mismatch; change load_hnsw_search_state to
validate a persisted fingerprint from the on-disk graph (e.g., doc-id order hash
or doc-id+vector-hash) against a fingerprint computed from
hnsw_build_entries(records, ...) before calling HnswIndex::from_parts; if the
fingerprint is missing or mismatches, treat the sidecar as stale and rebuild
(i.e., return Ok(None> or trigger rebuild flow) instead of loading, and ensure
decode_hnsw_graph/segment_hnsw_index_path storage format is extended to include
that fingerprint; also add a regression test that creates two segments with
identical doc counts but different doc-id orders/vectors and asserts that
load_hnsw_search_state rejects the stale graph and rebuilds rather than silently
loading mismatched parts.
🧹 Nitpick comments (1)
crates/garuda-types/src/hnsw.rs (1)

390-411: Methods may panic on empty graphs created via new().

max_level() (line 395) and entry_point() (line 403) will panic if called on an empty graph. While from_parts() validates before returning, new() allows creating an empty graph (with zero node_levels) that would panic when these methods are called.

If empty graphs are intentionally unsupported at runtime, consider adding validation in new() or documenting the precondition. Otherwise, these methods could return Option types.

💡 Option 1: Add precondition check in `new()`
 pub fn new(node_levels: Vec<HnswLevel>) -> Self {
+    assert!(!node_levels.is_empty(), "hnsw graph must have at least one node");
     let level_count = max(&node_levels).map_or(1, |level| level.get() + 1);
💡 Option 2: Return `Option` from accessors
-pub fn max_level(&self) -> HnswLevel {
-    assert!(!self.node_levels.is_empty(), "hnsw graph max level");
-    HnswLevel::new(self.levels.len() - 1)
-}
+pub fn max_level(&self) -> Option<HnswLevel> {
+    if self.node_levels.is_empty() {
+        return None;
+    }
+    Some(HnswLevel::new(self.levels.len() - 1))
+}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/garuda-types/src/hnsw.rs` around lines 390 - 411, The current
accessors max_level() and entry_point() can panic for graphs created by new()
because node_levels may be empty; fix by either (A) adding a precondition in
new() (or from_parts) to validate node_levels is non-empty and return a Result
or panic early, ensuring max_level() and entry_point() remain infallible, or (B)
change max_level() to return Option<HnswLevel> and entry_point() to return
Option<NodeIndex> (and update callers) so empty graphs are handled without
panics; locate functions named new(), from_parts(), max_level(), entry_point(),
and node_levels to implement the chosen approach consistently.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/garuda-types/src/ids.rs`:
- Around line 165-167: The current next() helpers (InternalDocId::next,
SegmentId::next, ManifestVersionId::next, SnapshotId::next) use raw `+ 1` on
their u64 payload and can overflow; change each signature from `fn next(self) ->
Self` to `fn next(self) -> Option<Self>` and implement using
`self.0.checked_add(1).map(|v| Self(v))` (returning None on overflow) so callers
must explicitly handle the overflow case; update any call sites to handle the
Option result.
- Around line 6-32: Add serde-driven validation by annotating the wrapper types
with #[serde(try_from = "String")] (for string-based types: CollectionName,
FieldName, DocId) and #[serde(try_from = "u64")] (for InternalDocId) and
implement the corresponding TryFrom conversions that call the existing
validators (e.g., implement TryFrom<String> for CollectionName that delegates to
CollectionName::parse, TryFrom<String> for FieldName and DocId delegating to
their constructors, and TryFrom<u64> for InternalDocId delegating to its
new/parse method), using Status as the associated Error type so deserialization
routes through your validation logic.

In `@crates/garuda-types/src/query.rs`:
- Around line 45-65: DenseVector currently derives Deserialize which bypasses
validation in parse(); replace the derive-based deserialization by implementing
a custom serde::Deserialize for DenseVector (or annotate the field with
#[serde(deserialize_with = "...")]) so that deserialization reads a Vec<f32>,
performs the same checks as DenseVector::parse (non-empty and all
values.is_finite()), and returns Err(serde::de::Error::custom(...)) on failure;
update/remove the derived Deserialize on DenseVector and ensure places
constructing from QueryVectorSource::Vector now receive a validated DenseVector
via this new deserializer.

---

Outside diff comments:
In `@crates/garuda-types/src/lib.rs`:
- Around line 15-33: The derived Deserialize allows zero values to bypass the
validation in VectorDimension::new (same for TopK), so replace the auto-derived
Deserialize by using serde(try_from = "usize") on the VectorDimension and TopK
structs and implement TryFrom<usize> for each to call their existing new()
constructor (returning an error mapped to a serde::de::Error). Remove or keep
Serialize derive but ensure Deserialize is routed through TryFrom so
deserialization fails for zero values; reference VectorDimension, TopK, and the
new() constructors and get() accessors when updating the types.

---

Duplicate comments:
In `@crates/garuda-segment/src/index.rs`:
- Around line 59-73: hnsw_index_config currently calls
params.neighbor_config().expect(...) which panics on invalid persisted params;
change hnsw_index_config to return a Result (e.g., Result<HnswIndexConfig,
Status>) and propagate the neighbor_config error instead of panicking by mapping
the Option/Result from params.neighbor_config() into an Err(Status::...) with a
clear message, then construct and return Ok(HnswIndexConfig::new(...)) using
HnswBuildConfig::new; update callers that open/build segments to use the ?
operator so failures bubble properly.
- Around line 98-121: The current load_hnsw_search_state only checks
node_count() vs entries.len(), which allows a same-count-but-different-live-set
mismatch; change load_hnsw_search_state to validate a persisted fingerprint from
the on-disk graph (e.g., doc-id order hash or doc-id+vector-hash) against a
fingerprint computed from hnsw_build_entries(records, ...) before calling
HnswIndex::from_parts; if the fingerprint is missing or mismatches, treat the
sidecar as stale and rebuild (i.e., return Ok(None> or trigger rebuild flow)
instead of loading, and ensure decode_hnsw_graph/segment_hnsw_index_path storage
format is extended to include that fingerprint; also add a regression test that
creates two segments with identical doc counts but different doc-id
orders/vectors and asserts that load_hnsw_search_state rejects the stale graph
and rebuilds rather than silently loading mismatched parts.

---

Nitpick comments:
In `@crates/garuda-types/src/hnsw.rs`:
- Around line 390-411: The current accessors max_level() and entry_point() can
panic for graphs created by new() because node_levels may be empty; fix by
either (A) adding a precondition in new() (or from_parts) to validate
node_levels is non-empty and return a Result or panic early, ensuring
max_level() and entry_point() remain infallible, or (B) change max_level() to
return Option<HnswLevel> and entry_point() to return Option<NodeIndex> (and
update callers) so empty graphs are handled without panics; locate functions
named new(), from_parts(), max_level(), entry_point(), and node_levels to
implement the chosen approach consistently.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 006c3391-8f4b-4f95-96a2-22f4c3ca31c8

📥 Commits

Reviewing files that changed from the base of the PR and between 922f595 and 2bfd8b1.

📒 Files selected for processing (12)
  • crates/garuda-index-hnsw/src/build.rs
  • crates/garuda-index-hnsw/src/search.rs
  • crates/garuda-index-hnsw/tests/test_hnsw_index.rs
  • crates/garuda-segment/src/index.rs
  • crates/garuda-segment/src/search.rs
  • crates/garuda-segment/tests/test_hnsw_sidecar.rs
  • crates/garuda-types/src/hnsw.rs
  • crates/garuda-types/src/ids.rs
  • crates/garuda-types/src/lib.rs
  • crates/garuda-types/src/query.rs
  • crates/garuda-types/src/schema.rs
  • crates/garuda-types/src/status.rs
🚧 Files skipped from review as they are similar to previous changes (3)
  • crates/garuda-index-hnsw/src/build.rs
  • crates/garuda-segment/src/search.rs
  • crates/garuda-index-hnsw/tests/test_hnsw_index.rs

Comment on lines +6 to +32
#[derive(Debug, Clone, PartialEq, Eq, PartialOrd, Ord, Hash, Serialize, Deserialize)]
pub struct CollectionName(String);

impl CollectionName {
pub fn parse(value: impl Into<String>) -> Result<Self, Status> {
let value = value.into();

if value.is_empty() {
return Err(Status::err(
StatusCode::InvalidArgument,
"collection name cannot be empty",
));
}

let valid = value
.chars()
.all(|ch| ch.is_ascii_alphanumeric() || ch == '_' || ch == '-');

if !valid {
return Err(Status::err(
StatusCode::InvalidArgument,
"collection name may only use letters, numbers, '_' and '-'",
));
}

Ok(Self(value))
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

In Serde, does #[derive(Deserialize)]for newtype/tuple structs call custom constructors likeparse()/new(), or does it deserialize the inner field directly? What is the idiomatic way to force validation via serde(try_from)or a manualDeserialize impl?

💡 Result:

In Serde, #[derive(Deserialize)] on newtype structs (e.g., struct Foo(usize);) and tuple structs (e.g., struct Foo(usize, bool);) deserializes the inner field(s) directly. It does not call custom constructors like parse or new. The derive macro generates code that treats newtypes as newtype structs (deserializing the inner type transparently) and tuple structs as tuples (deserializing each field in order), bypassing any user-defined constructors to directly construct the struct with the deserialized inner values. This can break invariants enforced by constructors, as noted in community discussions. The idiomatic way to force validation is using the container attribute #[serde(try_from = "FromType")] on the newtype/tuple struct. This deserializes into an intermediate FromType (which derives Deserialize and has no invariants), then calls TryFrom for the type (requiring TryFrom::Error: Display). Validation occurs in the TryFrom impl (e.g., calling new or custom checks). For structs with cross-field validation, use a similar unchecked struct pattern. Manual Deserialize impls are an alternative for complex cases but less idiomatic than #[serde(try_from)] for validation. Crates like serde_with (e.g., TryFromInto) or nutype provide helpers, but core Serde's try_from is standard. Example for newtype: #[derive(Deserialize)] #[serde(try_from = "String")] pub struct Email(String); impl TryFrom for Email { type Error = String; fn try_from(value: String) -> Result<Self, Self::Error> { // validation logic, e.g., Email::new(value) } } For tuple structs, use an unchecked tuple struct similarly.

Citations:


🏁 Script executed:

cat -n crates/garuda-types/src/ids.rs | head -200

Repository: PhantomInTheWire/garudadb

Length of output: 6150


Add validation to derived Deserialize using #[serde(try_from)].

These types enforce validation through parse() / new() constructors, but #[derive(Deserialize)] deserializes the inner field directly, bypassing those validators. This allows invalid states (empty strings, zero IDs) to be constructed from deserialized data.

Use #[serde(try_from = "InnerType")] with a corresponding TryFrom implementation to route deserialization through the existing validators:

#[derive(Serialize)]
#[serde(try_from = "String")]
pub struct CollectionName(String);

impl TryFrom<String> for CollectionName {
    type Error = Status;
    fn try_from(value: String) -> Result<Self, Self::Error> {
        CollectionName::parse(value)
    }
}

Apply to all three string-based types (CollectionName, FieldName, DocId) and the numeric InternalDocId.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/garuda-types/src/ids.rs` around lines 6 - 32, Add serde-driven
validation by annotating the wrapper types with #[serde(try_from = "String")]
(for string-based types: CollectionName, FieldName, DocId) and #[serde(try_from
= "u64")] (for InternalDocId) and implement the corresponding TryFrom
conversions that call the existing validators (e.g., implement TryFrom<String>
for CollectionName that delegates to CollectionName::parse, TryFrom<String> for
FieldName and DocId delegating to their constructors, and TryFrom<u64> for
InternalDocId delegating to its new/parse method), using Status as the
associated Error type so deserialization routes through your validation logic.

Comment on lines +165 to +167
pub fn next(self) -> Self {
Self(self.0 + 1)
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

In Rust, what happens when u64 + 1overflows in debug vs release builds? For monotonic ID counters, ischecked_add the idiomatic way to avoid wraparound?

💡 Result:

In Rust, for a u64 + 1 operation that overflows (e.g., u64::MAX + 1): - In debug builds, it panics at runtime due to overflow checks. - In release builds, it wraps around using two's complement semantics (resulting in 0 for u64::MAX + 1), as overflow checks are disabled for performance. For monotonic ID counters, checked_add is the idiomatic way to explicitly handle potential overflow and avoid unintended wraparound. It returns Some(new_value) if no overflow, or None if overflow would occur, allowing you to handle the error (e.g., via expect, unwrap_or, or pattern matching). Relying on debug panics or release wrapping is not recommended, as it leads to inconsistent behavior across builds. Use wrapping_add only if wraparound is explicitly desired.

Citations:


🏁 Script executed:

cat -n crates/garuda-types/src/ids.rs | head -250

Repository: PhantomInTheWire/garudadb

Length of output: 6980


Fix overflow handling in next() methods for monotonic ID counters.

These helper methods use raw + 1 on u64 fields, which panics in debug builds and wraps to 0 in release builds when reaching u64::MAX. This causes IDs to wrap back and duplicate.

Replace with checked_add(1) returning Option<Self> to explicitly handle overflow:

Locations to fix
  • InternalDocId::next() (lines 165-167)
  • SegmentId::next() (lines 182-184)
  • ManifestVersionId::next() (lines 199-201)
  • SnapshotId::next() (lines 216-218)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/garuda-types/src/ids.rs` around lines 165 - 167, The current next()
helpers (InternalDocId::next, SegmentId::next, ManifestVersionId::next,
SnapshotId::next) use raw `+ 1` on their u64 payload and can overflow; change
each signature from `fn next(self) -> Self` to `fn next(self) -> Option<Self>`
and implement using `self.0.checked_add(1).map(|v| Self(v))` (returning None on
overflow) so callers must explicitly handle the overflow case; update any call
sites to handle the Option result.

Comment on lines +45 to +65
#[derive(Debug, Clone, PartialEq, Serialize, Deserialize, Default)]
pub struct DenseVector(Vec<f32>);

impl DenseVector {
pub fn parse(values: Vec<f32>) -> Result<Self, Status> {
if values.is_empty() {
return Err(Status::err(
StatusCode::InvalidArgument,
"vector cannot be empty",
));
}

if values.iter().any(|value| !value.is_finite()) {
return Err(Status::err(
StatusCode::InvalidArgument,
"vector values must be finite",
));
}

Ok(Self(values))
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

In Serde/Rust, if struct DenseVector(Vec)derivesDefaultandDeserialize, will that bypass a separate validating constructor like DenseVector::parse? What is the idiomatic way to deserialize a vector type while rejecting empty or non-finite values?

💡 Result:

No, deriving Default and Deserialize on struct DenseVector(Vec) will not bypass a separate validating constructor like DenseVector::parse. The Deserialize derive generates code that deserializes directly into the Vec field using Serde's standard Deserialize impl for Vec, which accepts empty vectors and any valid f32 values (including NaN and Inf). It does not call any custom constructor like parse. A separate constructor is only used if you manually implement Deserialize to invoke it or use #[serde(from = "Vec")], but even then validation would need to be integrated. The idiomatic way to deserialize while rejecting empty or non-finite values is to use a custom deserializer with #[serde(deserialize_with = "...")] on a newtype wrapper struct, or implement Deserialize manually for DenseVector. For example: use serde::{Deserialize, Deserializer, de::{self, Visitor, SeqAccess}}; use std::fmt; #[derive(Debug)] pub struct DenseVector(Vec); impl<'de> Deserialize<'de> for DenseVector { fn deserialize(deserializer: D) -> Result<Self, D::Error> where D: Deserializer<'de>, { struct VecVisitor; impl<'de> Visitor<'de> for VecVisitor { type Value = Vec; fn expecting(&self, formatter: &mut fmt::Formatter) -> fmt::Result { formatter.write_str("array of finite f32") } fn visit_seq(self, mut seq: S) -> Result<Vec, S::Error> where S: SeqAccess<'de>, { let mut vec = Vec::new; while let Some(v) = seq.next_element? { if v.is_nan || v.is_infinite { return Err(de::Error::custom("non-finite f32 value")); } vec.push(v); } if vec.is_empty { return Err(de::Error::custom("empty vector")); } Ok(vec) } } deserializer.deserialize_seq(VecVisitor).map(DenseVector) } } This rejects empty arrays and non-finite f32 values during deserialization, per Serde docs on custom deserializers (serde.rs/stream-array.html, field-attrs.html) and StackOverflow examples for non-empty Vec (result 6) and float validation (results 11-12). Libraries like serde_valid offer additional validation options.

Citations:


🏁 Script executed:

cat -n crates/garuda-types/src/query.rs | sed -n '40,70p'

Repository: PhantomInTheWire/garudadb

Length of output: 1002


🏁 Script executed:

cat -n crates/garuda-engine/src/query.rs | sed -n '68,92p'

Repository: PhantomInTheWire/garudadb

Length of output: 1022


🏁 Script executed:

rg "DenseVector::default" --type rust

Repository: PhantomInTheWire/garudadb

Length of output: 367


🏁 Script executed:

rg "QueryVectorSource" --type rust -A 3 -B 1

Repository: PhantomInTheWire/garudadb

Length of output: 3739


DenseVector can be deserialized with invalid states, bypassing validation.

parse() enforces empty and non-finite vector rejection, but the derived Deserialize impl bypasses this entirely. When DenseVector is deserialized from request payloads (via QueryVectorSource::Vector), malformed vectors pass through unchecked. This data can then be cloned directly into query execution without re-validation.

Implement a custom Deserialize impl for DenseVector that applies the same validation logic from parse(). Alternatively, use #[serde(deserialize_with = "...")] with a validating deserializer. Do not rely on parse() being called during deserialization—it will not be.

Note: Default is actively used elsewhere in the codebase and is a separate concern; focus the fix on deserialization validation.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/garuda-types/src/query.rs` around lines 45 - 65, DenseVector currently
derives Deserialize which bypasses validation in parse(); replace the
derive-based deserialization by implementing a custom serde::Deserialize for
DenseVector (or annotate the field with #[serde(deserialize_with = "...")]) so
that deserialization reads a Vec<f32>, performs the same checks as
DenseVector::parse (non-empty and all values.is_finite()), and returns
Err(serde::de::Error::custom(...)) on failure; update/remove the derived
Deserialize on DenseVector and ensure places constructing from
QueryVectorSource::Vector now receive a validated DenseVector via this new
deserializer.

@PhantomInTheWire PhantomInTheWire merged commit 3261d51 into master Mar 22, 2026
2 checks passed
@PhantomInTheWire PhantomInTheWire deleted the codex/review-slices branch April 2, 2026 23:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant