Feat/storage#4
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughReplaces primitive id/deletion storage with Changes
Sequence DiagramsequenceDiagram
participant Client as Client
participant API as Collection API
participant Parser as Query Parser
participant Eval as garuda-meta::Evaluator
participant State as CollectionState
participant Index as DeleteStore/Index
Client->>API: delete_by_filter(raw_filter)
API->>Parser: parse_required_filter(raw_filter, schema)
Parser-->>API: FilterExpr
API->>State: all_live_records()
loop per live record
State->>Eval: evaluate_filter(expr, record.fields)
Eval-->>State: bool (match?)
alt match
State-->>API: collect doc_id
end
end
API->>State: apply_delete_batch(collected_doc_ids)
loop per doc_id
State->>State: append WalOp::Delete
State->>State: mark record.state = Deleted
State->>Index: delete_store.insert(internal_id)
end
State-->>API: WriteResults
API-->>Client: Ok / Err
Estimated Code Review Effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly Related PRs
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
0a90786 to
f72bb04
Compare
💡 Codex Reviewgarudadb/crates/garuda-engine/src/bootstrap.rs Lines 50 to 51 in 0a90786
garudadb/crates/garuda-storage/src/io.rs Lines 57 to 58 in 0a90786
garudadb/crates/garuda-storage/src/io.rs Lines 54 to 58 in 0a90786
ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
f72bb04 to
a126386
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/garuda-engine/src/bootstrap.rs (1)
46-63:⚠️ Potential issue | 🟠 MajorRebuild the derived stores after WAL replay.
id_mapanddelete_storeare restored from snapshots here, butCollectionState::apply_wal_opincrates/garuda-engine/src/state.rsLines 29-31 can skip redundant WAL entries without callingfinish_mutation(). If the loaded writing segment already contains mutations newer than the snapshots, recovery returns stale indexes even though the segment data is present.🩹 Proposed fix
let mut state = CollectionState { path, manifest, persisted_segments, writing_segment, id_map, delete_store, }; replay_writing_segment_wal(&mut state)?; + state.rebuild_indexes(); Ok(state)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/garuda-engine/src/bootstrap.rs` around lines 46 - 63, The code restores id_map and delete_store from snapshots before calling replay_writing_segment_wal, which can leave indexes stale if the writing segment contains newer mutations; after replay_writing_segment_wal(&mut state) you must rebuild the derived stores so they reflect replayed WAL entries: invoke the appropriate rebuild routines (e.g. call IdMap::rebuild_from_segments or a similar function using state.persisted_segments and state.writing_segment to recreate id_map, and call DeleteStore::rebuild_from_segments or equivalent to recreate delete_store) or iterate segments to reapply mutations and call finish_mutation() where needed so CollectionState.id_map and CollectionState.delete_store are reconstructed after WAL replay (refer to CollectionState, apply_wal_op, replay_writing_segment_wal, finish_mutation, IdMap, and DeleteStore).
🧹 Nitpick comments (4)
crates/garuda-engine/tests/test_collection_recovery_crash_semantics.rs (1)
229-229: Avoid hardcoding WAL segment directory"0"in testsUsing
"0"directly makes the tests brittle if the writing segment id changes. Prefer deriving the path fromWRITING_SEGMENT_IDto keep intent and behavior aligned with engine constants.♻️ Proposed refactor
- let wal_path = collection.path().join("0").join("data.wal"); + let wal_path = collection + .path() + .join(WRITING_SEGMENT_ID.to_string()) + .join("data.wal");Also applies to: 251-251
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/garuda-engine/tests/test_collection_recovery_crash_semantics.rs` at line 229, Avoid hardcoding the "0" WAL segment directory: replace the literal "0" used when building wal_path (and the other occurrence) with the engine constant WRITING_SEGMENT_ID converted to a string (e.g. WRITING_SEGMENT_ID.to_string()) and ensure WRITING_SEGMENT_ID is imported into the test so the path is derived from the canonical constant rather than a hardcoded value.crates/garuda-storage/src/snapshots.rs (1)
46-49: Canonicalize delete snapshot payload by removing duplicatesAfter widening input on Line 46, duplicate IDs can now be encoded. Since delete state is logically set-based, deduping after sort keeps snapshot bytes canonical and avoids unnecessary growth/churn.
Suggested fix
pub fn write_delete_snapshot( root: &Path, snapshot_id: SnapshotId, deleted_doc_ids: impl IntoIterator<Item = u64>, ) -> Result<(), Status> { let mut ids = deleted_doc_ids.into_iter().collect::<Vec<_>>(); ids.sort_unstable(); + ids.dedup(); let bytes = encode_delete_snapshot(&ids)?; write_file_atomically(&delete_snapshot_path(root, snapshot_id), &bytes) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/garuda-storage/src/snapshots.rs` around lines 46 - 49, The collected IDs from deleted_doc_ids can contain duplicates; after collecting into ids and calling ids.sort_unstable() you should canonicalize by removing duplicates (e.g., call ids.dedup()) so the delete snapshot is set-like and produces stable bytes; update the code around the ids variable (the collection, sort_unstable call and subsequent use) to perform deduplication after sorting.crates/garuda-storage/src/layout.rs (1)
148-155: Consider adding a defensive check for empty parent path.
path.parent()returnsSome("")(notNone) for relative single-component paths like"foo". While this is unlikely given that callers always use properly constructed absolute paths, a defensive check would make the function more robust if used in other contexts later.🛡️ Optional defensive fix
fn sync_parent_directory(path: &Path) -> Result<(), Status> { let Some(parent) = path.parent() else { return Ok(()); }; + if parent.as_os_str().is_empty() { + return Ok(()); + } sync_directory(parent) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/garuda-storage/src/layout.rs` around lines 148 - 155, The current sync_parent_directory function treats path.parent() == Some(parent) as always valid, but parent can be an empty path for single-component inputs; update sync_parent_directory to defensively check for an empty parent (e.g. parent.as_os_str().is_empty() or parent.components().next().is_none()) and return Ok(()) in that case before calling sync_directory(parent); keep references to sync_parent_directory, sync_directory and the path.parent() call so the check is added right after obtaining parent.crates/garuda-engine/src/lib.rs (1)
224-243: Consider documenting the TOCTOU behavior.Between collecting matching IDs (line 231) and executing deletions (line 234), the read lock is dropped (line 232). This allows concurrent modifications where:
- Another thread could delete the same documents →
NotFounderrors.- Another thread could insert new matching documents → those are missed.
The current behavior (propagating the first error) is acceptable for best-effort deletion. However, if atomic "delete all matching" semantics are required, consider holding a write lock throughout or documenting this as eventual-consistency behavior.
📝 Optional: Add a doc comment clarifying semantics
+ /// Deletes all documents matching the given filter expression. + /// + /// Note: This operation is not strictly atomic. Documents are matched under a + /// read lock, then deleted under a write lock. Concurrent modifications between + /// these phases may cause some matched documents to be already deleted (returning + /// an error) or new matching documents to be missed. pub fn delete_by_filter(&self, raw_filter: &str) -> Result<(), Status> {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/garuda-engine/src/lib.rs` around lines 224 - 243, The delete_by_filter function has a TOCTOU race: it calls collect_matching_doc_ids after acquiring a read via read_state then drops the state before calling delete(ids), so concurrent modifications (via delete or inserts) can cause NotFound errors or missed docs; either enforce atomicity by holding a write lock for the whole operation (acquire write state instead of read_state and do not drop until after delete) or explicitly document the current eventual-consistency behavior in a doc comment on delete_by_filter mentioning collect_matching_doc_ids, delete, read_state and the dropped state so callers know deletions are best-effort and may see NotFound or miss concurrently-inserted matches.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/garuda-engine/src/state.rs`:
- Around line 34-35: The upsert replay path uses insert_doc(…,
WriteMode::Upsert) which applies schema defaults/normalization but the
redundancy check (live_doc_matches) compares against the raw payload, causing
non-idempotent replays when only defaulted fields differ; to fix, normalize the
incoming WalOp::Upsert document (apply the same schema defaults/normalization
used by insert_doc) before calling live_doc_matches so the comparison is between
normalized forms, and ensure the same change is applied for any other places
that perform upsert redundancy checks (e.g., the other live_doc_matches usage in
the replay branch).
In `@crates/garuda-storage/src/snapshots.rs`:
- Around line 19-22: The new iterator-based API can allow duplicate document IDs
to be written; after collecting and sorting ordered_entries (the entries
variable -> ordered_entries Vec), scan the sorted list for adjacent entries with
identical keys and reject the snapshot by returning an
Err(Status::invalid_argument(...)) that includes the duplicate key before any
encoding/writing occurs (i.e., before the code path that uses HashMap::insert).
This preserves the previous uniqueness guarantee and prevents silent collapse on
read; include the key in the error message for debugging.
---
Outside diff comments:
In `@crates/garuda-engine/src/bootstrap.rs`:
- Around line 46-63: The code restores id_map and delete_store from snapshots
before calling replay_writing_segment_wal, which can leave indexes stale if the
writing segment contains newer mutations; after replay_writing_segment_wal(&mut
state) you must rebuild the derived stores so they reflect replayed WAL entries:
invoke the appropriate rebuild routines (e.g. call IdMap::rebuild_from_segments
or a similar function using state.persisted_segments and state.writing_segment
to recreate id_map, and call DeleteStore::rebuild_from_segments or equivalent to
recreate delete_store) or iterate segments to reapply mutations and call
finish_mutation() where needed so CollectionState.id_map and
CollectionState.delete_store are reconstructed after WAL replay (refer to
CollectionState, apply_wal_op, replay_writing_segment_wal, finish_mutation,
IdMap, and DeleteStore).
---
Nitpick comments:
In `@crates/garuda-engine/src/lib.rs`:
- Around line 224-243: The delete_by_filter function has a TOCTOU race: it calls
collect_matching_doc_ids after acquiring a read via read_state then drops the
state before calling delete(ids), so concurrent modifications (via delete or
inserts) can cause NotFound errors or missed docs; either enforce atomicity by
holding a write lock for the whole operation (acquire write state instead of
read_state and do not drop until after delete) or explicitly document the
current eventual-consistency behavior in a doc comment on delete_by_filter
mentioning collect_matching_doc_ids, delete, read_state and the dropped state so
callers know deletions are best-effort and may see NotFound or miss
concurrently-inserted matches.
In `@crates/garuda-engine/tests/test_collection_recovery_crash_semantics.rs`:
- Line 229: Avoid hardcoding the "0" WAL segment directory: replace the literal
"0" used when building wal_path (and the other occurrence) with the engine
constant WRITING_SEGMENT_ID converted to a string (e.g.
WRITING_SEGMENT_ID.to_string()) and ensure WRITING_SEGMENT_ID is imported into
the test so the path is derived from the canonical constant rather than a
hardcoded value.
In `@crates/garuda-storage/src/layout.rs`:
- Around line 148-155: The current sync_parent_directory function treats
path.parent() == Some(parent) as always valid, but parent can be an empty path
for single-component inputs; update sync_parent_directory to defensively check
for an empty parent (e.g. parent.as_os_str().is_empty() or
parent.components().next().is_none()) and return Ok(()) in that case before
calling sync_directory(parent); keep references to sync_parent_directory,
sync_directory and the path.parent() call so the check is added right after
obtaining parent.
In `@crates/garuda-storage/src/snapshots.rs`:
- Around line 46-49: The collected IDs from deleted_doc_ids can contain
duplicates; after collecting into ids and calling ids.sort_unstable() you should
canonicalize by removing duplicates (e.g., call ids.dedup()) so the delete
snapshot is set-like and produces stable bytes; update the code around the ids
variable (the collection, sort_unstable call and subsequent use) to perform
deduplication after sorting.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 53e1872f-ffb7-4174-82fc-6b6656dc3fd6
📒 Files selected for processing (14)
crates/garuda-engine/src/bootstrap.rscrates/garuda-engine/src/filter.rscrates/garuda-engine/src/lib.rscrates/garuda-engine/src/persistence.rscrates/garuda-engine/src/state.rscrates/garuda-engine/tests/test_collection_create_and_open_validation.rscrates/garuda-engine/tests/test_collection_filter_behavior.rscrates/garuda-engine/tests/test_collection_recovery_crash_semantics.rscrates/garuda-engine/tests/test_storage_id_map.rscrates/garuda-meta/src/lib.rscrates/garuda-segment/src/codec.rscrates/garuda-segment/src/lib.rscrates/garuda-storage/src/layout.rscrates/garuda-storage/src/snapshots.rs
💤 Files with no reviewable changes (1)
- crates/garuda-engine/src/filter.rs
| WalOp::Insert(doc) => self.insert_doc(doc.clone(), WriteMode::Insert), | ||
| WalOp::Upsert(doc) => self.insert_doc(doc.clone(), WriteMode::Upsert), |
There was a problem hiding this comment.
Keep upsert replay idempotency aligned with the normalized write path.
insert_doc() applies schema defaults before it decides what gets stored, but the upsert redundancy check still compares against the raw payload. A replayed upsert that only differs by defaulted fields will delete the current live record and append the same logical document again.
🩹 One way to fix this is to normalize before `live_doc_matches()` compares
fn live_doc_matches(&self, doc: &Doc) -> bool {
- let Some(record) = self.find_live_record(&doc.id) else {
+ let mut normalized = doc.clone();
+ apply_schema_defaults(&self.manifest.schema, &mut normalized);
+
+ let Some(record) = self.find_live_record(&normalized.id) else {
return false;
};
- record.doc == *doc
+ record.doc == normalized
}Also applies to: 62-76
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/garuda-engine/src/state.rs` around lines 34 - 35, The upsert replay
path uses insert_doc(…, WriteMode::Upsert) which applies schema
defaults/normalization but the redundancy check (live_doc_matches) compares
against the raw payload, causing non-idempotent replays when only defaulted
fields differ; to fix, normalize the incoming WalOp::Upsert document (apply the
same schema defaults/normalization used by insert_doc) before calling
live_doc_matches so the comparison is between normalized forms, and ensure the
same change is applied for any other places that perform upsert redundancy
checks (e.g., the other live_doc_matches usage in the replay branch).
| entries: impl IntoIterator<Item = (String, u64)>, | ||
| ) -> Result<(), Status> { | ||
| let mut ordered_entries = entries | ||
| .iter() | ||
| .map(|(doc_id, internal_doc_id)| (doc_id.as_str().to_string(), *internal_doc_id)) | ||
| .collect::<Vec<_>>(); | ||
| let mut ordered_entries = entries.into_iter().collect::<Vec<_>>(); | ||
| ordered_entries.sort_by(|lhs, rhs| lhs.0.cmp(&rhs.0)); |
There was a problem hiding this comment.
Validate duplicate doc IDs before encoding id-map snapshots
Line 19 removes the uniqueness guarantee that &HashMap used to provide. With the new iterator input, duplicate keys can be written, then silently collapsed during read at Line 37 (HashMap::insert), which can drop mappings without error.
Suggested fix
pub fn write_id_map_snapshot(
root: &Path,
snapshot_id: SnapshotId,
entries: impl IntoIterator<Item = (String, u64)>,
) -> Result<(), Status> {
let mut ordered_entries = entries.into_iter().collect::<Vec<_>>();
ordered_entries.sort_by(|lhs, rhs| lhs.0.cmp(&rhs.0));
+ for pair in ordered_entries.windows(2) {
+ if pair[0].0 == pair[1].0 {
+ return Err(Status::err(
+ StatusCode::Internal,
+ format!("duplicate doc_id in id map snapshot input: {}", pair[0].0),
+ ));
+ }
+ }
let bytes = encode_id_map(&ordered_entries)?;
write_file_atomically(&id_map_snapshot_path(root, snapshot_id), &bytes)
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| entries: impl IntoIterator<Item = (String, u64)>, | |
| ) -> Result<(), Status> { | |
| let mut ordered_entries = entries | |
| .iter() | |
| .map(|(doc_id, internal_doc_id)| (doc_id.as_str().to_string(), *internal_doc_id)) | |
| .collect::<Vec<_>>(); | |
| let mut ordered_entries = entries.into_iter().collect::<Vec<_>>(); | |
| ordered_entries.sort_by(|lhs, rhs| lhs.0.cmp(&rhs.0)); | |
| pub fn write_id_map_snapshot( | |
| root: &Path, | |
| snapshot_id: SnapshotId, | |
| entries: impl IntoIterator<Item = (String, u64)>, | |
| ) -> Result<(), Status> { | |
| let mut ordered_entries = entries.into_iter().collect::<Vec<_>>(); | |
| ordered_entries.sort_by(|lhs, rhs| lhs.0.cmp(&rhs.0)); | |
| for pair in ordered_entries.windows(2) { | |
| if pair[0].0 == pair[1].0 { | |
| return Err(Status::err( | |
| StatusCode::Internal, | |
| format!("duplicate doc_id in id map snapshot input: {}", pair[0].0), | |
| )); | |
| } | |
| } | |
| let bytes = encode_id_map(&ordered_entries)?; | |
| write_file_atomically(&id_map_snapshot_path(root, snapshot_id), &bytes) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/garuda-storage/src/snapshots.rs` around lines 19 - 22, The new
iterator-based API can allow duplicate document IDs to be written; after
collecting and sorting ordered_entries (the entries variable -> ordered_entries
Vec), scan the sorted list for adjacent entries with identical keys and reject
the snapshot by returning an Err(Status::invalid_argument(...)) that includes
the duplicate key before any encoding/writing occurs (i.e., before the code path
that uses HashMap::insert). This preserves the previous uniqueness guarantee and
prevents silent collapse on read; include the key in the error message for
debugging.
a126386 to
e230ba6
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e230ba6b10
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| let ids = collect_matching_doc_ids(&state, &filter); | ||
| drop(state); | ||
|
|
||
| for result in self.delete(ids) { |
There was a problem hiding this comment.
Make
delete_by_filter atomic with respect to writers
When the collection has concurrent writers, this two-phase implementation can delete the wrong document set: it snapshots matching DocIds under a read lock, drops the lock, and then resolves those ids again inside delete(). If another thread upserts/reinserts the same id in between, delete_by_filter will delete the replacement row even when it no longer matches the filter, and it can also miss rows that started matching after the read phase. Because Collection is cloneable and concurrent access is already exercised elsewhere in this crate, callers can observe incorrect deletes in production.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
♻️ Duplicate comments (1)
crates/garuda-engine/src/state.rs (1)
242-248:⚠️ Potential issue | 🟠 MajorNormalize before upsert redundancy comparison.
Line 242 compares the raw incoming doc, while
insert_docnormalizes with schema defaults first. This still makes upsert WAL replay non-idempotent when only defaulted fields differ (downstream at Line 54).Suggested fix
fn live_doc_matches(&self, doc: &Doc) -> bool { - let Some(record) = self.find_live_record(&doc.id) else { + let mut normalized = doc.clone(); + apply_schema_defaults(&self.manifest.schema, &mut normalized); + + let Some(record) = self.find_live_record(&normalized.id) else { return false; }; - record.doc == *doc + record.doc == normalized }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/garuda-engine/src/state.rs` around lines 242 - 248, live_doc_matches currently compares the raw incoming Doc against the stored record, causing non-idempotent upsert replay when schema defaults differ; change live_doc_matches to normalize the incoming doc the same way insert_doc does (apply schema defaults/normalization) before comparing to record.doc by calling the normalization routine used in insert_doc (or reuse the same helper used during upsert/insert) so comparisons are made on the canonical, defaulted form; ensure you reference live_doc_matches, insert_doc, and find_live_record when locating where to apply the normalization.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@crates/garuda-engine/src/state.rs`:
- Around line 242-248: live_doc_matches currently compares the raw incoming Doc
against the stored record, causing non-idempotent upsert replay when schema
defaults differ; change live_doc_matches to normalize the incoming doc the same
way insert_doc does (apply schema defaults/normalization) before comparing to
record.doc by calling the normalization routine used in insert_doc (or reuse the
same helper used during upsert/insert) so comparisons are made on the canonical,
defaulted form; ensure you reference live_doc_matches, insert_doc, and
find_live_record when locating where to apply the normalization.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 4e78aa6d-46b3-4236-93f8-29c704dbc743
📒 Files selected for processing (15)
crates/garuda-engine/src/bootstrap.rscrates/garuda-engine/src/filter.rscrates/garuda-engine/src/lib.rscrates/garuda-engine/src/persistence.rscrates/garuda-engine/src/state.rscrates/garuda-engine/tests/test_collection_create_and_open_validation.rscrates/garuda-engine/tests/test_collection_filter_behavior.rscrates/garuda-engine/tests/test_collection_recovery_crash_semantics.rscrates/garuda-engine/tests/test_storage_id_map.rscrates/garuda-meta/src/lib.rscrates/garuda-segment/src/codec.rscrates/garuda-segment/src/lib.rscrates/garuda-storage/src/layout.rscrates/garuda-storage/src/snapshots.rsgaruda/src/main.rs
💤 Files with no reviewable changes (1)
- crates/garuda-engine/src/filter.rs
✅ Files skipped from review due to trivial changes (1)
- crates/garuda-engine/tests/test_collection_filter_behavior.rs
🚧 Files skipped from review as they are similar to previous changes (8)
- crates/garuda-engine/tests/test_collection_create_and_open_validation.rs
- crates/garuda-engine/src/persistence.rs
- crates/garuda-storage/src/layout.rs
- crates/garuda-segment/src/lib.rs
- crates/garuda-segment/src/codec.rs
- crates/garuda-engine/tests/test_storage_id_map.rs
- crates/garuda-engine/src/lib.rs
- crates/garuda-storage/src/snapshots.rs
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b874554a34
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| Commands::Init => { | ||
| println!("{}", root.display()); | ||
| Ok(()) |
There was a problem hiding this comment.
Restore root creation/validation in
init
Because Database::open(&root) is now only called inside create and stats, the init branch no longer creates or validates the workspace root at all; it just prints the path and exits successfully. On a nonexistent or unwritable --root, garuda init now reports success even though the workspace was never initialized, so any automation using init as a provisioning/readiness step will get a false positive and fail later on the first real command.
Useful? React with 👍 / 👎.
b874554 to
005d669
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (2)
garuda/src/main.rs (2)
44-45: Consider centralizing DB open + contextual error text.
Database::open(&root).map_err(|status| status.message)?is duplicated and loses operation context. A small helper would reduce repetition and make failures clearer.♻️ Proposed refactor
fn main() -> Result<(), String> { let Cli { root, command } = Cli::parse(); match command { @@ Commands::Create { name, dimension } => { - let db = Database::open(&root).map_err(|status| status.message)?; + let db = open_db(&root)?; @@ Commands::Stats { name } => { - let db = Database::open(&root).map_err(|status| status.message)?; + let db = open_db(&root)?; let collection = db.open_collection(&name).map_err(|status| status.message)?; @@ } + +fn open_db(root: &PathBuf) -> Result<Database, String> { + Database::open(root) + .map_err(|status| format!("failed to open database at {}: {}", root.display(), status.message)) +}Also applies to: 67-68
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@garuda/src/main.rs` around lines 44 - 45, Centralize the repeated Database::open(&root).map_err(|status| status.message)? pattern by adding a small helper (e.g., open_database_with_context or Database::open_with_context) that calls Database::open(&root), maps the error to a more descriptive message (including operation context like "opening DB at root" or the caller name) and returns the Database or the mapped error; then replace the direct calls in the places that currently call Database::open(&root).map_err(|status| status.message)? (references: Database::open and the CollectionSchema initialization sites) with calls to this new helper to avoid duplication and provide contextual error text.
81-83: Consider returningResultfor defensive programming.
field_namecurrently panics on invalid input, which is safe for the constant values it receives (PRIMARY_KEY_FIELDandVECTOR_FIELD). However, since the function is not exported and could be reused with runtime-provided input in the future, returningResult<FieldName, String>would prevent panics. This is optional but recommended if the function scope expands.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@garuda/src/main.rs` around lines 81 - 83, Change field_name to return a Result<FieldName, String> instead of panicking: replace the current expect-based implementation so it calls FieldName::parse(value) and maps the parse error into a descriptive String (e.g. with map_err or to_string). Update all internal call sites (e.g. places that pass PRIMARY_KEY_FIELD and VECTOR_FIELD into field_name) to handle the Result (unwrap or propagate with ? where appropriate) so existing behavior remains unchanged while preventing a hard panic if field_name is ever called with runtime input.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@garuda/src/main.rs`:
- Around line 44-45: Centralize the repeated
Database::open(&root).map_err(|status| status.message)? pattern by adding a
small helper (e.g., open_database_with_context or Database::open_with_context)
that calls Database::open(&root), maps the error to a more descriptive message
(including operation context like "opening DB at root" or the caller name) and
returns the Database or the mapped error; then replace the direct calls in the
places that currently call Database::open(&root).map_err(|status|
status.message)? (references: Database::open and the CollectionSchema
initialization sites) with calls to this new helper to avoid duplication and
provide contextual error text.
- Around line 81-83: Change field_name to return a Result<FieldName, String>
instead of panicking: replace the current expect-based implementation so it
calls FieldName::parse(value) and maps the parse error into a descriptive String
(e.g. with map_err or to_string). Update all internal call sites (e.g. places
that pass PRIMARY_KEY_FIELD and VECTOR_FIELD into field_name) to handle the
Result (unwrap or propagate with ? where appropriate) so existing behavior
remains unchanged while preventing a hard panic if field_name is ever called
with runtime input.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: e09b6dd7-e14e-4e3d-80e5-ce73870bb888
📒 Files selected for processing (2)
garuda/src/main.rsgaruda/tests/test_cli_contract.rs
💤 Files with no reviewable changes (1)
- garuda/tests/test_cli_contract.rs
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 005d669559
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| root: PathBuf, | ||
| #[command(subcommand)] | ||
| command: Option<Commands>, | ||
| command: Commands, |
There was a problem hiding this comment.
Restore the
doctor/no-arg health-check command
Making the subcommand mandatory here, together with removing Doctor from the enum below, turns both garuda-cli and garuda-cli doctor from successful health checks into clap usage errors. The deleted doctor_should_exit_successfully_for_a_healthy_workspace test in garuda/tests/test_cli_contract.rs shows this was part of the existing CLI contract, so any scripts or readiness probes using the current health-check entrypoint will start failing even when the workspace is fine.
Useful? React with 👍 / 👎.
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
1 similar comment
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
crates/garuda-engine/src/lib.rs (1)
359-374: Consider using iterator combinators for conciseness.The logic is correct. For improved readability, you could simplify with iterator combinators:
♻️ Optional: Use iterator combinators
fn collect_matching_doc_ids( state: &CollectionState, filter: &garuda_types::FilterExpr, ) -> Vec<DocId> { - let mut ids = Vec::new(); - - for record in state.all_live_records() { - if !evaluate_filter(filter, &record.doc.fields) { - continue; - } - - ids.push(record.doc.id); - } - - ids + state + .all_live_records() + .filter(|record| evaluate_filter(filter, &record.doc.fields)) + .map(|record| record.doc.id.clone()) + .collect() }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/garuda-engine/src/lib.rs` around lines 359 - 374, The function collect_matching_doc_ids is using an explicit for loop to filter and collect doc IDs; replace the loop with iterator combinators for conciseness by calling state.all_live_records().into_iter().filter(|record| evaluate_filter(filter, &record.doc.fields)).map(|record| record.doc.id).collect::<Vec<DocId>>() (or equivalent using filter_map) so the logic stays the same but is expressed declaratively; keep references to evaluate_filter, state.all_live_records(), and record.doc.id when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@crates/garuda-engine/src/lib.rs`:
- Around line 359-374: The function collect_matching_doc_ids is using an
explicit for loop to filter and collect doc IDs; replace the loop with iterator
combinators for conciseness by calling
state.all_live_records().into_iter().filter(|record| evaluate_filter(filter,
&record.doc.fields)).map(|record| record.doc.id).collect::<Vec<DocId>>() (or
equivalent using filter_map) so the logic stays the same but is expressed
declaratively; keep references to evaluate_filter, state.all_live_records(), and
record.doc.id when making the change.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: d1f2174e-5281-4f4b-b1c0-39d644dbad10
⛔ Files ignored due to path filters (1)
scorecard.pngis excluded by!**/*.png
📒 Files selected for processing (9)
crates/garuda-engine/src/lib.rscrates/garuda-engine/src/query.rscrates/garuda-engine/src/schema.rscrates/garuda-engine/src/validation.rscrates/garuda-segment/src/codec.rscrates/garuda-segment/src/wal.rscrates/garuda-storage/src/layout.rsgaruda/src/main.rsgaruda/tests/test_cli_contract.rs
✅ Files skipped from review due to trivial changes (1)
- garuda/tests/test_cli_contract.rs
🚧 Files skipped from review as they are similar to previous changes (3)
- crates/garuda-storage/src/layout.rs
- crates/garuda-segment/src/codec.rs
- garuda/src/main.rs
66239de to
2013208
Compare
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
Summary by CodeRabbit
New Features
Bug Fixes
Tests