feat(people): contact resolution + scoring + CNContactStore (A5)#821
feat(people): contact resolution + scoring + CNContactStore (A5)#821jwalin-shah wants to merge 47 commits into
Conversation
Drafts the wedge for OpenHuman: continuous ingestion of Gmail/Calendar/Slack/ iMessage into a local personal_index.db, with a Today brief and retrieval- tuned chat as the two surfaces. Three composable privacy modes (Convenience / Hybrid / Fully local) sharing one code path. Track 1 unblocks the ship pipeline; Track 2 builds the spine over 4 weeks. Reuses ingestion logic from ~/projects/inbox.
Two implementation plans drafted from the life-capture spec: - Track 1 (ship pipeline): fix Ubuntu installer smoke, land in-flight PRs tinyhumansai#806/tinyhumansai#786/tinyhumansai#788/tinyhumansai#797, wire Tauri auto-updater + signed Mac/Windows builds. - Life-Capture #1 (foundation): SQLite + sqlite-vec personal_index.db, Embedder trait + HostedEmbedder (OpenAI), PII redaction, quoted-thread stripping, hybrid retrieval (vector + keyword + recency), controller schema + RPC. End-to-end test with synthetic items. No ingestion or UI yet — those are subsequent milestone plans.
Refactors scripts/install.sh to expose resolve_asset_url and verify_asset_reachable. Adds scripts/test_install.sh that exercises the resolver against a committed fixture latest.json. Failures now report the resolved URL and the parse error instead of dying silently.
…le, keyword projection column
Adopts the Hermes pattern (NousResearch/hermes-agent, MIT) of an
agent-writable, char-bounded curated memory file pair. F15 builds the
char-bounded store with atomic writes; F16 wires a session-start
snapshot into the system prompt (preserves prefix cache); F17 exposes
memory.{add,replace,remove,read} through the controller dispatch so
both the agent loop and skills can use it.
Sits between TinyHumans synthesis (volatile) and personal_index.db
(raw events) — fills the "deliberately curated facts" gap that
neither covered.
Adds src/openhuman/life_capture/ module tree (stubs for embedder, index, migrations, quote_strip, redact, rpc, schemas, types, plus tests/) per Foundation Plan F1. Wires the module in src/openhuman/mod.rs. Adds sqlite-vec to deps and httpmock as a dev-dep. Reuses existing rusqlite/regex/once_cell/async-trait/tempfile already in tree (Option C — avoids sqlx/rusqlite libsqlite3-sys conflict).
…egex
Previously the country-code group required at least one digit (\+?\d{1,3}),
so '(415) 555-0123' (no leading country code) wouldn't match. Wrapped the
whole prefix in a non-capturing optional group so parenthesized area codes
match without a leading country code.
…te, char-bounded)
… table Loads sqlite_vec via sqlite3_auto_extension exactly once per process so every new connection picks up the vec0 module. PersonalIndex wraps a single rusqlite Connection in Arc<Mutex<_>> for async sharing; open() runs migrations and sets WAL+foreign_keys, open_in_memory() is for tests. vec_version() round-trip and a 1536-dim INSERT + MATCH query both verified.
…_id) dedupe Adds IndexWriter with upsert (ON CONFLICT(source, external_id) DO UPDATE — FTS stays in sync via triggers) and upsert_vector (DELETE + INSERT since vec0 has no ON CONFLICT for virtual table primary keys). Both wrap rusqlite work in spawn_blocking and share the PersonalIndex's Arc<Mutex<Connection>>. Exposes ensure_vec_extension_registered() crate-wide so the migrations test can register vec0 before opening its in-memory connection.
- keyword_search: FTS5 ranked by negated bm25 with snippet markers (« » …), plus the user:local ACL EXISTS clause so the same query shape works for the multi-token team v2 ACL without rewrites. - vector_search: sqlite-vec MATCH with k = ? clause (vec0 KNN requirement) and ORDER BY distance; score = 1/(1 + distance) so callers can blend it with keyword bm25 on the same monotonic scale. Shared ItemRow + into_hit() helper lets both queries reuse the same row shape; rusqlite query_map closures hand-build it because rusqlite has no sqlx FromRow equivalent.
Pulls oversampled candidates (k*3 min 20) from both keyword and vector legs, normalises each independently to [0,1], then re-ranks with 0.55 * vec_norm + 0.35 * kw_norm + 0.10 * exp(-age_days/30). Documents present in only one leg get 0 for the missing signal but still compete on the others; only items with neither signal are dropped. Same-vector twin items now break their tie by recency.
…rch) Adds two controllers exposed via the registry: - life_capture.get_stats — total items, per-source counts, last-ingest ts. - life_capture.search — hybrid (vector + keyword + recency) search with embed-then-rank, optional k (default 10, capped at 100). Runtime state (PersonalIndex + dyn Embedder) lives in life_capture::runtime behind a tokio OnceCell because handlers are stateless fn(Map<String,Value>) and have no per-call context object. F14 will call runtime::init() at app startup; until then handlers return a structured 'not initialised' error so the failure mode is loud, not silent. Schemas registered in core/all.rs alongside cron.
… → search) FakeEmbedder hashes input bytes into a deterministic sparse 1536-dim vector so the same text round-trips identically through vec0 — keeps the e2e test hermetic, no network call. Verifies that: - quote-strip drops the 'On … wrote:' block before indexing, - redact masks the email address before indexing, - the upserted item resurfaces via hybrid_search with both transformations preserved on the returned text.
After Config::load_or_init succeeds, open the personal_index.db at config.workspace_dir and register the life_capture runtime. Embedder is env-gated: OPENHUMAN_EMBEDDINGS_KEY > OPENAI_API_KEY OPENHUMAN_EMBEDDINGS_URL (default: https://api.openai.com/v1) OPENHUMAN_EMBEDDINGS_MODEL (default: text-embedding-3-small) If no key is set we still open the index file (so Plan #2's ingestion worker can write to it) but skip runtime registration — controllers then return the 'not initialised' error from runtime::get() instead of silently calling a misconfigured embedder. Config-schema integration deferred to a follow-up; env-driven keeps F14 non-invasive while we land the rest of the foundation.
…ection snapshot_pair(memory, user) returns a MemorySnapshot containing plain strings — no reference back to the stores — so taking a snapshot at session start and reusing it across turns gives a stable system-prompt prefix and lets the LLM prefix-cache hit on every subsequent turn. Plan #1 also calls for wiring this into chat.rs' OpenClaw context loader, but openhuman assembles agent prompts per-agent under src/openhuman/agent/agents/*/prompt.rs (not the plan's stale src-tauri/src/commands/chat.rs path); the prompt-builder integration is deferred to Plan #2 alongside the agent-context refactor.
… controllers Adds four controllers under the memory_curated namespace (the 'memory' namespace is already owned by the long-term memory subsystem): - memory_curated.read — read MEMORY.md or USER.md - memory_curated.add — append entry, char-bounded - memory_curated.replace — substring replace, char-bounded - memory_curated.remove — drop entries containing a needle 'file' input is a typed enum (memory|user) so adapters reject anything else at validation time. Runtime state lives in curated_memory::runtime behind a tokio OnceCell, mirroring life_capture's pattern: startup in jsonrpc.rs constructs both stores at <workspace>/memories/ with Hermes' char limits (memory: 2200, user: 1375) and registers the runtime. Handlers return 'not initialised' until init runs, so failure is loud.
…ings Findings from Codex + Gemini second-opinion review: 1. IndexWriter::upsert orphaned vectors on re-ingest (Codex). On (source, external_id) conflict the row's id was kept but the caller's fresh UUID was used for upsert_vector — the vector wrote under an id no row joined to. Fix: explicit SELECT-then-UPDATE-or-INSERT in the same transaction; mutates caller's Item.id to the canonical id (so the next upsert_vector lands on the right row) and orphan-deletes any vector already written under the wrong id. Signature change: upsert(&[Item]) -> upsert(&mut [Item]). 2. upsert_vector DELETE+INSERT was outside a transaction (Gemini) — a failed INSERT permanently lost the item's vector. Now wrapped in tx. 3. Runtime over-gated get_stats on the embedder (Codex). Split runtime into separate INDEX + EMBEDDER OnceCells: get_stats only requires index, search requires both. Index initialises whenever the workspace dir is reachable; embedder is env-gated as before. 4. Startup race (both reviewers): runtime init lived inside the post- serve tokio::spawn block, so axum::serve was already accepting before the OnceCells were set. Hoisted both bootstraps (curated_memory and life_capture) into helpers called inline before axum::serve. 5. runtime::get error message lied (Codex) — said 'set embeddings.api_key in config' but startup actually reads env vars. Fixed text. Bonus: rename controller namespace memory_curated -> curated_memory (Codex preference; nothing depends on it yet so renamed before clients do). Adds regression test for #1 (reingest_with_fresh_uuid_keeps_vector_findable). 4696 lib tests pass.
…emory namespace rename Pre-push hook auto-applied cargo fmt across the foundation files; also fixed the schemas.rs docstring still referencing the old memory_curated namespace.
Critical: - curated_memory/store.rs: propagate read errors instead of `unwrap_or_default`, which could rewrite MEMORY.md / USER.md from an empty baseline on transient I/O or permission failures. Reject empty needle in replace/remove. - curated_memory/rpc.rs: belt-and-suspenders empty-needle guard at the RPC boundary (remove with "" deletes every entry). - life_capture/index.rs: upsert_vector rejects orphan ids (items row missing), which would have inserted vectors that never join in vector_search. - life_capture/embedder.rs: validate response length matches input, indices are contiguous 0..n, and every vector matches dim() — prevents silent misalignment or wrong-dimensional vectors flowing into the 1536-wide sqlite-vec table. Added 30s request + 10s connect timeouts. Retrieval hardening: - life_capture/index.rs: keyword_search now escapes FTS5 operators by tokenizing on whitespace and wrapping each token as a quoted literal. Prevents errors or unintended matches from stray quotes, AND/OR/NEAR, column specifiers in user input. - life_capture/index.rs: hybrid_search applies q.sources / q.since / q.until via post-filter (single consistent pass across keyword+vector fusion). - life_capture/rpc.rs: handle_search validates embedder dim and response length against the fixed 1536-wide index column — clear RPC error instead of a cryptic sqlite-vec failure when the embeddings model is swapped. Docs: - docs/event-bus.md: remove erroneous `.await` on register_native_global (it is sync; the handler closure is async). - life-capture design spec: soften PII redaction claim — only regex is implemented today; light NER is flagged as a future enhancement.
Rendered prompts now surface runtime curated-memory writes: - `PromptContext` gains `curated_snapshot: Option<&MemorySnapshot>`. - `UserFilesSection` prefers the snapshot over the workspace-file loader when one is attached, and injects `USER.md` alongside `MEMORY.md` using a byte-compatible `inject_snapshot_content` helper. - `Session` carries `Option<Arc<MemorySnapshot>>`, populated by `ensure_curated_snapshot` on the first turn from `curated_memory::runtime::get()`. Reused across turns so prompt bytes stay frozen (KV-cache prefix contract) while mid-session `curated_memory.add/replace/remove` writes land on the NEXT session. - `ParentExecutionContext` inherits the snapshot so sub-agents render identical `MEMORY.md`/`USER.md` blocks as the parent. - Legacy workspace-file fallback preserved for embeds that don't initialise the curated-memory runtime (pure unit tests).
Loads a 32-item corpus across gmail/calendar/imessage/slack into an in-memory PersonalIndex and runs 12 queries through keyword / vector / hybrid paths, asserting must_contain / must_not_contain within per-query top-K prefixes. The vector leg uses a deterministic FNV-1a feature-hashing embedder (1536 dims, L2-normalized) so the test is offline and reproducible; real embedder swaps behind one call. Fixture reserves `relevant` and `pending` fields so recall@k / MRR / nDCG bolt onto the same JSON later. q-src-01 is marked pending: hybrid_search currently ignores Query.sources/since/until. Flip to false once filtering lands.
Previously every read grabbed the same `Arc<Mutex<Connection>>` as the writer, so `IndexReader` calls serialised behind in-flight ingests even though WAL would allow parallel readers. Split `PersonalIndex` into: - `writer: Arc<Mutex<Connection>>` — unchanged single-writer model. - `reader_pool: Option<Arc<r2d2::Pool<SqliteConnectionManager>>>` — four-connection read pool on file-backed indexes, built alongside the writer after migrations run. Each pooled connection gets `query_only=ON` as a belt-and-suspenders guard; sqlite-vec is loaded automatically via the process-wide auto-extension. `IndexReader` routes `keyword_search` / `vector_search` through a new `with_read_conn` helper that picks the pool when present and falls back to the writer lock otherwise. In-memory handles keep the single-connection layout — shared-cache URIs buy nothing at test-fixture scale and would force every test to grow a unique name.
- Add `life_capture.ingest` RPC controller (idempotent upsert by
`(source, external_id)`, embeds text + atomic vector replace).
- Switch iMessage scanner's `ingest_group()` to POST
`openhuman.life_capture_ingest` instead of `openhuman.memory_doc_ingest`,
so chat-day transcripts land in the PersonalIndex (the same store
`life_capture.search` reads from).
- Fix `life_capture.search` envelope: wrap hits in `{"hits": [...]}` per
the schema (previously returned a bare array — obs 3165).
- New `#[ignore]`d e2e at `tests/imessage_ingest_e2e.rs` proves the bridge:
ingest → search → re-ingest is idempotent (`total_items` unchanged,
`replaced=true`).
New self-contained module `src/openhuman/people/` for A5 of the v1 plan.
Exposes people.list / people.resolve / people.score via the controller
registry; no branches added in core/cli.rs or core/jsonrpc.rs.
Files added
-----------
src/openhuman/people/mod.rs
src/openhuman/people/types.rs PersonId, Handle{IMessage,Email,
DisplayName}, Person, Interaction,
ScoreComponents, AddressBookContact
src/openhuman/people/resolver.rs HandleResolver: resolve / resolve_or_create
/ link — deterministic across case +
whitespace via Handle::canonicalize()
src/openhuman/people/scorer.rs score(interactions, now) → ScoreComponents
src/openhuman/people/store.rs SQLite store (in-memory or file-backed);
plus process-global OnceCell handle for
the schema adapters
src/openhuman/people/migrations.rs idempotent migrator (life_capture style)
src/openhuman/people/migrations/0001_init.sql
people, handle_aliases, interactions
src/openhuman/people/address_book.rs macOS Address Book read (see below)
src/openhuman/people/rpc.rs typed domain handlers
src/openhuman/people/schemas.rs ControllerSchema + adapters
src/openhuman/people/tests.rs cross-file integration tests
Wired into src/core/all.rs: registered_controllers, declared schemas, and
namespace_description("people"). Added `pub mod people;` in
src/openhuman/mod.rs. Nothing else touched.
Scoring
-------
Each component clamped to [0,1]; composite = product (also clamped).
recency exp(-ln2 * age_days / 14) half-life 14 days
frequency count / 50, saturated cap 50 interactions
reciprocity min(out, in) / max(out, in) 0 when one-sided
depth mean(length) / 500, saturated cap 500 chars
Constants are module-level (RECENCY_HALF_LIFE_DAYS / FREQUENCY_CAP /
DEPTH_CAP_CHARS) so the first pass can be tuned from config later without
breaking the API.
Address Book read path
----------------------
Read the macOS Address Book directly from its SQLite stores at
`~/Library/Application Support/AddressBook/Sources/*/AddressBook-v22.abcddb`
via the existing `rusqlite` dep (opened with
SQLITE_OPEN_READ_ONLY | SQLITE_OPEN_NO_MUTEX). Returns `Vec<AddressBookContact>`
with display_name + emails + phones. A missing AddressBook dir or per-source
db is treated as zero contacts, not an error.
Pragmatic divergence from the plan: the original A5 spec said "objc2". objc2
is not currently in Cargo.toml; adding it plus Contacts.framework linkage is
a heavier dep surface than the SQLite read, which gives the same data, zero
new deps, and is trivially testable. If a future caller needs live change
notifications (KVO) we can revisit.
On non-mac, `address_book::read()` is a stub returning `Ok(vec![])`.
Tests (25 passing, cargo test people --lib)
------------------------------------------
types::* — Handle canonicalization, key mapping
migrations::* — tables created, idempotent
store::* — insert / lookup / list / interaction round-trip
resolver::* — deterministic across case/whitespace; same email
+ different display_name → same PersonId;
distinct handles → distinct ids
scorer::* — empty=zero; half-life; unit-interval clamp;
one-sided → 0 reciprocity; deterministic
schemas::* — three schemas + three handlers; required fields
rpc::* — list orders by score desc; resolve(create=false)
returns null for unknown
address_book::* — non-mac returns empty; mac read never panics
tests::* — resolver + store cooperate across handle kinds;
PersonId string round-trip
Scope
-----
Self-contained. No life_capture / chronicle / nudges / UI integration —
those land in A6+ / A9 per the v1 plan.
…A5 review blockers)
- address_book.rs: replace direct rusqlite/SQLite read with CNContactStore
(objc2-contacts 0.3.2). Properly triggers TCC Contacts permission prompt
under signed/sandboxed builds. Returns structured AddressBookError with
PermissionDenied variant so callers can surface actionable messages.
Error downgrades promoted from debug! to warn!. Adds ContactsSource trait
as a mockable seam; SystemContactsSource wraps real FFI; non-mac stub
unchanged (returns empty vec).
- resolver.rs: add seed_from_address_book(source: &dyn ContactsSource) which
wires the address book read into the people store. Iterates contacts, picks
primary handle (email > phone > display name), calls resolve_or_create, then
links aliases. Idempotent. Returns (seeded, skipped) counts and propagates
AddressBookError. Tests cover: canned list, permission denied, idempotency,
display-name-only, no-field skipped.
- rpc.rs: add handle_refresh_address_book exposing the seeding path over RPC.
Returns { seeded, skipped, permission_denied } — never errors on permission
denied (surfaces as a flag so the UI can prompt the user).
- schemas.rs: register people.refresh_address_book controller (4th controller).
Update stale test counts (3 → 4).
- scorer.rs: replace total-ever FREQUENCY_CAP with a rolling window approach.
Adds FREQUENCY_WINDOW_DAYS = 30 constant. frequency = interactions within
last 30 days / FREQUENCY_CAP. Old-burst contacts no longer get inflated
scores. New tests: old_burst_does_not_inflate_frequency_score,
interactions_exactly_at_window_boundary_are_included.
- Cargo.toml: add objc2 = "0.6", objc2-foundation (NSArray/NSError/NSString/
NSPredicate features), objc2-contacts = "0.3.2" (CNContact/CNContactFetchRequest/
CNContactStore/CNLabeledValue/CNPhoneNumber), block2 = "0.6" under
[target.'cfg(target_os = "macos")'.dependencies], matching A7 versions.
|
Warning Rate limit exceeded
To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (36)
📝 WalkthroughWalkthroughThis PR implements foundational infrastructure for life-capture (a personal index with SQLite+sqlite-vec storage, hybrid search, and local embedding support), curated-memory (session-scoped file-backed stores injected into agent prompts), and a people module (contact resolution and interaction-based scoring). It updates the iMessage scanner to use the new life_capture_ingest endpoint, hardens the installer with asset verification, bootstraps both systems at server startup, and adds comprehensive planning documentation. ChangesFoundation Milestone: Life Capture, Curated Memory, People, and Agent Integration
Installer Hardening and CI Preparation
Sequence Diagram(s)sequenceDiagram
participant Client as Client/iMessage<br/>Scanner
participant LifeCapture as life_capture RPC<br/>(ingest/search)
participant Index as PersonalIndex<br/>(writer/reader)
participant Embedder as Embedder<br/>(OpenAI)
participant SQLite as SQLite<br/>personal_index.db
Client->>LifeCapture: POST ingest(source,<br/>external_id, text, ts)
LifeCapture->>LifeCapture: Redact PII,<br/>strip quoted reply
LifeCapture->>Index: upsert Item<br/>via IndexWriter
Index->>SQLite: SELECT existing by<br/>(source, external_id)
SQLite-->>Index: canonical UUID
Index->>SQLite: INSERT/UPDATE items<br/>row
LifeCapture->>Embedder: embed_batch([text])
Embedder-->>LifeCapture: Vec<Vec<f32>>
LifeCapture->>Index: upsert_vector<br/>(item_id, vector)
Index->>SQLite: INSERT item_vectors
LifeCapture-->>Client: RpcOutcome<br/>{item_id, replaced}
Client->>LifeCapture: POST search(text, k)
LifeCapture->>Embedder: embed_batch([text])
Embedder-->>LifeCapture: query_vector
LifeCapture->>Index: hybrid_search<br/>(query_vector, Query)
Index->>SQLite: keyword_search (FTS5)
Index->>SQLite: vector_search (vec0)
Index->>Index: RRF fusion +<br/>recency scoring
Index-->>LifeCapture: Vec<Hit>
LifeCapture-->>Client: RpcOutcome<br/>{hits: [...]}
sequenceDiagram
participant Agent as Agent<br/>Session
participant CuratedMem as curated_memory<br/>runtime
participant MemStore as MemoryStore<br/>(file-backed)
participant Prompt as Prompt<br/>Builder
Agent->>Agent: new Agent()<br/>curated_snapshot=None
Agent->>Agent: turn() first turn
Agent->>Agent: ensure_curated_snapshot()
Agent->>CuratedMem: get() runtime
CuratedMem->>MemStore: read MEMORY.md
MemStore-->>CuratedMem: String
CuratedMem->>MemStore: read USER.md
MemStore-->>CuratedMem: String
CuratedMem-->>Agent: MemorySnapshot<br/>{memory, user}
Agent->>Agent: store as Arc
Agent->>Prompt: build system prompt
Prompt->>Prompt: UserFilesSection
Prompt->>Agent: curated_snapshot()?
Agent-->>Prompt: Some(snapshot)
Prompt->>Prompt: inject snapshot<br/>MEMORY.md + USER.md
Prompt-->>Agent: enriched prompt
Agent-->>Agent: send to LLM
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120+ minutes The PR introduces three major new modules (life_capture, curated_memory, people) with substantial implementation density: ~2,500 lines of schema/migration definitions, ~1,500 lines of index implementation with complex reader logic (FTS5/vec0/hybrid search/RRF fusion), ~1,000+ lines of store implementations, extensive RPC/handler wiring across all three modules, server bootstrap logic, agent session modifications, and comprehensive test coverage with fixtures and e2e tests. The variety of concerns (SQLite schema/migrations, Rust async/sync patterns, macOS FFI for address book, embeddings API client, text processing, scoring algorithms) demands separate mental models for each section. The changes are heterogeneous across file types and domains, requiring reviewers to context-switch between database design, Rust concurrency patterns, API contract definitions, and integration semantics. Possibly related PRs
|
# Conflicts: # src/openhuman/mod.rs
…bview NotificationSettings
There was a problem hiding this comment.
Actionable comments posted: 15
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/core/all.rs (1)
88-190:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPeople RPC controllers not wired into the global registry.
The
peoplenamespace is declared innamespace_description()(line 324) with description "Contact resolution and recency × frequency × reciprocity × depth scoring," butbuild_registered_controllers()lacks a call tocontrollers.extend(crate::openhuman::people::all_people_registered_controllers()). Without this wiring, the four RPC endpoints (people.list,people.resolve,people.score,people.refresh_address_book) remain unreachable viatry_invoke_registered_rpc.The
all_people_registered_controllersfunction exists insrc/openhuman/people/mod.rs, so add the wiring tobuild_registered_controllers()following the established pattern.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/core/all.rs` around lines 88 - 190, build_registered_controllers() is missing the people controllers so the RPCs declared in namespace_description() (people.*) are not reachable; add a call to extend the global registry with crate::openhuman::people::all_people_registered_controllers() (following the same pattern as other modules) so try_invoke_registered_rpc can find people.list/resolve/score/refresh_address_book; locate the function build_registered_controllers and insert controllers.extend(crate::openhuman::people::all_people_registered_controllers()) in the appropriate alphabetical/section spot.
🧹 Nitpick comments (4)
scripts/test_install.sh (1)
23-27: 💤 Low valueConsider verifying the specific exit code for missing platform.
The test correctly expects a non-zero exit, but
resolve_asset_urlreturns exit code 3 specifically for missing platforms. Verifying this would catch regressions where the function might fail for a different reason.Proposed enhancement
# Also test a missing platform produces exit code 3 -if resolve_asset_url "$FIXTURE" "linux" "aarch64" 2>/dev/null; then - echo "FAIL: expected non-zero exit for missing platform linux-aarch64" +rc=0 +resolve_asset_url "$FIXTURE" "linux" "aarch64" 2>/dev/null || rc=$? +if [[ "$rc" -ne 3 ]]; then + echo "FAIL: expected exit code 3 for missing platform linux-aarch64, got $rc" exit 1 fi🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/test_install.sh` around lines 23 - 27, The test currently only asserts that resolve_asset_url "$FIXTURE" "linux" "aarch64" exits non-zero; change it to assert the specific exit code 3 returned for missing platforms by capturing the command's exit status (run resolve_asset_url with the same args, save $? into a variable) and then check that the variable equals 3, failing the test if it does not — reference the resolve_asset_url call and the "$FIXTURE" variable to locate and update the assertion logic.scripts/install.sh (1)
203-217: 💤 Low value
verify_asset_reachableis defined but not called in the install flow.The function is well-implemented with exponential backoff, but it's not wired into
resolve_from_latest_jsonor the main install path. The planning doc (Task A3 Step 3) shows intent to call it after resolving the URL.If this is intentional (deferred to a follow-up), consider adding a TODO comment. Otherwise, wire it in before the download step around line 415.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/install.sh` around lines 203 - 217, The verify_asset_reachable function is implemented but never invoked; call verify_asset_reachable("$asset_url") immediately after resolve_from_latest_json returns the asset URL and before the actual download step, check its return code and abort with a clear error/log if it fails (propagate the non-zero exit or exit 4), and if you intentionally want to defer this, add a TODO comment in the main install flow mentioning resolve_from_latest_json and verify_asset_reachable so it’s not lost.src/openhuman/life_capture/quote_strip.rs (1)
4-7: ⚡ Quick win
ON_DATE_WROTEmisses multi-line attribution headers common in mobile clients.
.{1,200}with no(?s)flag cannot cross a newline, so Gmail's typical two-line attribution:On Mon, Apr 21, 2026 at 9:14 AM, Sarah Smith <sarah@example.com> wrote:is never matched. The
>filter below still strips the quoted lines, but the attribution header leaks into the returned content.♻️ Proposed fix — use DOTALL with a non-greedy quantifier
-static ON_DATE_WROTE: Lazy<Regex> = Lazy::new(|| { - // "On <date>, <name> <email|>(?) wrote:" — match start of any line. - Regex::new(r"(?m)^On .{1,200}\bwrote:\s*$").unwrap() -}); +static ON_DATE_WROTE: Lazy<Regex> = Lazy::new(|| { + // "On <date>, <name> <email|>(?) wrote:" — allow header to span 1-3 lines. + // (?s) makes `.` cross newlines; non-greedy `.{1,300}?` prevents over-capture. + Regex::new(r"(?ms)^On .{1,300}?\bwrote:\s*$").unwrap() +});🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/life_capture/quote_strip.rs` around lines 4 - 7, The ON_DATE_WROTE regex doesn't match multi-line attribution headers because DOT (.) doesn't cross newlines; update the pattern used in ON_DATE_WROTE to enable DOTALL and make the quantifier non-greedy so it can span at most ~200 chars across lines (e.g., add the inline (?s) or combined (?ms) flag and use .{1,200}? ) — modify the Regex::new call that constructs ON_DATE_WROTE to use the DOTALL-enabled, non-greedy form (refer to the ON_DATE_WROTE static and its Regex::new invocation) so multi-line attribution blocks like Gmail's two-line header are matched and stripped.src/openhuman/people/rpc.rs (1)
16-54: 🏗️ Heavy liftN+1 query pattern: one
interactions_forround-trip per person
handle_listloads the full person list and then issues a separateinteractions_forquery for every person in the result set. With the 500-person cap that is up to 501 sequential DB round-trips before any sorting occurs. Adding abatch_interactions_for(person_ids)method toPeopleStore(a singleSELECT … WHERE person_id IN (…)keyed byPersonId) would collapse this to two queries.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/people/rpc.rs` around lines 16 - 54, handle_list currently calls PeopleStore::interactions_for(p.id) inside the loop, causing N+1 DB round-trips; add a new PeopleStore::batch_interactions_for(person_ids: &[PersonId]) -> Result<HashMap<PersonId, Vec<Interaction>>, _> (or similar) that does one SELECT WHERE person_id IN (...), call it once before the for p in people loop with the collected person IDs, replace per-person interactions_for calls with lookups into the returned map, and preserve existing error mapping (map_err(...)) and use the same score(&interactions, now) path so scoring and JSON construction (in handle_list) remain unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/core/jsonrpc.rs`:
- Around line 908-912: The doc comment "Initializes long-lived socket/event-bus
infrastructure." was accidentally moved to bootstrap_curated_memory; move that
doc line (and any other lines belonging to the original comment block) back
above bootstrap_skill_runtime so bootstrap_skill_runtime regains its intended
documentation, and replace or remove the misplaced comment on
bootstrap_curated_memory so its own doc comment correctly describes
curated-memory initialization; locate the functions by their names
bootstrap_curated_memory and bootstrap_skill_runtime to make the change.
In `@src/openhuman/agent/prompts/mod.rs`:
- Around line 337-339: PromptContext is missing the curated_snapshot field used
in mod.rs; add pub curated_snapshot: Option<&'a
crate::openhuman::curated_memory::MemorySnapshot> to the PromptContext<'a>
struct in prompts/types.rs, then update the functions that construct
PromptContext—empty_prompt_context_for_static_sections() and
build_system_prompt() in turn.rs—to accept/propagate the curated snapshot (set
to Some(...) when available or None otherwise) so ctx.curated_snapshot is
populated before code calls inject_snapshot_content; reference PromptContext,
empty_prompt_context_for_static_sections, build_system_prompt, and the existing
Agent implementation in session/types.rs for the intended field shape.
In `@src/openhuman/curated_memory/rpc.rs`:
- Around line 19-72: Change the four handler signatures (handle_read,
handle_add, handle_replace, handle_remove) to return RpcOutcome<Value> instead
of Result<RpcOutcome<Value>, String>, and convert all early Err(...) returns
into RpcOutcome::error(...) returns; specifically, replace uses of the `?`
operator on pick(&file, rt) and map_err(...) on
store.read()/add()/replace()/remove() with explicit error handling that returns
RpcOutcome::error(format!(...)) (include the same error text like "read: {e}",
"add: {e}", etc.), and for validation failures (e.g., empty needle) return
RpcOutcome::error("needle must not be empty".into()); keep successful paths
returning RpcOutcome::new(...) as before.
In `@src/openhuman/curated_memory/store.rs`:
- Around line 88-95: snapshot_pair currently calls memory_store.read().await and
user_store.read().await separately, which can interleave with concurrent writes
and produce a mixed snapshot; fix by acquiring exclusive locks on both
MemoryStore instances before reading (use the stores' write().await or other
exclusive lock API) and always acquire them in a deterministic order to avoid
deadlocks (e.g., compare the pointer/address or some stable id of memory_store
vs user_store and lock the lower one first, then the other), then read the
in-memory contents into the MemorySnapshot and release both locks.
In `@src/openhuman/life_capture/embedder.rs`:
- Around line 19-33: The current HostedEmbedder::new uses .unwrap_or_else(|_|
reqwest::Client::new()) which silently drops configured timeouts if the builder
fails; change this to call .expect(...) with a clear message (e.g., "failed to
build reqwest client with timeouts") so the program fails loud on
misconfiguration instead of returning a timeout-less client; update the builder
call in HostedEmbedder::new (the reqwest::Client::builder()...build()
invocation) to use .expect and include a descriptive message.
- Around line 97-99: The dim() method is hardcoded to 1536 and must reflect the
model used by HostedEmbedder; update the HostedEmbedder struct to store an
embedding dimension (add a field like embedding_dim set during new(base_url,
api_key, model)), compute or map the correct dimension for the chosen model when
constructing HostedEmbedder, change fn dim(&self) -> usize to return
self.embedding_dim, and ensure embed_batch and any callers use this stored value
so models like text-embedding-3-large (3072) work correctly.
In `@src/openhuman/life_capture/index.rs`:
- Around line 545-547: The hybrid_search currently truncates candidates by using
oversample before apply_query_filters, causing filtered-out top results to leave
too few matches; update the logic around oversample/q.k and the calls to
keyword_search and vector_search so that filtering runs before final truncation:
either have keyword_search/vector_search return an untruncated candidate pool
(or accept a much larger dynamic oversample) and then call
apply_query_filters(query_vector, q, ...) to filter, and only after filtering
take the top q.k results to return; implement a loop or adaptive oversample
(e.g., double oversample up to a max) if necessary to ensure enough filtered
candidates, and apply the same change to the other similar block referenced
(around the vc/kw usage at the later 581-582 region) so both legs are filtered
prior to truncation.
In `@src/openhuman/life_capture/rpc.rs`:
- Around line 154-170: The code currently calls writer.upsert(&mut items) before
calling embedder.embed_batch(...) so a failed embed leaves the stored row with
an out-of-date vector; change the flow so embedding is performed and a valid
vector obtained before mutating storage: call embedder.embed_batch(...) and
obtain `vector` for the input text (using the same `text`, `items`, and
`canonical_id`/`requested_id` logic) first, then call writer.upsert(&mut items)
followed immediately by writer.upsert_vector(&canonical_id, &vector) (or perform
both upserts in a single atomic operation/transaction if supported) to ensure
the row is only updated when embedding has succeeded. Ensure references to
`IndexWriter::new`, `writer.upsert`, `embedder.embed_batch`,
`writer.upsert_vector`, `items`, `canonical_id`, and `requested_id` are updated
accordingly.
In `@src/openhuman/life_capture/schemas.rs`:
- Around line 188-193: The FieldSchema for "metadata" currently declares
TypeSchema::Option(Box::new(TypeSchema::String)) but the handler treats it as a
raw JSON object and defaults missing input to {} — change the schema to an
optional JSON/object type (e.g. TypeSchema::Option(Box::new(TypeSchema::Object))
or the project’s equivalent JSON/map type) so it matches the handler, and/or
update the handler to either accept a string and parse it into JSON or stop
defaulting missing values to {} (use None) so wire format and stored value
agree; apply the same fix to the other occurrence of "metadata" mentioned (lines
254-257) ensuring both FieldSchema(name: "metadata", ...) and the handler use
the same JSON object representation.
In `@src/openhuman/people/resolver.rs`:
- Around line 28-33: The resolve and link functions skip canonicalization and
should canonicalize handles/aliases before using the store: in resolve, call
handle.canonicalize() (or otherwise obtain the canonical Handle) and pass that
to store.lookup instead of the raw handle; in link, canonicalize the other alias
(e.g., call other.canonicalize()) before calling store.add_alias(pid, ...) so
aliases are stored in canonical form; ensure this mirrors resolve_or_create's
use of handle.canonicalize() so lookups and alias additions are consistent and
avoid duplicate PersonIds.
- Around line 40-74: The current resolve_or_create implementation races: it
calls store.lookup(&canonical) then creates a new PersonId with PersonId::new
and calls store.insert_person(&person, &[canonical]) which can produce duplicate
people under concurrent requests; fix by making the create-or-get atomic —
either implement an upsert at the store layer (e.g., an INSERT ... ON CONFLICT
DO UPDATE/RETURNING that returns the existing PersonId) and expose a
store.upsert_person_or_get_id(...) used by resolve_or_create, or after
insert_person call re-query store.lookup(&canonical) and return that id
(discarding the locally generated id) to ensure the returned id is the canonical
stored id; modify resolve_or_create, lookup, and insert_person/upsert methods
accordingly so the function always returns the database-authoritative PersonId.
In `@src/openhuman/people/rpc.rs`:
- Around line 120-143: handle_score currently treats a missing person the same
as a person with zero interactions because interactions_for returns an empty
Vec; update handle_score to first verify the person exists (e.g. call a
PeopleStore method like get_person, find_person, or person_exists on the
provided person_id) and return an Err (with a clear message including the
person_id) when the person is not found; only call interactions_for and compute
the score when the person lookup succeeds, preserving the existing RpcOutcome
construction for valid persons.
- Around line 68-79: The "created" flag is incorrect because resolve_or_create
returns Some for both existing and newly-created persons; update the logic in
the RPC handler (the block using resolver.resolve_or_create, resolver.resolve
and building RpcOutcome) to first check existence (e.g., call
resolver.resolve(&handle).await? or resolver.exists(&handle).await?) and only
call resolver.resolve_or_create(&handle).await? when absent — then set "created"
to true only when you actually inserted a new record (i.e., when the pre-check
returned None/false and resolve_or_create returned Some), otherwise set
"created" to false while still returning the existing person_id.
In `@src/openhuman/people/schemas.rs`:
- Around line 60-104: The output schemas for the people RPCs are missing the
"components" object and "handles" array that the RPC responses actually return;
update the FieldSchema entries inside the outputs vector (the TypeSchema::Array
of TypeSchema::Object) to include a "components" FieldSchema (TypeSchema::Object
with fields "recency", "frequency", "reciprocity", "depth" as TypeSchema::F64)
and a "handles" FieldSchema (TypeSchema::Array of TypeSchema::Object describing
per-handle fields returned by handle_list), matching required flags and comments
to the actual responses used by handle_score and handle_list so schema
introspection matches rpc.rs behavior.
In `@tests/imessage_ingest_e2e.rs`:
- Around line 9-13: The tests share a process-global SQLite DB so the assertion
that total_items == 1 in ingest_then_search_then_reingest_is_idempotent (and the
similar assertions around lines 63-76 and 160-166) is flaky under parallel test
execution; either enforce serial runs in the test invocation or, preferably,
make the assertions relative/robust by checking the expected delta or using >=1
(e.g. assert total_items >= previous_count + 1 or compare counts before/after to
assert an increment) and update the assertions in
search_response_shape_matches_schema and
ingest_then_search_then_reingest_is_idempotent accordingly so they no longer
assume an absolute count of 1.
---
Outside diff comments:
In `@src/core/all.rs`:
- Around line 88-190: build_registered_controllers() is missing the people
controllers so the RPCs declared in namespace_description() (people.*) are not
reachable; add a call to extend the global registry with
crate::openhuman::people::all_people_registered_controllers() (following the
same pattern as other modules) so try_invoke_registered_rpc can find
people.list/resolve/score/refresh_address_book; locate the function
build_registered_controllers and insert
controllers.extend(crate::openhuman::people::all_people_registered_controllers())
in the appropriate alphabetical/section spot.
---
Nitpick comments:
In `@scripts/install.sh`:
- Around line 203-217: The verify_asset_reachable function is implemented but
never invoked; call verify_asset_reachable("$asset_url") immediately after
resolve_from_latest_json returns the asset URL and before the actual download
step, check its return code and abort with a clear error/log if it fails
(propagate the non-zero exit or exit 4), and if you intentionally want to defer
this, add a TODO comment in the main install flow mentioning
resolve_from_latest_json and verify_asset_reachable so it’s not lost.
In `@scripts/test_install.sh`:
- Around line 23-27: The test currently only asserts that resolve_asset_url
"$FIXTURE" "linux" "aarch64" exits non-zero; change it to assert the specific
exit code 3 returned for missing platforms by capturing the command's exit
status (run resolve_asset_url with the same args, save $? into a variable) and
then check that the variable equals 3, failing the test if it does not —
reference the resolve_asset_url call and the "$FIXTURE" variable to locate and
update the assertion logic.
In `@src/openhuman/life_capture/quote_strip.rs`:
- Around line 4-7: The ON_DATE_WROTE regex doesn't match multi-line attribution
headers because DOT (.) doesn't cross newlines; update the pattern used in
ON_DATE_WROTE to enable DOTALL and make the quantifier non-greedy so it can span
at most ~200 chars across lines (e.g., add the inline (?s) or combined (?ms)
flag and use .{1,200}? ) — modify the Regex::new call that constructs
ON_DATE_WROTE to use the DOTALL-enabled, non-greedy form (refer to the
ON_DATE_WROTE static and its Regex::new invocation) so multi-line attribution
blocks like Gmail's two-line header are matched and stripped.
In `@src/openhuman/people/rpc.rs`:
- Around line 16-54: handle_list currently calls
PeopleStore::interactions_for(p.id) inside the loop, causing N+1 DB round-trips;
add a new PeopleStore::batch_interactions_for(person_ids: &[PersonId]) ->
Result<HashMap<PersonId, Vec<Interaction>>, _> (or similar) that does one SELECT
WHERE person_id IN (...), call it once before the for p in people loop with the
collected person IDs, replace per-person interactions_for calls with lookups
into the returned map, and preserve existing error mapping (map_err(...)) and
use the same score(&interactions, now) path so scoring and JSON construction (in
handle_list) remain unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: d116a681-d12d-464b-9acb-52bdb04c8137
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (52)
Cargo.tomlapp/src-tauri/src/imessage_scanner/mod.rsdocs/superpowers/plans/2026-04-21-imessage-live-harness.mddocs/superpowers/plans/2026-04-22-life-capture-01-foundation.mddocs/superpowers/plans/2026-04-22-track1-ship-pipeline.mddocs/superpowers/specs/2026-04-22-life-capture-layer-design.mdscripts/fixtures/latest.jsonscripts/install.shscripts/test_install.shsrc/core/all.rssrc/core/jsonrpc.rssrc/openhuman/agent/harness/session/builder.rssrc/openhuman/agent/harness/session/runtime.rssrc/openhuman/agent/harness/session/turn.rssrc/openhuman/agent/harness/session/types.rssrc/openhuman/agent/prompts/mod.rssrc/openhuman/curated_memory/mod.rssrc/openhuman/curated_memory/rpc.rssrc/openhuman/curated_memory/runtime.rssrc/openhuman/curated_memory/schemas.rssrc/openhuman/curated_memory/store.rssrc/openhuman/curated_memory/types.rssrc/openhuman/life_capture/embedder.rssrc/openhuman/life_capture/index.rssrc/openhuman/life_capture/migrations.rssrc/openhuman/life_capture/migrations/0001_init.sqlsrc/openhuman/life_capture/migrations/0002_vec.sqlsrc/openhuman/life_capture/mod.rssrc/openhuman/life_capture/quote_strip.rssrc/openhuman/life_capture/redact.rssrc/openhuman/life_capture/rpc.rssrc/openhuman/life_capture/runtime.rssrc/openhuman/life_capture/schemas.rssrc/openhuman/life_capture/tests/e2e.rssrc/openhuman/life_capture/tests/mod.rssrc/openhuman/life_capture/types.rssrc/openhuman/mod.rssrc/openhuman/people/address_book.rssrc/openhuman/people/migrations.rssrc/openhuman/people/migrations/0001_init.sqlsrc/openhuman/people/mod.rssrc/openhuman/people/resolver.rssrc/openhuman/people/rpc.rssrc/openhuman/people/schemas.rssrc/openhuman/people/scorer.rssrc/openhuman/people/store.rssrc/openhuman/people/tests.rssrc/openhuman/people/types.rssrc/openhuman/webview_notifications/types.rstests/fixtures/life_capture/corpus.jsontests/imessage_ingest_e2e.rstests/life_capture_retrieval_eval.rs
| /// Initializes long-lived socket/event-bus infrastructure. | ||
| /// Open MEMORY.md / USER.md at `<workspace>/memories/` and register the | ||
| /// curated-memory runtime singleton. Idempotent — second-init is a no-op. | ||
| async fn bootstrap_curated_memory(workspace_dir: &std::path::Path) { | ||
| use std::sync::Arc; |
There was a problem hiding this comment.
Stale doc comment misattributed to bootstrap_curated_memory.
The existing line 908 — /// Initializes long-lived socket/event-bus infrastructure. — was originally the doc comment for bootstrap_skill_runtime. Inserting bootstrap_curated_memory immediately after it caused that line to become the first line of the new function's doc comment, which is incorrect. bootstrap_skill_runtime (line 1003) is now doc-comment-free as a side effect.
📝 Suggested fix
-/// Initializes long-lived socket/event-bus infrastructure.
-/// Open MEMORY.md / USER.md at `<workspace>/memories/` and register the
-/// curated-memory runtime singleton. Idempotent — second-init is a no-op.
+/// Open MEMORY.md / USER.md at `<workspace>/memories/` and register the
+/// curated-memory runtime singleton. Idempotent — second-init is a no-op.
async fn bootstrap_curated_memory(workspace_dir: &std::path::Path) {And restore the comment above bootstrap_skill_runtime:
+/// Initializes long-lived socket/event-bus infrastructure.
pub async fn bootstrap_skill_runtime() {🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/core/jsonrpc.rs` around lines 908 - 912, The doc comment "Initializes
long-lived socket/event-bus infrastructure." was accidentally moved to
bootstrap_curated_memory; move that doc line (and any other lines belonging to
the original comment block) back above bootstrap_skill_runtime so
bootstrap_skill_runtime regains its intended documentation, and replace or
remove the misplaced comment on bootstrap_curated_memory so its own doc comment
correctly describes curated-memory initialization; locate the functions by their
names bootstrap_curated_memory and bootstrap_skill_runtime to make the change.
| pub async fn handle_read( | ||
| rt: &CuratedMemoryRuntime, | ||
| file: String, | ||
| ) -> Result<RpcOutcome<Value>, String> { | ||
| let store = pick(&file, rt)?; | ||
| let body = store.read().await.map_err(|e| format!("read: {e}"))?; | ||
| Ok(RpcOutcome::new( | ||
| json!({ "file": file, "body": body }), | ||
| vec![], | ||
| )) | ||
| } | ||
|
|
||
| pub async fn handle_add( | ||
| rt: &CuratedMemoryRuntime, | ||
| file: String, | ||
| entry: String, | ||
| ) -> Result<RpcOutcome<Value>, String> { | ||
| let store = pick(&file, rt)?; | ||
| store.add(&entry).await.map_err(|e| format!("add: {e}"))?; | ||
| Ok(RpcOutcome::new(json!({ "file": file, "ok": true }), vec![])) | ||
| } | ||
|
|
||
| pub async fn handle_replace( | ||
| rt: &CuratedMemoryRuntime, | ||
| file: String, | ||
| needle: String, | ||
| replacement: String, | ||
| ) -> Result<RpcOutcome<Value>, String> { | ||
| if needle.is_empty() { | ||
| return Err("needle must not be empty".into()); | ||
| } | ||
| let store = pick(&file, rt)?; | ||
| store | ||
| .replace(&needle, &replacement) | ||
| .await | ||
| .map_err(|e| format!("replace: {e}"))?; | ||
| Ok(RpcOutcome::new(json!({ "file": file, "ok": true }), vec![])) | ||
| } | ||
|
|
||
| pub async fn handle_remove( | ||
| rt: &CuratedMemoryRuntime, | ||
| file: String, | ||
| needle: String, | ||
| ) -> Result<RpcOutcome<Value>, String> { | ||
| if needle.is_empty() { | ||
| return Err("needle must not be empty".into()); | ||
| } | ||
| let store = pick(&file, rt)?; | ||
| store | ||
| .remove(&needle) | ||
| .await | ||
| .map_err(|e| format!("remove: {e}"))?; | ||
| Ok(RpcOutcome::new(json!({ "file": file, "ok": true }), vec![])) | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win
Handler return types violate the RpcOutcome<T> guideline.
All four handlers return Result<RpcOutcome<Value>, String> instead of RpcOutcome<Value>. The Err(String) path is being handled by the adapter in schemas.rs at the call site, but the handler signatures themselves do not comply.
Per coding guidelines: "Use RpcOutcome<T> for RPC controller return types per the AGENTS.md specification" (src/openhuman/*/rpc.rs).
♻️ Proposed fix — representative handler, apply same pattern to all four
-pub async fn handle_read(
+pub async fn handle_read(
rt: &CuratedMemoryRuntime,
file: String,
-) -> Result<RpcOutcome<Value>, String> {
- let store = pick(&file, rt)?;
- let body = store.read().await.map_err(|e| format!("read: {e}"))?;
- Ok(RpcOutcome::new(
- json!({ "file": file, "body": body }),
- vec![],
- ))
+) -> RpcOutcome<Value> {
+ let store = match pick(&file, rt) {
+ Ok(s) => s,
+ Err(e) => return RpcOutcome::error(e),
+ };
+ match store.read().await {
+ Ok(body) => RpcOutcome::new(json!({ "file": file, "body": body }), vec![]),
+ Err(e) => RpcOutcome::error(format!("read: {e}")),
+ }
}Apply the same transformation to handle_add, handle_replace, and handle_remove, converting each Err(...) return to RpcOutcome::error(...).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/openhuman/curated_memory/rpc.rs` around lines 19 - 72, Change the four
handler signatures (handle_read, handle_add, handle_replace, handle_remove) to
return RpcOutcome<Value> instead of Result<RpcOutcome<Value>, String>, and
convert all early Err(...) returns into RpcOutcome::error(...) returns;
specifically, replace uses of the `?` operator on pick(&file, rt) and
map_err(...) on store.read()/add()/replace()/remove() with explicit error
handling that returns RpcOutcome::error(format!(...)) (include the same error
text like "read: {e}", "add: {e}", etc.), and for validation failures (e.g.,
empty needle) return RpcOutcome::error("needle must not be empty".into()); keep
successful paths returning RpcOutcome::new(...) as before.
| pub async fn snapshot_pair( | ||
| memory_store: &MemoryStore, | ||
| user_store: &MemoryStore, | ||
| ) -> std::io::Result<crate::openhuman::curated_memory::types::MemorySnapshot> { | ||
| Ok(crate::openhuman::curated_memory::types::MemorySnapshot { | ||
| memory: memory_store.read().await?, | ||
| user: user_store.read().await?, | ||
| }) |
There was a problem hiding this comment.
snapshot_pair is not actually point-in-time.
It reads the two files independently without holding either store's write lock, so a concurrent add/replace/remove can land between those awaits and produce a mixed snapshot from two different moments.
Suggested fix
pub async fn snapshot_pair(
memory_store: &MemoryStore,
user_store: &MemoryStore,
) -> std::io::Result<crate::openhuman::curated_memory::types::MemorySnapshot> {
+ let _memory_guard = memory_store.write_lock.lock().await;
+ let _user_guard = user_store.write_lock.lock().await;
+
Ok(crate::openhuman::curated_memory::types::MemorySnapshot {
- memory: memory_store.read().await?,
- user: user_store.read().await?,
+ memory: tokio::fs::read_to_string(&memory_store.file_path).await?,
+ user: tokio::fs::read_to_string(&user_store.file_path).await?,
})
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/openhuman/curated_memory/store.rs` around lines 88 - 95, snapshot_pair
currently calls memory_store.read().await and user_store.read().await
separately, which can interleave with concurrent writes and produce a mixed
snapshot; fix by acquiring exclusive locks on both MemoryStore instances before
reading (use the stores' write().await or other exclusive lock API) and always
acquire them in a deterministic order to avoid deadlocks (e.g., compare the
pointer/address or some stable id of memory_store vs user_store and lock the
lower one first, then the other), then read the in-memory contents into the
MemorySnapshot and release both locks.
| impl HostedEmbedder { | ||
| pub fn new(base_url: String, api_key: String, model: String) -> Self { | ||
| let http = reqwest::Client::builder() | ||
| .timeout(std::time::Duration::from_secs(30)) | ||
| .connect_timeout(std::time::Duration::from_secs(10)) | ||
| .build() | ||
| .unwrap_or_else(|_| reqwest::Client::new()); | ||
| Self { | ||
| base_url, | ||
| api_key, | ||
| model, | ||
| http, | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
unwrap_or_else(|_| reqwest::Client::new()) silently drops the configured timeouts.
If the builder fails (rare, but possible on TLS init issues), the fallback client has no request or connect timeout, meaning embed_batch can hang indefinitely. Since a builder failure here would be a fatal misconfiguration, .expect is more appropriate.
🛡️ Proposed fix
- .build()
- .unwrap_or_else(|_| reqwest::Client::new());
+ .build()
+ .expect("failed to build HostedEmbedder HTTP client");📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| impl HostedEmbedder { | |
| pub fn new(base_url: String, api_key: String, model: String) -> Self { | |
| let http = reqwest::Client::builder() | |
| .timeout(std::time::Duration::from_secs(30)) | |
| .connect_timeout(std::time::Duration::from_secs(10)) | |
| .build() | |
| .unwrap_or_else(|_| reqwest::Client::new()); | |
| Self { | |
| base_url, | |
| api_key, | |
| model, | |
| http, | |
| } | |
| } | |
| } | |
| impl HostedEmbedder { | |
| pub fn new(base_url: String, api_key: String, model: String) -> Self { | |
| let http = reqwest::Client::builder() | |
| .timeout(std::time::Duration::from_secs(30)) | |
| .connect_timeout(std::time::Duration::from_secs(10)) | |
| .build() | |
| .expect("failed to build HostedEmbedder HTTP client"); | |
| Self { | |
| base_url, | |
| api_key, | |
| model, | |
| http, | |
| } | |
| } | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/openhuman/life_capture/embedder.rs` around lines 19 - 33, The current
HostedEmbedder::new uses .unwrap_or_else(|_| reqwest::Client::new()) which
silently drops configured timeouts if the builder fails; change this to call
.expect(...) with a clear message (e.g., "failed to build reqwest client with
timeouts") so the program fails loud on misconfiguration instead of returning a
timeout-less client; update the builder call in HostedEmbedder::new (the
reqwest::Client::builder()...build() invocation) to use .expect and include a
descriptive message.
| if let Some(id) = self | ||
| .store | ||
| .lookup(&canonical) | ||
| .await | ||
| .map_err(|e| format!("lookup: {e}"))? | ||
| { | ||
| return Ok(id); | ||
| } | ||
| let id = PersonId::new(); | ||
| let (display_name, primary_email, primary_phone) = match &canonical { | ||
| Handle::DisplayName(s) => (Some(s.clone()), None, None), | ||
| Handle::Email(s) => (None, Some(s.clone()), None), | ||
| Handle::IMessage(s) => { | ||
| if s.contains('@') { | ||
| (None, Some(s.clone()), None) | ||
| } else { | ||
| (None, None, Some(s.clone())) | ||
| } | ||
| } | ||
| }; | ||
| let now = Utc::now(); | ||
| let person = Person { | ||
| id, | ||
| display_name, | ||
| primary_email, | ||
| primary_phone, | ||
| handles: vec![canonical.clone()], | ||
| created_at: now, | ||
| updated_at: now, | ||
| }; | ||
| self.store | ||
| .insert_person(&person, &[canonical]) | ||
| .await | ||
| .map_err(|e| format!("insert_person: {e}"))?; | ||
| Ok(id) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify whether PeopleStore::insert_person uses an atomic upsert /
# ON CONFLICT IGNORE on the handle table, which would make the TOCTOU benign.
rg -n "insert_person\|INSERT\|UNIQUE\|ON CONFLICT" --type=rust -A3 -B3 \
src/openhuman/people/store.rsRepository: tinyhumansai/openhuman
Length of output: 48
🏁 Script executed:
# Find store-related files in the people subdirectory
fd -t f "store" src/openhuman/people/Repository: tinyhumansai/openhuman
Length of output: 96
🏁 Script executed:
# Also check what files exist in the people directory
ls -la src/openhuman/people/Repository: tinyhumansai/openhuman
Length of output: 859
🏁 Script executed:
wc -l src/openhuman/people/store.rsRepository: tinyhumansai/openhuman
Length of output: 100
🏁 Script executed:
cat -n src/openhuman/people/store.rsRepository: tinyhumansai/openhuman
Length of output: 17945
🏁 Script executed:
head -100 src/openhuman/people/migrations.rsRepository: tinyhumansai/openhuman
Length of output: 2718
🏁 Script executed:
cat src/openhuman/people/migrations/0001_init.sqlRepository: tinyhumansai/openhuman
Length of output: 1608
Non-atomic check-then-insert: concurrent calls can mint duplicate PersonIds
The lookup on lines 40–47 and the insert_person on lines 70–73 are separate async operations. Two concurrent resolve_or_create calls for the same canonical handle can both observe None from lookup and then race to insert. Both will insert distinct Person records into the database; the INSERT OR IGNORE on the handle_aliases table prevents a constraint violation there, but the people table rows are already created with different PersonIds, breaking the invariant that the same handle always returns the same id.
The fix requires either an atomic upsert (e.g., INSERT...ON CONFLICT DO UPDATE RETURNING) or a re-select of the canonical handle after insertion to return the actual inserted PersonId rather than the locally generated one.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/openhuman/people/resolver.rs` around lines 40 - 74, The current
resolve_or_create implementation races: it calls store.lookup(&canonical) then
creates a new PersonId with PersonId::new and calls store.insert_person(&person,
&[canonical]) which can produce duplicate people under concurrent requests; fix
by making the create-or-get atomic — either implement an upsert at the store
layer (e.g., an INSERT ... ON CONFLICT DO UPDATE/RETURNING that returns the
existing PersonId) and expose a store.upsert_person_or_get_id(...) used by
resolve_or_create, or after insert_person call re-query store.lookup(&canonical)
and return that id (discarding the locally generated id) to ensure the returned
id is the canonical stored id; modify resolve_or_create, lookup, and
insert_person/upsert methods accordingly so the function always returns the
database-authoritative PersonId.
| outputs: vec![FieldSchema { | ||
| name: "people", | ||
| ty: TypeSchema::Array(Box::new(TypeSchema::Object { | ||
| fields: vec![ | ||
| FieldSchema { | ||
| name: "person_id", | ||
| ty: TypeSchema::String, | ||
| comment: "Stable UUID for this person.", | ||
| required: true, | ||
| }, | ||
| FieldSchema { | ||
| name: "display_name", | ||
| ty: TypeSchema::Option(Box::new(TypeSchema::String)), | ||
| comment: "Best-known display name, when set.", | ||
| required: false, | ||
| }, | ||
| FieldSchema { | ||
| name: "primary_email", | ||
| ty: TypeSchema::Option(Box::new(TypeSchema::String)), | ||
| comment: "Primary email, when set.", | ||
| required: false, | ||
| }, | ||
| FieldSchema { | ||
| name: "primary_phone", | ||
| ty: TypeSchema::Option(Box::new(TypeSchema::String)), | ||
| comment: "Primary phone, when set.", | ||
| required: false, | ||
| }, | ||
| FieldSchema { | ||
| name: "score", | ||
| ty: TypeSchema::F64, | ||
| comment: "Composite person-score in [0,1].", | ||
| required: true, | ||
| }, | ||
| FieldSchema { | ||
| name: "interaction_count", | ||
| ty: TypeSchema::U64, | ||
| comment: "Observed interactions contributing to the score.", | ||
| required: true, | ||
| }, | ||
| ], | ||
| })), | ||
| comment: "Ranked people, highest score first.", | ||
| required: true, | ||
| }], |
There was a problem hiding this comment.
components (and handles) are absent from both output schemas but present in the RPC responses
handle_score returns "components": { "recency", "frequency", "reciprocity", "depth" } (rpc.rs lines 133–138), but the score output schema only declares person_id, score, and interaction_count.
handle_list additionally returns a "handles" array per person (rpc.rs lines 28–35) and per-person "components" (rpc.rs lines 44–49); neither appears in the list output schema.
These omissions make the schemas inaccurate as documentation and would cause any code-generation or LLM tool-use path relying on schema introspection to ignore the most useful fields.
Also applies to: 158-177
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/openhuman/people/schemas.rs` around lines 60 - 104, The output schemas
for the people RPCs are missing the "components" object and "handles" array that
the RPC responses actually return; update the FieldSchema entries inside the
outputs vector (the TypeSchema::Array of TypeSchema::Object) to include a
"components" FieldSchema (TypeSchema::Object with fields "recency", "frequency",
"reciprocity", "depth" as TypeSchema::F64) and a "handles" FieldSchema
(TypeSchema::Array of TypeSchema::Object describing per-handle fields returned
by handle_list), matching required flags and comments to the actual responses
used by handle_score and handle_list so schema introspection matches rpc.rs
behavior.
| //! same process-global `life_capture::runtime` OnceCells (index + embedder). | ||
| //! That's by design: the OnceCells model the production startup contract, and | ||
| //! the tests run serially so a single shared init is correct. | ||
| //! | ||
| //! Run with: `cargo test --test imessage_ingest_e2e -- --ignored --nocapture` |
There was a problem hiding this comment.
Parallel execution makes the total_items == 1 assertion non-deterministic.
The file comment says "the tests run serially", but the run command cargo test --test imessage_ingest_e2e -- --ignored --nocapture uses the default multi-thread runner. Both tests share the same process-global SQLite DB (via the OnceCell / leaked tempdir). If search_response_shape_matches_schema wins the race and seeds "shape-guard:2026-04-22" before ingest_then_search_then_reingest_is_idempotent hits the stats assertion, total_items will be 2 and the test fails.
Two equivalent fixes:
Option A — enforce serial execution in the run instructions:
-//! Run with: `cargo test --test imessage_ingest_e2e -- --ignored --nocapture`
+//! Run with: `cargo test --test imessage_ingest_e2e -- --ignored --nocapture --test-threads=1`Option B — make the assertion relative (robust to parallel runs and more future-proof):
📝 Relative-count assertion
+ // Capture the baseline item count before ingesting anything.
+ let env_stats0 = post_rpc(&base, "openhuman.life_capture_get_stats", json!({})).await;
+ let stats0_total = rpc_result_body(&env_stats0)["total_items"]
+ .as_u64()
+ .expect("total_items u64");
+
// First ingest — should INSERT.
// ... (existing ingest call) ...
// Stats — should show exactly one new item.
let env_stats1 = post_rpc(&base, "openhuman.life_capture_get_stats", json!({})).await;
let stats1 = rpc_result_body(&env_stats1);
assert_eq!(
stats1["total_items"],
- json!(1),
- "expected total_items=1 after first ingest, got {stats1}"
+ json!(stats0_total + 1),
+ "expected exactly one new item after first ingest, got {stats1}"
);Also applies to: 63-76, 160-166
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/imessage_ingest_e2e.rs` around lines 9 - 13, The tests share a
process-global SQLite DB so the assertion that total_items == 1 in
ingest_then_search_then_reingest_is_idempotent (and the similar assertions
around lines 63-76 and 160-166) is flaky under parallel test execution; either
enforce serial runs in the test invocation or, preferably, make the assertions
relative/robust by checking the expected delta or using >=1 (e.g. assert
total_items >= previous_count + 1 or compare counts before/after to assert an
increment) and update the assertions in search_response_shape_matches_schema and
ingest_then_search_then_reingest_is_idempotent accordingly so they no longer
assume an absolute count of 1.
…xtures Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
…uplicates - Introduce CuratedMemoryPromptSnapshot in prompt types so CI builds without the curated_memory domain still resolve PromptContext. - Use that type for ParentExecutionContext, Agent session storage, and PromptContext; map from curated_memory::MemorySnapshot when taking snapshots. - Remove duplicate curated_snapshot fields from bad merges (struct literals and PromptContext construction). - Fix learning tests PromptContext initializer. Co-authored-by: Cursor <cursoragent@cursor.com>
CI failed with E0063 after ParentExecutionContext gained curated_snapshot; triage escalation, session turn builder, subagent tests, and integration stubs were still missing the field. Co-authored-by: Cursor <cursoragent@cursor.com>
|
Closing as superseded. The contacts/person resolution work we still need is split into #1259. The remaining memory/life-capture/curated-memory portions are intentionally not being carried forward here because that work is covered separately. |
Summary
Adds the A5 piece of the v1 plan: the
peoplemodule — handle resolution, scoring, and macOS Address Book seeding.src/openhuman/people/—mod,types,resolver,scorer,store,address_book,rpc,schemas,migrations,testsHandleResolvermaps (iMessage handle | email | display name) →PersonIdobjc2-contactsCNContactStore— triggers macOS Contacts TCC prompt, propagatesPermissionDeniedstructurally through the RPC response (doesn't fail silently)people.list,people.resolve,people.score,people.refresh_address_bookDependency
feat/life-capture-foundation). Diff includes A1 (be1112f9) on top of A5. Rebases off once #817 lands.Shell counterpart:
NSContactsUsageDescriptionstring added toapp/src-tauri/Info.plistontest/imessage-live-e2e(commitb9c4233a). Without it, CNContactStore returns empty rows silently under a signed build.Test plan
MockContactsSourcecovers happy / permission-denied / empty / display-name-only pathscargo build --testscleancargo test --lib people— 36 passedKnown follow-ups (not blockers)
add_aliasis INSERT OR IGNORE; Contact renames leave staleDisplayNamealiases. Resolution stays deterministic;Person.handlesaccrues over time. Follow-up PR.Summary by CodeRabbit
New Features
MEMORY.mdandUSER.mdfiles injected into conversation context.Documentation
Chores