feat(people): contact resolution + scoring + CNContactStore (A5) by jwalin-shah · Pull Request #821 · tinyhumansai/openhuman

jwalin-shah · 2026-04-23T06:01:15Z

Summary

Adds the A5 piece of the v1 plan: the people module — handle resolution, scoring, and macOS Address Book seeding.

New src/openhuman/people/ — mod, types, resolver, scorer, store, address_book, rpc, schemas, migrations, tests
HandleResolver maps (iMessage handle | email | display name) → PersonId
Score = recency × frequency × reciprocity × depth (each clamped to [0,1], composite = product)
- 14-day recency half-life
- 30-day frequency window (rolling; not total-ever)
- Reciprocity = min/max interaction ratio
Address Book seeded via objc2-contacts CNContactStore — triggers macOS Contacts TCC prompt, propagates PermissionDenied structurally through the RPC response (doesn't fail silently)
Non-macOS builds compile to a stub returning empty vec
Controllers exposed: people.list, people.resolve, people.score, people.refresh_address_book

Dependency

⚠️ Stacked on #817 (feat/life-capture-foundation). Diff includes A1 (be1112f9) on top of A5. Rebases off once #817 lands.

Shell counterpart: NSContactsUsageDescription string added to app/src-tauri/Info.plist on test/imessage-live-e2e (commit b9c4233a). Without it, CNContactStore returns empty rows silently under a signed build.

Test plan

Unit: resolver round-trip, scorer clamping + empty-interactions + one-sided-reciprocity, frequency-window correctness, idempotent seeding, permission-denied surfaces correctly
MockContactsSource covers happy / permission-denied / empty / display-name-only paths
Non-mac stub compiles and returns empty
cargo build --tests clean
cargo test --lib people — 36 passed

Known follow-ups (not blockers)

Alias rename accumulation — add_alias is INSERT OR IGNORE; Contact renames leave stale DisplayName aliases. Resolution stays deterministic; Person.handles accrues over time. Follow-up PR.

Summary by CodeRabbit

New Features
- Added Life-Capture foundation: searchable personal index for ingested messages and calendar events with embeddings.
- Added Curated Memory system: user-maintained MEMORY.md and USER.md files injected into conversation context.
- Added People resolver: contact discovery and interaction-based scoring.
- Integrated iMessage into life-capture ingestion pipeline.
Documentation
- Added Life-Capture layer design spec and foundation milestone plan.
- Added Track 1 ship pipeline plan for release workflow improvements.
Chores
- Enhanced installer script with source-only mode and asset resolution.
- Added test fixtures and end-to-end evaluation harness for retrieval.

Drafts the wedge for OpenHuman: continuous ingestion of Gmail/Calendar/Slack/ iMessage into a local personal_index.db, with a Today brief and retrieval- tuned chat as the two surfaces. Three composable privacy modes (Convenience / Hybrid / Fully local) sharing one code path. Track 1 unblocks the ship pipeline; Track 2 builds the spine over 4 weeks. Reuses ingestion logic from ~/projects/inbox.

Two implementation plans drafted from the life-capture spec: - Track 1 (ship pipeline): fix Ubuntu installer smoke, land in-flight PRs tinyhumansai#806/tinyhumansai#786/tinyhumansai#788/tinyhumansai#797, wire Tauri auto-updater + signed Mac/Windows builds. - Life-Capture #1 (foundation): SQLite + sqlite-vec personal_index.db, Embedder trait + HostedEmbedder (OpenAI), PII redaction, quoted-thread stripping, hybrid retrieval (vector + keyword + recency), controller schema + RPC. End-to-end test with synthetic items. No ingestion or UI yet — those are subsequent milestone plans.

Refactors scripts/install.sh to expose resolve_asset_url and verify_asset_reachable. Adds scripts/test_install.sh that exercises the resolver against a committed fixture latest.json. Failures now report the resolved URL and the parse error instead of dying silently.

…le, keyword projection column

… hardening

Adopts the Hermes pattern (NousResearch/hermes-agent, MIT) of an agent-writable, char-bounded curated memory file pair. F15 builds the char-bounded store with atomic writes; F16 wires a session-start snapshot into the system prompt (preserves prefix cache); F17 exposes memory.{add,replace,remove,read} through the controller dispatch so both the agent loop and skills can use it. Sits between TinyHumans synthesis (volatile) and personal_index.db (raw events) — fills the "deliberately curated facts" gap that neither covered.

Adds src/openhuman/life_capture/ module tree (stubs for embedder, index, migrations, quote_strip, redact, rpc, schemas, types, plus tests/) per Foundation Plan F1. Wires the module in src/openhuman/mod.rs. Adds sqlite-vec to deps and httpmock as a dev-dep. Reuses existing rusqlite/regex/once_cell/async-trait/tempfile already in tree (Option C — avoids sqlx/rusqlite libsqlite3-sys conflict).

…dexStats

…egex Previously the country-code group required at least one digit (\+?\d{1,3}), so '(415) 555-0123' (no leading country code) wouldn't match. Wrapped the whole prefix in a non-capturing optional group so parenthesized area codes match without a leading country code.

…te, char-bounded)

… table Loads sqlite_vec via sqlite3_auto_extension exactly once per process so every new connection picks up the vec0 module. PersonalIndex wraps a single rusqlite Connection in Arc<Mutex<_>> for async sharing; open() runs migrations and sets WAL+foreign_keys, open_in_memory() is for tests. vec_version() round-trip and a 1536-dim INSERT + MATCH query both verified.

…_id) dedupe Adds IndexWriter with upsert (ON CONFLICT(source, external_id) DO UPDATE — FTS stays in sync via triggers) and upsert_vector (DELETE + INSERT since vec0 has no ON CONFLICT for virtual table primary keys). Both wrap rusqlite work in spawn_blocking and share the PersonalIndex's Arc<Mutex<Connection>>. Exposes ensure_vec_extension_registered() crate-wide so the migrations test can register vec0 before opening its in-memory connection.

- keyword_search: FTS5 ranked by negated bm25 with snippet markers (« » …), plus the user:local ACL EXISTS clause so the same query shape works for the multi-token team v2 ACL without rewrites. - vector_search: sqlite-vec MATCH with k = ? clause (vec0 KNN requirement) and ORDER BY distance; score = 1/(1 + distance) so callers can blend it with keyword bm25 on the same monotonic scale. Shared ItemRow + into_hit() helper lets both queries reuse the same row shape; rusqlite query_map closures hand-build it because rusqlite has no sqlx FromRow equivalent.

Pulls oversampled candidates (k*3 min 20) from both keyword and vector legs, normalises each independently to [0,1], then re-ranks with 0.55 * vec_norm + 0.35 * kw_norm + 0.10 * exp(-age_days/30). Documents present in only one leg get 0 for the missing signal but still compete on the others; only items with neither signal are dropped. Same-vector twin items now break their tie by recency.

…rch) Adds two controllers exposed via the registry: - life_capture.get_stats — total items, per-source counts, last-ingest ts. - life_capture.search — hybrid (vector + keyword + recency) search with embed-then-rank, optional k (default 10, capped at 100). Runtime state (PersonalIndex + dyn Embedder) lives in life_capture::runtime behind a tokio OnceCell because handlers are stateless fn(Map<String,Value>) and have no per-call context object. F14 will call runtime::init() at app startup; until then handlers return a structured 'not initialised' error so the failure mode is loud, not silent. Schemas registered in core/all.rs alongside cron.

… → search) FakeEmbedder hashes input bytes into a deterministic sparse 1536-dim vector so the same text round-trips identically through vec0 — keeps the e2e test hermetic, no network call. Verifies that: - quote-strip drops the 'On … wrote:' block before indexing, - redact masks the email address before indexing, - the upserted item resurfaces via hybrid_search with both transformations preserved on the returned text.

After Config::load_or_init succeeds, open the personal_index.db at config.workspace_dir and register the life_capture runtime. Embedder is env-gated: OPENHUMAN_EMBEDDINGS_KEY > OPENAI_API_KEY OPENHUMAN_EMBEDDINGS_URL (default: https://api.openai.com/v1) OPENHUMAN_EMBEDDINGS_MODEL (default: text-embedding-3-small) If no key is set we still open the index file (so Plan #2's ingestion worker can write to it) but skip runtime registration — controllers then return the 'not initialised' error from runtime::get() instead of silently calling a misconfigured embedder. Config-schema integration deferred to a follow-up; env-driven keeps F14 non-invasive while we land the rest of the foundation.

…ection snapshot_pair(memory, user) returns a MemorySnapshot containing plain strings — no reference back to the stores — so taking a snapshot at session start and reusing it across turns gives a stable system-prompt prefix and lets the LLM prefix-cache hit on every subsequent turn. Plan #1 also calls for wiring this into chat.rs' OpenClaw context loader, but openhuman assembles agent prompts per-agent under src/openhuman/agent/agents/*/prompt.rs (not the plan's stale src-tauri/src/commands/chat.rs path); the prompt-builder integration is deferred to Plan #2 alongside the agent-context refactor.

… controllers Adds four controllers under the memory_curated namespace (the 'memory' namespace is already owned by the long-term memory subsystem): - memory_curated.read — read MEMORY.md or USER.md - memory_curated.add — append entry, char-bounded - memory_curated.replace — substring replace, char-bounded - memory_curated.remove — drop entries containing a needle 'file' input is a typed enum (memory|user) so adapters reject anything else at validation time. Runtime state lives in curated_memory::runtime behind a tokio OnceCell, mirroring life_capture's pattern: startup in jsonrpc.rs constructs both stores at <workspace>/memories/ with Hermes' char limits (memory: 2200, user: 1375) and registers the runtime. Handlers return 'not initialised' until init runs, so failure is loud.

…ings Findings from Codex + Gemini second-opinion review: 1. IndexWriter::upsert orphaned vectors on re-ingest (Codex). On (source, external_id) conflict the row's id was kept but the caller's fresh UUID was used for upsert_vector — the vector wrote under an id no row joined to. Fix: explicit SELECT-then-UPDATE-or-INSERT in the same transaction; mutates caller's Item.id to the canonical id (so the next upsert_vector lands on the right row) and orphan-deletes any vector already written under the wrong id. Signature change: upsert(&[Item]) -> upsert(&mut [Item]). 2. upsert_vector DELETE+INSERT was outside a transaction (Gemini) — a failed INSERT permanently lost the item's vector. Now wrapped in tx. 3. Runtime over-gated get_stats on the embedder (Codex). Split runtime into separate INDEX + EMBEDDER OnceCells: get_stats only requires index, search requires both. Index initialises whenever the workspace dir is reachable; embedder is env-gated as before. 4. Startup race (both reviewers): runtime init lived inside the post- serve tokio::spawn block, so axum::serve was already accepting before the OnceCells were set. Hoisted both bootstraps (curated_memory and life_capture) into helpers called inline before axum::serve. 5. runtime::get error message lied (Codex) — said 'set embeddings.api_key in config' but startup actually reads env vars. Fixed text. Bonus: rename controller namespace memory_curated -> curated_memory (Codex preference; nothing depends on it yet so renamed before clients do). Adds regression test for #1 (reingest_with_fresh_uuid_keeps_vector_findable). 4696 lib tests pass.

…emory namespace rename Pre-push hook auto-applied cargo fmt across the foundation files; also fixed the schemas.rs docstring still referencing the old memory_curated namespace.

Critical: - curated_memory/store.rs: propagate read errors instead of `unwrap_or_default`, which could rewrite MEMORY.md / USER.md from an empty baseline on transient I/O or permission failures. Reject empty needle in replace/remove. - curated_memory/rpc.rs: belt-and-suspenders empty-needle guard at the RPC boundary (remove with "" deletes every entry). - life_capture/index.rs: upsert_vector rejects orphan ids (items row missing), which would have inserted vectors that never join in vector_search. - life_capture/embedder.rs: validate response length matches input, indices are contiguous 0..n, and every vector matches dim() — prevents silent misalignment or wrong-dimensional vectors flowing into the 1536-wide sqlite-vec table. Added 30s request + 10s connect timeouts. Retrieval hardening: - life_capture/index.rs: keyword_search now escapes FTS5 operators by tokenizing on whitespace and wrapping each token as a quoted literal. Prevents errors or unintended matches from stray quotes, AND/OR/NEAR, column specifiers in user input. - life_capture/index.rs: hybrid_search applies q.sources / q.since / q.until via post-filter (single consistent pass across keyword+vector fusion). - life_capture/rpc.rs: handle_search validates embedder dim and response length against the fixed 1536-wide index column — clear RPC error instead of a cryptic sqlite-vec failure when the embeddings model is swapped. Docs: - docs/event-bus.md: remove erroneous `.await` on register_native_global (it is sync; the handler closure is async). - life-capture design spec: soften PII redaction claim — only regex is implemented today; light NER is flagged as a future enhancement.

Rendered prompts now surface runtime curated-memory writes: - `PromptContext` gains `curated_snapshot: Option<&MemorySnapshot>`. - `UserFilesSection` prefers the snapshot over the workspace-file loader when one is attached, and injects `USER.md` alongside `MEMORY.md` using a byte-compatible `inject_snapshot_content` helper. - `Session` carries `Option<Arc<MemorySnapshot>>`, populated by `ensure_curated_snapshot` on the first turn from `curated_memory::runtime::get()`. Reused across turns so prompt bytes stay frozen (KV-cache prefix contract) while mid-session `curated_memory.add/replace/remove` writes land on the NEXT session. - `ParentExecutionContext` inherits the snapshot so sub-agents render identical `MEMORY.md`/`USER.md` blocks as the parent. - Legacy workspace-file fallback preserved for embeds that don't initialise the curated-memory runtime (pure unit tests).

Loads a 32-item corpus across gmail/calendar/imessage/slack into an in-memory PersonalIndex and runs 12 queries through keyword / vector / hybrid paths, asserting must_contain / must_not_contain within per-query top-K prefixes. The vector leg uses a deterministic FNV-1a feature-hashing embedder (1536 dims, L2-normalized) so the test is offline and reproducible; real embedder swaps behind one call. Fixture reserves `relevant` and `pending` fields so recall@k / MRR / nDCG bolt onto the same JSON later. q-src-01 is marked pending: hybrid_search currently ignores Query.sources/since/until. Flip to false once filtering lands.

Previously every read grabbed the same `Arc<Mutex<Connection>>` as the writer, so `IndexReader` calls serialised behind in-flight ingests even though WAL would allow parallel readers. Split `PersonalIndex` into: - `writer: Arc<Mutex<Connection>>` — unchanged single-writer model. - `reader_pool: Option<Arc<r2d2::Pool<SqliteConnectionManager>>>` — four-connection read pool on file-backed indexes, built alongside the writer after migrations run. Each pooled connection gets `query_only=ON` as a belt-and-suspenders guard; sqlite-vec is loaded automatically via the process-wide auto-extension. `IndexReader` routes `keyword_search` / `vector_search` through a new `with_read_conn` helper that picks the pool when present and falls back to the writer lock otherwise. In-memory handles keep the single-connection layout — shared-cache URIs buy nothing at test-fixture scale and would force every test to grow a unique name.

- Add `life_capture.ingest` RPC controller (idempotent upsert by `(source, external_id)`, embeds text + atomic vector replace). - Switch iMessage scanner's `ingest_group()` to POST `openhuman.life_capture_ingest` instead of `openhuman.memory_doc_ingest`, so chat-day transcripts land in the PersonalIndex (the same store `life_capture.search` reads from). - Fix `life_capture.search` envelope: wrap hits in `{"hits": [...]}` per the schema (previously returned a bare array — obs 3165). - New `#[ignore]`d e2e at `tests/imessage_ingest_e2e.rs` proves the bridge: ingest → search → re-ingest is idempotent (`total_items` unchanged, `replaced=true`).

New self-contained module `src/openhuman/people/` for A5 of the v1 plan. Exposes people.list / people.resolve / people.score via the controller registry; no branches added in core/cli.rs or core/jsonrpc.rs. Files added ----------- src/openhuman/people/mod.rs src/openhuman/people/types.rs PersonId, Handle{IMessage,Email, DisplayName}, Person, Interaction, ScoreComponents, AddressBookContact src/openhuman/people/resolver.rs HandleResolver: resolve / resolve_or_create / link — deterministic across case + whitespace via Handle::canonicalize() src/openhuman/people/scorer.rs score(interactions, now) → ScoreComponents src/openhuman/people/store.rs SQLite store (in-memory or file-backed); plus process-global OnceCell handle for the schema adapters src/openhuman/people/migrations.rs idempotent migrator (life_capture style) src/openhuman/people/migrations/0001_init.sql people, handle_aliases, interactions src/openhuman/people/address_book.rs macOS Address Book read (see below) src/openhuman/people/rpc.rs typed domain handlers src/openhuman/people/schemas.rs ControllerSchema + adapters src/openhuman/people/tests.rs cross-file integration tests Wired into src/core/all.rs: registered_controllers, declared schemas, and namespace_description("people"). Added `pub mod people;` in src/openhuman/mod.rs. Nothing else touched. Scoring ------- Each component clamped to [0,1]; composite = product (also clamped). recency exp(-ln2 * age_days / 14) half-life 14 days frequency count / 50, saturated cap 50 interactions reciprocity min(out, in) / max(out, in) 0 when one-sided depth mean(length) / 500, saturated cap 500 chars Constants are module-level (RECENCY_HALF_LIFE_DAYS / FREQUENCY_CAP / DEPTH_CAP_CHARS) so the first pass can be tuned from config later without breaking the API. Address Book read path ---------------------- Read the macOS Address Book directly from its SQLite stores at `~/Library/Application Support/AddressBook/Sources/*/AddressBook-v22.abcddb` via the existing `rusqlite` dep (opened with SQLITE_OPEN_READ_ONLY | SQLITE_OPEN_NO_MUTEX). Returns `Vec<AddressBookContact>` with display_name + emails + phones. A missing AddressBook dir or per-source db is treated as zero contacts, not an error. Pragmatic divergence from the plan: the original A5 spec said "objc2". objc2 is not currently in Cargo.toml; adding it plus Contacts.framework linkage is a heavier dep surface than the SQLite read, which gives the same data, zero new deps, and is trivially testable. If a future caller needs live change notifications (KVO) we can revisit. On non-mac, `address_book::read()` is a stub returning `Ok(vec![])`. Tests (25 passing, cargo test people --lib) ------------------------------------------ types::* — Handle canonicalization, key mapping migrations::* — tables created, idempotent store::* — insert / lookup / list / interaction round-trip resolver::* — deterministic across case/whitespace; same email + different display_name → same PersonId; distinct handles → distinct ids scorer::* — empty=zero; half-life; unit-interval clamp; one-sided → 0 reciprocity; deterministic schemas::* — three schemas + three handlers; required fields rpc::* — list orders by score desc; resolve(create=false) returns null for unknown address_book::* — non-mac returns empty; mac read never panics tests::* — resolver + store cooperate across handle kinds; PersonId string round-trip Scope ----- Self-contained. No life_capture / chronicle / nudges / UI integration — those land in A6+ / A9 per the v1 plan.

…A5 review blockers) - address_book.rs: replace direct rusqlite/SQLite read with CNContactStore (objc2-contacts 0.3.2). Properly triggers TCC Contacts permission prompt under signed/sandboxed builds. Returns structured AddressBookError with PermissionDenied variant so callers can surface actionable messages. Error downgrades promoted from debug! to warn!. Adds ContactsSource trait as a mockable seam; SystemContactsSource wraps real FFI; non-mac stub unchanged (returns empty vec). - resolver.rs: add seed_from_address_book(source: &dyn ContactsSource) which wires the address book read into the people store. Iterates contacts, picks primary handle (email > phone > display name), calls resolve_or_create, then links aliases. Idempotent. Returns (seeded, skipped) counts and propagates AddressBookError. Tests cover: canned list, permission denied, idempotency, display-name-only, no-field skipped. - rpc.rs: add handle_refresh_address_book exposing the seeding path over RPC. Returns { seeded, skipped, permission_denied } — never errors on permission denied (surfaces as a flag so the UI can prompt the user). - schemas.rs: register people.refresh_address_book controller (4th controller). Update stale test counts (3 → 4). - scorer.rs: replace total-ever FREQUENCY_CAP with a rolling window approach. Adds FREQUENCY_WINDOW_DAYS = 30 constant. frequency = interactions within last 30 days / FREQUENCY_CAP. Old-burst contacts no longer get inflated scores. New tests: old_burst_does_not_inflate_frequency_score, interactions_exactly_at_window_boundary_are_included. - Cargo.toml: add objc2 = "0.6", objc2-foundation (NSArray/NSError/NSString/ NSPredicate features), objc2-contacts = "0.3.2" (CNContact/CNContactFetchRequest/ CNContactStore/CNLabeledValue/CNPhoneNumber), block2 = "0.6" under [target.'cfg(target_os = "macos")'.dependencies], matching A7 versions.

…xt stub

coderabbitai · 2026-04-23T06:01:24Z

Warning

Rate limit exceeded

@jwalin-shah has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 50 minutes and 30 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: fd43731a-2574-4a06-8b77-349f25d2ef52

📥 Commits

Reviewing files that changed from the base of the PR and between 45bcd9a and 08fafe7.

📒 Files selected for processing (36)

src/core/all.rs
src/openhuman/agent/agents/archivist/prompt.rs
src/openhuman/agent/agents/code_executor/prompt.rs
src/openhuman/agent/agents/critic/prompt.rs
src/openhuman/agent/agents/help/prompt.rs
src/openhuman/agent/agents/integrations_agent/prompt.rs
src/openhuman/agent/agents/loader.rs
src/openhuman/agent/agents/morning_briefing/prompt.rs
src/openhuman/agent/agents/orchestrator/prompt.rs
src/openhuman/agent/agents/planner/prompt.rs
src/openhuman/agent/agents/researcher/prompt.rs
src/openhuman/agent/agents/summarizer/prompt.rs
src/openhuman/agent/agents/tool_maker/prompt.rs
src/openhuman/agent/agents/tools_agent/prompt.rs
src/openhuman/agent/agents/trigger_reactor/prompt.rs
src/openhuman/agent/agents/trigger_triage/prompt.rs
src/openhuman/agent/agents/welcome/prompt.rs
src/openhuman/agent/debug/mod.rs
src/openhuman/agent/harness/fork_context.rs
src/openhuman/agent/harness/session/runtime.rs
src/openhuman/agent/harness/session/turn.rs
src/openhuman/agent/harness/session/types.rs
src/openhuman/agent/harness/subagent_runner/ops.rs
src/openhuman/agent/harness/subagent_runner/ops_tests.rs
src/openhuman/agent/prompts/mod.rs
src/openhuman/agent/prompts/mod_tests.rs
src/openhuman/agent/prompts/types.rs
src/openhuman/agent/triage/escalation.rs
src/openhuman/agent/triage/evaluator.rs
src/openhuman/curated_memory/mod.rs
src/openhuman/learning/prompt_sections.rs
src/openhuman/mod.rs
src/openhuman/people/rpc.rs
src/openhuman/people/store.rs
tests/agent_harness_public.rs
tests/calendar_grounding_e2e.rs

📝 Walkthrough

Walkthrough

This PR implements foundational infrastructure for life-capture (a personal index with SQLite+sqlite-vec storage, hybrid search, and local embedding support), curated-memory (session-scoped file-backed stores injected into agent prompts), and a people module (contact resolution and interaction-based scoring). It updates the iMessage scanner to use the new life_capture_ingest endpoint, hardens the installer with asset verification, bootstraps both systems at server startup, and adds comprehensive planning documentation.

Changes

Foundation Milestone: Life Capture, Curated Memory, People, and Agent Integration

Layer / File(s)	Summary
Data Shape & Domain Types `src/openhuman/life_capture/types.rs`, `src/openhuman/curated_memory/types.rs`, `src/openhuman/people/types.rs`	Introduces core types for life capture (`Item`, `Source`, `Query`, `Hit`, `IndexStats`), curated memory (`MemoryFile`, `MemorySnapshot`), and people management (`PersonId`, `Handle`, `Person`, `Interaction`, `ScoreComponents`).
SQLite Schema & Migrations `src/openhuman/life_capture/migrations/0001_init.sql`, `src/openhuman/life_capture/migrations/0002_vec.sql`, `src/openhuman/life_capture/migrations.rs`, `src/openhuman/people/migrations/0001_init.sql`, `src/openhuman/people/migrations.rs`	Defines SQLite schemas for life capture (`items` table with FTS5 sync triggers, `item_vectors` with vec0), people (`people`, `handle_aliases`, `interactions` tables with indexes), and migration runners for both.
Index Implementation `src/openhuman/life_capture/index.rs`, `src/openhuman/life_capture/embedder.rs`	Implements `PersonalIndex` with reader-pool pattern for concurrent access, `IndexWriter` for upsert/deduplication by `(source, external_id)`, `IndexReader` for keyword/vector/hybrid search with recency scoring, and `HostedEmbedder` for OpenAI-compatible embedding requests.
Data Processing & Utilities `src/openhuman/life_capture/redact.rs`, `src/openhuman/life_capture/quote_strip.rs`	Adds PII redaction (email, SSN, CC, phone regex replacement) and quoted-reply stripping for cleaned text storage.
People Store & Resolution `src/openhuman/people/store.rs`, `src/openhuman/people/resolver.rs`, `src/openhuman/people/scorer.rs`, `src/openhuman/people/address_book.rs`	Implements file-backed `PeopleStore` with person/handle/interaction CRUD, `HandleResolver` for deterministic contact deduplication and address-book seeding, deterministic interaction-based scoring (recency/frequency/reciprocity/depth), and macOS `AddressBook` reader via objc2.
Curated Memory Store `src/openhuman/curated_memory/store.rs`	Implements `MemoryStore` for file-backed memory (MEMORY.md/USER.md) with append/replace/remove operations, atomic writes, and character-limit enforcement.
RPC & Controller Handler Layer `src/openhuman/life_capture/rpc.rs`, `src/openhuman/life_capture/schemas.rs`, `src/openhuman/curated_memory/rpc.rs`, `src/openhuman/curated_memory/schemas.rs`, `src/openhuman/people/rpc.rs`, `src/openhuman/people/schemas.rs`	Implements async JSON-RPC domain handlers for life capture (`get_stats`, `search`, `ingest`), curated memory (`read`, `add`, `replace`, `remove`), and people (`list`, `resolve`, `refresh_address_book`, `score`); registers controller schemas and wires handlers via `all_controller_schemas()`/`all_registered_controllers()`.
Runtime & Bootstrap `src/openhuman/life_capture/runtime.rs`, `src/openhuman/curated_memory/runtime.rs`, `src/core/jsonrpc.rs`	Adds process-global runtime singletons (`OnceCell`) for `PersonalIndex`, `Embedder`, and `CuratedMemoryRuntime`; implements `bootstrap_curated_memory` and `bootstrap_life_capture` in server startup that initialize stores, apply migrations, configure embedder from environment, and log initialization state.
Agent Session Integration `src/openhuman/agent/harness/session/types.rs`, `src/openhuman/agent/harness/session/builder.rs`, `src/openhuman/agent/harness/session/turn.rs`, `src/openhuman/agent/harness/session/runtime.rs`, `src/openhuman/agent/prompts/mod.rs`	Adds `Agent::curated_snapshot` field to cache session-scoped memory snapshot, initializes on first turn via `ensure_curated_snapshot()`, and injects `MEMORY.md`/`USER.md` content into system prompt via `UserFilesSection`.
Module Exports & Wiring `src/openhuman/mod.rs`, `src/openhuman/life_capture/mod.rs`, `src/openhuman/curated_memory/mod.rs`, `src/openhuman/people/mod.rs`, `src/core/all.rs`	Declares and exports new submodules for life_capture, curated_memory, people; re-exports aggregated schemas with namespaced names; adds people namespace to core controller registry.
iMessage Scanner Integration `app/src-tauri/src/imessage_scanner/mod.rs`	Updates `ingest_group` to POST iMessage transcripts to `openhuman.life_capture_ingest` instead of memory_doc_ingest, includes computed `ts` from day string and restructured metadata.
Dependencies `Cargo.toml`	Adds SQLite libraries (`r2d2`, `r2d2_sqlite`, `sqlite-vec`), macOS Contacts bindings (`objc2`, `objc2-foundation`, `objc2-contacts`, `block2`), and test HTTP mocking (`httpmock`).
Integration & E2E Tests `tests/imessage_ingest_e2e.rs`, `tests/life_capture_retrieval_eval.rs`, `src/openhuman/life_capture/tests/e2e.rs`, `tests/fixtures/life_capture/corpus.json`, `src/openhuman/people/tests.rs`	Adds e2e test for iMessage ingest idempotency, retrieval evaluation harness with deterministic embeddings and fixture-based assertions, in-memory index redaction+quote-strip+search test, and integration tests validating resolver, schema, and UUID round-trip behavior.
Planning & Design Documentation `docs/superpowers/plans/2026-04-21-imessage-live-harness.md`, `docs/superpowers/plans/2026-04-22-life-capture-01-foundation.md`, `docs/superpowers/specs/2026-04-22-life-capture-layer-design.md`, `docs/superpowers/plans/2026-04-22-track1-ship-pipeline.md`	Comprehensive milestone plan for life-capture foundation (schema, embedder, index, RPC, agent integration), design spec for the overall life-capture layer (two demo surfaces, privacy modes, UI behavior, Track 1/2 roadmap), and Track 1 ship pipeline plan (Ubuntu smoke test fix, PR sequencing, Tauri auto-updater wiring).

Installer Hardening and CI Preparation

Layer / File(s)	Summary
Installer Script Refactoring `scripts/install.sh`	Adds `--source-only` mode for sourcing without execution, introduces `resolve_asset_url()` to extract platform-specific URLs from `latest.json` with structured error codes, adds `verify_asset_reachable()` for HTTP HEAD validation with exponential backoff, and improves `resolve_from_latest_json` with richer logging for missing platforms.
Installer Testing `scripts/test_install.sh`	New smoke-test script that sources `install.sh` in `--source-only` mode, tests `resolve_asset_url` against fixtures, validates correct URL resolution, and asserts missing-platform graceful failure.
Test Fixtures `scripts/fixtures/latest.json`	JSON fixture providing version metadata and platform-specific download URLs (linux-x86_64 AppImage, darwin-aarch64 DMG) with empty signature fields.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client/iMessage<br/>Scanner
    participant LifeCapture as life_capture RPC<br/>(ingest/search)
    participant Index as PersonalIndex<br/>(writer/reader)
    participant Embedder as Embedder<br/>(OpenAI)
    participant SQLite as SQLite<br/>personal_index.db

    Client->>LifeCapture: POST ingest(source,<br/>external_id, text, ts)
    LifeCapture->>LifeCapture: Redact PII,<br/>strip quoted reply
    LifeCapture->>Index: upsert Item<br/>via IndexWriter
    Index->>SQLite: SELECT existing by<br/>(source, external_id)
    SQLite-->>Index: canonical UUID
    Index->>SQLite: INSERT/UPDATE items<br/>row
    LifeCapture->>Embedder: embed_batch([text])
    Embedder-->>LifeCapture: Vec<Vec<f32>>
    LifeCapture->>Index: upsert_vector<br/>(item_id, vector)
    Index->>SQLite: INSERT item_vectors
    LifeCapture-->>Client: RpcOutcome<br/>{item_id, replaced}
    
    Client->>LifeCapture: POST search(text, k)
    LifeCapture->>Embedder: embed_batch([text])
    Embedder-->>LifeCapture: query_vector
    LifeCapture->>Index: hybrid_search<br/>(query_vector, Query)
    Index->>SQLite: keyword_search (FTS5)
    Index->>SQLite: vector_search (vec0)
    Index->>Index: RRF fusion +<br/>recency scoring
    Index-->>LifeCapture: Vec<Hit>
    LifeCapture-->>Client: RpcOutcome<br/>{hits: [...]}

sequenceDiagram
    participant Agent as Agent<br/>Session
    participant CuratedMem as curated_memory<br/>runtime
    participant MemStore as MemoryStore<br/>(file-backed)
    participant Prompt as Prompt<br/>Builder

    Agent->>Agent: new Agent()<br/>curated_snapshot=None
    Agent->>Agent: turn() first turn
    Agent->>Agent: ensure_curated_snapshot()
    Agent->>CuratedMem: get() runtime
    CuratedMem->>MemStore: read MEMORY.md
    MemStore-->>CuratedMem: String
    CuratedMem->>MemStore: read USER.md
    MemStore-->>CuratedMem: String
    CuratedMem-->>Agent: MemorySnapshot<br/>{memory, user}
    Agent->>Agent: store as Arc
    Agent->>Prompt: build system prompt
    Prompt->>Prompt: UserFilesSection
    Prompt->>Agent: curated_snapshot()?
    Agent-->>Prompt: Some(snapshot)
    Prompt->>Prompt: inject snapshot<br/>MEMORY.md + USER.md
    Prompt-->>Agent: enriched prompt
    Agent-->>Agent: send to LLM

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120+ minutes

The PR introduces three major new modules (life_capture, curated_memory, people) with substantial implementation density: ~2,500 lines of schema/migration definitions, ~1,500 lines of index implementation with complex reader logic (FTS5/vec0/hybrid search/RRF fusion), ~1,000+ lines of store implementations, extensive RPC/handler wiring across all three modules, server bootstrap logic, agent session modifications, and comprehensive test coverage with fixtures and e2e tests. The variety of concerns (SQLite schema/migrations, Rust async/sync patterns, macOS FFI for address book, embeddings API client, text processing, scoring algorithms) demands separate mental models for each section. The changes are heterogeneous across file types and domains, requiring reviewers to context-switch between database design, Rust concurrency patterns, API contract definitions, and integration semantics.

Possibly related PRs

tinyhumansai/openhuman#509: Both PRs modify src/core/jsonrpc.rs to add startup-time bootstrap for background initialization—this PR adds bootstrap_curated_memory and bootstrap_life_capture while #509 starts composio periodic sync.
tinyhumansai/openhuman#1027: Both PRs modify the agent session lifecycle (src/openhuman/agent/harness/session/{builder.rs,turn.rs,types.rs}) to cache session-scoped context—this PR adds curated_snapshot initialization while #1027 adds tree-prefetch state.
tinyhumansai/openhuman#524: Both PRs expand the controller namespace registry in src/core/all.rs—this PR adds the "people" namespace entry alongside namespace descriptions.

🐰 Hops excitedly

A brand new life-capture spine is here!
With memories to store and embeddings so clear,
People to resolve, scores to compute,
Curated snapshots in prompts to boot!
The index is blazing, full-text and vec,
This foundation's so ready—what's next on the deck? ✨

# Conflicts: # src/openhuman/mod.rs

…bview NotificationSettings

coderabbitai

Actionable comments posted: 15

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/core/all.rs (1)
88-190: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

People RPC controllers not wired into the global registry.

The people namespace is declared in namespace_description() (line 324) with description "Contact resolution and recency × frequency × reciprocity × depth scoring," but build_registered_controllers() lacks a call to controllers.extend(crate::openhuman::people::all_people_registered_controllers()). Without this wiring, the four RPC endpoints (people.list, people.resolve, people.score, people.refresh_address_book) remain unreachable via try_invoke_registered_rpc.

The all_people_registered_controllers function exists in src/openhuman/people/mod.rs, so add the wiring to build_registered_controllers() following the established pattern.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/core/all.rs` around lines 88 - 190, build_registered_controllers() is
missing the people controllers so the RPCs declared in namespace_description()
(people.*) are not reachable; add a call to extend the global registry with
crate::openhuman::people::all_people_registered_controllers() (following the
same pattern as other modules) so try_invoke_registered_rpc can find
people.list/resolve/score/refresh_address_book; locate the function
build_registered_controllers and insert
controllers.extend(crate::openhuman::people::all_people_registered_controllers())
in the appropriate alphabetical/section spot.

🧹 Nitpick comments (4)

scripts/test_install.sh (1)
23-27: 💤 Low value

Consider verifying the specific exit code for missing platform.

The test correctly expects a non-zero exit, but resolve_asset_url returns exit code 3 specifically for missing platforms. Verifying this would catch regressions where the function might fail for a different reason.
Proposed enhancement
 # Also test a missing platform produces exit code 3
-if resolve_asset_url "$FIXTURE" "linux" "aarch64" 2>/dev/null; then
-  echo "FAIL: expected non-zero exit for missing platform linux-aarch64"
+rc=0
+resolve_asset_url "$FIXTURE" "linux" "aarch64" 2>/dev/null || rc=$?
+if [[ "$rc" -ne 3 ]]; then
+  echo "FAIL: expected exit code 3 for missing platform linux-aarch64, got $rc"
   exit 1
 fi
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/test_install.sh` around lines 23 - 27, The test currently only
asserts that resolve_asset_url "$FIXTURE" "linux" "aarch64" exits non-zero;
change it to assert the specific exit code 3 returned for missing platforms by
capturing the command's exit status (run resolve_asset_url with the same args,
save $? into a variable) and then check that the variable equals 3, failing the
test if it does not — reference the resolve_asset_url call and the "$FIXTURE"
variable to locate and update the assertion logic.
scripts/install.sh (1)
203-217: 💤 Low value

verify_asset_reachable is defined but not called in the install flow.

The function is well-implemented with exponential backoff, but it's not wired into resolve_from_latest_json or the main install path. The planning doc (Task A3 Step 3) shows intent to call it after resolving the URL.

If this is intentional (deferred to a follow-up), consider adding a TODO comment. Otherwise, wire it in before the download step around line 415.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/install.sh` around lines 203 - 217, The verify_asset_reachable
function is implemented but never invoked; call
verify_asset_reachable("$asset_url") immediately after resolve_from_latest_json
returns the asset URL and before the actual download step, check its return code
and abort with a clear error/log if it fails (propagate the non-zero exit or
exit 4), and if you intentionally want to defer this, add a TODO comment in the
main install flow mentioning resolve_from_latest_json and verify_asset_reachable
so it’s not lost.
src/openhuman/life_capture/quote_strip.rs (1)
4-7: ⚡ Quick win

ON_DATE_WROTE misses multi-line attribution headers common in mobile clients.

.{1,200} with no (?s) flag cannot cross a newline, so Gmail's typical two-line attribution:
On Mon, Apr 21, 2026 at 9:14 AM, Sarah Smith
<sarah@example.com> wrote:
is never matched. The > filter below still strips the quoted lines, but the attribution header leaks into the returned content.
♻️ Proposed fix — use DOTALL with a non-greedy quantifier
-static ON_DATE_WROTE: Lazy<Regex> = Lazy::new(|| {
-    // "On <date>, <name> <email|>(?) wrote:" — match start of any line.
-    Regex::new(r"(?m)^On .{1,200}\bwrote:\s*$").unwrap()
-});
+static ON_DATE_WROTE: Lazy<Regex> = Lazy::new(|| {
+    // "On <date>, <name> <email|>(?) wrote:" — allow header to span 1-3 lines.
+    // (?s) makes `.` cross newlines; non-greedy `.{1,300}?` prevents over-capture.
+    Regex::new(r"(?ms)^On .{1,300}?\bwrote:\s*$").unwrap()
+});
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/life_capture/quote_strip.rs` around lines 4 - 7, The
ON_DATE_WROTE regex doesn't match multi-line attribution headers because DOT (.)
doesn't cross newlines; update the pattern used in ON_DATE_WROTE to enable
DOTALL and make the quantifier non-greedy so it can span at most ~200 chars
across lines (e.g., add the inline (?s) or combined (?ms) flag and use .{1,200}?
) — modify the Regex::new call that constructs ON_DATE_WROTE to use the
DOTALL-enabled, non-greedy form (refer to the ON_DATE_WROTE static and its
Regex::new invocation) so multi-line attribution blocks like Gmail's two-line
header are matched and stripped.
src/openhuman/people/rpc.rs (1)
16-54: 🏗️ Heavy lift

N+1 query pattern: one interactions_for round-trip per person

handle_list loads the full person list and then issues a separate interactions_for query for every person in the result set. With the 500-person cap that is up to 501 sequential DB round-trips before any sorting occurs. Adding a batch_interactions_for(person_ids) method to PeopleStore (a single SELECT … WHERE person_id IN (…) keyed by PersonId) would collapse this to two queries.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/people/rpc.rs` around lines 16 - 54, handle_list currently
calls PeopleStore::interactions_for(p.id) inside the loop, causing N+1 DB
round-trips; add a new PeopleStore::batch_interactions_for(person_ids:
&[PersonId]) -> Result<HashMap<PersonId, Vec<Interaction>>, _> (or similar) that
does one SELECT WHERE person_id IN (...), call it once before the for p in
people loop with the collected person IDs, replace per-person interactions_for
calls with lookups into the returned map, and preserve existing error mapping
(map_err(...)) and use the same score(&interactions, now) path so scoring and
JSON construction (in handle_list) remain unchanged.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/core/jsonrpc.rs`:
- Around line 908-912: The doc comment "Initializes long-lived socket/event-bus
infrastructure." was accidentally moved to bootstrap_curated_memory; move that
doc line (and any other lines belonging to the original comment block) back
above bootstrap_skill_runtime so bootstrap_skill_runtime regains its intended
documentation, and replace or remove the misplaced comment on
bootstrap_curated_memory so its own doc comment correctly describes
curated-memory initialization; locate the functions by their names
bootstrap_curated_memory and bootstrap_skill_runtime to make the change.

In `@src/openhuman/agent/prompts/mod.rs`:
- Around line 337-339: PromptContext is missing the curated_snapshot field used
in mod.rs; add pub curated_snapshot: Option<&'a
crate::openhuman::curated_memory::MemorySnapshot> to the PromptContext<'a>
struct in prompts/types.rs, then update the functions that construct
PromptContext—empty_prompt_context_for_static_sections() and
build_system_prompt() in turn.rs—to accept/propagate the curated snapshot (set
to Some(...) when available or None otherwise) so ctx.curated_snapshot is
populated before code calls inject_snapshot_content; reference PromptContext,
empty_prompt_context_for_static_sections, build_system_prompt, and the existing
Agent implementation in session/types.rs for the intended field shape.

In `@src/openhuman/curated_memory/rpc.rs`:
- Around line 19-72: Change the four handler signatures (handle_read,
handle_add, handle_replace, handle_remove) to return RpcOutcome<Value> instead
of Result<RpcOutcome<Value>, String>, and convert all early Err(...) returns
into RpcOutcome::error(...) returns; specifically, replace uses of the `?`
operator on pick(&file, rt) and map_err(...) on
store.read()/add()/replace()/remove() with explicit error handling that returns
RpcOutcome::error(format!(...)) (include the same error text like "read: {e}",
"add: {e}", etc.), and for validation failures (e.g., empty needle) return
RpcOutcome::error("needle must not be empty".into()); keep successful paths
returning RpcOutcome::new(...) as before.

In `@src/openhuman/curated_memory/store.rs`:
- Around line 88-95: snapshot_pair currently calls memory_store.read().await and
user_store.read().await separately, which can interleave with concurrent writes
and produce a mixed snapshot; fix by acquiring exclusive locks on both
MemoryStore instances before reading (use the stores' write().await or other
exclusive lock API) and always acquire them in a deterministic order to avoid
deadlocks (e.g., compare the pointer/address or some stable id of memory_store
vs user_store and lock the lower one first, then the other), then read the
in-memory contents into the MemorySnapshot and release both locks.

In `@src/openhuman/life_capture/embedder.rs`:
- Around line 19-33: The current HostedEmbedder::new uses .unwrap_or_else(|_|
reqwest::Client::new()) which silently drops configured timeouts if the builder
fails; change this to call .expect(...) with a clear message (e.g., "failed to
build reqwest client with timeouts") so the program fails loud on
misconfiguration instead of returning a timeout-less client; update the builder
call in HostedEmbedder::new (the reqwest::Client::builder()...build()
invocation) to use .expect and include a descriptive message.
- Around line 97-99: The dim() method is hardcoded to 1536 and must reflect the
model used by HostedEmbedder; update the HostedEmbedder struct to store an
embedding dimension (add a field like embedding_dim set during new(base_url,
api_key, model)), compute or map the correct dimension for the chosen model when
constructing HostedEmbedder, change fn dim(&self) -> usize to return
self.embedding_dim, and ensure embed_batch and any callers use this stored value
so models like text-embedding-3-large (3072) work correctly.

In `@src/openhuman/life_capture/index.rs`:
- Around line 545-547: The hybrid_search currently truncates candidates by using
oversample before apply_query_filters, causing filtered-out top results to leave
too few matches; update the logic around oversample/q.k and the calls to
keyword_search and vector_search so that filtering runs before final truncation:
either have keyword_search/vector_search return an untruncated candidate pool
(or accept a much larger dynamic oversample) and then call
apply_query_filters(query_vector, q, ...) to filter, and only after filtering
take the top q.k results to return; implement a loop or adaptive oversample
(e.g., double oversample up to a max) if necessary to ensure enough filtered
candidates, and apply the same change to the other similar block referenced
(around the vc/kw usage at the later 581-582 region) so both legs are filtered
prior to truncation.

In `@src/openhuman/life_capture/rpc.rs`:
- Around line 154-170: The code currently calls writer.upsert(&mut items) before
calling embedder.embed_batch(...) so a failed embed leaves the stored row with
an out-of-date vector; change the flow so embedding is performed and a valid
vector obtained before mutating storage: call embedder.embed_batch(...) and
obtain `vector` for the input text (using the same `text`, `items`, and
`canonical_id`/`requested_id` logic) first, then call writer.upsert(&mut items)
followed immediately by writer.upsert_vector(&canonical_id, &vector) (or perform
both upserts in a single atomic operation/transaction if supported) to ensure
the row is only updated when embedding has succeeded. Ensure references to
`IndexWriter::new`, `writer.upsert`, `embedder.embed_batch`,
`writer.upsert_vector`, `items`, `canonical_id`, and `requested_id` are updated
accordingly.

In `@src/openhuman/life_capture/schemas.rs`:
- Around line 188-193: The FieldSchema for "metadata" currently declares
TypeSchema::Option(Box::new(TypeSchema::String)) but the handler treats it as a
raw JSON object and defaults missing input to {} — change the schema to an
optional JSON/object type (e.g. TypeSchema::Option(Box::new(TypeSchema::Object))
or the project’s equivalent JSON/map type) so it matches the handler, and/or
update the handler to either accept a string and parse it into JSON or stop
defaulting missing values to {} (use None) so wire format and stored value
agree; apply the same fix to the other occurrence of "metadata" mentioned (lines
254-257) ensuring both FieldSchema(name: "metadata", ...) and the handler use
the same JSON object representation.

In `@src/openhuman/people/resolver.rs`:
- Around line 28-33: The resolve and link functions skip canonicalization and
should canonicalize handles/aliases before using the store: in resolve, call
handle.canonicalize() (or otherwise obtain the canonical Handle) and pass that
to store.lookup instead of the raw handle; in link, canonicalize the other alias
(e.g., call other.canonicalize()) before calling store.add_alias(pid, ...) so
aliases are stored in canonical form; ensure this mirrors resolve_or_create's
use of handle.canonicalize() so lookups and alias additions are consistent and
avoid duplicate PersonIds.
- Around line 40-74: The current resolve_or_create implementation races: it
calls store.lookup(&canonical) then creates a new PersonId with PersonId::new
and calls store.insert_person(&person, &[canonical]) which can produce duplicate
people under concurrent requests; fix by making the create-or-get atomic —
either implement an upsert at the store layer (e.g., an INSERT ... ON CONFLICT
DO UPDATE/RETURNING that returns the existing PersonId) and expose a
store.upsert_person_or_get_id(...) used by resolve_or_create, or after
insert_person call re-query store.lookup(&canonical) and return that id
(discarding the locally generated id) to ensure the returned id is the canonical
stored id; modify resolve_or_create, lookup, and insert_person/upsert methods
accordingly so the function always returns the database-authoritative PersonId.

In `@src/openhuman/people/rpc.rs`:
- Around line 120-143: handle_score currently treats a missing person the same
as a person with zero interactions because interactions_for returns an empty
Vec; update handle_score to first verify the person exists (e.g. call a
PeopleStore method like get_person, find_person, or person_exists on the
provided person_id) and return an Err (with a clear message including the
person_id) when the person is not found; only call interactions_for and compute
the score when the person lookup succeeds, preserving the existing RpcOutcome
construction for valid persons.
- Around line 68-79: The "created" flag is incorrect because resolve_or_create
returns Some for both existing and newly-created persons; update the logic in
the RPC handler (the block using resolver.resolve_or_create, resolver.resolve
and building RpcOutcome) to first check existence (e.g., call
resolver.resolve(&handle).await? or resolver.exists(&handle).await?) and only
call resolver.resolve_or_create(&handle).await? when absent — then set "created"
to true only when you actually inserted a new record (i.e., when the pre-check
returned None/false and resolve_or_create returned Some), otherwise set
"created" to false while still returning the existing person_id.

In `@src/openhuman/people/schemas.rs`:
- Around line 60-104: The output schemas for the people RPCs are missing the
"components" object and "handles" array that the RPC responses actually return;
update the FieldSchema entries inside the outputs vector (the TypeSchema::Array
of TypeSchema::Object) to include a "components" FieldSchema (TypeSchema::Object
with fields "recency", "frequency", "reciprocity", "depth" as TypeSchema::F64)
and a "handles" FieldSchema (TypeSchema::Array of TypeSchema::Object describing
per-handle fields returned by handle_list), matching required flags and comments
to the actual responses used by handle_score and handle_list so schema
introspection matches rpc.rs behavior.

In `@tests/imessage_ingest_e2e.rs`:
- Around line 9-13: The tests share a process-global SQLite DB so the assertion
that total_items == 1 in ingest_then_search_then_reingest_is_idempotent (and the
similar assertions around lines 63-76 and 160-166) is flaky under parallel test
execution; either enforce serial runs in the test invocation or, preferably,
make the assertions relative/robust by checking the expected delta or using >=1
(e.g. assert total_items >= previous_count + 1 or compare counts before/after to
assert an increment) and update the assertions in
search_response_shape_matches_schema and
ingest_then_search_then_reingest_is_idempotent accordingly so they no longer
assume an absolute count of 1.

---

Outside diff comments:
In `@src/core/all.rs`:
- Around line 88-190: build_registered_controllers() is missing the people
controllers so the RPCs declared in namespace_description() (people.*) are not
reachable; add a call to extend the global registry with
crate::openhuman::people::all_people_registered_controllers() (following the
same pattern as other modules) so try_invoke_registered_rpc can find
people.list/resolve/score/refresh_address_book; locate the function
build_registered_controllers and insert
controllers.extend(crate::openhuman::people::all_people_registered_controllers())
in the appropriate alphabetical/section spot.

---

Nitpick comments:
In `@scripts/install.sh`:
- Around line 203-217: The verify_asset_reachable function is implemented but
never invoked; call verify_asset_reachable("$asset_url") immediately after
resolve_from_latest_json returns the asset URL and before the actual download
step, check its return code and abort with a clear error/log if it fails
(propagate the non-zero exit or exit 4), and if you intentionally want to defer
this, add a TODO comment in the main install flow mentioning
resolve_from_latest_json and verify_asset_reachable so it’s not lost.

In `@scripts/test_install.sh`:
- Around line 23-27: The test currently only asserts that resolve_asset_url
"$FIXTURE" "linux" "aarch64" exits non-zero; change it to assert the specific
exit code 3 returned for missing platforms by capturing the command's exit
status (run resolve_asset_url with the same args, save $? into a variable) and
then check that the variable equals 3, failing the test if it does not —
reference the resolve_asset_url call and the "$FIXTURE" variable to locate and
update the assertion logic.

In `@src/openhuman/life_capture/quote_strip.rs`:
- Around line 4-7: The ON_DATE_WROTE regex doesn't match multi-line attribution
headers because DOT (.) doesn't cross newlines; update the pattern used in
ON_DATE_WROTE to enable DOTALL and make the quantifier non-greedy so it can span
at most ~200 chars across lines (e.g., add the inline (?s) or combined (?ms)
flag and use .{1,200}? ) — modify the Regex::new call that constructs
ON_DATE_WROTE to use the DOTALL-enabled, non-greedy form (refer to the
ON_DATE_WROTE static and its Regex::new invocation) so multi-line attribution
blocks like Gmail's two-line header are matched and stripped.

In `@src/openhuman/people/rpc.rs`:
- Around line 16-54: handle_list currently calls
PeopleStore::interactions_for(p.id) inside the loop, causing N+1 DB round-trips;
add a new PeopleStore::batch_interactions_for(person_ids: &[PersonId]) ->
Result<HashMap<PersonId, Vec<Interaction>>, _> (or similar) that does one SELECT
WHERE person_id IN (...), call it once before the for p in people loop with the
collected person IDs, replace per-person interactions_for calls with lookups
into the returned map, and preserve existing error mapping (map_err(...)) and
use the same score(&interactions, now) path so scoring and JSON construction (in
handle_list) remain unchanged.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d116a681-d12d-464b-9acb-52bdb04c8137

📥 Commits

Reviewing files that changed from the base of the PR and between ea97e64 and 45bcd9a.

⛔ Files ignored due to path filters (1)

Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (52)

Cargo.toml
app/src-tauri/src/imessage_scanner/mod.rs
docs/superpowers/plans/2026-04-21-imessage-live-harness.md
docs/superpowers/plans/2026-04-22-life-capture-01-foundation.md
docs/superpowers/plans/2026-04-22-track1-ship-pipeline.md
docs/superpowers/specs/2026-04-22-life-capture-layer-design.md
scripts/fixtures/latest.json
scripts/install.sh
scripts/test_install.sh
src/core/all.rs
src/core/jsonrpc.rs
src/openhuman/agent/harness/session/builder.rs
src/openhuman/agent/harness/session/runtime.rs
src/openhuman/agent/harness/session/turn.rs
src/openhuman/agent/harness/session/types.rs
src/openhuman/agent/prompts/mod.rs
src/openhuman/curated_memory/mod.rs
src/openhuman/curated_memory/rpc.rs
src/openhuman/curated_memory/runtime.rs
src/openhuman/curated_memory/schemas.rs
src/openhuman/curated_memory/store.rs
src/openhuman/curated_memory/types.rs
src/openhuman/life_capture/embedder.rs
src/openhuman/life_capture/index.rs
src/openhuman/life_capture/migrations.rs
src/openhuman/life_capture/migrations/0001_init.sql
src/openhuman/life_capture/migrations/0002_vec.sql
src/openhuman/life_capture/mod.rs
src/openhuman/life_capture/quote_strip.rs
src/openhuman/life_capture/redact.rs
src/openhuman/life_capture/rpc.rs
src/openhuman/life_capture/runtime.rs
src/openhuman/life_capture/schemas.rs
src/openhuman/life_capture/tests/e2e.rs
src/openhuman/life_capture/tests/mod.rs
src/openhuman/life_capture/types.rs
src/openhuman/mod.rs
src/openhuman/people/address_book.rs
src/openhuman/people/migrations.rs
src/openhuman/people/migrations/0001_init.sql
src/openhuman/people/mod.rs
src/openhuman/people/resolver.rs
src/openhuman/people/rpc.rs
src/openhuman/people/schemas.rs
src/openhuman/people/scorer.rs
src/openhuman/people/store.rs
src/openhuman/people/tests.rs
src/openhuman/people/types.rs
src/openhuman/webview_notifications/types.rs
tests/fixtures/life_capture/corpus.json
tests/imessage_ingest_e2e.rs
tests/life_capture_retrieval_eval.rs

coderabbitai · 2026-05-04T18:54:32Z

 /// Initializes long-lived socket/event-bus infrastructure.
+/// Open MEMORY.md / USER.md at `<workspace>/memories/` and register the
+/// curated-memory runtime singleton. Idempotent — second-init is a no-op.
+async fn bootstrap_curated_memory(workspace_dir: &std::path::Path) {
+    use std::sync::Arc;


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Stale doc comment misattributed to bootstrap_curated_memory.

The existing line 908 — /// Initializes long-lived socket/event-bus infrastructure. — was originally the doc comment for bootstrap_skill_runtime. Inserting bootstrap_curated_memory immediately after it caused that line to become the first line of the new function's doc comment, which is incorrect. bootstrap_skill_runtime (line 1003) is now doc-comment-free as a side effect.

📝 Suggested fix

-/// Initializes long-lived socket/event-bus infrastructure. -/// Open MEMORY.md / USER.md at `<workspace>/memories/` and register the -/// curated-memory runtime singleton. Idempotent — second-init is a no-op. +/// Open MEMORY.md / USER.md at `<workspace>/memories/` and register the +/// curated-memory runtime singleton. Idempotent — second-init is a no-op. async fn bootstrap_curated_memory(workspace_dir: &std::path::Path) {

And restore the comment above bootstrap_skill_runtime:

+/// Initializes long-lived socket/event-bus infrastructure. pub async fn bootstrap_skill_runtime() {

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/core/jsonrpc.rs` around lines 908 - 912, The doc comment "Initializes long-lived socket/event-bus infrastructure." was accidentally moved to bootstrap_curated_memory; move that doc line (and any other lines belonging to the original comment block) back above bootstrap_skill_runtime so bootstrap_skill_runtime regains its intended documentation, and replace or remove the misplaced comment on bootstrap_curated_memory so its own doc comment correctly describes curated-memory initialization; locate the functions by their names bootstrap_curated_memory and bootstrap_skill_runtime to make the change.

coderabbitai · 2026-05-04T18:54:32Z

+pub async fn handle_read(
+    rt: &CuratedMemoryRuntime,
+    file: String,
+) -> Result<RpcOutcome<Value>, String> {
+    let store = pick(&file, rt)?;
+    let body = store.read().await.map_err(|e| format!("read: {e}"))?;
+    Ok(RpcOutcome::new(
+        json!({ "file": file, "body": body }),
+        vec![],
+    ))
+}
+
+pub async fn handle_add(
+    rt: &CuratedMemoryRuntime,
+    file: String,
+    entry: String,
+) -> Result<RpcOutcome<Value>, String> {
+    let store = pick(&file, rt)?;
+    store.add(&entry).await.map_err(|e| format!("add: {e}"))?;
+    Ok(RpcOutcome::new(json!({ "file": file, "ok": true }), vec![]))
+}
+
+pub async fn handle_replace(
+    rt: &CuratedMemoryRuntime,
+    file: String,
+    needle: String,
+    replacement: String,
+) -> Result<RpcOutcome<Value>, String> {
+    if needle.is_empty() {
+        return Err("needle must not be empty".into());
+    }
+    let store = pick(&file, rt)?;
+    store
+        .replace(&needle, &replacement)
+        .await
+        .map_err(|e| format!("replace: {e}"))?;
+    Ok(RpcOutcome::new(json!({ "file": file, "ok": true }), vec![]))
+}
+
+pub async fn handle_remove(
+    rt: &CuratedMemoryRuntime,
+    file: String,
+    needle: String,
+) -> Result<RpcOutcome<Value>, String> {
+    if needle.is_empty() {
+        return Err("needle must not be empty".into());
+    }
+    let store = pick(&file, rt)?;
+    store
+        .remove(&needle)
+        .await
+        .map_err(|e| format!("remove: {e}"))?;
+    Ok(RpcOutcome::new(json!({ "file": file, "ok": true }), vec![]))
+}


🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Handler return types violate the RpcOutcome<T> guideline.

All four handlers return Result<RpcOutcome<Value>, String> instead of RpcOutcome<Value>. The Err(String) path is being handled by the adapter in schemas.rs at the call site, but the handler signatures themselves do not comply.

Per coding guidelines: "Use RpcOutcome<T> for RPC controller return types per the AGENTS.md specification" (src/openhuman/*/rpc.rs).

♻️ Proposed fix — representative handler, apply same pattern to all four

-pub async fn handle_read( +pub async fn handle_read( rt: &CuratedMemoryRuntime, file: String, -) -> Result<RpcOutcome<Value>, String> { - let store = pick(&file, rt)?; - let body = store.read().await.map_err(|e| format!("read: {e}"))?; - Ok(RpcOutcome::new( - json!({ "file": file, "body": body }), - vec![], - )) +) -> RpcOutcome<Value> { + let store = match pick(&file, rt) { + Ok(s) => s, + Err(e) => return RpcOutcome::error(e), + }; + match store.read().await { + Ok(body) => RpcOutcome::new(json!({ "file": file, "body": body }), vec![]), + Err(e) => RpcOutcome::error(format!("read: {e}")), + } }

Apply the same transformation to handle_add, handle_replace, and handle_remove, converting each Err(...) return to RpcOutcome::error(...).

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/curated_memory/rpc.rs` around lines 19 - 72, Change the four handler signatures (handle_read, handle_add, handle_replace, handle_remove) to return RpcOutcome<Value> instead of Result<RpcOutcome<Value>, String>, and convert all early Err(...) returns into RpcOutcome::error(...) returns; specifically, replace uses of the `?` operator on pick(&file, rt) and map_err(...) on store.read()/add()/replace()/remove() with explicit error handling that returns RpcOutcome::error(format!(...)) (include the same error text like "read: {e}", "add: {e}", etc.), and for validation failures (e.g., empty needle) return RpcOutcome::error("needle must not be empty".into()); keep successful paths returning RpcOutcome::new(...) as before.

coderabbitai · 2026-05-04T18:54:32Z

+pub async fn snapshot_pair(
+    memory_store: &MemoryStore,
+    user_store: &MemoryStore,
+) -> std::io::Result<crate::openhuman::curated_memory::types::MemorySnapshot> {
+    Ok(crate::openhuman::curated_memory::types::MemorySnapshot {
+        memory: memory_store.read().await?,
+        user: user_store.read().await?,
+    })


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

snapshot_pair is not actually point-in-time.

It reads the two files independently without holding either store's write lock, so a concurrent add/replace/remove can land between those awaits and produce a mixed snapshot from two different moments.

Suggested fix

pub async fn snapshot_pair( memory_store: &MemoryStore, user_store: &MemoryStore, ) -> std::io::Result<crate::openhuman::curated_memory::types::MemorySnapshot> { + let _memory_guard = memory_store.write_lock.lock().await; + let _user_guard = user_store.write_lock.lock().await; + Ok(crate::openhuman::curated_memory::types::MemorySnapshot { - memory: memory_store.read().await?, - user: user_store.read().await?, + memory: tokio::fs::read_to_string(&memory_store.file_path).await?, + user: tokio::fs::read_to_string(&user_store.file_path).await?, }) }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/curated_memory/store.rs` around lines 88 - 95, snapshot_pair currently calls memory_store.read().await and user_store.read().await separately, which can interleave with concurrent writes and produce a mixed snapshot; fix by acquiring exclusive locks on both MemoryStore instances before reading (use the stores' write().await or other exclusive lock API) and always acquire them in a deterministic order to avoid deadlocks (e.g., compare the pointer/address or some stable id of memory_store vs user_store and lock the lower one first, then the other), then read the in-memory contents into the MemorySnapshot and release both locks.

coderabbitai · 2026-05-04T18:54:32Z

+impl HostedEmbedder {
+    pub fn new(base_url: String, api_key: String, model: String) -> Self {
+        let http = reqwest::Client::builder()
+            .timeout(std::time::Duration::from_secs(30))
+            .connect_timeout(std::time::Duration::from_secs(10))
+            .build()
+            .unwrap_or_else(|_| reqwest::Client::new());
+        Self {
+            base_url,
+            api_key,
+            model,
+            http,
+        }
+    }
+}


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

unwrap_or_else(|_| reqwest::Client::new()) silently drops the configured timeouts.

If the builder fails (rare, but possible on TLS init issues), the fallback client has no request or connect timeout, meaning embed_batch can hang indefinitely. Since a builder failure here would be a fatal misconfiguration, .expect is more appropriate.

🛡️ Proposed fix

- .build() - .unwrap_or_else(|_| reqwest::Client::new()); + .build() + .expect("failed to build HostedEmbedder HTTP client");

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

impl HostedEmbedder {

pub fn new(base_url: String, api_key: String, model: String) -> Self {

let http = reqwest::Client::builder()

.timeout(std::time::Duration::from_secs(30))

.connect_timeout(std::time::Duration::from_secs(10))

.build()

.unwrap_or_else(|_| reqwest::Client::new());

Self {

base_url,

api_key,

model,

http,

}

}

}

impl HostedEmbedder {

pub fn new(base_url: String, api_key: String, model: String) -> Self {

let http = reqwest::Client::builder()

.timeout(std::time::Duration::from_secs(30))

.connect_timeout(std::time::Duration::from_secs(10))

.build()

.expect("failed to build HostedEmbedder HTTP client");

Self {

base_url,

api_key,

model,

http,

}

}

}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/life_capture/embedder.rs` around lines 19 - 33, The current HostedEmbedder::new uses .unwrap_or_else(|_| reqwest::Client::new()) which silently drops configured timeouts if the builder fails; change this to call .expect(...) with a clear message (e.g., "failed to build reqwest client with timeouts") so the program fails loud on misconfiguration instead of returning a timeout-less client; update the builder call in HostedEmbedder::new (the reqwest::Client::builder()...build() invocation) to use .expect and include a descriptive message.

coderabbitai · 2026-05-04T18:54:32Z

+        if let Some(id) = self
+            .store
+            .lookup(&canonical)
+            .await
+            .map_err(|e| format!("lookup: {e}"))?
+        {
+            return Ok(id);
+        }
+        let id = PersonId::new();
+        let (display_name, primary_email, primary_phone) = match &canonical {
+            Handle::DisplayName(s) => (Some(s.clone()), None, None),
+            Handle::Email(s) => (None, Some(s.clone()), None),
+            Handle::IMessage(s) => {
+                if s.contains('@') {
+                    (None, Some(s.clone()), None)
+                } else {
+                    (None, None, Some(s.clone()))
+                }
+            }
+        };
+        let now = Utc::now();
+        let person = Person {
+            id,
+            display_name,
+            primary_email,
+            primary_phone,
+            handles: vec![canonical.clone()],
+            created_at: now,
+            updated_at: now,
+        };
+        self.store
+            .insert_person(&person, &[canonical])
+            .await
+            .map_err(|e| format!("insert_person: {e}"))?;
+        Ok(id)


⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Verify whether PeopleStore::insert_person uses an atomic upsert / # ON CONFLICT IGNORE on the handle table, which would make the TOCTOU benign. rg -n "insert_person\|INSERT\|UNIQUE\|ON CONFLICT" --type=rust -A3 -B3 \ src/openhuman/people/store.rs

Repository: tinyhumansai/openhuman

Length of output: 48

🏁 Script executed:

# Find store-related files in the people subdirectory fd -t f "store" src/openhuman/people/

Repository: tinyhumansai/openhuman

Length of output: 96

🏁 Script executed:

# Also check what files exist in the people directory ls -la src/openhuman/people/

Repository: tinyhumansai/openhuman

Length of output: 859

🏁 Script executed:

wc -l src/openhuman/people/store.rs

Repository: tinyhumansai/openhuman

Length of output: 100

🏁 Script executed:

cat -n src/openhuman/people/store.rs

Repository: tinyhumansai/openhuman

Length of output: 17945

🏁 Script executed:

head -100 src/openhuman/people/migrations.rs

Repository: tinyhumansai/openhuman

Length of output: 2718

🏁 Script executed:

cat src/openhuman/people/migrations/0001_init.sql

Repository: tinyhumansai/openhuman

Length of output: 1608

Non-atomic check-then-insert: concurrent calls can mint duplicate PersonIds

The lookup on lines 40–47 and the insert_person on lines 70–73 are separate async operations. Two concurrent resolve_or_create calls for the same canonical handle can both observe None from lookup and then race to insert. Both will insert distinct Person records into the database; the INSERT OR IGNORE on the handle_aliases table prevents a constraint violation there, but the people table rows are already created with different PersonIds, breaking the invariant that the same handle always returns the same id.

The fix requires either an atomic upsert (e.g., INSERT...ON CONFLICT DO UPDATE RETURNING) or a re-select of the canonical handle after insertion to return the actual inserted PersonId rather than the locally generated one.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/people/resolver.rs` around lines 40 - 74, The current resolve_or_create implementation races: it calls store.lookup(&canonical) then creates a new PersonId with PersonId::new and calls store.insert_person(&person, &[canonical]) which can produce duplicate people under concurrent requests; fix by making the create-or-get atomic — either implement an upsert at the store layer (e.g., an INSERT ... ON CONFLICT DO UPDATE/RETURNING that returns the existing PersonId) and expose a store.upsert_person_or_get_id(...) used by resolve_or_create, or after insert_person call re-query store.lookup(&canonical) and return that id (discarding the locally generated id) to ensure the returned id is the canonical stored id; modify resolve_or_create, lookup, and insert_person/upsert methods accordingly so the function always returns the database-authoritative PersonId.

coderabbitai · 2026-05-04T18:54:32Z

+            outputs: vec![FieldSchema {
+                name: "people",
+                ty: TypeSchema::Array(Box::new(TypeSchema::Object {
+                    fields: vec![
+                        FieldSchema {
+                            name: "person_id",
+                            ty: TypeSchema::String,
+                            comment: "Stable UUID for this person.",
+                            required: true,
+                        },
+                        FieldSchema {
+                            name: "display_name",
+                            ty: TypeSchema::Option(Box::new(TypeSchema::String)),
+                            comment: "Best-known display name, when set.",
+                            required: false,
+                        },
+                        FieldSchema {
+                            name: "primary_email",
+                            ty: TypeSchema::Option(Box::new(TypeSchema::String)),
+                            comment: "Primary email, when set.",
+                            required: false,
+                        },
+                        FieldSchema {
+                            name: "primary_phone",
+                            ty: TypeSchema::Option(Box::new(TypeSchema::String)),
+                            comment: "Primary phone, when set.",
+                            required: false,
+                        },
+                        FieldSchema {
+                            name: "score",
+                            ty: TypeSchema::F64,
+                            comment: "Composite person-score in [0,1].",
+                            required: true,
+                        },
+                        FieldSchema {
+                            name: "interaction_count",
+                            ty: TypeSchema::U64,
+                            comment: "Observed interactions contributing to the score.",
+                            required: true,
+                        },
+                    ],
+                })),
+                comment: "Ranked people, highest score first.",
+                required: true,
+            }],


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

components (and handles) are absent from both output schemas but present in the RPC responses

handle_score returns "components": { "recency", "frequency", "reciprocity", "depth" } (rpc.rs lines 133–138), but the score output schema only declares person_id, score, and interaction_count.

handle_list additionally returns a "handles" array per person (rpc.rs lines 28–35) and per-person "components" (rpc.rs lines 44–49); neither appears in the list output schema.

These omissions make the schemas inaccurate as documentation and would cause any code-generation or LLM tool-use path relying on schema introspection to ignore the most useful fields.

Also applies to: 158-177

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/people/schemas.rs` around lines 60 - 104, The output schemas for the people RPCs are missing the "components" object and "handles" array that the RPC responses actually return; update the FieldSchema entries inside the outputs vector (the TypeSchema::Array of TypeSchema::Object) to include a "components" FieldSchema (TypeSchema::Object with fields "recency", "frequency", "reciprocity", "depth" as TypeSchema::F64) and a "handles" FieldSchema (TypeSchema::Array of TypeSchema::Object describing per-handle fields returned by handle_list), matching required flags and comments to the actual responses used by handle_score and handle_list so schema introspection matches rpc.rs behavior.

coderabbitai · 2026-05-04T18:54:33Z

+//! same process-global `life_capture::runtime` OnceCells (index + embedder).
+//! That's by design: the OnceCells model the production startup contract, and
+//! the tests run serially so a single shared init is correct.
+//!
+//! Run with: `cargo test --test imessage_ingest_e2e -- --ignored --nocapture`


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Parallel execution makes the total_items == 1 assertion non-deterministic.

The file comment says "the tests run serially", but the run command cargo test --test imessage_ingest_e2e -- --ignored --nocapture uses the default multi-thread runner. Both tests share the same process-global SQLite DB (via the OnceCell / leaked tempdir). If search_response_shape_matches_schema wins the race and seeds "shape-guard:2026-04-22" before ingest_then_search_then_reingest_is_idempotent hits the stats assertion, total_items will be 2 and the test fails.

Two equivalent fixes:

Option A — enforce serial execution in the run instructions:

-//! Run with: `cargo test --test imessage_ingest_e2e -- --ignored --nocapture` +//! Run with: `cargo test --test imessage_ingest_e2e -- --ignored --nocapture --test-threads=1`

Option B — make the assertion relative (robust to parallel runs and more future-proof):

📝 Relative-count assertion

+ // Capture the baseline item count before ingesting anything. + let env_stats0 = post_rpc(&base, "openhuman.life_capture_get_stats", json!({})).await; + let stats0_total = rpc_result_body(&env_stats0)["total_items"] + .as_u64() + .expect("total_items u64"); + // First ingest — should INSERT. // ... (existing ingest call) ... // Stats — should show exactly one new item. let env_stats1 = post_rpc(&base, "openhuman.life_capture_get_stats", json!({})).await; let stats1 = rpc_result_body(&env_stats1); assert_eq!( stats1["total_items"], - json!(1), - "expected total_items=1 after first ingest, got {stats1}" + json!(stats0_total + 1), + "expected exactly one new item after first ingest, got {stats1}" );

Also applies to: 63-76, 160-166

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/imessage_ingest_e2e.rs` around lines 9 - 13, The tests share a process-global SQLite DB so the assertion that total_items == 1 in ingest_then_search_then_reingest_is_idempotent (and the similar assertions around lines 63-76 and 160-166) is flaky under parallel test execution; either enforce serial runs in the test invocation or, preferably, make the assertions relative/robust by checking the expected delta or using >=1 (e.g. assert total_items >= previous_count + 1 or compare counts before/after to assert an increment) and update the assertions in search_response_shape_matches_schema and ingest_then_search_then_reingest_is_idempotent accordingly so they no longer assume an absolute count of 1.

…xtures Co-authored-by: Cursor <cursoragent@cursor.com>

Co-authored-by: Cursor <cursoragent@cursor.com>

…uplicates - Introduce CuratedMemoryPromptSnapshot in prompt types so CI builds without the curated_memory domain still resolve PromptContext. - Use that type for ParentExecutionContext, Agent session storage, and PromptContext; map from curated_memory::MemorySnapshot when taking snapshots. - Remove duplicate curated_snapshot fields from bad merges (struct literals and PromptContext construction). - Fix learning tests PromptContext initializer. Co-authored-by: Cursor <cursoragent@cursor.com>

CI failed with E0063 after ParentExecutionContext gained curated_snapshot; triage escalation, session turn builder, subagent tests, and integration stubs were still missing the field. Co-authored-by: Cursor <cursoragent@cursor.com>

jwalin-shah · 2026-05-05T23:53:21Z

Closing as superseded. The contacts/person resolution work we still need is split into #1259. The remaining memory/life-capture/curated-memory portions are intentionally not being carried forward here because that work is covered separately.

jwalinsshah added 30 commits April 22, 2026 20:14

docs(spec): cut Slack from life-capture v1, defer to v1.1

36eb444

docs(plans): incorporate Onyx insights — ACL on items, sync_state tab…

e160e58

…le, keyword projection column

docs(plans): drop ubuntu from CI matrix in Track 1; preserve resolver…

d7f2284

… hardening

feat(life_capture): core types — Item, Source, Person, Query, Hit, In…

1d45ffb

…dexStats

feat(life_capture): PII redaction (email, phone, SSN, CC)

da09f01

feat(life_capture): strip quoted replies from email bodies

a938f90

feat(life_capture): SQLite schema + migration loader (rusqlite)

24d8264

feat(life_capture): Embedder trait + HostedEmbedder (OpenAI-compatible)

ae57f28

feat(curated_memory): MemoryStore for MEMORY.md + USER.md (atomic wri…

37da5e1

…te, char-bounded)

style(life_capture, curated_memory): apply rustfmt + finish curated_m…

2efab6a

…emory namespace rename Pre-push hook auto-applied cargo fmt across the foundation files; also fixed the schemas.rs docstring still referencing the old memory_curated namespace.

jwalinsshah added 4 commits April 22, 2026 20:15

fix(test): add missing curated_snapshot field to ParentExecutionConte…

b9c3cd6

…xt stub

style(people): apply rustfmt

d7772f3

This was referenced Apr 23, 2026

feat(life_capture): chronicle dispatcher + parser (A3) #825

Closed

feat(life_capture, curated_memory): personal index + curated memory scratchpad (Plan #1 foundation) #817

Closed

jwalinsshah added 2 commits April 23, 2026 11:35

Merge remote-tracking branch 'upstream/main' into feat/a5-people

d93cbd4

# Conflicts: # src/openhuman/mod.rs

fix(clippy): remove PersonId::to_string shadow + derive Default on we…

02a9aba

…bview NotificationSettings

jwalin-shah marked this pull request as ready for review April 25, 2026 08:03

jwalin-shah requested a review from a team April 25, 2026 08:03

jwalinsshah added 2 commits May 3, 2026 23:50

merge upstream/main into feat/a5-people

6bbcdc3

merge upstream/main into feat/a5-people

45bcd9a

coderabbitai Bot requested changes May 4, 2026

View reviewed changes

jwalinsshah and others added 8 commits May 4, 2026 12:03

fix(agent-prompts): add curated_snapshot to PromptContext and test fi…

f01117b

…xtures Co-authored-by: Cursor <cursoragent@cursor.com>

style(rust): format curated_memory and eventkit modules

c0ba19d

Co-authored-by: Cursor <cursoragent@cursor.com>

fix(prompts): borrow curated_snapshot from PromptContext

bd55b19

Co-authored-by: Cursor <cursoragent@cursor.com>

fix(people): address registry and prompt review feedback

7170677

fix(people): expose people module

3a2511e

fix(people): apply rustfmt wrapping

08fafe7

This was referenced May 5, 2026

fix(installer): harden release asset resolution #1258

Merged

feat(people): add contact resolution and scoring #1259

Merged

feat(life_capture): add personal index and retrieval #1260

Closed

jwalin-shah closed this May 5, 2026

Conversation

jwalin-shah commented Apr 23, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Dependency

Test plan

Known follow-ups (not blockers)

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot May 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot May 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 4, 2026

Choose a reason for hiding this comment

Uh oh!

jwalin-shah commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jwalin-shah commented Apr 23, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 23, 2026 •

edited

Loading