Skip to content

feat(people): contact resolution + scoring + CNContactStore (A5)#821

Closed
jwalin-shah wants to merge 47 commits into
tinyhumansai:mainfrom
jwalin-shah:feat/a5-people
Closed

feat(people): contact resolution + scoring + CNContactStore (A5)#821
jwalin-shah wants to merge 47 commits into
tinyhumansai:mainfrom
jwalin-shah:feat/a5-people

Conversation

@jwalin-shah
Copy link
Copy Markdown
Contributor

@jwalin-shah jwalin-shah commented Apr 23, 2026

Summary

Adds the A5 piece of the v1 plan: the people module — handle resolution, scoring, and macOS Address Book seeding.

  • New src/openhuman/people/mod, types, resolver, scorer, store, address_book, rpc, schemas, migrations, tests
  • HandleResolver maps (iMessage handle | email | display name) → PersonId
  • Score = recency × frequency × reciprocity × depth (each clamped to [0,1], composite = product)
    • 14-day recency half-life
    • 30-day frequency window (rolling; not total-ever)
    • Reciprocity = min/max interaction ratio
  • Address Book seeded via objc2-contacts CNContactStore — triggers macOS Contacts TCC prompt, propagates PermissionDenied structurally through the RPC response (doesn't fail silently)
  • Non-macOS builds compile to a stub returning empty vec
  • Controllers exposed: people.list, people.resolve, people.score, people.refresh_address_book

Dependency

⚠️ Stacked on #817 (feat/life-capture-foundation). Diff includes A1 (be1112f9) on top of A5. Rebases off once #817 lands.

Shell counterpart: NSContactsUsageDescription string added to app/src-tauri/Info.plist on test/imessage-live-e2e (commit b9c4233a). Without it, CNContactStore returns empty rows silently under a signed build.

Test plan

  • Unit: resolver round-trip, scorer clamping + empty-interactions + one-sided-reciprocity, frequency-window correctness, idempotent seeding, permission-denied surfaces correctly
  • MockContactsSource covers happy / permission-denied / empty / display-name-only paths
  • Non-mac stub compiles and returns empty
  • cargo build --tests clean
  • cargo test --lib people — 36 passed

Known follow-ups (not blockers)

  • Alias rename accumulation — add_alias is INSERT OR IGNORE; Contact renames leave stale DisplayName aliases. Resolution stays deterministic; Person.handles accrues over time. Follow-up PR.

Summary by CodeRabbit

  • New Features

    • Added Life-Capture foundation: searchable personal index for ingested messages and calendar events with embeddings.
    • Added Curated Memory system: user-maintained MEMORY.md and USER.md files injected into conversation context.
    • Added People resolver: contact discovery and interaction-based scoring.
    • Integrated iMessage into life-capture ingestion pipeline.
  • Documentation

    • Added Life-Capture layer design spec and foundation milestone plan.
    • Added Track 1 ship pipeline plan for release workflow improvements.
  • Chores

    • Enhanced installer script with source-only mode and asset resolution.
    • Added test fixtures and end-to-end evaluation harness for retrieval.

Drafts the wedge for OpenHuman: continuous ingestion of Gmail/Calendar/Slack/
iMessage into a local personal_index.db, with a Today brief and retrieval-
tuned chat as the two surfaces. Three composable privacy modes (Convenience /
Hybrid / Fully local) sharing one code path. Track 1 unblocks the ship
pipeline; Track 2 builds the spine over 4 weeks. Reuses ingestion logic from
~/projects/inbox.
Two implementation plans drafted from the life-capture spec:

- Track 1 (ship pipeline): fix Ubuntu installer smoke, land in-flight PRs
  tinyhumansai#806/tinyhumansai#786/tinyhumansai#788/tinyhumansai#797, wire Tauri auto-updater + signed Mac/Windows builds.
- Life-Capture #1 (foundation): SQLite + sqlite-vec personal_index.db,
  Embedder trait + HostedEmbedder (OpenAI), PII redaction, quoted-thread
  stripping, hybrid retrieval (vector + keyword + recency), controller
  schema + RPC. End-to-end test with synthetic items. No ingestion or UI
  yet — those are subsequent milestone plans.
Refactors scripts/install.sh to expose resolve_asset_url and
verify_asset_reachable. Adds scripts/test_install.sh that exercises
the resolver against a committed fixture latest.json. Failures now
report the resolved URL and the parse error instead of dying silently.
Adopts the Hermes pattern (NousResearch/hermes-agent, MIT) of an
agent-writable, char-bounded curated memory file pair. F15 builds the
char-bounded store with atomic writes; F16 wires a session-start
snapshot into the system prompt (preserves prefix cache); F17 exposes
memory.{add,replace,remove,read} through the controller dispatch so
both the agent loop and skills can use it.

Sits between TinyHumans synthesis (volatile) and personal_index.db
(raw events) — fills the "deliberately curated facts" gap that
neither covered.
Adds src/openhuman/life_capture/ module tree (stubs for embedder,
index, migrations, quote_strip, redact, rpc, schemas, types, plus
tests/) per Foundation Plan F1. Wires the module in src/openhuman/mod.rs.

Adds sqlite-vec to deps and httpmock as a dev-dep. Reuses existing
rusqlite/regex/once_cell/async-trait/tempfile already in tree
(Option C — avoids sqlx/rusqlite libsqlite3-sys conflict).
…egex

Previously the country-code group required at least one digit (\+?\d{1,3}),
so '(415) 555-0123' (no leading country code) wouldn't match. Wrapped the
whole prefix in a non-capturing optional group so parenthesized area codes
match without a leading country code.
… table

Loads sqlite_vec via sqlite3_auto_extension exactly once per process so every
new connection picks up the vec0 module. PersonalIndex wraps a single rusqlite
Connection in Arc<Mutex<_>> for async sharing; open() runs migrations and sets
WAL+foreign_keys, open_in_memory() is for tests.

vec_version() round-trip and a 1536-dim INSERT + MATCH query both verified.
…_id) dedupe

Adds IndexWriter with upsert (ON CONFLICT(source, external_id) DO UPDATE — FTS
stays in sync via triggers) and upsert_vector (DELETE + INSERT since vec0 has
no ON CONFLICT for virtual table primary keys). Both wrap rusqlite work in
spawn_blocking and share the PersonalIndex's Arc<Mutex<Connection>>.

Exposes ensure_vec_extension_registered() crate-wide so the migrations test
can register vec0 before opening its in-memory connection.
- keyword_search: FTS5 ranked by negated bm25 with snippet markers (« » …),
  plus the user:local ACL EXISTS clause so the same query shape works for the
  multi-token team v2 ACL without rewrites.
- vector_search: sqlite-vec MATCH with k = ? clause (vec0 KNN requirement)
  and ORDER BY distance; score = 1/(1 + distance) so callers can blend it
  with keyword bm25 on the same monotonic scale.

Shared ItemRow + into_hit() helper lets both queries reuse the same row
shape; rusqlite query_map closures hand-build it because rusqlite has no
sqlx FromRow equivalent.
Pulls oversampled candidates (k*3 min 20) from both keyword and vector legs,
normalises each independently to [0,1], then re-ranks with
0.55 * vec_norm + 0.35 * kw_norm + 0.10 * exp(-age_days/30).

Documents present in only one leg get 0 for the missing signal but still
compete on the others; only items with neither signal are dropped. Same-vector
twin items now break their tie by recency.
…rch)

Adds two controllers exposed via the registry:
- life_capture.get_stats — total items, per-source counts, last-ingest ts.
- life_capture.search    — hybrid (vector + keyword + recency) search with
  embed-then-rank, optional k (default 10, capped at 100).

Runtime state (PersonalIndex + dyn Embedder) lives in life_capture::runtime
behind a tokio OnceCell because handlers are stateless fn(Map<String,Value>)
and have no per-call context object. F14 will call runtime::init() at app
startup; until then handlers return a structured 'not initialised' error so
the failure mode is loud, not silent.

Schemas registered in core/all.rs alongside cron.
… → search)

FakeEmbedder hashes input bytes into a deterministic sparse 1536-dim vector
so the same text round-trips identically through vec0 — keeps the e2e test
hermetic, no network call. Verifies that:
- quote-strip drops the 'On … wrote:' block before indexing,
- redact masks the email address before indexing,
- the upserted item resurfaces via hybrid_search with both transformations
  preserved on the returned text.
After Config::load_or_init succeeds, open the personal_index.db at
config.workspace_dir and register the life_capture runtime. Embedder is
env-gated:
  OPENHUMAN_EMBEDDINGS_KEY > OPENAI_API_KEY
  OPENHUMAN_EMBEDDINGS_URL (default: https://api.openai.com/v1)
  OPENHUMAN_EMBEDDINGS_MODEL (default: text-embedding-3-small)

If no key is set we still open the index file (so Plan #2's ingestion
worker can write to it) but skip runtime registration — controllers then
return the 'not initialised' error from runtime::get() instead of silently
calling a misconfigured embedder.

Config-schema integration deferred to a follow-up; env-driven keeps F14
non-invasive while we land the rest of the foundation.
…ection

snapshot_pair(memory, user) returns a MemorySnapshot containing plain
strings — no reference back to the stores — so taking a snapshot at session
start and reusing it across turns gives a stable system-prompt prefix and
lets the LLM prefix-cache hit on every subsequent turn.

Plan #1 also calls for wiring this into chat.rs' OpenClaw context loader,
but openhuman assembles agent prompts per-agent under
src/openhuman/agent/agents/*/prompt.rs (not the plan's stale
src-tauri/src/commands/chat.rs path); the prompt-builder integration is
deferred to Plan #2 alongside the agent-context refactor.
… controllers

Adds four controllers under the memory_curated namespace (the 'memory'
namespace is already owned by the long-term memory subsystem):
- memory_curated.read    — read MEMORY.md or USER.md
- memory_curated.add     — append entry, char-bounded
- memory_curated.replace — substring replace, char-bounded
- memory_curated.remove  — drop entries containing a needle

'file' input is a typed enum (memory|user) so adapters reject anything
else at validation time. Runtime state lives in curated_memory::runtime
behind a tokio OnceCell, mirroring life_capture's pattern: startup in
jsonrpc.rs constructs both stores at <workspace>/memories/ with Hermes'
char limits (memory: 2200, user: 1375) and registers the runtime.
Handlers return 'not initialised' until init runs, so failure is loud.
…ings

Findings from Codex + Gemini second-opinion review:

1. IndexWriter::upsert orphaned vectors on re-ingest (Codex). On
   (source, external_id) conflict the row's id was kept but the caller's
   fresh UUID was used for upsert_vector — the vector wrote under an id no
   row joined to. Fix: explicit SELECT-then-UPDATE-or-INSERT in the same
   transaction; mutates caller's Item.id to the canonical id (so the next
   upsert_vector lands on the right row) and orphan-deletes any vector
   already written under the wrong id. Signature change: upsert(&[Item])
   -> upsert(&mut [Item]).

2. upsert_vector DELETE+INSERT was outside a transaction (Gemini) — a
   failed INSERT permanently lost the item's vector. Now wrapped in tx.

3. Runtime over-gated get_stats on the embedder (Codex). Split runtime
   into separate INDEX + EMBEDDER OnceCells: get_stats only requires
   index, search requires both. Index initialises whenever the workspace
   dir is reachable; embedder is env-gated as before.

4. Startup race (both reviewers): runtime init lived inside the post-
   serve tokio::spawn block, so axum::serve was already accepting before
   the OnceCells were set. Hoisted both bootstraps (curated_memory and
   life_capture) into helpers called inline before axum::serve.

5. runtime::get error message lied (Codex) — said 'set embeddings.api_key
   in config' but startup actually reads env vars. Fixed text.

Bonus: rename controller namespace memory_curated -> curated_memory
(Codex preference; nothing depends on it yet so renamed before clients
do).

Adds regression test for #1 (reingest_with_fresh_uuid_keeps_vector_findable).
4696 lib tests pass.
…emory namespace rename

Pre-push hook auto-applied cargo fmt across the foundation files; also fixed the schemas.rs docstring still referencing the old memory_curated namespace.
Critical:
- curated_memory/store.rs: propagate read errors instead of `unwrap_or_default`,
  which could rewrite MEMORY.md / USER.md from an empty baseline on transient
  I/O or permission failures. Reject empty needle in replace/remove.
- curated_memory/rpc.rs: belt-and-suspenders empty-needle guard at the RPC
  boundary (remove with "" deletes every entry).
- life_capture/index.rs: upsert_vector rejects orphan ids (items row missing),
  which would have inserted vectors that never join in vector_search.
- life_capture/embedder.rs: validate response length matches input, indices
  are contiguous 0..n, and every vector matches dim() — prevents silent
  misalignment or wrong-dimensional vectors flowing into the 1536-wide
  sqlite-vec table. Added 30s request + 10s connect timeouts.

Retrieval hardening:
- life_capture/index.rs: keyword_search now escapes FTS5 operators by
  tokenizing on whitespace and wrapping each token as a quoted literal.
  Prevents errors or unintended matches from stray quotes, AND/OR/NEAR,
  column specifiers in user input.
- life_capture/index.rs: hybrid_search applies q.sources / q.since / q.until
  via post-filter (single consistent pass across keyword+vector fusion).
- life_capture/rpc.rs: handle_search validates embedder dim and response
  length against the fixed 1536-wide index column — clear RPC error instead
  of a cryptic sqlite-vec failure when the embeddings model is swapped.

Docs:
- docs/event-bus.md: remove erroneous `.await` on register_native_global
  (it is sync; the handler closure is async).
- life-capture design spec: soften PII redaction claim — only regex is
  implemented today; light NER is flagged as a future enhancement.
Rendered prompts now surface runtime curated-memory writes:

- `PromptContext` gains `curated_snapshot: Option<&MemorySnapshot>`.
- `UserFilesSection` prefers the snapshot over the workspace-file
  loader when one is attached, and injects `USER.md` alongside
  `MEMORY.md` using a byte-compatible `inject_snapshot_content` helper.
- `Session` carries `Option<Arc<MemorySnapshot>>`, populated by
  `ensure_curated_snapshot` on the first turn from
  `curated_memory::runtime::get()`. Reused across turns so prompt
  bytes stay frozen (KV-cache prefix contract) while mid-session
  `curated_memory.add/replace/remove` writes land on the NEXT session.
- `ParentExecutionContext` inherits the snapshot so sub-agents render
  identical `MEMORY.md`/`USER.md` blocks as the parent.
- Legacy workspace-file fallback preserved for embeds that don't
  initialise the curated-memory runtime (pure unit tests).
Loads a 32-item corpus across gmail/calendar/imessage/slack into an
in-memory PersonalIndex and runs 12 queries through keyword / vector /
hybrid paths, asserting must_contain / must_not_contain within per-query
top-K prefixes.

The vector leg uses a deterministic FNV-1a feature-hashing embedder
(1536 dims, L2-normalized) so the test is offline and reproducible; real
embedder swaps behind one call. Fixture reserves `relevant` and
`pending` fields so recall@k / MRR / nDCG bolt onto the same JSON later.

q-src-01 is marked pending: hybrid_search currently ignores
Query.sources/since/until. Flip to false once filtering lands.
Previously every read grabbed the same `Arc<Mutex<Connection>>` as the
writer, so `IndexReader` calls serialised behind in-flight ingests even
though WAL would allow parallel readers. Split `PersonalIndex` into:

- `writer: Arc<Mutex<Connection>>` — unchanged single-writer model.
- `reader_pool: Option<Arc<r2d2::Pool<SqliteConnectionManager>>>` —
  four-connection read pool on file-backed indexes, built alongside the
  writer after migrations run. Each pooled connection gets
  `query_only=ON` as a belt-and-suspenders guard; sqlite-vec is loaded
  automatically via the process-wide auto-extension.

`IndexReader` routes `keyword_search` / `vector_search` through a new
`with_read_conn` helper that picks the pool when present and falls
back to the writer lock otherwise. In-memory handles keep the
single-connection layout — shared-cache URIs buy nothing at
test-fixture scale and would force every test to grow a unique name.
- Add `life_capture.ingest` RPC controller (idempotent upsert by
  `(source, external_id)`, embeds text + atomic vector replace).
- Switch iMessage scanner's `ingest_group()` to POST
  `openhuman.life_capture_ingest` instead of `openhuman.memory_doc_ingest`,
  so chat-day transcripts land in the PersonalIndex (the same store
  `life_capture.search` reads from).
- Fix `life_capture.search` envelope: wrap hits in `{"hits": [...]}` per
  the schema (previously returned a bare array — obs 3165).
- New `#[ignore]`d e2e at `tests/imessage_ingest_e2e.rs` proves the bridge:
  ingest → search → re-ingest is idempotent (`total_items` unchanged,
  `replaced=true`).
New self-contained module `src/openhuman/people/` for A5 of the v1 plan.
Exposes people.list / people.resolve / people.score via the controller
registry; no branches added in core/cli.rs or core/jsonrpc.rs.

Files added
-----------
  src/openhuman/people/mod.rs
  src/openhuman/people/types.rs         PersonId, Handle{IMessage,Email,
                                        DisplayName}, Person, Interaction,
                                        ScoreComponents, AddressBookContact
  src/openhuman/people/resolver.rs      HandleResolver: resolve / resolve_or_create
                                        / link — deterministic across case +
                                        whitespace via Handle::canonicalize()
  src/openhuman/people/scorer.rs        score(interactions, now) → ScoreComponents
  src/openhuman/people/store.rs         SQLite store (in-memory or file-backed);
                                        plus process-global OnceCell handle for
                                        the schema adapters
  src/openhuman/people/migrations.rs    idempotent migrator (life_capture style)
  src/openhuman/people/migrations/0001_init.sql
                                        people, handle_aliases, interactions
  src/openhuman/people/address_book.rs  macOS Address Book read (see below)
  src/openhuman/people/rpc.rs           typed domain handlers
  src/openhuman/people/schemas.rs       ControllerSchema + adapters
  src/openhuman/people/tests.rs         cross-file integration tests

Wired into src/core/all.rs: registered_controllers, declared schemas, and
namespace_description("people"). Added `pub mod people;` in
src/openhuman/mod.rs. Nothing else touched.

Scoring
-------
Each component clamped to [0,1]; composite = product (also clamped).
  recency     exp(-ln2 * age_days / 14)            half-life 14 days
  frequency   count / 50, saturated                 cap 50 interactions
  reciprocity min(out, in) / max(out, in)           0 when one-sided
  depth       mean(length) / 500, saturated         cap 500 chars
Constants are module-level (RECENCY_HALF_LIFE_DAYS / FREQUENCY_CAP /
DEPTH_CAP_CHARS) so the first pass can be tuned from config later without
breaking the API.

Address Book read path
----------------------
Read the macOS Address Book directly from its SQLite stores at
`~/Library/Application Support/AddressBook/Sources/*/AddressBook-v22.abcddb`
via the existing `rusqlite` dep (opened with
SQLITE_OPEN_READ_ONLY | SQLITE_OPEN_NO_MUTEX). Returns `Vec<AddressBookContact>`
with display_name + emails + phones. A missing AddressBook dir or per-source
db is treated as zero contacts, not an error.

Pragmatic divergence from the plan: the original A5 spec said "objc2". objc2
is not currently in Cargo.toml; adding it plus Contacts.framework linkage is
a heavier dep surface than the SQLite read, which gives the same data, zero
new deps, and is trivially testable. If a future caller needs live change
notifications (KVO) we can revisit.

On non-mac, `address_book::read()` is a stub returning `Ok(vec![])`.

Tests (25 passing, cargo test people --lib)
------------------------------------------
  types::*                 — Handle canonicalization, key mapping
  migrations::*            — tables created, idempotent
  store::*                 — insert / lookup / list / interaction round-trip
  resolver::*              — deterministic across case/whitespace; same email
                             + different display_name → same PersonId;
                             distinct handles → distinct ids
  scorer::*                — empty=zero; half-life; unit-interval clamp;
                             one-sided → 0 reciprocity; deterministic
  schemas::*               — three schemas + three handlers; required fields
  rpc::*                   — list orders by score desc; resolve(create=false)
                             returns null for unknown
  address_book::*          — non-mac returns empty; mac read never panics
  tests::*                 — resolver + store cooperate across handle kinds;
                             PersonId string round-trip

Scope
-----
Self-contained. No life_capture / chronicle / nudges / UI integration —
those land in A6+ / A9 per the v1 plan.
…A5 review blockers)

- address_book.rs: replace direct rusqlite/SQLite read with CNContactStore
  (objc2-contacts 0.3.2). Properly triggers TCC Contacts permission prompt
  under signed/sandboxed builds. Returns structured AddressBookError with
  PermissionDenied variant so callers can surface actionable messages.
  Error downgrades promoted from debug! to warn!. Adds ContactsSource trait
  as a mockable seam; SystemContactsSource wraps real FFI; non-mac stub
  unchanged (returns empty vec).

- resolver.rs: add seed_from_address_book(source: &dyn ContactsSource) which
  wires the address book read into the people store. Iterates contacts, picks
  primary handle (email > phone > display name), calls resolve_or_create, then
  links aliases. Idempotent. Returns (seeded, skipped) counts and propagates
  AddressBookError. Tests cover: canned list, permission denied, idempotency,
  display-name-only, no-field skipped.

- rpc.rs: add handle_refresh_address_book exposing the seeding path over RPC.
  Returns { seeded, skipped, permission_denied } — never errors on permission
  denied (surfaces as a flag so the UI can prompt the user).

- schemas.rs: register people.refresh_address_book controller (4th controller).
  Update stale test counts (3 → 4).

- scorer.rs: replace total-ever FREQUENCY_CAP with a rolling window approach.
  Adds FREQUENCY_WINDOW_DAYS = 30 constant. frequency = interactions within
  last 30 days / FREQUENCY_CAP. Old-burst contacts no longer get inflated
  scores. New tests: old_burst_does_not_inflate_frequency_score,
  interactions_exactly_at_window_boundary_are_included.

- Cargo.toml: add objc2 = "0.6", objc2-foundation (NSArray/NSError/NSString/
  NSPredicate features), objc2-contacts = "0.3.2" (CNContact/CNContactFetchRequest/
  CNContactStore/CNLabeledValue/CNPhoneNumber), block2 = "0.6" under
  [target.'cfg(target_os = "macos")'.dependencies], matching A7 versions.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 23, 2026

Warning

Rate limit exceeded

@jwalin-shah has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 50 minutes and 30 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: fd43731a-2574-4a06-8b77-349f25d2ef52

📥 Commits

Reviewing files that changed from the base of the PR and between 45bcd9a and 08fafe7.

📒 Files selected for processing (36)
  • src/core/all.rs
  • src/openhuman/agent/agents/archivist/prompt.rs
  • src/openhuman/agent/agents/code_executor/prompt.rs
  • src/openhuman/agent/agents/critic/prompt.rs
  • src/openhuman/agent/agents/help/prompt.rs
  • src/openhuman/agent/agents/integrations_agent/prompt.rs
  • src/openhuman/agent/agents/loader.rs
  • src/openhuman/agent/agents/morning_briefing/prompt.rs
  • src/openhuman/agent/agents/orchestrator/prompt.rs
  • src/openhuman/agent/agents/planner/prompt.rs
  • src/openhuman/agent/agents/researcher/prompt.rs
  • src/openhuman/agent/agents/summarizer/prompt.rs
  • src/openhuman/agent/agents/tool_maker/prompt.rs
  • src/openhuman/agent/agents/tools_agent/prompt.rs
  • src/openhuman/agent/agents/trigger_reactor/prompt.rs
  • src/openhuman/agent/agents/trigger_triage/prompt.rs
  • src/openhuman/agent/agents/welcome/prompt.rs
  • src/openhuman/agent/debug/mod.rs
  • src/openhuman/agent/harness/fork_context.rs
  • src/openhuman/agent/harness/session/runtime.rs
  • src/openhuman/agent/harness/session/turn.rs
  • src/openhuman/agent/harness/session/types.rs
  • src/openhuman/agent/harness/subagent_runner/ops.rs
  • src/openhuman/agent/harness/subagent_runner/ops_tests.rs
  • src/openhuman/agent/prompts/mod.rs
  • src/openhuman/agent/prompts/mod_tests.rs
  • src/openhuman/agent/prompts/types.rs
  • src/openhuman/agent/triage/escalation.rs
  • src/openhuman/agent/triage/evaluator.rs
  • src/openhuman/curated_memory/mod.rs
  • src/openhuman/learning/prompt_sections.rs
  • src/openhuman/mod.rs
  • src/openhuman/people/rpc.rs
  • src/openhuman/people/store.rs
  • tests/agent_harness_public.rs
  • tests/calendar_grounding_e2e.rs
📝 Walkthrough

Walkthrough

This PR implements foundational infrastructure for life-capture (a personal index with SQLite+sqlite-vec storage, hybrid search, and local embedding support), curated-memory (session-scoped file-backed stores injected into agent prompts), and a people module (contact resolution and interaction-based scoring). It updates the iMessage scanner to use the new life_capture_ingest endpoint, hardens the installer with asset verification, bootstraps both systems at server startup, and adds comprehensive planning documentation.

Changes

Foundation Milestone: Life Capture, Curated Memory, People, and Agent Integration

Layer / File(s) Summary
Data Shape & Domain Types
src/openhuman/life_capture/types.rs, src/openhuman/curated_memory/types.rs, src/openhuman/people/types.rs
Introduces core types for life capture (Item, Source, Query, Hit, IndexStats), curated memory (MemoryFile, MemorySnapshot), and people management (PersonId, Handle, Person, Interaction, ScoreComponents).
SQLite Schema & Migrations
src/openhuman/life_capture/migrations/0001_init.sql, src/openhuman/life_capture/migrations/0002_vec.sql, src/openhuman/life_capture/migrations.rs, src/openhuman/people/migrations/0001_init.sql, src/openhuman/people/migrations.rs
Defines SQLite schemas for life capture (items table with FTS5 sync triggers, item_vectors with vec0), people (people, handle_aliases, interactions tables with indexes), and migration runners for both.
Index Implementation
src/openhuman/life_capture/index.rs, src/openhuman/life_capture/embedder.rs
Implements PersonalIndex with reader-pool pattern for concurrent access, IndexWriter for upsert/deduplication by (source, external_id), IndexReader for keyword/vector/hybrid search with recency scoring, and HostedEmbedder for OpenAI-compatible embedding requests.
Data Processing & Utilities
src/openhuman/life_capture/redact.rs, src/openhuman/life_capture/quote_strip.rs
Adds PII redaction (email, SSN, CC, phone regex replacement) and quoted-reply stripping for cleaned text storage.
People Store & Resolution
src/openhuman/people/store.rs, src/openhuman/people/resolver.rs, src/openhuman/people/scorer.rs, src/openhuman/people/address_book.rs
Implements file-backed PeopleStore with person/handle/interaction CRUD, HandleResolver for deterministic contact deduplication and address-book seeding, deterministic interaction-based scoring (recency/frequency/reciprocity/depth), and macOS AddressBook reader via objc2.
Curated Memory Store
src/openhuman/curated_memory/store.rs
Implements MemoryStore for file-backed memory (MEMORY.md/USER.md) with append/replace/remove operations, atomic writes, and character-limit enforcement.
RPC & Controller Handler Layer
src/openhuman/life_capture/rpc.rs, src/openhuman/life_capture/schemas.rs, src/openhuman/curated_memory/rpc.rs, src/openhuman/curated_memory/schemas.rs, src/openhuman/people/rpc.rs, src/openhuman/people/schemas.rs
Implements async JSON-RPC domain handlers for life capture (get_stats, search, ingest), curated memory (read, add, replace, remove), and people (list, resolve, refresh_address_book, score); registers controller schemas and wires handlers via all_controller_schemas()/all_registered_controllers().
Runtime & Bootstrap
src/openhuman/life_capture/runtime.rs, src/openhuman/curated_memory/runtime.rs, src/core/jsonrpc.rs
Adds process-global runtime singletons (OnceCell) for PersonalIndex, Embedder, and CuratedMemoryRuntime; implements bootstrap_curated_memory and bootstrap_life_capture in server startup that initialize stores, apply migrations, configure embedder from environment, and log initialization state.
Agent Session Integration
src/openhuman/agent/harness/session/types.rs, src/openhuman/agent/harness/session/builder.rs, src/openhuman/agent/harness/session/turn.rs, src/openhuman/agent/harness/session/runtime.rs, src/openhuman/agent/prompts/mod.rs
Adds Agent::curated_snapshot field to cache session-scoped memory snapshot, initializes on first turn via ensure_curated_snapshot(), and injects MEMORY.md/USER.md content into system prompt via UserFilesSection.
Module Exports & Wiring
src/openhuman/mod.rs, src/openhuman/life_capture/mod.rs, src/openhuman/curated_memory/mod.rs, src/openhuman/people/mod.rs, src/core/all.rs
Declares and exports new submodules for life_capture, curated_memory, people; re-exports aggregated schemas with namespaced names; adds people namespace to core controller registry.
iMessage Scanner Integration
app/src-tauri/src/imessage_scanner/mod.rs
Updates ingest_group to POST iMessage transcripts to openhuman.life_capture_ingest instead of memory_doc_ingest, includes computed ts from day string and restructured metadata.
Dependencies
Cargo.toml
Adds SQLite libraries (r2d2, r2d2_sqlite, sqlite-vec), macOS Contacts bindings (objc2, objc2-foundation, objc2-contacts, block2), and test HTTP mocking (httpmock).
Integration & E2E Tests
tests/imessage_ingest_e2e.rs, tests/life_capture_retrieval_eval.rs, src/openhuman/life_capture/tests/e2e.rs, tests/fixtures/life_capture/corpus.json, src/openhuman/people/tests.rs
Adds e2e test for iMessage ingest idempotency, retrieval evaluation harness with deterministic embeddings and fixture-based assertions, in-memory index redaction+quote-strip+search test, and integration tests validating resolver, schema, and UUID round-trip behavior.
Planning & Design Documentation
docs/superpowers/plans/2026-04-21-imessage-live-harness.md, docs/superpowers/plans/2026-04-22-life-capture-01-foundation.md, docs/superpowers/specs/2026-04-22-life-capture-layer-design.md, docs/superpowers/plans/2026-04-22-track1-ship-pipeline.md
Comprehensive milestone plan for life-capture foundation (schema, embedder, index, RPC, agent integration), design spec for the overall life-capture layer (two demo surfaces, privacy modes, UI behavior, Track 1/2 roadmap), and Track 1 ship pipeline plan (Ubuntu smoke test fix, PR sequencing, Tauri auto-updater wiring).

Installer Hardening and CI Preparation

Layer / File(s) Summary
Installer Script Refactoring
scripts/install.sh
Adds --source-only mode for sourcing without execution, introduces resolve_asset_url() to extract platform-specific URLs from latest.json with structured error codes, adds verify_asset_reachable() for HTTP HEAD validation with exponential backoff, and improves resolve_from_latest_json with richer logging for missing platforms.
Installer Testing
scripts/test_install.sh
New smoke-test script that sources install.sh in --source-only mode, tests resolve_asset_url against fixtures, validates correct URL resolution, and asserts missing-platform graceful failure.
Test Fixtures
scripts/fixtures/latest.json
JSON fixture providing version metadata and platform-specific download URLs (linux-x86_64 AppImage, darwin-aarch64 DMG) with empty signature fields.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client/iMessage<br/>Scanner
    participant LifeCapture as life_capture RPC<br/>(ingest/search)
    participant Index as PersonalIndex<br/>(writer/reader)
    participant Embedder as Embedder<br/>(OpenAI)
    participant SQLite as SQLite<br/>personal_index.db

    Client->>LifeCapture: POST ingest(source,<br/>external_id, text, ts)
    LifeCapture->>LifeCapture: Redact PII,<br/>strip quoted reply
    LifeCapture->>Index: upsert Item<br/>via IndexWriter
    Index->>SQLite: SELECT existing by<br/>(source, external_id)
    SQLite-->>Index: canonical UUID
    Index->>SQLite: INSERT/UPDATE items<br/>row
    LifeCapture->>Embedder: embed_batch([text])
    Embedder-->>LifeCapture: Vec<Vec<f32>>
    LifeCapture->>Index: upsert_vector<br/>(item_id, vector)
    Index->>SQLite: INSERT item_vectors
    LifeCapture-->>Client: RpcOutcome<br/>{item_id, replaced}
    
    Client->>LifeCapture: POST search(text, k)
    LifeCapture->>Embedder: embed_batch([text])
    Embedder-->>LifeCapture: query_vector
    LifeCapture->>Index: hybrid_search<br/>(query_vector, Query)
    Index->>SQLite: keyword_search (FTS5)
    Index->>SQLite: vector_search (vec0)
    Index->>Index: RRF fusion +<br/>recency scoring
    Index-->>LifeCapture: Vec<Hit>
    LifeCapture-->>Client: RpcOutcome<br/>{hits: [...]}
Loading
sequenceDiagram
    participant Agent as Agent<br/>Session
    participant CuratedMem as curated_memory<br/>runtime
    participant MemStore as MemoryStore<br/>(file-backed)
    participant Prompt as Prompt<br/>Builder

    Agent->>Agent: new Agent()<br/>curated_snapshot=None
    Agent->>Agent: turn() first turn
    Agent->>Agent: ensure_curated_snapshot()
    Agent->>CuratedMem: get() runtime
    CuratedMem->>MemStore: read MEMORY.md
    MemStore-->>CuratedMem: String
    CuratedMem->>MemStore: read USER.md
    MemStore-->>CuratedMem: String
    CuratedMem-->>Agent: MemorySnapshot<br/>{memory, user}
    Agent->>Agent: store as Arc
    Agent->>Prompt: build system prompt
    Prompt->>Prompt: UserFilesSection
    Prompt->>Agent: curated_snapshot()?
    Agent-->>Prompt: Some(snapshot)
    Prompt->>Prompt: inject snapshot<br/>MEMORY.md + USER.md
    Prompt-->>Agent: enriched prompt
    Agent-->>Agent: send to LLM
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120+ minutes

The PR introduces three major new modules (life_capture, curated_memory, people) with substantial implementation density: ~2,500 lines of schema/migration definitions, ~1,500 lines of index implementation with complex reader logic (FTS5/vec0/hybrid search/RRF fusion), ~1,000+ lines of store implementations, extensive RPC/handler wiring across all three modules, server bootstrap logic, agent session modifications, and comprehensive test coverage with fixtures and e2e tests. The variety of concerns (SQLite schema/migrations, Rust async/sync patterns, macOS FFI for address book, embeddings API client, text processing, scoring algorithms) demands separate mental models for each section. The changes are heterogeneous across file types and domains, requiring reviewers to context-switch between database design, Rust concurrency patterns, API contract definitions, and integration semantics.

Possibly related PRs

  • tinyhumansai/openhuman#509: Both PRs modify src/core/jsonrpc.rs to add startup-time bootstrap for background initialization—this PR adds bootstrap_curated_memory and bootstrap_life_capture while #509 starts composio periodic sync.
  • tinyhumansai/openhuman#1027: Both PRs modify the agent session lifecycle (src/openhuman/agent/harness/session/{builder.rs,turn.rs,types.rs}) to cache session-scoped context—this PR adds curated_snapshot initialization while #1027 adds tree-prefetch state.
  • tinyhumansai/openhuman#524: Both PRs expand the controller namespace registry in src/core/all.rs—this PR adds the "people" namespace entry alongside namespace descriptions.

🐰 Hops excitedly

A brand new life-capture spine is here!
With memories to store and embeddings so clear,
People to resolve, scores to compute,
Curated snapshots in prompts to boot!
The index is blazing, full-text and vec,
This foundation's so ready—what's next on the deck? ✨

@jwalin-shah jwalin-shah marked this pull request as ready for review April 25, 2026 08:03
@jwalin-shah jwalin-shah requested a review from a team April 25, 2026 08:03
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 15

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/core/all.rs (1)

88-190: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

People RPC controllers not wired into the global registry.

The people namespace is declared in namespace_description() (line 324) with description "Contact resolution and recency × frequency × reciprocity × depth scoring," but build_registered_controllers() lacks a call to controllers.extend(crate::openhuman::people::all_people_registered_controllers()). Without this wiring, the four RPC endpoints (people.list, people.resolve, people.score, people.refresh_address_book) remain unreachable via try_invoke_registered_rpc.

The all_people_registered_controllers function exists in src/openhuman/people/mod.rs, so add the wiring to build_registered_controllers() following the established pattern.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/core/all.rs` around lines 88 - 190, build_registered_controllers() is
missing the people controllers so the RPCs declared in namespace_description()
(people.*) are not reachable; add a call to extend the global registry with
crate::openhuman::people::all_people_registered_controllers() (following the
same pattern as other modules) so try_invoke_registered_rpc can find
people.list/resolve/score/refresh_address_book; locate the function
build_registered_controllers and insert
controllers.extend(crate::openhuman::people::all_people_registered_controllers())
in the appropriate alphabetical/section spot.
🧹 Nitpick comments (4)
scripts/test_install.sh (1)

23-27: 💤 Low value

Consider verifying the specific exit code for missing platform.

The test correctly expects a non-zero exit, but resolve_asset_url returns exit code 3 specifically for missing platforms. Verifying this would catch regressions where the function might fail for a different reason.

Proposed enhancement
 # Also test a missing platform produces exit code 3
-if resolve_asset_url "$FIXTURE" "linux" "aarch64" 2>/dev/null; then
-  echo "FAIL: expected non-zero exit for missing platform linux-aarch64"
+rc=0
+resolve_asset_url "$FIXTURE" "linux" "aarch64" 2>/dev/null || rc=$?
+if [[ "$rc" -ne 3 ]]; then
+  echo "FAIL: expected exit code 3 for missing platform linux-aarch64, got $rc"
   exit 1
 fi
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/test_install.sh` around lines 23 - 27, The test currently only
asserts that resolve_asset_url "$FIXTURE" "linux" "aarch64" exits non-zero;
change it to assert the specific exit code 3 returned for missing platforms by
capturing the command's exit status (run resolve_asset_url with the same args,
save $? into a variable) and then check that the variable equals 3, failing the
test if it does not — reference the resolve_asset_url call and the "$FIXTURE"
variable to locate and update the assertion logic.
scripts/install.sh (1)

203-217: 💤 Low value

verify_asset_reachable is defined but not called in the install flow.

The function is well-implemented with exponential backoff, but it's not wired into resolve_from_latest_json or the main install path. The planning doc (Task A3 Step 3) shows intent to call it after resolving the URL.

If this is intentional (deferred to a follow-up), consider adding a TODO comment. Otherwise, wire it in before the download step around line 415.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/install.sh` around lines 203 - 217, The verify_asset_reachable
function is implemented but never invoked; call
verify_asset_reachable("$asset_url") immediately after resolve_from_latest_json
returns the asset URL and before the actual download step, check its return code
and abort with a clear error/log if it fails (propagate the non-zero exit or
exit 4), and if you intentionally want to defer this, add a TODO comment in the
main install flow mentioning resolve_from_latest_json and verify_asset_reachable
so it’s not lost.
src/openhuman/life_capture/quote_strip.rs (1)

4-7: ⚡ Quick win

ON_DATE_WROTE misses multi-line attribution headers common in mobile clients.

.{1,200} with no (?s) flag cannot cross a newline, so Gmail's typical two-line attribution:

On Mon, Apr 21, 2026 at 9:14 AM, Sarah Smith
<sarah@example.com> wrote:

is never matched. The > filter below still strips the quoted lines, but the attribution header leaks into the returned content.

♻️ Proposed fix — use DOTALL with a non-greedy quantifier
-static ON_DATE_WROTE: Lazy<Regex> = Lazy::new(|| {
-    // "On <date>, <name> <email|>(?) wrote:" — match start of any line.
-    Regex::new(r"(?m)^On .{1,200}\bwrote:\s*$").unwrap()
-});
+static ON_DATE_WROTE: Lazy<Regex> = Lazy::new(|| {
+    // "On <date>, <name> <email|>(?) wrote:" — allow header to span 1-3 lines.
+    // (?s) makes `.` cross newlines; non-greedy `.{1,300}?` prevents over-capture.
+    Regex::new(r"(?ms)^On .{1,300}?\bwrote:\s*$").unwrap()
+});
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/life_capture/quote_strip.rs` around lines 4 - 7, The
ON_DATE_WROTE regex doesn't match multi-line attribution headers because DOT (.)
doesn't cross newlines; update the pattern used in ON_DATE_WROTE to enable
DOTALL and make the quantifier non-greedy so it can span at most ~200 chars
across lines (e.g., add the inline (?s) or combined (?ms) flag and use .{1,200}?
) — modify the Regex::new call that constructs ON_DATE_WROTE to use the
DOTALL-enabled, non-greedy form (refer to the ON_DATE_WROTE static and its
Regex::new invocation) so multi-line attribution blocks like Gmail's two-line
header are matched and stripped.
src/openhuman/people/rpc.rs (1)

16-54: 🏗️ Heavy lift

N+1 query pattern: one interactions_for round-trip per person

handle_list loads the full person list and then issues a separate interactions_for query for every person in the result set. With the 500-person cap that is up to 501 sequential DB round-trips before any sorting occurs. Adding a batch_interactions_for(person_ids) method to PeopleStore (a single SELECT … WHERE person_id IN (…) keyed by PersonId) would collapse this to two queries.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/people/rpc.rs` around lines 16 - 54, handle_list currently
calls PeopleStore::interactions_for(p.id) inside the loop, causing N+1 DB
round-trips; add a new PeopleStore::batch_interactions_for(person_ids:
&[PersonId]) -> Result<HashMap<PersonId, Vec<Interaction>>, _> (or similar) that
does one SELECT WHERE person_id IN (...), call it once before the for p in
people loop with the collected person IDs, replace per-person interactions_for
calls with lookups into the returned map, and preserve existing error mapping
(map_err(...)) and use the same score(&interactions, now) path so scoring and
JSON construction (in handle_list) remain unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/core/jsonrpc.rs`:
- Around line 908-912: The doc comment "Initializes long-lived socket/event-bus
infrastructure." was accidentally moved to bootstrap_curated_memory; move that
doc line (and any other lines belonging to the original comment block) back
above bootstrap_skill_runtime so bootstrap_skill_runtime regains its intended
documentation, and replace or remove the misplaced comment on
bootstrap_curated_memory so its own doc comment correctly describes
curated-memory initialization; locate the functions by their names
bootstrap_curated_memory and bootstrap_skill_runtime to make the change.

In `@src/openhuman/agent/prompts/mod.rs`:
- Around line 337-339: PromptContext is missing the curated_snapshot field used
in mod.rs; add pub curated_snapshot: Option<&'a
crate::openhuman::curated_memory::MemorySnapshot> to the PromptContext<'a>
struct in prompts/types.rs, then update the functions that construct
PromptContext—empty_prompt_context_for_static_sections() and
build_system_prompt() in turn.rs—to accept/propagate the curated snapshot (set
to Some(...) when available or None otherwise) so ctx.curated_snapshot is
populated before code calls inject_snapshot_content; reference PromptContext,
empty_prompt_context_for_static_sections, build_system_prompt, and the existing
Agent implementation in session/types.rs for the intended field shape.

In `@src/openhuman/curated_memory/rpc.rs`:
- Around line 19-72: Change the four handler signatures (handle_read,
handle_add, handle_replace, handle_remove) to return RpcOutcome<Value> instead
of Result<RpcOutcome<Value>, String>, and convert all early Err(...) returns
into RpcOutcome::error(...) returns; specifically, replace uses of the `?`
operator on pick(&file, rt) and map_err(...) on
store.read()/add()/replace()/remove() with explicit error handling that returns
RpcOutcome::error(format!(...)) (include the same error text like "read: {e}",
"add: {e}", etc.), and for validation failures (e.g., empty needle) return
RpcOutcome::error("needle must not be empty".into()); keep successful paths
returning RpcOutcome::new(...) as before.

In `@src/openhuman/curated_memory/store.rs`:
- Around line 88-95: snapshot_pair currently calls memory_store.read().await and
user_store.read().await separately, which can interleave with concurrent writes
and produce a mixed snapshot; fix by acquiring exclusive locks on both
MemoryStore instances before reading (use the stores' write().await or other
exclusive lock API) and always acquire them in a deterministic order to avoid
deadlocks (e.g., compare the pointer/address or some stable id of memory_store
vs user_store and lock the lower one first, then the other), then read the
in-memory contents into the MemorySnapshot and release both locks.

In `@src/openhuman/life_capture/embedder.rs`:
- Around line 19-33: The current HostedEmbedder::new uses .unwrap_or_else(|_|
reqwest::Client::new()) which silently drops configured timeouts if the builder
fails; change this to call .expect(...) with a clear message (e.g., "failed to
build reqwest client with timeouts") so the program fails loud on
misconfiguration instead of returning a timeout-less client; update the builder
call in HostedEmbedder::new (the reqwest::Client::builder()...build()
invocation) to use .expect and include a descriptive message.
- Around line 97-99: The dim() method is hardcoded to 1536 and must reflect the
model used by HostedEmbedder; update the HostedEmbedder struct to store an
embedding dimension (add a field like embedding_dim set during new(base_url,
api_key, model)), compute or map the correct dimension for the chosen model when
constructing HostedEmbedder, change fn dim(&self) -> usize to return
self.embedding_dim, and ensure embed_batch and any callers use this stored value
so models like text-embedding-3-large (3072) work correctly.

In `@src/openhuman/life_capture/index.rs`:
- Around line 545-547: The hybrid_search currently truncates candidates by using
oversample before apply_query_filters, causing filtered-out top results to leave
too few matches; update the logic around oversample/q.k and the calls to
keyword_search and vector_search so that filtering runs before final truncation:
either have keyword_search/vector_search return an untruncated candidate pool
(or accept a much larger dynamic oversample) and then call
apply_query_filters(query_vector, q, ...) to filter, and only after filtering
take the top q.k results to return; implement a loop or adaptive oversample
(e.g., double oversample up to a max) if necessary to ensure enough filtered
candidates, and apply the same change to the other similar block referenced
(around the vc/kw usage at the later 581-582 region) so both legs are filtered
prior to truncation.

In `@src/openhuman/life_capture/rpc.rs`:
- Around line 154-170: The code currently calls writer.upsert(&mut items) before
calling embedder.embed_batch(...) so a failed embed leaves the stored row with
an out-of-date vector; change the flow so embedding is performed and a valid
vector obtained before mutating storage: call embedder.embed_batch(...) and
obtain `vector` for the input text (using the same `text`, `items`, and
`canonical_id`/`requested_id` logic) first, then call writer.upsert(&mut items)
followed immediately by writer.upsert_vector(&canonical_id, &vector) (or perform
both upserts in a single atomic operation/transaction if supported) to ensure
the row is only updated when embedding has succeeded. Ensure references to
`IndexWriter::new`, `writer.upsert`, `embedder.embed_batch`,
`writer.upsert_vector`, `items`, `canonical_id`, and `requested_id` are updated
accordingly.

In `@src/openhuman/life_capture/schemas.rs`:
- Around line 188-193: The FieldSchema for "metadata" currently declares
TypeSchema::Option(Box::new(TypeSchema::String)) but the handler treats it as a
raw JSON object and defaults missing input to {} — change the schema to an
optional JSON/object type (e.g. TypeSchema::Option(Box::new(TypeSchema::Object))
or the project’s equivalent JSON/map type) so it matches the handler, and/or
update the handler to either accept a string and parse it into JSON or stop
defaulting missing values to {} (use None) so wire format and stored value
agree; apply the same fix to the other occurrence of "metadata" mentioned (lines
254-257) ensuring both FieldSchema(name: "metadata", ...) and the handler use
the same JSON object representation.

In `@src/openhuman/people/resolver.rs`:
- Around line 28-33: The resolve and link functions skip canonicalization and
should canonicalize handles/aliases before using the store: in resolve, call
handle.canonicalize() (or otherwise obtain the canonical Handle) and pass that
to store.lookup instead of the raw handle; in link, canonicalize the other alias
(e.g., call other.canonicalize()) before calling store.add_alias(pid, ...) so
aliases are stored in canonical form; ensure this mirrors resolve_or_create's
use of handle.canonicalize() so lookups and alias additions are consistent and
avoid duplicate PersonIds.
- Around line 40-74: The current resolve_or_create implementation races: it
calls store.lookup(&canonical) then creates a new PersonId with PersonId::new
and calls store.insert_person(&person, &[canonical]) which can produce duplicate
people under concurrent requests; fix by making the create-or-get atomic —
either implement an upsert at the store layer (e.g., an INSERT ... ON CONFLICT
DO UPDATE/RETURNING that returns the existing PersonId) and expose a
store.upsert_person_or_get_id(...) used by resolve_or_create, or after
insert_person call re-query store.lookup(&canonical) and return that id
(discarding the locally generated id) to ensure the returned id is the canonical
stored id; modify resolve_or_create, lookup, and insert_person/upsert methods
accordingly so the function always returns the database-authoritative PersonId.

In `@src/openhuman/people/rpc.rs`:
- Around line 120-143: handle_score currently treats a missing person the same
as a person with zero interactions because interactions_for returns an empty
Vec; update handle_score to first verify the person exists (e.g. call a
PeopleStore method like get_person, find_person, or person_exists on the
provided person_id) and return an Err (with a clear message including the
person_id) when the person is not found; only call interactions_for and compute
the score when the person lookup succeeds, preserving the existing RpcOutcome
construction for valid persons.
- Around line 68-79: The "created" flag is incorrect because resolve_or_create
returns Some for both existing and newly-created persons; update the logic in
the RPC handler (the block using resolver.resolve_or_create, resolver.resolve
and building RpcOutcome) to first check existence (e.g., call
resolver.resolve(&handle).await? or resolver.exists(&handle).await?) and only
call resolver.resolve_or_create(&handle).await? when absent — then set "created"
to true only when you actually inserted a new record (i.e., when the pre-check
returned None/false and resolve_or_create returned Some), otherwise set
"created" to false while still returning the existing person_id.

In `@src/openhuman/people/schemas.rs`:
- Around line 60-104: The output schemas for the people RPCs are missing the
"components" object and "handles" array that the RPC responses actually return;
update the FieldSchema entries inside the outputs vector (the TypeSchema::Array
of TypeSchema::Object) to include a "components" FieldSchema (TypeSchema::Object
with fields "recency", "frequency", "reciprocity", "depth" as TypeSchema::F64)
and a "handles" FieldSchema (TypeSchema::Array of TypeSchema::Object describing
per-handle fields returned by handle_list), matching required flags and comments
to the actual responses used by handle_score and handle_list so schema
introspection matches rpc.rs behavior.

In `@tests/imessage_ingest_e2e.rs`:
- Around line 9-13: The tests share a process-global SQLite DB so the assertion
that total_items == 1 in ingest_then_search_then_reingest_is_idempotent (and the
similar assertions around lines 63-76 and 160-166) is flaky under parallel test
execution; either enforce serial runs in the test invocation or, preferably,
make the assertions relative/robust by checking the expected delta or using >=1
(e.g. assert total_items >= previous_count + 1 or compare counts before/after to
assert an increment) and update the assertions in
search_response_shape_matches_schema and
ingest_then_search_then_reingest_is_idempotent accordingly so they no longer
assume an absolute count of 1.

---

Outside diff comments:
In `@src/core/all.rs`:
- Around line 88-190: build_registered_controllers() is missing the people
controllers so the RPCs declared in namespace_description() (people.*) are not
reachable; add a call to extend the global registry with
crate::openhuman::people::all_people_registered_controllers() (following the
same pattern as other modules) so try_invoke_registered_rpc can find
people.list/resolve/score/refresh_address_book; locate the function
build_registered_controllers and insert
controllers.extend(crate::openhuman::people::all_people_registered_controllers())
in the appropriate alphabetical/section spot.

---

Nitpick comments:
In `@scripts/install.sh`:
- Around line 203-217: The verify_asset_reachable function is implemented but
never invoked; call verify_asset_reachable("$asset_url") immediately after
resolve_from_latest_json returns the asset URL and before the actual download
step, check its return code and abort with a clear error/log if it fails
(propagate the non-zero exit or exit 4), and if you intentionally want to defer
this, add a TODO comment in the main install flow mentioning
resolve_from_latest_json and verify_asset_reachable so it’s not lost.

In `@scripts/test_install.sh`:
- Around line 23-27: The test currently only asserts that resolve_asset_url
"$FIXTURE" "linux" "aarch64" exits non-zero; change it to assert the specific
exit code 3 returned for missing platforms by capturing the command's exit
status (run resolve_asset_url with the same args, save $? into a variable) and
then check that the variable equals 3, failing the test if it does not —
reference the resolve_asset_url call and the "$FIXTURE" variable to locate and
update the assertion logic.

In `@src/openhuman/life_capture/quote_strip.rs`:
- Around line 4-7: The ON_DATE_WROTE regex doesn't match multi-line attribution
headers because DOT (.) doesn't cross newlines; update the pattern used in
ON_DATE_WROTE to enable DOTALL and make the quantifier non-greedy so it can span
at most ~200 chars across lines (e.g., add the inline (?s) or combined (?ms)
flag and use .{1,200}? ) — modify the Regex::new call that constructs
ON_DATE_WROTE to use the DOTALL-enabled, non-greedy form (refer to the
ON_DATE_WROTE static and its Regex::new invocation) so multi-line attribution
blocks like Gmail's two-line header are matched and stripped.

In `@src/openhuman/people/rpc.rs`:
- Around line 16-54: handle_list currently calls
PeopleStore::interactions_for(p.id) inside the loop, causing N+1 DB round-trips;
add a new PeopleStore::batch_interactions_for(person_ids: &[PersonId]) ->
Result<HashMap<PersonId, Vec<Interaction>>, _> (or similar) that does one SELECT
WHERE person_id IN (...), call it once before the for p in people loop with the
collected person IDs, replace per-person interactions_for calls with lookups
into the returned map, and preserve existing error mapping (map_err(...)) and
use the same score(&interactions, now) path so scoring and JSON construction (in
handle_list) remain unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d116a681-d12d-464b-9acb-52bdb04c8137

📥 Commits

Reviewing files that changed from the base of the PR and between ea97e64 and 45bcd9a.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (52)
  • Cargo.toml
  • app/src-tauri/src/imessage_scanner/mod.rs
  • docs/superpowers/plans/2026-04-21-imessage-live-harness.md
  • docs/superpowers/plans/2026-04-22-life-capture-01-foundation.md
  • docs/superpowers/plans/2026-04-22-track1-ship-pipeline.md
  • docs/superpowers/specs/2026-04-22-life-capture-layer-design.md
  • scripts/fixtures/latest.json
  • scripts/install.sh
  • scripts/test_install.sh
  • src/core/all.rs
  • src/core/jsonrpc.rs
  • src/openhuman/agent/harness/session/builder.rs
  • src/openhuman/agent/harness/session/runtime.rs
  • src/openhuman/agent/harness/session/turn.rs
  • src/openhuman/agent/harness/session/types.rs
  • src/openhuman/agent/prompts/mod.rs
  • src/openhuman/curated_memory/mod.rs
  • src/openhuman/curated_memory/rpc.rs
  • src/openhuman/curated_memory/runtime.rs
  • src/openhuman/curated_memory/schemas.rs
  • src/openhuman/curated_memory/store.rs
  • src/openhuman/curated_memory/types.rs
  • src/openhuman/life_capture/embedder.rs
  • src/openhuman/life_capture/index.rs
  • src/openhuman/life_capture/migrations.rs
  • src/openhuman/life_capture/migrations/0001_init.sql
  • src/openhuman/life_capture/migrations/0002_vec.sql
  • src/openhuman/life_capture/mod.rs
  • src/openhuman/life_capture/quote_strip.rs
  • src/openhuman/life_capture/redact.rs
  • src/openhuman/life_capture/rpc.rs
  • src/openhuman/life_capture/runtime.rs
  • src/openhuman/life_capture/schemas.rs
  • src/openhuman/life_capture/tests/e2e.rs
  • src/openhuman/life_capture/tests/mod.rs
  • src/openhuman/life_capture/types.rs
  • src/openhuman/mod.rs
  • src/openhuman/people/address_book.rs
  • src/openhuman/people/migrations.rs
  • src/openhuman/people/migrations/0001_init.sql
  • src/openhuman/people/mod.rs
  • src/openhuman/people/resolver.rs
  • src/openhuman/people/rpc.rs
  • src/openhuman/people/schemas.rs
  • src/openhuman/people/scorer.rs
  • src/openhuman/people/store.rs
  • src/openhuman/people/tests.rs
  • src/openhuman/people/types.rs
  • src/openhuman/webview_notifications/types.rs
  • tests/fixtures/life_capture/corpus.json
  • tests/imessage_ingest_e2e.rs
  • tests/life_capture_retrieval_eval.rs

Comment thread src/core/jsonrpc.rs
Comment on lines 908 to +912
/// Initializes long-lived socket/event-bus infrastructure.
/// Open MEMORY.md / USER.md at `<workspace>/memories/` and register the
/// curated-memory runtime singleton. Idempotent — second-init is a no-op.
async fn bootstrap_curated_memory(workspace_dir: &std::path::Path) {
use std::sync::Arc;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Stale doc comment misattributed to bootstrap_curated_memory.

The existing line 908 — /// Initializes long-lived socket/event-bus infrastructure. — was originally the doc comment for bootstrap_skill_runtime. Inserting bootstrap_curated_memory immediately after it caused that line to become the first line of the new function's doc comment, which is incorrect. bootstrap_skill_runtime (line 1003) is now doc-comment-free as a side effect.

📝 Suggested fix
-/// Initializes long-lived socket/event-bus infrastructure.
-/// Open MEMORY.md / USER.md at `<workspace>/memories/` and register the
-/// curated-memory runtime singleton. Idempotent — second-init is a no-op.
+/// Open MEMORY.md / USER.md at `<workspace>/memories/` and register the
+/// curated-memory runtime singleton. Idempotent — second-init is a no-op.
 async fn bootstrap_curated_memory(workspace_dir: &std::path::Path) {

And restore the comment above bootstrap_skill_runtime:

+/// Initializes long-lived socket/event-bus infrastructure.
 pub async fn bootstrap_skill_runtime() {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/core/jsonrpc.rs` around lines 908 - 912, The doc comment "Initializes
long-lived socket/event-bus infrastructure." was accidentally moved to
bootstrap_curated_memory; move that doc line (and any other lines belonging to
the original comment block) back above bootstrap_skill_runtime so
bootstrap_skill_runtime regains its intended documentation, and replace or
remove the misplaced comment on bootstrap_curated_memory so its own doc comment
correctly describes curated-memory initialization; locate the functions by their
names bootstrap_curated_memory and bootstrap_skill_runtime to make the change.

Comment thread src/openhuman/agent/prompts/mod.rs Outdated
Comment on lines +19 to +72
pub async fn handle_read(
rt: &CuratedMemoryRuntime,
file: String,
) -> Result<RpcOutcome<Value>, String> {
let store = pick(&file, rt)?;
let body = store.read().await.map_err(|e| format!("read: {e}"))?;
Ok(RpcOutcome::new(
json!({ "file": file, "body": body }),
vec![],
))
}

pub async fn handle_add(
rt: &CuratedMemoryRuntime,
file: String,
entry: String,
) -> Result<RpcOutcome<Value>, String> {
let store = pick(&file, rt)?;
store.add(&entry).await.map_err(|e| format!("add: {e}"))?;
Ok(RpcOutcome::new(json!({ "file": file, "ok": true }), vec![]))
}

pub async fn handle_replace(
rt: &CuratedMemoryRuntime,
file: String,
needle: String,
replacement: String,
) -> Result<RpcOutcome<Value>, String> {
if needle.is_empty() {
return Err("needle must not be empty".into());
}
let store = pick(&file, rt)?;
store
.replace(&needle, &replacement)
.await
.map_err(|e| format!("replace: {e}"))?;
Ok(RpcOutcome::new(json!({ "file": file, "ok": true }), vec![]))
}

pub async fn handle_remove(
rt: &CuratedMemoryRuntime,
file: String,
needle: String,
) -> Result<RpcOutcome<Value>, String> {
if needle.is_empty() {
return Err("needle must not be empty".into());
}
let store = pick(&file, rt)?;
store
.remove(&needle)
.await
.map_err(|e| format!("remove: {e}"))?;
Ok(RpcOutcome::new(json!({ "file": file, "ok": true }), vec![]))
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Handler return types violate the RpcOutcome<T> guideline.

All four handlers return Result<RpcOutcome<Value>, String> instead of RpcOutcome<Value>. The Err(String) path is being handled by the adapter in schemas.rs at the call site, but the handler signatures themselves do not comply.

Per coding guidelines: "Use RpcOutcome<T> for RPC controller return types per the AGENTS.md specification" (src/openhuman/*/rpc.rs).

♻️ Proposed fix — representative handler, apply same pattern to all four
-pub async fn handle_read(
+pub async fn handle_read(
     rt: &CuratedMemoryRuntime,
     file: String,
-) -> Result<RpcOutcome<Value>, String> {
-    let store = pick(&file, rt)?;
-    let body = store.read().await.map_err(|e| format!("read: {e}"))?;
-    Ok(RpcOutcome::new(
-        json!({ "file": file, "body": body }),
-        vec![],
-    ))
+) -> RpcOutcome<Value> {
+    let store = match pick(&file, rt) {
+        Ok(s) => s,
+        Err(e) => return RpcOutcome::error(e),
+    };
+    match store.read().await {
+        Ok(body) => RpcOutcome::new(json!({ "file": file, "body": body }), vec![]),
+        Err(e) => RpcOutcome::error(format!("read: {e}")),
+    }
 }

Apply the same transformation to handle_add, handle_replace, and handle_remove, converting each Err(...) return to RpcOutcome::error(...).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/curated_memory/rpc.rs` around lines 19 - 72, Change the four
handler signatures (handle_read, handle_add, handle_replace, handle_remove) to
return RpcOutcome<Value> instead of Result<RpcOutcome<Value>, String>, and
convert all early Err(...) returns into RpcOutcome::error(...) returns;
specifically, replace uses of the `?` operator on pick(&file, rt) and
map_err(...) on store.read()/add()/replace()/remove() with explicit error
handling that returns RpcOutcome::error(format!(...)) (include the same error
text like "read: {e}", "add: {e}", etc.), and for validation failures (e.g.,
empty needle) return RpcOutcome::error("needle must not be empty".into()); keep
successful paths returning RpcOutcome::new(...) as before.

Comment on lines +88 to +95
pub async fn snapshot_pair(
memory_store: &MemoryStore,
user_store: &MemoryStore,
) -> std::io::Result<crate::openhuman::curated_memory::types::MemorySnapshot> {
Ok(crate::openhuman::curated_memory::types::MemorySnapshot {
memory: memory_store.read().await?,
user: user_store.read().await?,
})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

snapshot_pair is not actually point-in-time.

It reads the two files independently without holding either store's write lock, so a concurrent add/replace/remove can land between those awaits and produce a mixed snapshot from two different moments.

Suggested fix
 pub async fn snapshot_pair(
     memory_store: &MemoryStore,
     user_store: &MemoryStore,
 ) -> std::io::Result<crate::openhuman::curated_memory::types::MemorySnapshot> {
+    let _memory_guard = memory_store.write_lock.lock().await;
+    let _user_guard = user_store.write_lock.lock().await;
+
     Ok(crate::openhuman::curated_memory::types::MemorySnapshot {
-        memory: memory_store.read().await?,
-        user: user_store.read().await?,
+        memory: tokio::fs::read_to_string(&memory_store.file_path).await?,
+        user: tokio::fs::read_to_string(&user_store.file_path).await?,
     })
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/curated_memory/store.rs` around lines 88 - 95, snapshot_pair
currently calls memory_store.read().await and user_store.read().await
separately, which can interleave with concurrent writes and produce a mixed
snapshot; fix by acquiring exclusive locks on both MemoryStore instances before
reading (use the stores' write().await or other exclusive lock API) and always
acquire them in a deterministic order to avoid deadlocks (e.g., compare the
pointer/address or some stable id of memory_store vs user_store and lock the
lower one first, then the other), then read the in-memory contents into the
MemorySnapshot and release both locks.

Comment on lines +19 to +33
impl HostedEmbedder {
pub fn new(base_url: String, api_key: String, model: String) -> Self {
let http = reqwest::Client::builder()
.timeout(std::time::Duration::from_secs(30))
.connect_timeout(std::time::Duration::from_secs(10))
.build()
.unwrap_or_else(|_| reqwest::Client::new());
Self {
base_url,
api_key,
model,
http,
}
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

unwrap_or_else(|_| reqwest::Client::new()) silently drops the configured timeouts.

If the builder fails (rare, but possible on TLS init issues), the fallback client has no request or connect timeout, meaning embed_batch can hang indefinitely. Since a builder failure here would be a fatal misconfiguration, .expect is more appropriate.

🛡️ Proposed fix
-        .build()
-        .unwrap_or_else(|_| reqwest::Client::new());
+        .build()
+        .expect("failed to build HostedEmbedder HTTP client");
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
impl HostedEmbedder {
pub fn new(base_url: String, api_key: String, model: String) -> Self {
let http = reqwest::Client::builder()
.timeout(std::time::Duration::from_secs(30))
.connect_timeout(std::time::Duration::from_secs(10))
.build()
.unwrap_or_else(|_| reqwest::Client::new());
Self {
base_url,
api_key,
model,
http,
}
}
}
impl HostedEmbedder {
pub fn new(base_url: String, api_key: String, model: String) -> Self {
let http = reqwest::Client::builder()
.timeout(std::time::Duration::from_secs(30))
.connect_timeout(std::time::Duration::from_secs(10))
.build()
.expect("failed to build HostedEmbedder HTTP client");
Self {
base_url,
api_key,
model,
http,
}
}
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/life_capture/embedder.rs` around lines 19 - 33, The current
HostedEmbedder::new uses .unwrap_or_else(|_| reqwest::Client::new()) which
silently drops configured timeouts if the builder fails; change this to call
.expect(...) with a clear message (e.g., "failed to build reqwest client with
timeouts") so the program fails loud on misconfiguration instead of returning a
timeout-less client; update the builder call in HostedEmbedder::new (the
reqwest::Client::builder()...build() invocation) to use .expect and include a
descriptive message.

Comment on lines +40 to +74
if let Some(id) = self
.store
.lookup(&canonical)
.await
.map_err(|e| format!("lookup: {e}"))?
{
return Ok(id);
}
let id = PersonId::new();
let (display_name, primary_email, primary_phone) = match &canonical {
Handle::DisplayName(s) => (Some(s.clone()), None, None),
Handle::Email(s) => (None, Some(s.clone()), None),
Handle::IMessage(s) => {
if s.contains('@') {
(None, Some(s.clone()), None)
} else {
(None, None, Some(s.clone()))
}
}
};
let now = Utc::now();
let person = Person {
id,
display_name,
primary_email,
primary_phone,
handles: vec![canonical.clone()],
created_at: now,
updated_at: now,
};
self.store
.insert_person(&person, &[canonical])
.await
.map_err(|e| format!("insert_person: {e}"))?;
Ok(id)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify whether PeopleStore::insert_person uses an atomic upsert /
# ON CONFLICT IGNORE on the handle table, which would make the TOCTOU benign.
rg -n "insert_person\|INSERT\|UNIQUE\|ON CONFLICT" --type=rust -A3 -B3 \
    src/openhuman/people/store.rs

Repository: tinyhumansai/openhuman

Length of output: 48


🏁 Script executed:

# Find store-related files in the people subdirectory
fd -t f "store" src/openhuman/people/

Repository: tinyhumansai/openhuman

Length of output: 96


🏁 Script executed:

# Also check what files exist in the people directory
ls -la src/openhuman/people/

Repository: tinyhumansai/openhuman

Length of output: 859


🏁 Script executed:

wc -l src/openhuman/people/store.rs

Repository: tinyhumansai/openhuman

Length of output: 100


🏁 Script executed:

cat -n src/openhuman/people/store.rs

Repository: tinyhumansai/openhuman

Length of output: 17945


🏁 Script executed:

head -100 src/openhuman/people/migrations.rs

Repository: tinyhumansai/openhuman

Length of output: 2718


🏁 Script executed:

cat src/openhuman/people/migrations/0001_init.sql

Repository: tinyhumansai/openhuman

Length of output: 1608


Non-atomic check-then-insert: concurrent calls can mint duplicate PersonIds

The lookup on lines 40–47 and the insert_person on lines 70–73 are separate async operations. Two concurrent resolve_or_create calls for the same canonical handle can both observe None from lookup and then race to insert. Both will insert distinct Person records into the database; the INSERT OR IGNORE on the handle_aliases table prevents a constraint violation there, but the people table rows are already created with different PersonIds, breaking the invariant that the same handle always returns the same id.

The fix requires either an atomic upsert (e.g., INSERT...ON CONFLICT DO UPDATE RETURNING) or a re-select of the canonical handle after insertion to return the actual inserted PersonId rather than the locally generated one.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/people/resolver.rs` around lines 40 - 74, The current
resolve_or_create implementation races: it calls store.lookup(&canonical) then
creates a new PersonId with PersonId::new and calls store.insert_person(&person,
&[canonical]) which can produce duplicate people under concurrent requests; fix
by making the create-or-get atomic — either implement an upsert at the store
layer (e.g., an INSERT ... ON CONFLICT DO UPDATE/RETURNING that returns the
existing PersonId) and expose a store.upsert_person_or_get_id(...) used by
resolve_or_create, or after insert_person call re-query store.lookup(&canonical)
and return that id (discarding the locally generated id) to ensure the returned
id is the canonical stored id; modify resolve_or_create, lookup, and
insert_person/upsert methods accordingly so the function always returns the
database-authoritative PersonId.

Comment thread src/openhuman/people/rpc.rs Outdated
Comment thread src/openhuman/people/rpc.rs
Comment on lines +60 to +104
outputs: vec![FieldSchema {
name: "people",
ty: TypeSchema::Array(Box::new(TypeSchema::Object {
fields: vec![
FieldSchema {
name: "person_id",
ty: TypeSchema::String,
comment: "Stable UUID for this person.",
required: true,
},
FieldSchema {
name: "display_name",
ty: TypeSchema::Option(Box::new(TypeSchema::String)),
comment: "Best-known display name, when set.",
required: false,
},
FieldSchema {
name: "primary_email",
ty: TypeSchema::Option(Box::new(TypeSchema::String)),
comment: "Primary email, when set.",
required: false,
},
FieldSchema {
name: "primary_phone",
ty: TypeSchema::Option(Box::new(TypeSchema::String)),
comment: "Primary phone, when set.",
required: false,
},
FieldSchema {
name: "score",
ty: TypeSchema::F64,
comment: "Composite person-score in [0,1].",
required: true,
},
FieldSchema {
name: "interaction_count",
ty: TypeSchema::U64,
comment: "Observed interactions contributing to the score.",
required: true,
},
],
})),
comment: "Ranked people, highest score first.",
required: true,
}],
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

components (and handles) are absent from both output schemas but present in the RPC responses

handle_score returns "components": { "recency", "frequency", "reciprocity", "depth" } (rpc.rs lines 133–138), but the score output schema only declares person_id, score, and interaction_count.

handle_list additionally returns a "handles" array per person (rpc.rs lines 28–35) and per-person "components" (rpc.rs lines 44–49); neither appears in the list output schema.

These omissions make the schemas inaccurate as documentation and would cause any code-generation or LLM tool-use path relying on schema introspection to ignore the most useful fields.

Also applies to: 158-177

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/people/schemas.rs` around lines 60 - 104, The output schemas
for the people RPCs are missing the "components" object and "handles" array that
the RPC responses actually return; update the FieldSchema entries inside the
outputs vector (the TypeSchema::Array of TypeSchema::Object) to include a
"components" FieldSchema (TypeSchema::Object with fields "recency", "frequency",
"reciprocity", "depth" as TypeSchema::F64) and a "handles" FieldSchema
(TypeSchema::Array of TypeSchema::Object describing per-handle fields returned
by handle_list), matching required flags and comments to the actual responses
used by handle_score and handle_list so schema introspection matches rpc.rs
behavior.

Comment on lines +9 to +13
//! same process-global `life_capture::runtime` OnceCells (index + embedder).
//! That's by design: the OnceCells model the production startup contract, and
//! the tests run serially so a single shared init is correct.
//!
//! Run with: `cargo test --test imessage_ingest_e2e -- --ignored --nocapture`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Parallel execution makes the total_items == 1 assertion non-deterministic.

The file comment says "the tests run serially", but the run command cargo test --test imessage_ingest_e2e -- --ignored --nocapture uses the default multi-thread runner. Both tests share the same process-global SQLite DB (via the OnceCell / leaked tempdir). If search_response_shape_matches_schema wins the race and seeds "shape-guard:2026-04-22" before ingest_then_search_then_reingest_is_idempotent hits the stats assertion, total_items will be 2 and the test fails.

Two equivalent fixes:

Option A — enforce serial execution in the run instructions:

-//! Run with: `cargo test --test imessage_ingest_e2e -- --ignored --nocapture`
+//! Run with: `cargo test --test imessage_ingest_e2e -- --ignored --nocapture --test-threads=1`

Option B — make the assertion relative (robust to parallel runs and more future-proof):

📝 Relative-count assertion
+    // Capture the baseline item count before ingesting anything.
+    let env_stats0 = post_rpc(&base, "openhuman.life_capture_get_stats", json!({})).await;
+    let stats0_total = rpc_result_body(&env_stats0)["total_items"]
+        .as_u64()
+        .expect("total_items u64");
+
     // First ingest — should INSERT.
     // ... (existing ingest call) ...

     // Stats — should show exactly one new item.
     let env_stats1 = post_rpc(&base, "openhuman.life_capture_get_stats", json!({})).await;
     let stats1 = rpc_result_body(&env_stats1);
     assert_eq!(
         stats1["total_items"],
-        json!(1),
-        "expected total_items=1 after first ingest, got {stats1}"
+        json!(stats0_total + 1),
+        "expected exactly one new item after first ingest, got {stats1}"
     );

Also applies to: 63-76, 160-166

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/imessage_ingest_e2e.rs` around lines 9 - 13, The tests share a
process-global SQLite DB so the assertion that total_items == 1 in
ingest_then_search_then_reingest_is_idempotent (and the similar assertions
around lines 63-76 and 160-166) is flaky under parallel test execution; either
enforce serial runs in the test invocation or, preferably, make the assertions
relative/robust by checking the expected delta or using >=1 (e.g. assert
total_items >= previous_count + 1 or compare counts before/after to assert an
increment) and update the assertions in search_response_shape_matches_schema and
ingest_then_search_then_reingest_is_idempotent accordingly so they no longer
assume an absolute count of 1.

jwalinsshah and others added 8 commits May 4, 2026 12:03
…xtures

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
…uplicates

- Introduce CuratedMemoryPromptSnapshot in prompt types so CI builds without
  the curated_memory domain still resolve PromptContext.
- Use that type for ParentExecutionContext, Agent session storage, and
  PromptContext; map from curated_memory::MemorySnapshot when taking snapshots.
- Remove duplicate curated_snapshot fields from bad merges (struct literals
  and PromptContext construction).
- Fix learning tests PromptContext initializer.

Co-authored-by: Cursor <cursoragent@cursor.com>
CI failed with E0063 after ParentExecutionContext gained curated_snapshot;
triage escalation, session turn builder, subagent tests, and integration
stubs were still missing the field.

Co-authored-by: Cursor <cursoragent@cursor.com>
@jwalin-shah
Copy link
Copy Markdown
Contributor Author

Closing as superseded. The contacts/person resolution work we still need is split into #1259. The remaining memory/life-capture/curated-memory portions are intentionally not being carried forward here because that work is covered separately.

@jwalin-shah jwalin-shah closed this May 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants