Skip to content

fix(orchestrator): prefer live integrations over memory_tree for inbox/doc queries#1731

Merged
senamakel merged 12 commits into
tinyhumansai:mainfrom
senamakel:fix/orchestrator-prefer-live-integrations
May 14, 2026
Merged

fix(orchestrator): prefer live integrations over memory_tree for inbox/doc queries#1731
senamakel merged 12 commits into
tinyhumansai:mainfrom
senamakel:fix/orchestrator-prefer-live-integrations

Conversation

@senamakel
Copy link
Copy Markdown
Member

@senamakel senamakel commented May 14, 2026

Summary

  • Orchestrator was answering "any new emails?" / "summarise the notion doc I edited" from the retrospective memory_tree index instead of calling the live integration. Users had to explicitly say "search my gmail" to force the right behaviour.
  • Reworked the orchestrator prompt to disambiguate delegate_to_integrations_agent (live SaaS) from memory_tree (historical ingest), with an explicit decision-tree step that catches inbox/doc/calendar/chat phrasings before memory tools are considered.
  • No tool-surface changes — prompt-only fix, plus a test update for the new decision-tree wording.

Problem

Users complained they had to spell out "search my gmail" or "check my inbox" to get the orchestrator to actually hit the Gmail integration. Otherwise it would happily answer from memory_tree — which is a stale, retrospective summary of already-ingested content, not a live view of the inbox.

Root cause in the prompt:

  1. The "Delegation Decision Tree" listed memory_* under "direct tools" in Step 2, before the "specialised execution" step that mentions delegate_to_integrations_agent. Combined with the bias toward direct tools, the model preferred memory.
  2. The "Memory tree retrieval" section described memory_tree as a tool to "query the user's ingested email/chat/document history" with a mode literally documented as "Use for 'in my email last week…' intents" — exactly the phrasing users were tripping on.
  3. The Connected Integrations and Response Style example sections never gave a concrete live-integration delegation pattern to imitate.

Solution

src/openhuman/agent/agents/orchestrator/prompt.md:

  • Inserted a new Step 2 in the decision tree that explicitly routes any request naming or implying a connected external service (email/inbox/gmail/calendar/notion/drive/slack/whatsapp/telegram/linear/etc.) to delegate_to_integrations_agent before memory or other direct tools are considered. Explicit guidance: "Do this even if memory_tree could plausibly answer."
  • If the relevant toolkit is not in Connected Integrations, the orchestrator must point the user at Settings → Connections → [Service] rather than silently falling back to memory_tree.
  • Reframed the Memory tree retrieval section as historical-only. It is now described as a retrospective index over already-ingested content, not a live API. Listed the phrasings that genuinely warrant it ("what did we discuss last month", "summarise my recent activity"). Removed misleading language about "in my email last week…".
  • Added a concrete Gmail example to the Response Style section so the model has a delegation pattern to imitate, and annotated the existing Notion example as a live delegation.

src/openhuman/agent/agents/orchestrator/prompt.rs:

  • Updated build_includes_direct_first_decision_tree to assert the new decision-tree wording (live-integration step + the "do this even if memory_tree could plausibly answer" guidance) so a regression that removed Step 2 would fail the build.

Submission Checklist

  • Tests added or updated (happy path + at least one failure / edge case) per Testing Strategy
  • N/A: prompt-only change with no executable diff coverage signal; the existing orchestrator::prompt::tests suite was extended to assert the new decision-tree wording.
  • N/A: behaviour-only change — no new/removed/renamed feature rows in docs/TEST-COVERAGE-MATRIX.md.
  • N/A: no matrix feature IDs involved.
  • No new external network dependencies introduced (mock backend used per Testing Strategy)
  • N/A: no release-cut surfaces touched (orchestrator prompt only).
  • N/A: no linked issue — user-reported behaviour, fixed directly.

Impact

  • Runtime: desktop only (orchestrator agent prompt). No platform-specific code paths touched.
  • Behaviour: users asking about inbox / docs / calendar / chat state in natural language ("any new emails from alice today?", "summarise the latest notion doc") will now get a live integration call instead of a stale memory summary.
  • Compatibility: no schema or tool-surface change. memory_tree still available for retrospective queries.
  • Risk: prompt regressions are caught by the updated unit tests in prompt.rs. Worst-case behaviour mismatch is recoverable via a follow-up prompt tweak — no migrations or persisted state involved.

Related

  • Closes:
  • Follow-up PR(s)/TODOs:

AI Authored PR Metadata (required for Codex/Linear PRs)

Linear Issue

  • Key: N/A
  • URL: N/A

Commit & Branch

  • Branch: fix/orchestrator-prefer-live-integrations
  • Commit SHA: 74c2b0d

Validation Run

  • pnpm --filter openhuman-app format:check — N/A: no app/ files touched.
  • pnpm typecheck — N/A: no TS files touched.
  • Focused tests: cargo test --manifest-path Cargo.toml --lib orchestrator::prompt — 7 passed, 0 failed.
  • Rust fmt/check (if changed): orchestrator prompt module compiles via the focused test run above.
  • Tauri fmt/check (if changed): N/A — no app/src-tauri changes.

Validation Blocked

  • command: N/A
  • error: N/A
  • impact: N/A

Behavior Changes

  • Intended behavior change: orchestrator routes live inbox/doc/calendar/chat requests to delegate_to_integrations_agent instead of memory_tree.
  • User-visible effect: "any new emails?" hits Gmail; "what did we discuss last month" still hits memory.

Parity Contract

  • Legacy behavior preserved: memory_tree still callable for retrospective queries; all existing modes (search_entities, query_topic, query_source, query_global, drill_down, fetch_leaves) unchanged.
  • Guard/fallback/dispatch parity checks: orchestrator tool registry (agent.toml) untouched; agents::loader invariant "orchestrator must have memory_tree" still holds.

Duplicate / Superseded PR Handling

  • Duplicate PR(s): N/A
  • Canonical PR: N/A
  • Resolution: N/A

Summary by CodeRabbit

  • New Features

    • Improved orchestrator agent routing to prioritize live external services and explicit integration delegation; examples and guidance updated.
  • Chores

    • Bumped core dependency versions and adjusted a dependency feature for compatibility.
    • Mock API delay handling now enforces a configurable maximum cap to prevent excessive simulated delays.

Review Change Stack

rafaelfiguereod-stack and others added 11 commits May 10, 2026 14:39
…dates

Bumps the npm_and_yarn group with 1 update in the / directory: [basic-ftp](https://github.com/patrickjuchli/basic-ftp).


Updates `basic-ftp` from 5.3.0 to 5.3.1
- [Release notes](https://github.com/patrickjuchli/basic-ftp/releases)
- [Changelog](https://github.com/patrickjuchli/basic-ftp/blob/master/CHANGELOG.md)
- [Commits](patrickjuchli/basic-ftp@v5.3.0...v5.3.1)

Updates `fast-xml-builder` from 1.1.5 to 1.2.0
- [Changelog](https://github.com/NaturalIntelligence/fast-xml-builder/blob/main/CHANGELOG.md)
- [Commits](NaturalIntelligence/fast-xml-builder@v1.1.5...v1.2.0)

Updates `ip-address` from 10.1.0 to 10.2.0
- [Commits](https://github.com/beaugunderson/ip-address/commits)

---
updated-dependencies:
- dependency-name: basic-ftp
  dependency-version: 5.3.1
  dependency-type: indirect
  dependency-group: npm_and_yarn
- dependency-name: fast-xml-builder
  dependency-version: 1.2.0
  dependency-type: indirect
  dependency-group: npm_and_yarn
- dependency-name: ip-address
  dependency-version: 10.2.0
  dependency-type: indirect
  dependency-group: npm_and_yarn
...

Signed-off-by: dependabot[bot] <support@github.com>
Bumps the cargo group with 2 updates in the / directory: [openssl](https://github.com/rust-openssl/rust-openssl) and [rustls-webpki](https://github.com/rustls/webpki).


Updates `openssl` from 0.10.77 to 0.10.79
- [Release notes](https://github.com/rust-openssl/rust-openssl/releases)
- [Commits](rust-openssl/rust-openssl@openssl-v0.10.77...openssl-v0.10.79)

Updates `rustls-webpki` from 0.103.12 to 0.103.13
- [Release notes](https://github.com/rustls/webpki/releases)
- [Commits](rustls/webpki@v/0.103.12...v/0.103.13)

---
updated-dependencies:
- dependency-name: openssl
  dependency-version: 0.10.79
  dependency-type: indirect
  dependency-group: cargo
- dependency-name: rustls-webpki
  dependency-version: 0.103.13
  dependency-type: indirect
  dependency-group: cargo
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
…tofix-5

Potential fix for code scanning alert no. 5: Resource exhaustion
…arn/npm_and_yarn-9401a92e25

build(deps): bump the npm_and_yarn group across 1 directory with 3 updates
…go-83c3bdb6f7

build(deps): bump the cargo group across 1 directory with 2 updates
…ck delay cap

Addresses CI failures and reviewer feedback on PR tinyhumansai#1462:

- fix(rand): update rand API calls to 0.10 compat
  - src/core/auth.rs: RngCore::fill_bytes → RngExt::fill
  - src/openhuman/security/pairing.rs: same
  - src/openhuman/memory/tree/tree_source/registry.rs: thread_rng().gen() → rand::random()
  - src/openhuman/tools/impl/computer/human_path.rs: Rng → RngExt bound, fix float type ambiguity
- fix(sentry): remove "test" feature from production sentry dep (Cargo.toml:113)
- fix(mock-api): add MAX_MOCK_DELAY_MS=30_000 cap in getDelayMs() (scripts/mock-api/state.mjs)
- fix(mock-api): add re-export shim at scripts/mock-api-core.mjs for backward compat
- chore: resolve merge conflicts in Cargo.lock, pnpm-lock.yaml, app/src-tauri/Cargo.lock
The upstream repo has CodeQL default setup enabled, which rejects SARIF
uploads from advanced configurations with: "CodeQL analyses from advanced
configurations cannot be processed when the default setup is enabled".

Removing the workflow restores green CI; default setup already provides
the equivalent scanning at the repo level.
Per CodeRabbit suggestion + CLAUDE.md "Debug logging" policy: emit
entry/exit `log::trace!` markers around the token generation flow with
a stable `[auth]` prefix. No secret material is logged — only counts.
…x/doc queries

Users were having to explicitly tell the model to "search my gmail" because
the orchestrator was answering inbox/doc/calendar queries from the
retrospective memory_tree index instead of delegating to the live
integration. The prompt had two overlapping sections (Connected
Integrations + Memory tree retrieval) and the decision tree didn't
disambiguate them.

- Insert a new Step 2 in the decision tree that routes any request
  naming or implying a connected service (inbox/gmail/calendar/notion/
  drive/slack/etc.) to delegate_to_integrations_agent BEFORE memory or
  other direct tools are considered. Explicit "do this even if memory_tree
  could plausibly answer."
- Reframe the memory_tree section as historical-only: it's a retrospective
  index over ingested content, not a live API. Listed the phrasings that
  actually warrant it ("what did we discuss last month", "summarise recent
  activity").
- Added a gmail example to Response Style so the model has a concrete
  delegate_to_integrations_agent pattern to imitate, and noted on the
  existing notion example that it's a live delegation.
- Updated the decision-tree test in prompt.rs to assert the new wording.
@senamakel senamakel requested a review from a team May 14, 2026 12:01
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 14, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 840ab314-ab83-44f0-b3c6-5890cd09e8a7

📥 Commits

Reviewing files that changed from the base of the PR and between 74c2b0d and 6a27575.

📒 Files selected for processing (1)
  • src/openhuman/agent/agents/orchestrator/prompt.md

📝 Walkthrough

Walkthrough

Upgrade rand to 0.10 and modernize RNG APIs; add a 30s cap to mock API delays; update orchestrator prompt and tests to delegate live external-service requests to the integrations agent instead of memory_tree.

Changes

Rand 0.10 Upgrade & RNG API Modernization

Layer / File(s) Summary
Dependency version and feature updates
Cargo.toml
rand bumped from 0.9 to 0.10; removed test feature from main sentry dependency.
Security token generation modernization
src/core/auth.rs, src/openhuman/security/pairing.rs
Token generators switch from RngCore::fill_bytes() to RngExt::fill(); auth.rs adds trace-level logging around token generation.
RNG trait updates in utility and sampling modules
src/openhuman/memory/tree/tree_source/registry.rs, src/openhuman/tools/impl/computer/human_path.rs
new_summary_id uses rand::random(); human_path(), dwell_ms(), and sample_normal() now require RngExt instead of Rng.

Mock API Delay Clamping

Layer / File(s) Summary
Delay maximum and validation logic
scripts/mock-api/state.mjs
Added MAX_MOCK_DELAY_MS = 30_000; getDelayMs() validates numeric input and clamps positive finite delays to the maximum.

Orchestrator Agent Delegation Logic

Layer / File(s) Summary
Delegation decision tree and memory tree clarification
src/openhuman/agent/agents/orchestrator/prompt.md
Decision tree now detects service-implied requests, looks up connected toolkits, and delegates to delegate_to_integrations_agent; memory_tree limited to retrospective/ingested data and not used as live source-of-truth.
Prompt test assertion updates
src/openhuman/agent/agents/orchestrator/prompt.rs
Tests updated to match revised decision-tree wording and to assert routing of connected external-service requests to delegate_to_integrations_agent.

Sequence Diagram(s)

sequenceDiagram
  participant User
  participant Orchestrator
  participant ConnectedIntegrations as IntegrationsAgent
  participant MemoryTree

  User->>Orchestrator: request implying live external service (e.g., "check my Gmail")
  Orchestrator->>Orchestrator: detect service-implied request
  Orchestrator->>IntegrationsAgent: lookup toolkit & delegate_to_integrations_agent
  IntegrationsAgent-->>Orchestrator: toolkit response / action result
  Orchestrator-->>User: return live-service result
  Note over Orchestrator,MemoryTree: MemoryTree only used for retrospective/ingested context or if live toolkit is missing (explicitly notify user)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Poem

🐰 I nibble bytes and seeds anew,

rand refreshed, the traits accrue.
Delays trimmed short — thirty grand,
Agents now send calls by hand,
A rabbit cheers: delegation true!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and accurately summarizes the primary change: updating the orchestrator prompt to prefer delegating inbox/doc/calendar queries to live integrations instead of relying on the retrospective memory_tree index.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
src/openhuman/agent/agents/orchestrator/prompt.rs (1)

163-167: ⚡ Quick win

Add one assertion for the unconnected-integration branch.

These tests cover the live-routing branch well, but they don’t lock in the “Settings → Connections → [Service]” / “do not silently fall back” instruction. Adding a contains(...) assertion would guard the exact regression this PR is fixing.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/agent/agents/orchestrator/prompt.rs` around lines 163 - 167,
The test is missing an assertion that locks in the unconnected-integration
branch text, so add an assertion alongside the existing ones that checks the
rendered prompt body for the “Settings → Connections → [Service]” / “do not
silently fall back” instruction; specifically, in the same test where `body` is
asserted to contain the live-routing lines (references: `body`,
`delegate_to_integrations_agent`, `memory_tree`), add something like
assert!(body.contains("Settings → Connections → [Service]") or
assert!(body.contains("do not silently fall back")) to ensure the
unconnected-integration guidance is present.
src/openhuman/tools/impl/computer/human_path.rs (1)

122-122: ⚡ Quick win

Remove duplicated trait bound RngExt + RngExt at line 122.

The generic bound lists RngExt twice, which is redundant. Simplify to a single bound for clearer intent and to avoid lint noise.

Proposed fix
-fn sample_normal<R: RngExt + RngExt>(mean: f64, stddev: f64, rng: &mut R) -> f64 {
+fn sample_normal<R: RngExt>(mean: f64, stddev: f64, rng: &mut R) -> f64 {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/tools/impl/computer/human_path.rs` at line 122, The generic
bound on function sample_normal incorrectly repeats RngExt twice (R: RngExt +
RngExt); update the signature to use a single trait bound (e.g., R: RngExt) so
the function declaration reads with one RngExt bound while keeping the rng
parameter type (&mut R) unchanged; locate sample_normal and remove the duplicate
RngExt in the generic bounds.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/openhuman/agent/agents/orchestrator/prompt.md`:
- Around line 117-124: The example in the orchestrator prompt uses em dashes
(e.g., in the inline response lines containing "— wants to grab food" and
similar) which contradicts the prompt rule banning em dashes; update the example
text used with the delegate_to_integrations_agent/toolkit: "gmail" snippet to
replace em dashes with allowed punctuation (e.g., a hyphen '-' or a colon ':')
and ensure the revised lines preserve meaning and tone while conforming to the
no-em-dash style rule.

---

Nitpick comments:
In `@src/openhuman/agent/agents/orchestrator/prompt.rs`:
- Around line 163-167: The test is missing an assertion that locks in the
unconnected-integration branch text, so add an assertion alongside the existing
ones that checks the rendered prompt body for the “Settings → Connections →
[Service]” / “do not silently fall back” instruction; specifically, in the same
test where `body` is asserted to contain the live-routing lines (references:
`body`, `delegate_to_integrations_agent`, `memory_tree`), add something like
assert!(body.contains("Settings → Connections → [Service]") or
assert!(body.contains("do not silently fall back")) to ensure the
unconnected-integration guidance is present.

In `@src/openhuman/tools/impl/computer/human_path.rs`:
- Line 122: The generic bound on function sample_normal incorrectly repeats
RngExt twice (R: RngExt + RngExt); update the signature to use a single trait
bound (e.g., R: RngExt) so the function declaration reads with one RngExt bound
while keeping the rng parameter type (&mut R) unchanged; locate sample_normal
and remove the duplicate RngExt in the generic bounds.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 599d7ca9-39f7-4f17-9618-b4427595a684

📥 Commits

Reviewing files that changed from the base of the PR and between 23fbaec and 74c2b0d.

⛔ Files ignored due to path filters (2)
  • Cargo.lock is excluded by !**/*.lock
  • app/src-tauri/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (8)
  • Cargo.toml
  • scripts/mock-api/state.mjs
  • src/core/auth.rs
  • src/openhuman/agent/agents/orchestrator/prompt.md
  • src/openhuman/agent/agents/orchestrator/prompt.rs
  • src/openhuman/memory/tree/tree_source/registry.rs
  • src/openhuman/security/pairing.rs
  • src/openhuman/tools/impl/computer/human_path.rs

Comment thread src/openhuman/agent/agents/orchestrator/prompt.md Outdated
@senamakel senamakel merged commit 4d5e0d5 into tinyhumansai:main May 14, 2026
26 of 27 checks passed
senamakel added a commit to senamakel/openhuman that referenced this pull request May 14, 2026
Resolves a conflict in `orchestrator/prompt.md` where upstream's
tinyhumansai#1731 added a new step-2 "live external service" branch and
renumbered the decision tree (3 → 4). Keeps both upstream's
integrations-first ordering and the crypto branch from this PR
under the renumbered step 4 ("Does this need other specialised
execution?").
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants