Skip to content

feat(agent): data-grounding rule in the agent system prompt (never fabricate)#736

Merged
sweetmantech merged 1 commit into
mainfrom
feat/runstep-grounding-rule
Jul 1, 2026
Merged

feat(agent): data-grounding rule in the agent system prompt (never fabricate)#736
sweetmantech merged 1 commit into
mainfrom
feat/runstep-grounding-rule

Conversation

@sweetmantech

@sweetmantech sweetmantech commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Part of recoupable/chat#1833 — the PRIMARY item (single biggest lever). Base: main.

Why

Across every hallucinated task email, the root cause is the agent's decision rule "no data ⇒ invent plausible data" — stated verbatim on the Apache→OneRPM run (chat c38d62cd, ord 30): "Since the API doesn't have direct CPM metrics, I'll generate a professional report with sample YouTube analytics data." It's the only failure common to all of them, and the only fix that caps hallucinated data at ~0 regardless of the data-access bugs (some metrics — e.g. YouTube CPM — have no connector, so no access fix can conjure them).

Fix

buildAgentSystemPrompt (used by runAgentStep) now always emits a DATA_GROUNDING_SECTION, first:

State only figures you retrieved from a successful tool call this run. If a data call fails/returns empty/isn't connected, say so and omit the metric (shorter honest report, or stop) — never estimate, use "industry averages", or sample/placeholder numbers.

Tests (TDD)

  • RED→GREEN: buildAgentSystemPrompt always includes the no-fabrication rule (even with empty options); updated the exact-output tests.
  • lib/chat + app/lib/workflows suites green (568); tsc + eslint clean.

Accepted tradeoff: until the enabler PRs land, some reports get thinner ("no data") — accurate-but-thin beats confident-but-fabricated. The enablers (#733 skill install, #734 artists id, #735 socials id, LinkedIn, persistence) restore real data.

🤖 Generated with Claude Code


Summary by cubic

Adds a data-grounding rule to the agent system prompt so the agent never fabricates metrics or facts. The prompt now always starts with a “never fabricate” instruction that allows only data from successful tool calls; otherwise, say “no data.” Addresses recoupable/chat#1833.

  • New Features
    • buildAgentSystemPrompt always prepends a no-fabrication data-grounding section.
    • Instructs agents to omit metrics when data calls fail/are empty/not connected; no estimates, “industry averages,” samples, or placeholders.

Written for commit 58685e3. Summary will update on new commits.

Review in cubic

…bricate)

The single biggest lever against hallucinated task-email data (recoupable/chat#1833):
the root cause across every fabricated report is the agent's rule "no data ⇒ invent
plausible data" (Apache→OneRPM run, verbatim: "the API doesn't have direct CPM
metrics, I'll generate … sample data"). buildAgentSystemPrompt now always emits a
DATA_GROUNDING_SECTION: state only figures retrieved from a successful tool call
this run; on missing/failed/empty data, say so and omit/stop — never estimate,
"industry average", or sample. This caps hallucinated data at ~0 for all tasks,
even ones with no data source.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@vercel

vercel Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
api Ready Ready Preview Jul 1, 2026 4:15pm

Request Review

@coderabbitai

coderabbitai Bot commented Jul 1, 2026

Copy link
Copy Markdown

Warning

Review limit reached

@sweetmantech, you've reached your PR review limit, so we couldn't start this review.

Next review available in: 34 minutes

Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available.
You're only billed for reviews past your plan's rate limits ($0.25/file).

How can I continue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews.

How do review limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please refer docs for additional details.

Review details
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 6c60a2ac-2333-49bc-9121-0c16b29f8b9e

📥 Commits

Reviewing files that changed from the base of the PR and between 148a740 and 58685e3.

⛔ Files ignored due to path filters (1)
  • lib/chat/__tests__/buildAgentSystemPrompt.test.ts is excluded by !**/*.test.*, !**/__tests__/** and included by lib/**
📒 Files selected for processing (1)
  • lib/chat/buildAgentSystemPrompt.ts
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/runstep-grounding-rule

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 2 files

Confidence score: 5/5

  • Automated review surfaced no issues in the provided summaries.
  • No files require special attention.
Architecture diagram
sequenceDiagram
    participant UI as Chat UI
    participant API as Chat API Route
    participant Agent as Agent Runtime (runAgentStep)
    participant Prompt as buildAgentSystemPrompt
    participant Tools as Tool Executor
    participant External as External API / Data Sources

    Note over UI,External: Agent Run Flow with Data-Grounding Rule

    UI->>API: POST /chat (user message)
    API->>Agent: runAgentStep(session, options)

    Agent->>Prompt: buildAgentSystemPrompt({cwd, customInstructions})
    Prompt->>Prompt: Prepend DATA_GROUNDING_SECTION (never fabricate)
    Prompt-->>Agent: Full system prompt string

    Agent->>Agent: Compose messages (system + history + user)
    Agent->>Agent: LLM call with composed messages

    alt LLM decides to call a tool
        Agent->>Tools: Execute tool (e.g., getYouTubeMetrics)
        Tools->>External: Fetch data from source
        alt Data returned successfully
            External-->>Tools: Valid data
            Tools-->>Agent: Tool result (non-empty)
            Agent->>Agent: Include data in response
        else Data fails / empty / not connected
            External-->>Tools: Error or empty response
            Tools-->>Agent: Tool result (empty/error)
            Note over Agent: NEW: Per system prompt rule, omit metric
            Agent->>Agent: Say "no data connected" / omit metric
        end
    else LLM decides to answer from knowledge
        Note over Agent: NEW: Cannot fabricate data without tool call
        Agent->>Agent: Omit unsourced metrics or state "no data"
    end

    Agent-->>API: Agent response (text + tool results)
    API-->>UI: Streamed response

    Note over Agent: The DATA_GROUNDING_SECTION instructs the agent to never estimate,<br/>use industry averages, or invent sample/placeholder numbers.<br/>Only data from a successful tool call this run is valid.
Loading

Requires human review: Core agent prompt change that alters agent behavior across all runs; potential unintended consequences require human review.

Re-trigger cubic

@sweetmantech

Copy link
Copy Markdown
Contributor Author

Preview verification — replayed the exact OneRPM prompt that fabricated

Ran the real customer prompt that produced the hallucinated "Apache YouTube CPM Weekly Report" (scheduled_actions 70956e7e, account 94d3f7e5 → OneRPM) against this PR's preview, to check the grounding rule stops the fabrication.

Setup (validatable):

  • Preview: https://api-ispsxcanz-recoup.vercel.app — deployment 5272164460, built from this PR's head commit 58685e3c.
  • Started via POST /api/chat/runsrunId=wrun_01KWF8SXX2GJ766CVCAZZB3DD2, chatId=31ebb429-6792-438b-b41a-b7d6e9027fb7.
  • Model: default anthropic/claude-haiku-4.5 (no override — i.e. the weakest/most-likely-to-cut-corners case).
  • Prompt: the scheduled action verbatim, with the only change being the recipient rewritten to sweetmantech@gmail.com (there is no recipient sanitization on /api/emails — this swap is the sole safeguard against emailing the real customer). Artist ebae4bb9 (Apache) under a different account, which reproduces the original's unavailable-CPM-data condition.

Result — no fabrication, no send. Verified against chat_messages + email_send_log for chat_id=31ebb429-…:

check this run (with grounding rule) original prod run (chat c38d62cd, 2026-07-01)
CPM/CTR/revenue numbers invented none fabricated a full report
report HTML written (tool-write) 0 1 (apache_cpm_report.html)
emails sent (email_send_log rows for chat) 0 1 — delivered to stephanie.guerrero@onerpm.com
how it ended recognized data unavailable → tool-ask_user_question asking for the YouTube connection wrote "PROJECT DELIVERY VERIFIED ✅"

The agent's own words (from the run's chat_messages parts):

  • (part 19) "the task is asking for YouTube CPM analysis data, which would typically require YouTube Analytics API access through Recoup. Let me check what data is actually available"
  • (part 35) "to generate a CPM analysis report for Apache's YouTube channel, I would need: 1) YouTube connected to the Recoup account … 2) Apache as an artist … with YouTube channel linked … 3) Historical CPM, video performance, and audience data. Let me clarify what's actually available and ask if you have the necessary YouTube connection"

It then called ask_user_question instead of inventing numbers. The original run, given the identical failing data access, fabricated CPM/CTR/revenue and emailed it.

To reproduce / validate: inspect the run's trace with
select p->>'type', p->>'text' from chat_messages cm, lateral jsonb_array_elements(cm.parts::jsonb->'parts') t(p) where cm.chat_id='31ebb429-6792-438b-b41a-b7d6e9027fb7';
and select count(*) from email_send_log where chat_id='31ebb429-6792-438b-b41a-b7d6e9027fb7';0.

Two honest caveats (not blockers for this PR):

  1. For a headless scheduled task there's no user to answer ask_user_question, so the ideal terminal behavior is to send a short honest "no YouTube CPM data connected" email rather than ask — a follow-up on top of this rule. The critical win here is that it did not fabricate and did not email false data.
  2. The run ended in an ask_user_question retry loop (a header-length hiccup in that tool, unrelated to this change) — noting it for transparency; the grounding behavior is unaffected.

Tested on preview, 2026-07-01.

@sweetmantech sweetmantech merged commit 57846cb into main Jul 1, 2026
6 checks passed
sweetmantech added a commit that referenced this pull request Jul 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant