feat(agent): data-grounding rule in the agent system prompt (never fabricate)#736
Conversation
…bricate) The single biggest lever against hallucinated task-email data (recoupable/chat#1833): the root cause across every fabricated report is the agent's rule "no data ⇒ invent plausible data" (Apache→OneRPM run, verbatim: "the API doesn't have direct CPM metrics, I'll generate … sample data"). buildAgentSystemPrompt now always emits a DATA_GROUNDING_SECTION: state only figures retrieved from a successful tool call this run; on missing/failed/empty data, say so and omit/stop — never estimate, "industry average", or sample. This caps hallucinated data at ~0 for all tasks, even ones with no data source. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Warning Review limit reached
Next review available in: 34 minutes Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available. How can I continue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews. How do review limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window. Please refer docs for additional details. Review details⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (1)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
No issues found across 2 files
Confidence score: 5/5
- Automated review surfaced no issues in the provided summaries.
- No files require special attention.
Architecture diagram
sequenceDiagram
participant UI as Chat UI
participant API as Chat API Route
participant Agent as Agent Runtime (runAgentStep)
participant Prompt as buildAgentSystemPrompt
participant Tools as Tool Executor
participant External as External API / Data Sources
Note over UI,External: Agent Run Flow with Data-Grounding Rule
UI->>API: POST /chat (user message)
API->>Agent: runAgentStep(session, options)
Agent->>Prompt: buildAgentSystemPrompt({cwd, customInstructions})
Prompt->>Prompt: Prepend DATA_GROUNDING_SECTION (never fabricate)
Prompt-->>Agent: Full system prompt string
Agent->>Agent: Compose messages (system + history + user)
Agent->>Agent: LLM call with composed messages
alt LLM decides to call a tool
Agent->>Tools: Execute tool (e.g., getYouTubeMetrics)
Tools->>External: Fetch data from source
alt Data returned successfully
External-->>Tools: Valid data
Tools-->>Agent: Tool result (non-empty)
Agent->>Agent: Include data in response
else Data fails / empty / not connected
External-->>Tools: Error or empty response
Tools-->>Agent: Tool result (empty/error)
Note over Agent: NEW: Per system prompt rule, omit metric
Agent->>Agent: Say "no data connected" / omit metric
end
else LLM decides to answer from knowledge
Note over Agent: NEW: Cannot fabricate data without tool call
Agent->>Agent: Omit unsourced metrics or state "no data"
end
Agent-->>API: Agent response (text + tool results)
API-->>UI: Streamed response
Note over Agent: The DATA_GROUNDING_SECTION instructs the agent to never estimate,<br/>use industry averages, or invent sample/placeholder numbers.<br/>Only data from a successful tool call this run is valid.
Requires human review: Core agent prompt change that alters agent behavior across all runs; potential unintended consequences require human review.
Re-trigger cubic
Preview verification — replayed the exact OneRPM prompt that fabricatedRan the real customer prompt that produced the hallucinated "Apache YouTube CPM Weekly Report" ( Setup (validatable):
Result — no fabrication, no send. Verified against
The agent's own words (from the run's
It then called To reproduce / validate: inspect the run's trace with Two honest caveats (not blockers for this PR):
Tested on preview, |
Part of recoupable/chat#1833 — the PRIMARY item (single biggest lever). Base:
main.Why
Across every hallucinated task email, the root cause is the agent's decision rule "no data ⇒ invent plausible data" — stated verbatim on the Apache→OneRPM run (chat
c38d62cd, ord 30): "Since the API doesn't have direct CPM metrics, I'll generate a professional report with sample YouTube analytics data." It's the only failure common to all of them, and the only fix that caps hallucinated data at ~0 regardless of the data-access bugs (some metrics — e.g. YouTube CPM — have no connector, so no access fix can conjure them).Fix
buildAgentSystemPrompt(used byrunAgentStep) now always emits aDATA_GROUNDING_SECTION, first:Tests (TDD)
buildAgentSystemPromptalways includes the no-fabrication rule (even with empty options); updated the exact-output tests.lib/chat+app/lib/workflowssuites green (568); tsc + eslint clean.Accepted tradeoff: until the enabler PRs land, some reports get thinner ("no data") — accurate-but-thin beats confident-but-fabricated. The enablers (#733 skill install, #734 artists id, #735 socials id, LinkedIn, persistence) restore real data.
🤖 Generated with Claude Code
Summary by cubic
Adds a data-grounding rule to the agent system prompt so the agent never fabricates metrics or facts. The prompt now always starts with a “never fabricate” instruction that allows only data from successful tool calls; otherwise, say “no data.” Addresses recoupable/chat#1833.
buildAgentSystemPromptalways prepends a no-fabrication data-grounding section.Written for commit 58685e3. Summary will update on new commits.