-
Notifications
You must be signed in to change notification settings - Fork 24
Description
Parent
Part of #204 (Phase 4: Hardening)
Problem
Users frequently send messages to the wrong agent � asking a polecat or refinery agent a question meant for the Mayor. The terminal tabs for agents and the Mayor are visually similar, and it's easy to send a message to whichever tab happens to be focused. The polecat either ignores the message, tries to interpret it as a work instruction, or responds confusingly.
Solution
Add a detection heuristic to polecat and refinery system prompts that identifies when an incoming message looks like it was meant for the Mayor, and responds with a helpful redirect instead of trying to act on it.
What to detect
Messages that:
- Ask about town-wide status ("how's the convoy going?", "what are the agents working on?")
- Request work delegation ("can you start working on the auth module?", "sling this to a polecat")
- Ask about other agents ("what is Toast doing?", "is the refinery busy?")
- Ask general questions about the codebase or project direction ("what's the architecture of this repo?", "what should we work on next?")
- Use language that implies talking to a coordinator ("can you assign...", "create a convoy for...", "what's the plan for...")
What NOT to flag
Messages that are clearly directed at the current agent's work:
- Feedback on the agent's current bead ("that's wrong, the endpoint should be POST not GET")
- Instructions related to the hooked bead ("also add tests for the edge case")
- Rework requests ("the build is failing, fix the import")
- Direct questions about what the agent is doing ("what are you working on?", "show me the diff")
Prompt addition
Add a section to both buildPolecatSystemPrompt and buildRefinerySystemPrompt:
## Misdirected Messages
If you receive a message that seems intended for the Mayor � asking about town status,
requesting work delegation, asking about other agents, or asking broad project questions �
do not try to act on it. Instead, respond with something like:
"It looks like this message might have been meant for the Mayor � I'm just a polecat
working on [current bead title]. I can mail the Mayor for you if you'd like, or you
can switch to the Mayor tab and talk to him directly."
If the user confirms they meant to talk to you, proceed normally. If they ask you to
forward it, use gt_mail_send to send the message to the Mayor.
Acceptance Criteria
- Polecat system prompt includes misdirected message detection guidance
- Refinery system prompt includes misdirected message detection guidance
- Agent identifies the current bead in its redirect response (so the user has context on which agent they're talking to)
- Agent offers to mail the Mayor as an option (via
gt_mail_send) - Agent does not flag legitimate work-related messages as misdirected
Notes
- No data migration needed � cloud Gastown hasn't deployed to production
- This is prompt-only � no new tools, endpoints, or schema changes
- The detection is heuristic (LLM judgment), not rule-based. False positives are preferable to false negatives � it's better to ask "did you mean the Mayor?" than to silently misinterpret a misdirected message as a work instruction
- The refinery prompt should be even more aggressive about detection since users almost never need to directly message the refinery