Skip to content

Detect misdirected messages to polecats/refinery â redirect to Mayor #995

@jrf0110

Description

@jrf0110

Parent

Part of #204 (Phase 4: Hardening)

Problem

Users frequently send messages to the wrong agent � asking a polecat or refinery agent a question meant for the Mayor. The terminal tabs for agents and the Mayor are visually similar, and it's easy to send a message to whichever tab happens to be focused. The polecat either ignores the message, tries to interpret it as a work instruction, or responds confusingly.

Solution

Add a detection heuristic to polecat and refinery system prompts that identifies when an incoming message looks like it was meant for the Mayor, and responds with a helpful redirect instead of trying to act on it.

What to detect

Messages that:

  • Ask about town-wide status ("how's the convoy going?", "what are the agents working on?")
  • Request work delegation ("can you start working on the auth module?", "sling this to a polecat")
  • Ask about other agents ("what is Toast doing?", "is the refinery busy?")
  • Ask general questions about the codebase or project direction ("what's the architecture of this repo?", "what should we work on next?")
  • Use language that implies talking to a coordinator ("can you assign...", "create a convoy for...", "what's the plan for...")

What NOT to flag

Messages that are clearly directed at the current agent's work:

  • Feedback on the agent's current bead ("that's wrong, the endpoint should be POST not GET")
  • Instructions related to the hooked bead ("also add tests for the edge case")
  • Rework requests ("the build is failing, fix the import")
  • Direct questions about what the agent is doing ("what are you working on?", "show me the diff")

Prompt addition

Add a section to both buildPolecatSystemPrompt and buildRefinerySystemPrompt:

## Misdirected Messages

If you receive a message that seems intended for the Mayor � asking about town status, 
requesting work delegation, asking about other agents, or asking broad project questions � 
do not try to act on it. Instead, respond with something like:

"It looks like this message might have been meant for the Mayor � I'm just a polecat 
working on [current bead title]. I can mail the Mayor for you if you'd like, or you 
can switch to the Mayor tab and talk to him directly."

If the user confirms they meant to talk to you, proceed normally. If they ask you to 
forward it, use gt_mail_send to send the message to the Mayor.

Acceptance Criteria

  • Polecat system prompt includes misdirected message detection guidance
  • Refinery system prompt includes misdirected message detection guidance
  • Agent identifies the current bead in its redirect response (so the user has context on which agent they're talking to)
  • Agent offers to mail the Mayor as an option (via gt_mail_send)
  • Agent does not flag legitimate work-related messages as misdirected

Notes

  • No data migration needed â�� cloud Gastown hasn't deployed to production
  • This is prompt-only â�� no new tools, endpoints, or schema changes
  • The detection is heuristic (LLM judgment), not rule-based. False positives are preferable to false negatives â�� it's better to ask "did you mean the Mayor?" than to silently misinterpret a misdirected message as a work instruction
  • The refinery prompt should be even more aggressive about detection since users almost never need to directly message the refinery

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Post-launchenhancementNew feature or requestgt:containerContainer management, agent processes, SDK, heartbeatkilo-auto-fixAuto-generated label by Kilokilo-triagedAuto-generated label by Kilo

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions