A dataset-backed supervisor for coding agents. It watches for the agent stopping prematurely — asking permission for work it could just do, listing "next steps" then halting, or stalling mid-task — and re-prompts it to finish. Always-on, low-risk (it approves legitimate stops and waits), and it learns from your own sessions.
Successor to the (now archived) dzianisv/opencode-plugins reflection plugin.
Works on two runtimes from one shared core:
| Runtime | Mechanism | Surface |
|---|---|---|
| Claude Code | Stop hook (bin/on-stop.mjs) |
plugin + /supervisor:train, /supervisor:status, /supervisor:goal, /supervisor:retry |
| OpenCode | session.idle event (opencode/supervisor.ts) |
tools supervisor, set_supervisor, supervisor_train, supervisor_goal, supervisor_retry + /supervisor, /supervisor:train, /supervisor:goal, /supervisor:retry |
The classification taxonomy (6 categories + mined anti-patterns) and its feedback
templates live in a single source of truth, core/patterns.json,
so both runtimes — and the trainer — share one brain.
/plugin marketplace add dzianisv/agents-supervisor
/plugin install supervisor
Done — the Stop hook is active immediately. Then:
/supervisor:status— show effective patterns + recent verdicts/supervisor:train— learn from your sessions (updates your local patterns)
Switch off / on:
# whole plugin (all sessions)
claude plugin disable supervisor@agents-supervisor
claude plugin enable supervisor@agents-supervisor
# this session only
echo "$(cat .supervisor/current_session)" >> .supervisor/disabled # off
grep -v "$(cat .supervisor/current_session)" .supervisor/disabled > .supervisor/disabled.tmp && mv .supervisor/disabled.tmp .supervisor/disabled # on
(Local dev install: /plugin marketplace add /path/to/agents-supervisor instead of the GitHub slug.)
Add to opencode.json:
{ "$schema": "https://opencode.ai/config.json", "plugin": ["opencode-supervisor"] }or drop opencode/supervisor.ts into ~/.config/opencode/plugin/. Then:
/supervisor— status ·/supervisor:train— learn (web app: call thesupervisor_traintool; file commands don't expand in the web UI)
Switch off / on:
/supervisor off # disable for this session
/supervisor on # re-enable
Whole plugin off: remove "opencode-supervisor" from opencode.json, or launch with opencode --pure.
Beyond catching premature stops, you can point the supervisor at a goal it must demonstrably meet before it lets the agent stop — it keeps re-prompting (up to a retry budget) until the goal's met or the budget runs out.
/supervisor:goal all tests in test/auth pass and the PR is open with green CI
/supervisor:goal # check status (condition, attempts, last reason)
/supervisor:goal clear # clear (aliases: stop, off, reset, none, cancel)
/supervisor:retry 24 # set this session's retry budget (1–100, default 16)
The goal is injected into the judge as a mandatory completion requirement — the
agent is not allowed to stop until the goal is met (or the budget exhausts). While a
goal is active the retry budget rises to 16 (vs 3 normally). State is per-session at
.supervisor/goals/<sessionId>.json (mode 0600); budget is spent only when a
continuation actually fires.
- Claude Code:
/supervisor:goal//supervisor:retryskills (theStophook enforces it). - OpenCode: same commands in the TUI; in the web app call the
supervisor_goal/supervisor_retrytools.
Configurable rubric (OpenCode): drop a .supervisor/rubric.md (or
~/.config/opencode/supervisor/rubric.md) with ## Patterns / ## Antipatterns
sections to override the judge's rubric; otherwise the shipped default is used.
(Claude Code reads its rubric from core/patterns.json ⊕ your user-local patterns.)
A judge LLM classifies each stop into one of:
complete, waiting_for_user_legitimate, tool_available_punt,
summary_drift_stop, genuinely_stuck, working. Only the middle three inject a
continuation nudge (escalating over up to 3 attempts); the rest are left alone.
The anti-pattern rules that sharpen these (permission-seeking, stopped-with-todos,
false-complete, legitimate-stop) were mined from real agent stops where the user
had to reply.
/supervisor:train # mine last 14d, update your local patterns
/supervisor:train --since=30d
/supervisor:train --dry-run # preview the pattern diff, write nothing
/supervisor:train --push-hf # also archive the private dataset to HuggingFace
It mines agent stopped → user followed up pairs from your OpenCode DBs and Claude
transcripts, derives refreshed anti-pattern weights + provenance, and writes them to
your user-local patterns file:
~/.config/agents-supervisor/patterns.json # learned overrides (deep-merged over shipped)
Guarantees:
- Never commits to this repo / upstream — learning is user-side only.
- Dataset stays private — mined data lands in
.dataset/(git-ignored) and, with--push-hf, a private HuggingFace dataset repo ($SUPERVISOR_HF_DATASET, defaultdzianisv/agent-supervisor-stops). Never in git. - A
.bakof the prior patterns is kept; revert with the printed command.
First --push-hf run needs hf auth login and pip install -U huggingface_hub.
core/patterns.mjs deep-merges, later overriding earlier:
- shipped
core/patterns.json(read-only defaults) ~/.config/agents-supervisor/patterns.json(user, written by train)<project>/.supervisor/patterns.json(project,--scope=project)
npm test # unit tests (core + hook + train derivation), node:test
npm run test:cc # Claude Code end-to-end (real claude -p, no mocks)
npm run eval # OpenCode judge eval (promptfoo)
MIT © dzianisv