Autonomous agent framework. Give it a goal; it decomposes, executes, learns, and reports. No hand-holding required.
Works standalone or alongside OpenClaw, Telegram, Slack, or any other interface you wire in.
Status: personal infrastructure / active development. This is a working system, not a polished library. APIs change, features are added fast, and some things are still sharp edges. It runs continuously on a headless Ubuntu box and gets iterated on daily. If you're reading this and it seems useful, it probably is — just go in eyes open.
- Python 3.10+ (tested on 3.12–3.14)
- Linux or macOS (Linux preferred for always-on deployments)
- At least one LLM API key:
ANTHROPIC_API_KEY,OPENROUTER_API_KEY, orOPENAI_API_KEY - Optional:
claudeCLI (Claude Code),ghCLI (GitHub), Telegram bot token
- Autonomous loops: goal → plan → execute steps → done|stuck, with stuck detection, roadblock recovery, and progress logging
- Multi-agent delegation: Director plans, Workers execute (research / build / ops), Inspector validates — no Worker grades its own output
- Persistent memory: lessons extracted from every run, injected into future prompts; tiered decay (short/medium/long); spaced repetition
- Self-improvement: meta-evolver reviews failure patterns every 10 minutes and proposes prompt/guardrail/skill changes
- Skill library: reusable step patterns extracted from successful runs; scored, tested, and promoted automatically
- Interface-agnostic: Telegram, Slack, CLI, or call
run_agent_loop()directly from Python — same behavior regardless of how a goal arrives - Token-efficient research: pre-fetch layer intercepts URLs before LLM calls, uses Jina Reader for clean markdown, authenticated X/Twitter access via CLI
- Cost reporting: summarize
memory/step-costs.jsonlinto grouped latency/token/cost tables instead of eyeballing raw JSONL
flowchart TD
IN["Goal arrives\nTelegram / Slack / CLI / Python API"]
IN --> H[handle.py\nNOW / AGENDA classification]
H --> NOW["NOW lane\n1-shot response"]
H --> AL[agent_loop.py\nAutonomous executor]
subgraph LOOP["Agent Loop"]
AL --> RC["Rules check\nrules.jsonl — zero-cost match"]
RC -->|hit| STEPS
RC -->|miss| DC["_decompose()\nLLM step planner"]
ID["poe_self.py\nidentity block"] -->|prepend| DC
DC --> STEPS["Steps queue\nparallel fan-out where independent"]
STEPS --> WK["Workers\nresearch · build · ops · reporter"]
WK --> INS[Inspector\nvalidates output]
INS -->|pass| REC["Record outcome\noutcomes.jsonl"]
INS -->|fail| WK
REC --> CKP["checkpoint.py\nper-step JSON state"]
CKP --> MEM["Memory\nlessons · rules · decisions · skills"]
end
MEM --> EV["Evolver\nmeta-improvement every ~10 heartbeats"]
EV -->|new skill| MEM
EV -->|guardrail| CON[constraint.py\nHITL gating]
LOOP --> CEO["poe.py — CEO layer\ndistil + report"]
NOW --> CEO
CEO --> OUT["Telegram / Slack / stdout"]
HB["Heartbeat\nsheriff · mission drain\nmorning briefing"] -.->|monitors| LOOP
All share one interface: LLMAdapter.complete(messages, tools) → LLMResponse
| Backend | When active |
|---|---|
AnthropicSDKAdapter |
ANTHROPIC_API_KEY set |
ClaudeSubprocessAdapter |
claude binary in PATH (Claude Code OAuth) |
OpenRouterAdapter |
OPENROUTER_API_KEY set |
OpenAIAdapter |
OPENAI_API_KEY set |
CodexCLIAdapter |
codex binary available (ChatGPT OAuth) |
build_adapter("auto") selects the best available backend. MODEL_CHEAP/MID/POWER abstract model names across backends.
# 1. Clone and install
git clone https://github.com/slycrel/openclaw-orchestration.git
cd openclaw-orchestration
pip install -e ".[dev]"
# 2. Set your API key (at minimum, one of these)
export ANTHROPIC_API_KEY=sk-ant-...
# or: export OPENROUTER_API_KEY=...
# or: export OPENAI_API_KEY=...
# 3. Bootstrap workspace (creates ~/.poe/workspace/, systemd services)
python3 src/cli.py poe-bootstrap install
# 4. Run your first goal
PYTHONPATH=src python3 -m handle "what time is it in Tokyo?" # quick answer (NOW lane)
PYTHONPATH=src python3 -m handle "research the top 3 LLM frameworks" # multi-step (AGENDA lane)
# Or use the autonomous loop directly
python3 src/agent_loop.py "research winning polymarket strategies"No OpenClaw installation required. Set POE_WORKSPACE to any directory to use a custom workspace root.
# Telegram listener (requires TELEGRAM_BOT_TOKEN)
python3 src/telegram_listener.py # run forever
python3 src/telegram_listener.py --once # process pending and exit
# System health
python3 src/cli.py sheriff health
python3 src/cli.py poe-observe
# Memory status
python3 src/cli.py memory context
python3 src/cli.py poe-memory statusStructured logging via stdlib logging. All loggers live under the poe.* namespace.
# Quiet (default) — only warnings and errors
python3 src/agent_loop.py "your goal"
# Step lifecycle, timing, tokens, block reasons
POE_LOG_LEVEL=INFO python3 src/agent_loop.py "your goal"
# Full detail — constraint checks, adapter type, content lengths
POE_LOG_LEVEL=DEBUG python3 src/agent_loop.py "your goal"The --verbose CLI flag is equivalent to POE_LOG_LEVEL=DEBUG. Output goes to stderr so it doesn't interfere with result output.
# Summarize step telemetry
poe-tool-costs --metrics memory/step-costs.jsonl
# Write markdown + JSON reports
poe-tool-costs \
--metrics memory/step-costs.jsonl \
--write-report output/benchmarks/tool-cost-report-live.md \
--write-json output/benchmarks/tool-cost-report-live.json
# Run fixture benchmarks
poe-tool-costs --run-fixtures --fixtures benchmarks/fixture-workloads.json --output-dir output/benchmarks
# Backend benchmarks (memory append/read, filtered lookup, concurrent contention)
poe-benchmark --slice memory-backend --output-dir output/benchmarks
poe-benchmark --slice memory-backend-filtered-lookup --output-dir output/benchmarks
poe-benchmark --slice memory-backend-append-contention --output-dir output/benchmarks --workers 2 4Reports include: task class grouping, ok/error split, median/p95 latency and tokens, total cost, and contention analysis.
| Logger | What it covers |
|---|---|
poe.loop |
Step start/done/blocked, adapter timing, USD cost per step, loop lifecycle |
poe.planner |
Multi-plan decomposition, dependency graph, execution levels |
poe.persona |
Persona spawn, adapter resolution, spawn completion |
poe.evolver |
Meta-evolution cycles, suggestion apply, skill synthesis |
poe.introspect |
Failure diagnosis, lens analysis, recovery planning |
Deploy deploy/poe-telegram.service to listen 24/7:
sudo cp deploy/poe-telegram.service /etc/systemd/system/
sudo systemctl enable --now poe-telegramSlash commands:
| Command | What it does |
|---|---|
/status |
System health, heartbeat, stuck projects |
/research <goal or URL> |
Autonomous research loop with live step progress |
/director <directive> |
Full Director/Worker pipeline |
/build <goal> |
Build worker |
/ops <command> |
Ops worker |
/map |
Goal relationship map |
/ancestry <project> |
Goal ancestry chain |
/stop |
Stop running loop |
/help |
Command list |
Natural language is auto-routed (NOW = fast, AGENDA = multi-step loop). Messages during an active loop are routed as interrupts.
Mirror of the Telegram interface using Socket Mode (no public endpoint):
pip install slack-sdk
export SLACK_BOT_TOKEN=xoxb-... SLACK_APP_TOKEN=xapp-...
python3 src/slack_listener.pyfrom agent_loop import run_agent_loop
result = run_agent_loop(
"research the three main benefits of prediction markets",
project="polymarket-research",
step_callback=lambda n, text, summary, status: print(f"step {n}: {summary}"),
)
print(result.summary())# Heartbeat (health + meta-evolver, 60s interval)
sudo cp deploy/poe-heartbeat.service /etc/systemd/system/
sudo systemctl enable --now poe-heartbeat
# Inspector (quality validation, runs every 20 heartbeat ticks)
sudo cp deploy/poe-inspector.service /etc/systemd/system/
sudo systemctl enable --now poe-inspectorHeartbeat recovery tiers:
- Scripted: disk warn, API key missing, gateway down → log suggestion
- LLM diagnosis: stuck projects → cheap LLM recovery action
- Telegram escalation: critical health → alert Jeremy
Credentials are read in priority order:
- Environment variables:
ANTHROPIC_API_KEY,OPENROUTER_API_KEY,OPENAI_API_KEY,TELEGRAM_BOT_TOKEN $POE_ENV_FILEor<workspace>/secrets/.env~/.openclaw/openclaw.json(OpenClaw config, if present)
Workspace root resolves as: POE_WORKSPACE → OPENCLAW_WORKSPACE → WORKSPACE_ROOT → ~/.poe/workspace
OpenClaw is fully optional. The system runs standalone on any machine with Python 3.10+ and a Claude/OpenAI API key.
The workspace (~/.poe/workspace/ by default) holds all runtime state, learning data, and self-evolved artifacts. It is not checked into git — the repo ships defaults, and the workspace accumulates improvements over time.
~/.poe/workspace/
├── memory/ # Outcomes, lessons, knowledge nodes, captain's log, diagnoses
├── skills/ # Self-created/evolved skill .md files (override repo defaults)
├── personas/ # Self-created/evolved persona specs (override repo defaults)
├── playbook.md # Director's operational wisdom (auto-maintained by evolver)
├── output/ # Run artifacts, operator status, research outputs
├── projects/ # Per-project NEXT.md, decisions, risks
├── config.yml # Workspace-level config overrides
└── secrets/
└── .env # API keys (auto-discovered by config.py)
Resolution order for skills and personas: workspace → repo. When the system evolves a better version of a shipped skill or persona, the workspace version wins. Repo versions are the shipped defaults.
Two-tier YAML config (like git's ~/.gitconfig vs .git/config):
| File | Scope | What goes here |
|---|---|---|
~/.poe/config.yml |
User-level | API keys, model prefs, yolo mode, notifications |
~/.poe/workspace/config.yml |
Workspace-level | Evolver, inspector thresholds, constraint settings |
Workspace inherits from user; workspace keys override. Access in code: from config import get; get("inspector.breach_threshold", 0.30)
| Module | What it does |
|---|---|
agent_loop.py |
Autonomous loop: decompose goal → execute steps → done|stuck |
llm.py |
Platform-agnostic LLM adapters (Anthropic, OpenRouter, OpenAI, subprocess, Codex) |
web_fetch.py |
URL pre-fetch: Jina Reader clean markdown, X/Twitter auth, t.co resolution |
memory.py |
Outcome recording, lesson extraction, tiered decay, Reflexion injection |
skills.py |
Reusable step patterns: extract, score, test-gate, promote |
persona.py |
Composable agent identities (researcher, builder, ops, companion, psyche-researcher) |
hooks.py |
Pluggable callbacks at step/loop/mission level |
poe.py |
CEO layer: distill active missions → executive summary |
handle.py |
Entry point: classify intent → route → execute → respond |
telegram_listener.py |
Telegram polling, slash commands, ack+edit UX, live step progress |
slack_listener.py |
Slack Socket Mode, mirrors Telegram commands |
director.py |
Director: plan → delegate to workers → review output |
workers.py |
Worker agents: research, build, ops, general |
sheriff.py |
Loop Sheriff: detect stuck loops, system health checks |
heartbeat.py |
Periodic health + tiered recovery + Telegram escalation |
evolver.py |
Meta-evolver: analyze outcomes → propose improvements |
inspector.py |
Quality agent: friction detection, alignment scoring, evolver feed |
mission.py |
Mission hierarchy: Mission → Milestone → Feature → Worker Session |
ancestry.py |
Goal ancestry chain: parent_id, ancestry.json, prompt injection |
metrics.py |
Success rate, cost, token usage per task type; pass@k / pass^k |
constraint.py |
Pre-execution action validator: 5 pattern groups (destructive/secret/path/network/exec), HIGH blocks, MEDIUM warns, pluggable registry |
security.py |
Prompt injection detection on external content (pre-loop scanning) |
config.py |
Workspace resolution, credential discovery, env var priority |
bootstrap.py |
poe-bootstrap install: dirs, services, smoke test |
orch.py |
Core file-first state: NEXT.md tasks, run records, project lifecycle |
poe_self.py |
Persistent identity block: load_poe_identity(), with_poe_identity() — injected into every decompose call |
checkpoint.py |
Per-step loop checkpointing: write_checkpoint(), resume_from(), delete_checkpoint() — enables loop resume |
claim_verifier.py |
Hallucination detection: file-path and Python symbol existence checking on step results; annotate_result() surfaces NOT_FOUND / SYMBOL_CLAIMS_NOT_FOUND |
Run completes
→ memory.py records outcome + extracts 1-3 lessons
→ tiered JSONL: short (session) / medium (weeks) / long (months)
→ decay applied daily; lessons promoted on score + reuse threshold
Promotion cycle (three-tier):
→ observe_pattern(lesson) → hypothesis (1 confirmation)
→ observe_pattern(lesson) again → StandingRule promoted (2+ confirmations)
→ contradict_pattern(lesson) → demotes hypothesis if contradictions > confirmations
→ inject_standing_rules() → applied unconditionally to every decompose call
Decision journal:
→ record_decision(decision, rationale, alternatives) on architectural choices
→ search_decisions(goal) → TF-IDF ranked relevant priors injected before planning
→ prevents re-litigating settled decisions
Every 10 heartbeat ticks (~10 min):
→ evolver analyzes last 50 outcomes
→ identifies failure patterns
→ generates suggestions: prompt_tweak | new_guardrail | skill_pattern
Next run with similar task:
→ inject_standing_rules() — unconditional rules prepended
→ inject_decisions(goal) — relevant prior decisions appended
→ inject_tiered_lessons() — ranked lessons (long-tier first)
→ ancestry context loaded
→ router.py picks skills by predicted success probability (not just keyword match)
→ TF-IDF fallback when router not trained — relevance-ranked, not just keyword substring
Constraint harness (constraint.py) — fires before every step execution, no LLM round-trip required:
- Blocks destructive patterns (
rm -rf,DROP TABLE,format /) - Blocks secret exposure (
/etc/passwd,~/.ssh/, env dumps) - Blocks path escape (writes outside workspace)
- Warns on unsafe network ops and shell exec patterns
Skill circuit breaker — distinguishes a network blip from a broken skill:
- 1-2 failures → circuit stays CLOSED, no action (blip tolerance)
- 3+ consecutive failures → circuit OPEN → skill queued for LLM rewrite
- After rewrite → HALF_OPEN (probationary, needs 2 successes to close)
- Failure during HALF_OPEN → immediately back to OPEN
Prompt injection detection (security.py) — scans external content before it enters the agent context.
# Run tests (4278 passing, all LLM calls mocked)
python3 -m pytest tests/ -q
# Dry-run (no LLM calls)
python3 src/agent_loop.py "test goal" --dry-run --verbose
python3 src/cli.py poe-heartbeat --dry-run
python3 src/cli.py poe-eval --dry-run- OpenClaw: reads
~/.openclaw/openclaw.jsonfor credentials/tokens; can coordinate via OpenClaw gateway (src/gateway.py) - Telegram: first-class interface via Bot API polling
- Slack: Socket Mode, no public endpoint needed
- macOS + Linux:
bootstrap.pygenerates systemd (Linux) or launchd (macOS) service files - Docker:
Dockerfile+docker-compose.ymlfor isolated deployment