Poe Orchestration

Autonomous agent framework. Give it a goal; it decomposes, executes, learns, and reports. No hand-holding required.

Works standalone or alongside OpenClaw, Telegram, Slack, or any other interface you wire in.

Status: personal infrastructure / active development. This is a working system, not a polished library. APIs change, features are added fast, and some things are still sharp edges. It runs continuously on a headless Ubuntu box and gets iterated on daily. If you're reading this and it seems useful, it probably is — just go in eyes open.

Prerequisites

Python 3.10+ (tested on 3.12–3.14)
Linux or macOS (Linux preferred for always-on deployments)
At least one LLM API key: ANTHROPIC_API_KEY, OPENROUTER_API_KEY, or OPENAI_API_KEY
Optional: claude CLI (Claude Code), gh CLI (GitHub), Telegram bot token

What it does

Autonomous loops: goal → plan → execute steps → done|stuck, with stuck detection, roadblock recovery, and progress logging
Multi-agent delegation: Director plans, Workers execute (research / build / ops), Inspector validates — no Worker grades its own output
Persistent memory: lessons extracted from every run, injected into future prompts; tiered decay (short/medium/long); spaced repetition
Self-improvement: meta-evolver reviews failure patterns every 10 minutes and proposes prompt/guardrail/skill changes
Skill library: reusable step patterns extracted from successful runs; scored, tested, and promoted automatically
Interface-agnostic: Telegram, Slack, CLI, or call run_agent_loop() directly from Python — same behavior regardless of how a goal arrives
Token-efficient research: pre-fetch layer intercepts URLs before LLM calls, uses Jina Reader for clean markdown, authenticated X/Twitter access via CLI
Cost reporting: summarize memory/step-costs.jsonl into grouped latency/token/cost tables instead of eyeballing raw JSONL

Architecture

flowchart TD
    IN["Goal arrives\nTelegram / Slack / CLI / Python API"]
    IN --> H[handle.py\nNOW / AGENDA classification]

    H --> NOW["NOW lane\n1-shot response"]
    H --> AL[agent_loop.py\nAutonomous executor]

    subgraph LOOP["Agent Loop"]
        AL --> RC["Rules check\nrules.jsonl — zero-cost match"]
        RC -->|hit| STEPS
        RC -->|miss| DC["_decompose()\nLLM step planner"]
        ID["poe_self.py\nidentity block"] -->|prepend| DC
        DC --> STEPS["Steps queue\nparallel fan-out where independent"]

        STEPS --> WK["Workers\nresearch · build · ops · reporter"]
        WK --> INS[Inspector\nvalidates output]
        INS -->|pass| REC["Record outcome\noutcomes.jsonl"]
        INS -->|fail| WK
        REC --> CKP["checkpoint.py\nper-step JSON state"]
        CKP --> MEM["Memory\nlessons · rules · decisions · skills"]
    end

    MEM --> EV["Evolver\nmeta-improvement every ~10 heartbeats"]
    EV -->|new skill| MEM
    EV -->|guardrail| CON[constraint.py\nHITL gating]

    LOOP --> CEO["poe.py — CEO layer\ndistil + report"]
    NOW --> CEO
    CEO --> OUT["Telegram / Slack / stdout"]

    HB["Heartbeat\nsheriff · mission drain\nmorning briefing"] -.->|monitors| LOOP

LLM backends (`llm.py`)

All share one interface: LLMAdapter.complete(messages, tools) → LLMResponse

Backend	When active
`AnthropicSDKAdapter`	`ANTHROPIC_API_KEY` set
`ClaudeSubprocessAdapter`	`claude` binary in PATH (Claude Code OAuth)
`OpenRouterAdapter`	`OPENROUTER_API_KEY` set
`OpenAIAdapter`	`OPENAI_API_KEY` set
`CodexCLIAdapter`	`codex` binary available (ChatGPT OAuth)

build_adapter("auto") selects the best available backend. MODEL_CHEAP/MID/POWER abstract model names across backends.

Quickstart

# 1. Clone and install
git clone https://github.com/slycrel/openclaw-orchestration.git
cd openclaw-orchestration
pip install -e ".[dev]"

# 2. Set your API key (at minimum, one of these)
export ANTHROPIC_API_KEY=sk-ant-...
# or: export OPENROUTER_API_KEY=...
# or: export OPENAI_API_KEY=...

# 3. Bootstrap workspace (creates ~/.poe/workspace/, systemd services)
python3 src/cli.py poe-bootstrap install

# 4. Run your first goal
PYTHONPATH=src python3 -m handle "what time is it in Tokyo?"          # quick answer (NOW lane)
PYTHONPATH=src python3 -m handle "research the top 3 LLM frameworks"  # multi-step (AGENDA lane)

# Or use the autonomous loop directly
python3 src/agent_loop.py "research winning polymarket strategies"

No OpenClaw installation required. Set POE_WORKSPACE to any directory to use a custom workspace root.

More commands

# Telegram listener (requires TELEGRAM_BOT_TOKEN)
python3 src/telegram_listener.py           # run forever
python3 src/telegram_listener.py --once    # process pending and exit

# System health
python3 src/cli.py sheriff health
python3 src/cli.py poe-observe

# Memory status
python3 src/cli.py memory context
python3 src/cli.py poe-memory status

Logging

Structured logging via stdlib logging. All loggers live under the poe.* namespace.

# Quiet (default) — only warnings and errors
python3 src/agent_loop.py "your goal"

# Step lifecycle, timing, tokens, block reasons
POE_LOG_LEVEL=INFO python3 src/agent_loop.py "your goal"

# Full detail — constraint checks, adapter type, content lengths
POE_LOG_LEVEL=DEBUG python3 src/agent_loop.py "your goal"

The --verbose CLI flag is equivalent to POE_LOG_LEVEL=DEBUG. Output goes to stderr so it doesn't interfere with result output.

Benchmarking and cost reporting

# Summarize step telemetry
poe-tool-costs --metrics memory/step-costs.jsonl

# Write markdown + JSON reports
poe-tool-costs \
  --metrics memory/step-costs.jsonl \
  --write-report output/benchmarks/tool-cost-report-live.md \
  --write-json output/benchmarks/tool-cost-report-live.json

# Run fixture benchmarks
poe-tool-costs --run-fixtures --fixtures benchmarks/fixture-workloads.json --output-dir output/benchmarks

# Backend benchmarks (memory append/read, filtered lookup, concurrent contention)
poe-benchmark --slice memory-backend --output-dir output/benchmarks
poe-benchmark --slice memory-backend-filtered-lookup --output-dir output/benchmarks
poe-benchmark --slice memory-backend-append-contention --output-dir output/benchmarks --workers 2 4

Reports include: task class grouping, ok/error split, median/p95 latency and tokens, total cost, and contention analysis.

Logger	What it covers
`poe.loop`	Step start/done/blocked, adapter timing, USD cost per step, loop lifecycle
`poe.planner`	Multi-plan decomposition, dependency graph, execution levels
`poe.persona`	Persona spawn, adapter resolution, spawn completion
`poe.evolver`	Meta-evolution cycles, suggestion apply, skill synthesis
`poe.introspect`	Failure diagnosis, lens analysis, recovery planning

Interfaces

Telegram

Deploy deploy/poe-telegram.service to listen 24/7:

sudo cp deploy/poe-telegram.service /etc/systemd/system/
sudo systemctl enable --now poe-telegram

Slash commands:

Command	What it does
`/status`	System health, heartbeat, stuck projects
`/research <goal or URL>`	Autonomous research loop with live step progress
`/director <directive>`	Full Director/Worker pipeline
`/build <goal>`	Build worker
`/ops <command>`	Ops worker
`/map`	Goal relationship map
`/ancestry <project>`	Goal ancestry chain
`/stop`	Stop running loop
`/help`	Command list

Natural language is auto-routed (NOW = fast, AGENDA = multi-step loop). Messages during an active loop are routed as interrupts.

Slack

Mirror of the Telegram interface using Socket Mode (no public endpoint):

pip install slack-sdk
export SLACK_BOT_TOKEN=xoxb-... SLACK_APP_TOKEN=xapp-...
python3 src/slack_listener.py

Python API

from agent_loop import run_agent_loop

result = run_agent_loop(
    "research the three main benefits of prediction markets",
    project="polymarket-research",
    step_callback=lambda n, text, summary, status: print(f"step {n}: {summary}"),
)
print(result.summary())

Always-on services

# Heartbeat (health + meta-evolver, 60s interval)
sudo cp deploy/poe-heartbeat.service /etc/systemd/system/
sudo systemctl enable --now poe-heartbeat

# Inspector (quality validation, runs every 20 heartbeat ticks)
sudo cp deploy/poe-inspector.service /etc/systemd/system/
sudo systemctl enable --now poe-inspector

Heartbeat recovery tiers:

Scripted: disk warn, API key missing, gateway down → log suggestion
LLM diagnosis: stuck projects → cheap LLM recovery action
Telegram escalation: critical health → alert Jeremy

Configuration

Credentials are read in priority order:

Environment variables: ANTHROPIC_API_KEY, OPENROUTER_API_KEY, OPENAI_API_KEY, TELEGRAM_BOT_TOKEN
$POE_ENV_FILE or <workspace>/secrets/.env
~/.openclaw/openclaw.json (OpenClaw config, if present)

Workspace root resolves as: POE_WORKSPACE → OPENCLAW_WORKSPACE → WORKSPACE_ROOT → ~/.poe/workspace

OpenClaw is fully optional. The system runs standalone on any machine with Python 3.10+ and a Claude/OpenAI API key.

Workspace layout

The workspace (~/.poe/workspace/ by default) holds all runtime state, learning data, and self-evolved artifacts. It is not checked into git — the repo ships defaults, and the workspace accumulates improvements over time.

~/.poe/workspace/
├── memory/           # Outcomes, lessons, knowledge nodes, captain's log, diagnoses
├── skills/           # Self-created/evolved skill .md files (override repo defaults)
├── personas/         # Self-created/evolved persona specs (override repo defaults)
├── playbook.md       # Director's operational wisdom (auto-maintained by evolver)
├── output/           # Run artifacts, operator status, research outputs
├── projects/         # Per-project NEXT.md, decisions, risks
├── config.yml        # Workspace-level config overrides
└── secrets/
    └── .env          # API keys (auto-discovered by config.py)

Resolution order for skills and personas: workspace → repo. When the system evolves a better version of a shipped skill or persona, the workspace version wins. Repo versions are the shipped defaults.

Two-tier YAML config (like git's ~/.gitconfig vs .git/config):

File	Scope	What goes here
`~/.poe/config.yml`	User-level	API keys, model prefs, yolo mode, notifications
`~/.poe/workspace/config.yml`	Workspace-level	Evolver, inspector thresholds, constraint settings

Workspace inherits from user; workspace keys override. Access in code: from config import get; get("inspector.breach_threshold", 0.30)

Source modules

Module	What it does
`agent_loop.py`	Autonomous loop: decompose goal → execute steps → done\|stuck
`llm.py`	Platform-agnostic LLM adapters (Anthropic, OpenRouter, OpenAI, subprocess, Codex)
`web_fetch.py`	URL pre-fetch: Jina Reader clean markdown, X/Twitter auth, t.co resolution
`memory.py`	Outcome recording, lesson extraction, tiered decay, Reflexion injection
`skills.py`	Reusable step patterns: extract, score, test-gate, promote
`persona.py`	Composable agent identities (researcher, builder, ops, companion, psyche-researcher)
`hooks.py`	Pluggable callbacks at step/loop/mission level
`poe.py`	CEO layer: distill active missions → executive summary
`handle.py`	Entry point: classify intent → route → execute → respond
`telegram_listener.py`	Telegram polling, slash commands, ack+edit UX, live step progress
`slack_listener.py`	Slack Socket Mode, mirrors Telegram commands
`director.py`	Director: plan → delegate to workers → review output
`workers.py`	Worker agents: research, build, ops, general
`sheriff.py`	Loop Sheriff: detect stuck loops, system health checks
`heartbeat.py`	Periodic health + tiered recovery + Telegram escalation
`evolver.py`	Meta-evolver: analyze outcomes → propose improvements
`inspector.py`	Quality agent: friction detection, alignment scoring, evolver feed
`mission.py`	Mission hierarchy: Mission → Milestone → Feature → Worker Session
`ancestry.py`	Goal ancestry chain: parent_id, ancestry.json, prompt injection
`metrics.py`	Success rate, cost, token usage per task type; pass@k / pass^k
`constraint.py`	Pre-execution action validator: 5 pattern groups (destructive/secret/path/network/exec), HIGH blocks, MEDIUM warns, pluggable registry
`security.py`	Prompt injection detection on external content (pre-loop scanning)
`config.py`	Workspace resolution, credential discovery, env var priority
`bootstrap.py`	`poe-bootstrap install`: dirs, services, smoke test
`orch.py`	Core file-first state: NEXT.md tasks, run records, project lifecycle
`poe_self.py`	Persistent identity block: `load_poe_identity()`, `with_poe_identity()` — injected into every decompose call
`checkpoint.py`	Per-step loop checkpointing: `write_checkpoint()`, `resume_from()`, `delete_checkpoint()` — enables loop resume
`claim_verifier.py`	Hallucination detection: file-path and Python symbol existence checking on step results; `annotate_result()` surfaces `NOT_FOUND` / `SYMBOL_CLAIMS_NOT_FOUND`

Memory and self-improvement

Run completes
    → memory.py records outcome + extracts 1-3 lessons
    → tiered JSONL: short (session) / medium (weeks) / long (months)
    → decay applied daily; lessons promoted on score + reuse threshold

Promotion cycle (three-tier):
    → observe_pattern(lesson) → hypothesis (1 confirmation)
    → observe_pattern(lesson) again → StandingRule promoted (2+ confirmations)
    → contradict_pattern(lesson) → demotes hypothesis if contradictions > confirmations
    → inject_standing_rules() → applied unconditionally to every decompose call

Decision journal:
    → record_decision(decision, rationale, alternatives) on architectural choices
    → search_decisions(goal) → TF-IDF ranked relevant priors injected before planning
    → prevents re-litigating settled decisions

Every 10 heartbeat ticks (~10 min):
    → evolver analyzes last 50 outcomes
    → identifies failure patterns
    → generates suggestions: prompt_tweak | new_guardrail | skill_pattern

Next run with similar task:
    → inject_standing_rules() — unconditional rules prepended
    → inject_decisions(goal) — relevant prior decisions appended
    → inject_tiered_lessons() — ranked lessons (long-tier first)
    → ancestry context loaded
    → router.py picks skills by predicted success probability (not just keyword match)
    → TF-IDF fallback when router not trained — relevance-ranked, not just keyword substring

Safety and reliability

Constraint harness (constraint.py) — fires before every step execution, no LLM round-trip required:

Blocks destructive patterns (rm -rf, DROP TABLE, format /)
Blocks secret exposure (/etc/passwd, ~/.ssh/, env dumps)
Blocks path escape (writes outside workspace)
Warns on unsafe network ops and shell exec patterns

Skill circuit breaker — distinguishes a network blip from a broken skill:

1-2 failures → circuit stays CLOSED, no action (blip tolerance)
3+ consecutive failures → circuit OPEN → skill queued for LLM rewrite
After rewrite → HALF_OPEN (probationary, needs 2 successes to close)
Failure during HALF_OPEN → immediately back to OPEN

Prompt injection detection (security.py) — scans external content before it enters the agent context.

Development

# Run tests (4278 passing, all LLM calls mocked)
python3 -m pytest tests/ -q

# Dry-run (no LLM calls)
python3 src/agent_loop.py "test goal" --dry-run --verbose
python3 src/cli.py poe-heartbeat --dry-run
python3 src/cli.py poe-eval --dry-run

Compatibility

OpenClaw: reads ~/.openclaw/openclaw.json for credentials/tokens; can coordinate via OpenClaw gateway (src/gateway.py)
Telegram: first-class interface via Bot API polling
Slack: Socket Mode, no public endpoint needed
macOS + Linux: bootstrap.py generates systemd (Linux) or launchd (macOS) service files
Docker: Dockerfile + docker-compose.yml for isolated deployment

Name		Name	Last commit message	Last commit date
Latest commit History 618 Commits
.github		.github
benchmarks		benchmarks
deploy/systemd		deploy/systemd
docs		docs
lat.md		lat.md
personas		personas
projects		projects
research		research
scripts		scripts
skills		skills
src		src
tests		tests
user		user
.coveragerc		.coveragerc
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
BACKLOG.md		BACKLOG.md
BACKLOG_DONE.md		BACKLOG_DONE.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
MAINLINE_PLAN.md		MAINLINE_PLAN.md
MILESTONES.md		MILESTONES.md
PREDICTION_MARKETS_RESEARCH_SUMMARY.md		PREDICTION_MARKETS_RESEARCH_SUMMARY.md
README.md		README.md
ROADMAP.md		ROADMAP.md
SOURCES.md		SOURCES.md
STEAL_LIST.md		STEAL_LIST.md
VISION.md		VISION.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Poe Orchestration

Prerequisites

What it does

Architecture

LLM backends (`llm.py`)

Quickstart

More commands

Logging

Benchmarking and cost reporting

Interfaces

Telegram

Slack

Python API

Always-on services

Configuration

Workspace layout

Source modules

Memory and self-improvement

Safety and reliability

Development

Compatibility

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Poe Orchestration

Prerequisites

What it does

Architecture

LLM backends (llm.py)

Quickstart

More commands

Logging

Benchmarking and cost reporting

Interfaces

Telegram

Slack

Python API

Always-on services

Configuration

Workspace layout

Source modules

Memory and self-improvement

Safety and reliability

Development

Compatibility

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

LLM backends (`llm.py`)

Packages