Gastown User-Scoped Admin Panel — Inspect & Intervene via GastownUserDO → TownDO

## Parent

Part of #204 (Phase 4: Hardening)

## Problem

When a Gastown user's town breaks — agents stall, beads get stuck, containers die, merges fail, credentials expire — there is currently **no way for a Kilo admin to diagnose or fix it** without SSH-level access to Cloudflare infrastructure. All state is locked inside Durable Object SQLite databases and KV storage, errors are scattered across `console.error` logs with no correlation, and intervention requires writing ad-hoc scripts against internal APIs.

This issue covers a **user-scoped** admin panel: given a specific user, look up their GastownUserDO, see their towns, and drill into any TownDO to inspect and intervene. Fleet-wide views (all towns across all users) are out of scope here and will be addressed alongside #228 (Observability) once a secondary index exists.

---

## User Failure Scenarios & Required Admin Capabilities

### 1. "My agent isn't doing anything" — Stuck Agent

**Root causes**: Container died mid-session, git clone failed (bad credentials, private repo), dispatch budget exhausted (5 attempts), polecat name pool exhausted, GUPP timeout, agent process crashed inside container.

**Admin needs**:
- See agent status timeline: `idle` → `working` → (stalled? dead?)
- See dispatch attempt history with error messages (currently only in `console.error`)
- See the agent's last SDK events (from AgentDO) — did it receive its prompt? did it call any tools? did the session error out?
- See container health: is the container running? when did it last restart? what's its process list?
- **Intervention**: Force-reset agent to `idle`, force-unhook from bead, force-kill agent process in container, retry dispatch

### 2. "My bead has been open forever" — Stuck Bead

**Root causes**: No idle agent available, all polecats stalled, dispatch keeps failing, bead assigned to dead agent, convoy dependency blocking it, review queue backed up.

**Admin needs**:
- See bead state history (bead_events timeline)
- See which agent is hooked (if any) and that agent's status
- See dependency graph: is this bead blocked by another? part of a convoy? waiting for review?
- See dispatch attempts against this bead specifically
- **Intervention**: Force-close bead, force-fail bead, reassign to different agent, manually unhook current agent, change bead priority

### 3. "Code won't merge" — Refinery Failure

**Root causes**: Merge conflict, git push credentials expired, refinery agent hallucinated bad PR URL, PR-strategy entry stuck waiting for webhook, review queue single-item bottleneck.

**Admin needs**:
- See review queue depth and processing rate
- See per-MR-bead status: who's reviewing, how long in `in_progress`, branch names, PR URL
- See refinery agent's reasoning (conversation/tool calls)
- See git operation logs: what merge commands ran, what failed, stderr output
- **Intervention**: Force-retry a failed review, force-close an MR bead, skip review and direct-merge, clear the review queue, change merge strategy on the fly

### 4. "Container keeps dying" — Container Instability

**Root causes**: OOM from too many concurrent agents, Cloudflare container platform issues, sleep/wake cycle problems, environment variable misconfiguration.

**Admin needs**:
- Container lifecycle timeline: start, stop (with exit code + reason), error events
- Container resource usage (memory, CPU if available from CF)
- Current process list inside container
- Environment variable inspection (redacted secrets)
- Sleep/wake history
- **Intervention**: Force-restart container, force-stop container, update environment variables, change sleep timeout

### 5. "GitHub says permission denied" — Credential Issues

**Root causes**: GitHub App token expired, integration revoked, wrong integration linked to rig, credential store in container has stale token, dynamic credential resolution endpoint returning errors.

**Admin needs**:
- See current credential state: which integration is linked, when was the token last resolved, is the token valid?
- See git credential verification results (currently only a `console.warn`)
- See all git push/pull failures across agents in the town
- **Intervention**: Force-refresh credentials from integration, manually set a git token, re-link integration

### 6. "My convoy never finished" — Convoy Progress Stall

**Root causes**: A tracked bead was deleted instead of closed (counter never catches up), a tracked bead's merge failed, intermediate merge to feature branch conflicted, landing MR failed.

**Admin needs**:
- Convoy bead graph: all tracked beads with their statuses
- Progress counters: `closed_beads` / `total_beads`, with specific identification of which beads are not yet closed
- Landing MR status
- **Intervention**: Force-land convoy, add/remove tracked beads, force-close stuck child beads

### 7. "The Mayor isn't responding" — Mayor Dysfunction

**Root causes**: Mayor agent crashed, container sleeping, Mayor stuck in a tool call loop, Mayor's system prompt is stale/wrong, Mayor's session errored.

**Admin needs**:
- Mayor agent status and last activity timestamp
- Mayor's current session state (active? idle? errored?)
- Mayor's recent conversation (messages + tool calls)
- Mayor's system prompt (rendered, with all dynamic context)
- **Intervention**: Force-restart Mayor, clear Mayor conversation, resend last user message

### 8. "I'm being charged but nothing is happening" — Silent Resource Consumption

**Root causes**: Zombie agents consuming LLM tokens, agents in retry loops, refinery re-reviewing the same MR, container running with no active work.

**Admin needs**:
- LLM token usage per agent per bead (requires #228 metrics)
- Active agent count vs. actual work being done
- Container uptime with no dispatched agents
- Review queue churn: how many times has the same MR been retried?

---

## Data Access Model — No Central Database

Gastown has no central Postgres table of towns or agents. All state is distributed across Durable Objects:

- **GastownUserDO** (keyed by userId) — owns `user_towns` and `user_rigs` tables. This is the only way to discover which towns a user has.
- **TownDO** (keyed by townId) — owns all beads, agents, rigs, events, config. This is where all the operational data lives.
- **AgentDO** (keyed by agentId) — owns high-volume SDK event streams per agent.

This means the admin panel **cannot start from a global town list** — there is no table to query. The entry point must be **user-scoped**: look up a user, query their GastownUserDO for their towns, then drill into a specific TownDO.

### Navigation flow

```
Admin searches for user (by email, userId, or name from kilocode_users in Postgres)
  → Hits GastownUserDO for that userId
    → Gets list of towns (user_towns) and rigs (user_rigs)
      → Selects a town → enters Town Inspector (queries TownDO)
        → Drills into beads, agents, container, config, events
```

This user-scoped lookup should live inside the **existing Kilo admin dashboard's user detail page** — not in a separate Gastown-only admin route. When an admin is already looking at a user (e.g., for a support case), they should see a "Gastown" section showing that user's towns with health indicators, and be able to drill straight into the Town Inspector from there.

---

## Admin Panel Sections

### User → Gastown Section (entry point, on existing admin user detail page)
- List of user's towns (from GastownUserDO) with health indicators (green/yellow/red)
- Per-town summary: active agents, open beads, container status, last activity
- Quick actions: jump to Town Inspector, force-restart container
- List of user's rigs with linked repo and integration status

### Town Inspector (single town deep-dive)
- **State tab**: All beads with current status, assigned agent, last event timestamp. Filterable by type/status. Clickable to bead detail view.
- **Agents tab**: All agents with role, status, hooked bead, dispatch attempts, last activity. Clickable to agent detail view with SDK event stream.
- **Review Queue tab**: Pending/in-progress MR beads, refinery assignment, PR URLs, time-in-queue.
- **Container tab**: Health status, process list, environment variables (redacted), lifecycle timeline, resource usage.
- **Config tab**: Town config, rig configs, integration links, credential status. Editable by admin.
- **Events tab**: Unified timeline merging bead_events + agent events + container events, correlated by bead/agent ID.

### Bead Inspector (single bead deep-dive)
- Full state history (bead_events)
- Dependency graph visualization (what blocks this, what this blocks, convoy membership)
- Assigned agent history (who worked on this, for how long)
- Related MR beads (review submissions)
- Agent conversation for each assignment (pulled from AgentDO)
- Admin actions: force-close, force-fail, reassign, change priority, unhook agent

### Agent Inspector (single agent deep-dive)
- Status timeline
- SDK event stream (from AgentDO, with search)
- Current/past hooked beads
- Dispatch attempt history with error details
- Git operations performed
- LLM token usage (when #228 metrics available)
- Admin actions: force-reset, force-kill, delete agent

### Intervention Log
- All admin actions taken (who, when, what, on which town/bead/agent)
- Immutable audit trail — admins must not be able to intervene without a record

---

## Data Requirements

Several pieces of data that admins need are currently **not persisted anywhere**:

1. **Dispatch attempt details** — currently `console.error` only. Need to persist dispatch errors (container response, error message, timestamp) in a `dispatch_attempts` table or as bead events.
2. **Container lifecycle events** — currently `console.log` only. Need to emit bead events or a dedicated container event table for start/stop/error/sleep/wake.
3. **Git operation results** — currently container-side `console.log` only. Need to report clone/push/merge outcomes back to TownDO as events.
4. **Admin intervention audit log** — does not exist. Need a new table.
5. **Credential resolution results** — currently `console.warn` only. Need to persist token refresh success/failure.

These data gaps should be addressed in #228 or as a prerequisite to this issue.

## Acceptance Criteria

- [ ] User → Gastown section on existing admin user detail page (GastownUserDO lookup, town list with health indicators)
- [ ] Town inspector with state, agents, review queue, container, config, and events tabs
- [ ] Bead inspector with full state history, dependency graph, and agent conversation replay
- [ ] Agent inspector with SDK event stream, dispatch history, and status timeline
- [ ] Admin interventions: force-reset agent, force-close/fail bead, force-restart container, force-retry review, credential refresh
- [ ] All admin interventions recorded in an immutable audit log
- [ ] Unified event timeline correlating bead events, agent events, and container events
- [ ] Admin panel gated to `is_admin` users (extends #537)
- [ ] Data persistence gaps addressed (dispatch attempts, container lifecycle, git operations, credential resolution)

## Notes

- No data migration needed — cloud Gastown hasn't deployed to production
- This is the **admin** panel for Kilo operators, not the user-facing dashboard (which is #346 / #225)
- The entry point is the existing admin user detail page — add a Gastown section there, not a separate `/gastown/admin/` route tree
- Town Inspector and deeper views can be standalone admin pages, linked from the user detail page
- Fleet-wide overview (all towns across all users) is out of scope — will be addressed alongside #228 (Observability) once a secondary index exists




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gastown User-Scoped Admin Panel — Inspect & Intervene via GastownUserDO → TownDO #897

Parent

Problem

User Failure Scenarios & Required Admin Capabilities

1. "My agent isn't doing anything" — Stuck Agent

2. "My bead has been open forever" — Stuck Bead

3. "Code won't merge" — Refinery Failure

4. "Container keeps dying" — Container Instability

5. "GitHub says permission denied" — Credential Issues

6. "My convoy never finished" — Convoy Progress Stall

7. "The Mayor isn't responding" — Mayor Dysfunction

8. "I'm being charged but nothing is happening" — Silent Resource Consumption

Data Access Model — No Central Database

Navigation flow

Admin Panel Sections

User → Gastown Section (entry point, on existing admin user detail page)

Town Inspector (single town deep-dive)

Bead Inspector (single bead deep-dive)

Agent Inspector (single agent deep-dive)

Intervention Log

Data Requirements

Acceptance Criteria

Notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Gastown User-Scoped Admin Panel — Inspect & Intervene via GastownUserDO → TownDO #897

Description

Parent

Problem

User Failure Scenarios & Required Admin Capabilities

1. "My agent isn't doing anything" — Stuck Agent

2. "My bead has been open forever" — Stuck Bead

3. "Code won't merge" — Refinery Failure

4. "Container keeps dying" — Container Instability

5. "GitHub says permission denied" — Credential Issues

6. "My convoy never finished" — Convoy Progress Stall

7. "The Mayor isn't responding" — Mayor Dysfunction

8. "I'm being charged but nothing is happening" — Silent Resource Consumption

Data Access Model — No Central Database

Navigation flow

Admin Panel Sections

User → Gastown Section (entry point, on existing admin user detail page)

Town Inspector (single town deep-dive)

Bead Inspector (single bead deep-dive)

Agent Inspector (single agent deep-dive)

Intervention Log

Data Requirements

Acceptance Criteria

Notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions