feat(gastown): implement town reconciler with event-driven state management#1336
Merged
feat(gastown): implement town reconciler with event-driven state management#1336
Conversation
Contributor
Code Review SummaryStatus: 3 Issues Found | Recommendation: Address before merge Overview
Issue Details (click to expand)WARNING
Other Observations (not in diff)Issues found in unchanged code that cannot receive inline comments:
Files Reviewed (1 files)
Fix these issues in Kilo Cloud Reviewed by gpt-5.4-20260305 · 471,905 tokens |
c4556f0 to
448b5c3
Compare
448b5c3 to
9e1ac33
Compare
36ee61b to
93b10c8
Compare
jrf0110
commented
Mar 20, 2026
jrf0110
commented
Mar 20, 2026
93b10c8 to
591b7be
Compare
4 tasks
591b7be to
8399380
Compare
6a266de to
1bf9abc
Compare
…gement Replace the imperative patrol/scheduling/review-queue alarm phases with a declarative reconciler that drains events, computes desired state, and applies corrective actions. Reconciler architecture (Phases 1-5): - Event table (town_events) with 8 event types, dual-write in RPC handlers - 6 reconciler functions: agents, beads, review queue, convoys, GUPP, GC - applyEvent/applyAction for all event and action types - Event-only agentDone/agentCompleted (no direct mutations) - Lazy assignment for slingConvoy/startConvoy (#1249) - Container status observation pre-phase (replaces witnessPatrol) - Agent activity watermark via enriched heartbeat - Invariant checker + reconciler metrics in getAlarmStatus - Rework flow: gt_request_changes tool for refinery change requests - Refinery system prompt wired in (buildRefinerySystemPrompt) Bug fixes: - Convoy landing MR cycling (reconciler created duplicates) - Multi-agent hook mutual exclusion (hookBead unhooks stale agents) - agentCompleted no longer closes beads when gt_done was not called - GUPP NaN bug for agents with null last_activity_at - Container status event flood (upsert instead of insert per tick) - PR-strategy MR beads no longer block the refinery queue - poll_pr refreshes updated_at to prevent false orphan kills - Failed events retry on next tick instead of being dropped Also includes prerequisite fixes from #1244: - MR bead failure lifecycle fixes - Review queue recovery improvements - Convoy progress tracking fixes Dead code removed: ~1,578 lines from patrol.ts, scheduling.ts, review-queue.ts, and Town.do.ts.
1bf9abc to
6c769f9
Compare
| @@ -518,7 +532,17 @@ export async function startAgent( | |||
| ): Promise<ManagedAgent> { | |||
| const existing = agents.get(request.agentId); | |||
| if (existing && (existing.status === 'running' || existing.status === 'starting')) { | |||
Contributor
There was a problem hiding this comment.
WARNING: Restarting a starting agent can leak a duplicate session
This branch now treats existing.status === 'starting' the same as an idle running agent, but stopAgent() cannot actually cancel a startup before sessionId exists. If a retry lands while the first startAgent() call is still between sessionCount++ and session.create(), the original call keeps going, subscribes to events, and can leave an extra live session that is no longer tracked in agents.
jeanduplessis
approved these changes
Mar 20, 2026
This was referenced Mar 20, 2026
Closed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replace the imperative patrol/scheduling/review-queue alarm phases with a declarative reconciler that drains events, computes desired state, and applies corrective actions. This is the complete implementation of the reconciliation spec (Phases 1-5), plus critical bug fixes discovered during production rollout.
Architecture: The alarm loop is now: container status observation → event drain → reconcile → apply actions → side effects → housekeeping. RPC handlers (
agentDone,agentCompleted) are event-only — they insert events intotown_eventsand the reconciler applies state transitions on the next alarm tick.New files:
src/dos/town/reconciler.ts— 6 reconciler functions (agents, beads, review queue, convoys, GUPP, GC) + invariant checker + event applicationsrc/dos/town/actions.ts— 22-variant action type +applyAction()with sync SQL mutations and deferred async side effectssrc/dos/town/events.ts— Event recording, draining, pruning, and container status upsertsrc/db/tables/town-events.table.ts— Event table schematest/integration/reconciler.test.ts— 12 integration tests for reconciler rulesKey behavioral changes:
slingConvoy/startConvoyno longer eagerly create agents for all beads. The reconciler assigns agents only to unblocked beads (Lazy agent assignment — scheduler assigns agents when beads are ready, not at creation #1249)agentCompleted('completed')no longer closes beads whengt_donewasn't called (prevents idle-timer-killed polecats from falsely closing work)witnessPatrol— polls container health before each reconciler ticklast_event_type,last_event_at,active_tools)gt_request_changestool lets the refinery create rework beads that block the MR beadbuildRefinerySystemPromptis now wired in with branch, target branch, and merge strategyBug fixes:
hookBeadnow has mutual exclusion (unhooks stale agents)last_activity_atare handled instead of silently skippedin_progressonly (open PR-strategy beads no longer erroneously failed)Dead code removed: ~1,578 lines from
patrol.ts,scheduling.ts,review-queue.ts, andTown.do.ts.Verification
pnpm typecheck: clean (0 errors)pnpm test: 118/118 unit tests passingreconciler.test.ts,convoy-dag.test.ts,review-failure.test.tswrangler tailVisual Changes
N/A
Reviewer Notes
docs/gt/reconciliation-deviations.md(outside this repo, in the docs directory) tracks 24 deliberate deviations from the original spec with rationale.http-api.test.ts,rig-do.test.ts, andtown-container.test.tsare partially addressed (route URL patterns,bead.id→bead.bead_id). Some bead-events ordering failures inrig-do.test.tsremain from before this work.STALE_IN_PROGRESS_TIMEOUT_MSis set to 5 minutes (up from 2) to avoid racing with the container idle timer (2 min) + alarm tick. This means orphaned in-progress beads take 5 min to recover instead of 2..kilo/worktrees/files from other agents.--no-verifywas used for the push.