feat(gastown): replace binary is_admin gate with PostHog feature flags#904
feat(gastown): replace binary is_admin gate with PostHog feature flags#904
Conversation
Code Review SummaryStatus: 1 Issue Found | Recommendation: Address before merge Overview
Issue Details (click to expand)No new inline-commentable issues in the current diff. Other Observations (not in diff)Issues found in unchanged code that cannot receive inline comments:
Fix these issues in Kilo Cloud Files Reviewed (50 files)
|
feea622 to
ebe2373
Compare
b6584cc to
bdb3a53
Compare
0a435ce to
e511ea6
Compare
| agentBeadId: input.source_agent_id ?? null, | ||
| title: `Escalation (${input.severity}): ${input.message.slice(0, 80)}`, | ||
| context: { | ||
| escalation_bead_id: beadId, |
There was a problem hiding this comment.
WARNING: Linked escalation ID is stored under a different metadata path than the resolver reads
createTriageRequest() wraps this object under metadata.context, but resolveTriage() later checks triageBead.metadata?.escalation_bead_id before deciding whether to close the original escalation bead. Because this value is nested here, every resolved escalation triage request leaves its escalation bead open indefinitely.
There was a problem hiding this comment.
Fixed. The lookup now reads from triageBead.metadata.context.escalation_bead_id (matching the TriageRequestMetadata structure where context is nested) instead of triageBead.metadata.escalation_bead_id.
| ); | ||
| } | ||
|
|
||
| if (error && !data) { |
There was a problem hiding this comment.
WARNING: WebSocket failures never reach the documented polling fallback
If the status socket fails before the first message, this branch renders only the error and the pane never calls the authenticated getAlarmStatus query. That leaves the entire Status tab unusable on browsers or networks where WebSockets are blocked, even though the backend added a polling endpoint for the same data.
There was a problem hiding this comment.
Fixed. AlarmStatusPane now uses the tRPC getAlarmStatus query as a polling fallback (5s interval) when the WebSocket fails. The connection indicator shows 'Live' (green) for WS, 'Polling' (blue) for fallback, or 'Reconnecting...' (yellow) when neither is working. Also regenerated the type declarations to include getAlarmStatus.
| @@ -2009,6 +2665,10 @@ export class TownDO extends DurableObject<Env> { | |||
| // polecat assignee preserved. | |||
| agents.hookBead(this.sql, refineryAgent.id, entry.id); | |||
|
|
|||
| // Mark as working before the async container start (same I/O gate | |||
| // rationale as dispatchAgent — see comment there). | |||
| agents.updateAgentStatus(this.sql, refineryAgent.id, 'working'); | |||
There was a problem hiding this comment.
WARNING: Failed refinery launches now leave the per-rig refinery stuck as working
This new pre-start status flip needs a matching rollback in the !started path below. Right now a single container-start failure unhooks the bead and fails the review, but the refinery row stays working, so every later review for that rig hits the status !== 'idle' guard above and gets re-queued forever.
There was a problem hiding this comment.
Fixed. The !started path in processReviewQueue now calls updateAgentStatus(refineryAgent.id, 'idle') alongside the existing unhook and review failure, so the per-rig refinery singleton is available for future reviews.
#901) Replace the binary is_admin check from #537 with PostHog feature flags for progressive rollout. Flag management (allowlists, percentage rollout, kill-switch) is handled entirely through the PostHog dashboard — no custom DB tables or admin UI needed. Gate points updated: - 9 Next.js pages use isFeatureFlagEnabled('gastown-access', user.id) - Sidebar uses useFeatureFlagEnabled('gastown-access') - Token endpoint evaluates the flag and embeds gastownAccess in the JWT - Worker checks gastownAccess JWT claim (isAdmin fallback for compat) Sub-feature flag names defined: gastown-convoys, gastown-pr-merge, gastown-multi-rig (to be created in PostHog when needed). Closes #901
- Switch from isFeatureFlagEnabled to isReleaseToggleEnabled for strict boolean auth checks (prevents multivariate variants from granting access) - Remove dev-mode bypass — gate in dev too via PostHog - Abstract requireGastownAccess into gastownProcedure composable tRPC middleware in init.ts, replacing manual requireGastownAccess(ctx) calls - Remove sub-feature flags (convoys, pr_merge, multi_rig) — only gastown-access remains
Use useFeatureFlagVariantKey === true instead of useFeatureFlagEnabled to align the sidebar with the server-side isReleaseToggleEnabled check. This prevents multivariate string variants from showing the nav item when server-side access would be denied.
Switch all gate points from isReleaseToggleEnabled back to isFeatureFlagEnabled. Add a DEV_ENABLED_FLAGS set to posthog-feature-flags.ts that returns true for gastown-access in non-production environments so local dev works without PostHog configuration. Sidebar reverts to useFeatureFlagEnabled.
…ture-flags.ts Move the dev-mode override out of the shared posthog-feature-flags.ts module and into isGastownEnabled in src/lib/gastown/feature-flags.ts. All pages and the token endpoint now call isGastownEnabled(user.id) which returns true in non-production and delegates to isFeatureFlagEnabled in production. The sidebar uses useFeatureFlagEnabled || isDevelopment for the same effect client-side.
… improvements (#442) (#924) Alarm-driven patrol system (witness & deacon): - Tiered GUPP violation handling (30min warn, 1h escalate+triage, 2h force-stop) - Orphaned work detection, stale hook recovery, agent GC, crash loop detection - Per-bead timeout enforcement with agent container termination - On-demand LLM triage agent for ambiguous situations - Triage action validation, access control, and snapshot-based resolution - Stranded convoy feeding with immediate dispatch eligibility Mayor codebase browsing: - Browse worktrees at /workspace/rigs/<rigId>/browse/ for read-only access - POST /repos/setup container endpoint for proactive repo cloning - System prompt written to AGENTS.md so mayor and sub-agents share context - Git credential race fix: refreshGitCredentials runs before configureRig - GIT_TERMINAL_PROMPT=0 to prevent credential prompt hangs Agent dispatch improvements: - startPoint parameter for convoy agents to branch from feature branch - platformIntegrationId and KILOCODE_TOKEN plumbed through repo setup - Existing users arm watchdog on DO init - RESTART_WITH_BACKOFF uses dispatch cooldown delay Rig deletion fix: - tRPC deleteRig now calls TownDO.removeRig (was missing) - addRig handles stale name conflicts via catch-and-retry Real-time alarm status UI: - Hibernatable WebSocket for live alarm status push - Status tab in terminal bar with agent/bead/patrol cards Other UI: - Convoy title and branch use flex-based truncation instead of fixed max-width - Status pane card padding normalized to p-2 - Legacy agent roles accepted in Zod schemas for backward compat - PostHog feature flag integration for gastown access gating
…apshot bead CLOSE_BEAD and REASSIGN_BEAD now check that the agent's current hook matches the snapshot bead from the triage request before calling stopAgentInContainer. If the agent has moved on to different work, stopping it would abort unrelated sessions.
gt_bead_close only marks the bead closed without unhooking the agent or resetting it to idle, leaking agent records. gt_done triggers the agentDone path which has the patrol-created triage fast-path that properly closes the batch, unhooks, and returns the agent to idle.
… queue safety - Treat refinery as per-rig singleton in getOrCreateAgent to prevent UNIQUE constraint on identity when a refinery already exists - Re-queue review entry (reset to open) when refinery is busy instead of leaving it stuck in in_progress - Return 'not_found' (not 'unknown') from checkAgentContainerStatus on 404, so witnessPatrol immediately resets and redispatches agents after container eviction instead of waiting for the 2-hour GUPP timeout
…refresh - Remove remaining gt_bead_close reference in triage prompt (line 72) that contradicted the gt_done instruction on line 49 - Use strftime with ISO format in orphanedHooks SQL query to match the toISOString() format stored in last_activity_at - Resolve git credentials per-rig in mayor browse setup instead of sharing one credential set across all rigs - Browse worktree refresh uses fetch+reset instead of checkout to avoid wrong-branch errors (worktree is on synthetic browse branch)
…llow-up Previously, gt_escalate created an escalation bead and optionally notified the mayor, but nothing automated acted on it. Escalation beads sat open with no assignee indefinitely. Now routeEscalation creates a triage request alongside the escalation bead, feeding the escalation into the patrol→triage→resolve loop. The triage agent can then RESTART, REASSIGN, CLOSE, or ESCALATE_TO_MAYOR with the full context of the original escalation. When a triage request linked to an escalation is resolved, the escalation bead is also closed automatically. Also adds 'escalation' to the TriageType union and enriches the ESCALATE_TO_MAYOR mayor message with agent and bead context.
When an agent escalates from within a convoy, the escalation bead and its triage request now carry convoy_id and source_bead_id in their metadata. This associates escalations with their convoy for display purposes and lays groundwork for Phase 4 convoy-aware triage handling.
…llback, regen types - Fix escalation_bead_id lookup in resolveTriage to read from metadata.context (matching createTriageRequest's structure) - Add polling fallback to AlarmStatusPane via tRPC getAlarmStatus query when WebSocket fails, with 5s refetch interval - Reset refinery to idle when container start fails in processReviewQueue - Regenerate gastown type declarations to include getAlarmStatus
572a193 to
fbe0bc3
Compare
Summary
Replace the binary
is_admingate from #537 with PostHog feature flags for progressive rollout. All flag management — allowlists, percentage rollout, kill-switch — is handled through the PostHog dashboard. No custom DB tables, admin UI, or flag evaluation infrastructure needed.Gate points updated (all 4 layers):
isFeatureFlagEnabled('gastown-access', user.id)with dev-mode bypassuseFeatureFlagEnabled('gastown-access')(client-side PostHog hook, same pattern asauto-triage-feature)/api/gastown/token) — evaluates PostHog flag, embedsgastownAccess: truein JWT, returns 403 if deniedrequireGastownAccess()checksgastownAccessJWT claim, falls back toisAdminfor backward compatibilitySub-feature flag names defined (in
src/lib/gastown/feature-flags.ts):gastown-access— top-level gategastown-convoys— convoy creationgastown-pr-merge— PR merge strategygastown-multi-rig— second rig per townThese follow the existing PostHog flag pattern used by
auto-triage-feature,code-review-cloud-agent-next, etc.Backward compatibility: When the
gastown-accessflag doesn't exist in PostHog (or PostHog is unavailable in non-production), the dev-mode bypass allows local development. In production,isFeatureFlagEnabledreturnsfalsewhen the flag is missing, so access is denied by default (same as the oldis_admingate for non-admins). The worker accepts bothgastownAccessandisAdminJWT claims, so old tokens continue to work.Closes #901
Verification
pnpm typecheck— all 29 workspace packages passENABLE_GASTOWN_FEATUREhas zero remaining references after removalrequireAdminhas zero remaining references in gastown worker after renameVisual Changes
N/A
Reviewer Notes
ENABLE_GASTOWN_FEATUREcompile-time constant has been removed — it had no consumers after this change. The PostHog kill-switch is strictly better (no redeploy needed).cohortscolumn onkilocode_userswas investigated and found to be completely dead code (never populated, read, or used). It is not relevant to this feature.gastown-accessflag must be created in PostHog before this can be deployed. Initially configure it to target users whereis_administrueto preserve the current behavior, then progressively widen access.IS_DEVELOPMENT) follows the same pattern asauto-fix/page.tsxto ensure local development works without PostHog.