Skip to content

fix(cloud-agent-next): recover orphaned review sessions#877

Open
alex-alecu wants to merge 5 commits intomainfrom
fix/review-release-session
Open

fix(cloud-agent-next): recover orphaned review sessions#877
alex-alecu wants to merge 5 commits intomainfrom
fix/review-release-session

Conversation

@alex-alecu
Copy link
Contributor

@alex-alecu alex-alecu commented Mar 6, 2026

Summary

Recover V2 review sessions when a running execution loses active_execution_id.

The production case here looks like a single hidden execution, not intended parallel work in one session. Once the marker was lost, stale reaping, interrupts, alarm cadence, idle Kilo cleanup, and the terminal callback path could all treat the session as idle. This change repairs the active marker from non-terminal execution state and makes those lifecycle checks use the repaired view. The broader scan of non-terminal executions is defensive hardening if marker loss ever lets session state drift.

Verification

  • pnpm --dir cloud-agent-next exec tsc -p tsconfig.json --noEmit
  • pnpm --dir cloud-agent-next test:integration test/integration/session/disconnect-and-reaper.test.ts test/integration/session/start-execution-v2.test.ts

Visual Changes

N/A

Reviewer Notes

The observed incident is explained by one running execution with a missing marker. Handling multiple non-terminal records here is secondary hardening, not an assumption that parallel executions in one session are expected.

@alex-alecu alex-alecu self-assigned this Mar 6, 2026
@alex-alecu alex-alecu requested a review from eshurakov March 6, 2026 09:57
const executions = await this.executionQueries.getAll();
const candidates = executions
.filter(execution => execution.status === 'running' || execution.status === 'pending')
.sort((a, b) => b.startedAt - a.startedAt);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: Recovering only the newest non-terminal execution can leave older orphaned executions stuck

If active_execution_id is lost and another execution starts before the alarm fires, both executions can be non-terminal. Picking the newest candidate here restores that marker and causes the older orphan to be ignored on subsequent alarms, because cleanupStaleExecutions() goes back to following the active marker once one exists. That leaves the stale original execution stranded until the newer execution finishes.

@kilo-code-bot
Copy link
Contributor

kilo-code-bot bot commented Mar 6, 2026

Code Review Summary

Status: 1 Issues Found | Recommendation: Address before merge

Overview

Severity Count
CRITICAL 0
WARNING 1
SUGGESTION 0
Issue Details (click to expand)

WARNING

File Line Issue
cloud-agent-next/src/persistence/CloudAgentSession.ts 1104 Returning the orphaned execution without restoring the active marker leaves interrupt, idle-server cleanup, and status reads inconsistent

Fix these issues in Kilo Cloud

Other Observations (not in diff)

N/A

Files Reviewed (3 files)
  • cloud-agent-next/src/persistence/CloudAgentSession.ts - 1 issue
  • cloud-agent-next/test/integration/session/disconnect-and-reaper.test.ts - 0 issues
  • cloud-agent-next/test/integration/session/start-execution-v2.test.ts - 0 issues

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants