What happened
After 26h of daemon uptime, agentctl daemon status reported 394 active sessions and 79 active locks. After restart, it showed 0. The 394 sessions were mostly stale OpenClaw hook sessions (cron jobs) from 16-23h ago that had already completed on the gateway.
Root cause
Three related issues in session-tracker reaping logic:
1. OpenClaw sessions are never reaped from state (primary bug)
reapStaleEntries() only cleans up sessions based on PID liveness:
if ((record.status === 'running' || record.status === 'idle') && record.pid) {
// Only reaps if PID is dead
}
OpenClaw sessions have no PID (they're remote gateway sessions). When the gateway stops returning a session (it completed), the session-tracker never notices — the session just sits in state as running forever.
The discover-first approach correctly queries the gateway each poll cycle, but only adds/updates sessions that appear in discover results. It never removes sessions that disappear from discover results. This is the core gap.
2. Auto-locks are never released for sessions that die without session.stop
autoUnlock() is only called in the session.stop RPC handler (server.ts:421,433). When reapStaleEntries() marks a session as stopped (PID dead), it does NOT call autoUnlock(). This means auto-locks for crashed Claude Code sessions accumulate too.
For the 79 locks specifically: these are likely a mix of stale auto-locks from dead sessions and manual locks that were never released.
3. pruneOldSessions() only runs on startup
The 7-day prune (STOPPED_SESSION_PRUNE_AGE_MS) runs once at startPolling(). For long-running daemons, stopped sessions accumulate until the next restart. Should also run periodically (e.g., every hour).
Expected behavior
- Sessions that disappear from adapter discover() results should be marked stopped and eventually pruned
- Auto-locks should be released when a session is reaped (not just on explicit stop)
- Periodic pruning should run during daemon lifetime, not just on startup
Fix suggestions
-
In poll(): After discover, build a set of all discovered session IDs. Any session in state with status=running/idle whose adapter matches but whose ID is NOT in the discovered set → mark stopped. (Be careful: only reap sessions from adapters that successfully returned results, to avoid mass-reaping on transient adapter failures.)
-
In reapStaleEntries(): When marking a session stopped, also call lockManager.autoUnlock(sessionId).
-
Periodic prune: Run pruneOldSessions() on a timer (e.g., hourly) in addition to startup.
How to reproduce
- Start daemon, wait for OpenClaw adapter to discover sessions
- Wait for those sessions to complete on the gateway side
agentctl daemon status — sessions count grows monotonically
agentctl list -a — shows hundreds of stale OpenClaw sessions as running/idle
Environment
- agentctl (current main)
- macOS, OpenClaw gateway with frequent hook sessions (cron)
What happened
After 26h of daemon uptime,
agentctl daemon statusreported 394 active sessions and 79 active locks. After restart, it showed 0. The 394 sessions were mostly stale OpenClaw hook sessions (cron jobs) from 16-23h ago that had already completed on the gateway.Root cause
Three related issues in session-tracker reaping logic:
1. OpenClaw sessions are never reaped from state (primary bug)
reapStaleEntries()only cleans up sessions based on PID liveness:OpenClaw sessions have no PID (they're remote gateway sessions). When the gateway stops returning a session (it completed), the session-tracker never notices — the session just sits in state as
runningforever.The discover-first approach correctly queries the gateway each poll cycle, but only adds/updates sessions that appear in discover results. It never removes sessions that disappear from discover results. This is the core gap.
2. Auto-locks are never released for sessions that die without
session.stopautoUnlock()is only called in thesession.stopRPC handler (server.ts:421,433). WhenreapStaleEntries()marks a session as stopped (PID dead), it does NOT callautoUnlock(). This means auto-locks for crashed Claude Code sessions accumulate too.For the 79 locks specifically: these are likely a mix of stale auto-locks from dead sessions and manual locks that were never released.
3.
pruneOldSessions()only runs on startupThe 7-day prune (
STOPPED_SESSION_PRUNE_AGE_MS) runs once atstartPolling(). For long-running daemons, stopped sessions accumulate until the next restart. Should also run periodically (e.g., every hour).Expected behavior
Fix suggestions
In
poll(): After discover, build a set of all discovered session IDs. Any session in state withstatus=running/idlewhose adapter matches but whose ID is NOT in the discovered set → mark stopped. (Be careful: only reap sessions from adapters that successfully returned results, to avoid mass-reaping on transient adapter failures.)In
reapStaleEntries(): When marking a session stopped, also calllockManager.autoUnlock(sessionId).Periodic prune: Run
pruneOldSessions()on a timer (e.g., hourly) in addition to startup.How to reproduce
agentctl daemon status— sessions count grows monotonicallyagentctl list -a— shows hundreds of stale OpenClaw sessions as running/idleEnvironment