Skip to content

fix(gastown): Container token refresh fires every minute on idle towns due to in-memory throttle #1409

@jrf0110

Description

@jrf0110

Bug

Idle towns POST /refresh-token to the container every ~1 minute, keeping containers alive unnecessarily. The token refresh is supposed to be throttled to once per hour, but the throttle resets every time the DO is evicted from memory.

Root Cause

Town.do.ts:3052 stores the throttle timestamp as an instance property:

private lastContainerTokenRefreshAt = 0;

Durable Objects are evicted from memory between alarm ticks when idle. On re-instantiation, this resets to 0. The next alarm tick sees now - 0 > TOKEN_REFRESH_INTERVAL_MS → true → fires the refresh. With IDLE_ALARM_INTERVAL_MS = 60_000 (1 minute), this means every idle alarm tick re-instantiates the DO, resets the throttle, and POSTs /refresh-token to the container.

The container receives the POST, wakes up (if sleeping), processes it, and stays alive waiting for more requests. This prevents idle containers from being reclaimed.

Fix

Part A: Persist the throttle in DO storage

Replace the in-memory timestamp with ctx.storage.get/put:

private async refreshContainerToken(): Promise<void> {
  const TOKEN_REFRESH_INTERVAL_MS = 60 * 60_000;
  const now = Date.now();
  const lastRefresh = await this.ctx.storage.get<number>('lastContainerTokenRefreshAt') ?? 0;
  if (now - lastRefresh < TOKEN_REFRESH_INTERVAL_MS) return;

  const townId = this.townId;
  if (!townId) return;
  const townConfig = await this.getTownConfig();
  const userId = townConfig.owner_user_id ?? townId;
  await dispatch.refreshContainerToken(this.env, townId, userId);
  await this.ctx.storage.put('lastContainerTokenRefreshAt', now);
}

ctx.storage.get/put survives DO eviction (persisted in the DO's SQLite).

Part B: Increase idle alarm interval to 5 minutes

const IDLE_ALARM_INTERVAL_MS = 5 * 60_000; // 5m when idle

Nothing in the idle tick needs to run every minute. Container observation, event drain, reconciliation, and housekeeping are all no-ops when there's no active work. The alarm re-arms to 5s immediately when new work arrives via armAlarmIfNeeded, so latency for new beads is unaffected.

Files

  • src/dos/Town.do.tslastContainerTokenRefreshAt (line 3052), refreshContainerToken (line 3053), IDLE_ALARM_INTERVAL_MS (line 122)

Impact

Medium — wastes container resources and Cloudflare billing on idle towns. Does not affect correctness.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Post-launchbugSomething isn't workinggt:containerContainer management, agent processes, SDK, heartbeatkilo-auto-fixAuto-generated label by Kilokilo-triagedAuto-generated label by Kilo

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions