Bug: PocketIC silently reloads canister execution state from stale checkpoint after macOS sleep
Summary
When dfx start (v0.30.2, PocketIC-based) runs on macOS and the laptop sleeps for ≥ ~90 minutes, certain canisters have their Wasm heap reset and stable memory rolled back to a stale on-disk checkpoint — even though the PocketIC process and all sandbox processes remain alive throughout. In-memory page deltas that should still be in the process's address space are lost. This causes canister state to silently regress to an earlier point in time.
Environment
- dfx: 0.30.2
- pocket-ic: bundled with dfx 0.30.2 (running under Rosetta on Apple Silicon)
- macOS: Apple Silicon, macOS Sequoia (15.x)
- Rust canister toolchain: ic-stable-structures 0.7, ic-cdk 0.20, ic-cdk-timers 1.0, candid 0.10
Reproduction Steps
Minimal reproduction canister
A trivial canister (push_stress) with zero application logic — only ic-stable-structures MemoryManager + StableBTreeMap + StableCell:
# Cargo.toml
[dependencies]
candid = "0.10"
ic-cdk = "0.20"
ic-cdk-timers = "1.0"
ic-stable-structures = "0.7"
The canister:
- Uses
MemoryManager with a StableBTreeMap and two StableCells
- Maintains a heap counter (
heap_tick) incremented every 30 seconds via timer
- Maintains a stable counter (
push_count in a StableCell) incremented alongside heap_tick
- Has stable memory grown to 2,177 pages (~134 MB) via
Memory::grow()
Steps
- Start
dfx start on macOS
- Deploy the canister and verify:
heap_tick > 0, push_count > 0, stable_pages = 2177
- Close the laptop lid for ≥ 90 minutes
- Open the lid
- Query
get_status → heap_tick has reset to a small number (counting from 0), push_count continues from its pre-sleep value
heap_tick resetting proves the Wasm heap was re-initialized. push_count continuing from its StableCell value proves stable memory is partially restored (the StableCell's MemoryId region was recovered), but writes from the period before the sleep may be lost.
Key observations
- The PocketIC process (
pocket-ic --port-file ... --ttl 2592000 --log-levels error) remains alive throughout — verified via ps -p <pid> -o lstart showing continuous uptime across multiple sleep events.
- All canister sandbox processes (
pocket-ic --run-as-canister-sandbox ...) also remain alive with their original PIDs and start dates.
- macOS power log (
pmset -g log) confirms normal Clamshell Sleep → DarkWake → Full Wake transitions, NOT process termination.
- Despite the process surviving, canister execution state is reloaded from a stale on-disk checkpoint.
Trigger Conditions
Through controlled experiments over 7 sleep/wake events (Apr 1–3, 2026), we identified the conditions that trigger the bug:
| Canister |
Stable pages |
Uses MemoryManager |
Affected? |
Sleep events tested |
| Service map (production) |
2,177 |
Yes |
Always (7/7 events ≥ 90 min) |
7 |
| push_stress (test) |
513 |
Yes |
Never (0/5) |
5 |
| push_stress (test, grown) |
2,177 |
Yes |
Yes (1/1) — first event after growing |
1 |
| raw_stable_probe (test) |
2,177 |
No (DefaultMemoryImpl) |
Never (0/1) |
1 |
| stable_memory_probe (test) |
385 |
No |
Never (0/7) |
7 |
Two conditions appear necessary:
- Stable memory ≥ ~2,177 pages (~134 MB) — push_stress at 513 pages was immune across 5 events; at 2,177 pages it was affected on the first event.
MemoryManager usage — push_stress (MemoryManager, 2,177 pages) was affected; raw_stable_probe (DefaultMemoryImpl, 2,177 pages) was not. (Based on 1 event — needs more data to confirm.)
Sleep duration threshold: ~90 minutes. Sleeps < ~90 minutes do not trigger the issue. The PocketIC embedder config shows max_sandbox_idle_time: 30 seconds, which may be related — all sandboxes exceed this during any sleep.
Observed Behavior
Heap reset
All heap (Wasm linear memory) variables reset to their default values, as if the canister had been freshly installed. This is observed via:
LAST_HEARTBEAT_NS (heap variable) resetting to 0
heap_tick (heap counter) resetting to 0
- Ring buffer contents (stored in a MemoryManager-backed VM region) being wiped
Stable memory rollback
Stable memory is restored from an old on-disk checkpoint, not from the in-memory page deltas:
- The service map's B-tree registry shows
tree_len values from an older checkpoint (e.g., 23 entries when it should have 44).
- Writes made between the last on-disk checkpoint and the sleep are lost. In one test, a deduplication operation successfully removed 21 entries from the B-tree; after sleep, all 21 removed entries reappeared.
StableCell values (small, single-page structures) survive, suggesting that some stable memory regions are recovered while others are not.
Late reconciliation
Minutes after wake, the correct state sometimes becomes visible — old entries that were "missing" reappear. This creates duplicate entries when the canister has already created new entries based on the stale state.
Why This Shouldn't Happen
The PocketIC process is not killed by macOS sleep. All in-memory state — including page deltas accumulated since the last checkpoint — should be in the process's address space when it resumes. The IC execution controller's execute() method passes execution_state.wasm_memory and execution_state.stable_memory to sandboxes, which should contain all accumulated page deltas.
Something inside PocketIC's state management is discarding or bypassing the in-memory page deltas and reloading canister execution state from on-disk checkpoints during or after the sleep/wake cycle. Possible mechanisms:
-
Checkpoint cycle triggered by round catch-up: After waking, PocketIC processes many accumulated rounds. If this triggers a checkpoint cycle (tip_to_checkpoint_and_switch + reset_tip_to), and the checkpoint flush doesn't fully capture all canisters' page deltas, the subsequent tip reset would discard the missing deltas.
-
Sandbox eviction + stale execution state reload: The max_sandbox_idle_time: 30s would mark all sandboxes as idle during sleep. If the eviction path reloads execution state from on-disk checkpoints rather than the main process's canonical in-memory state, the stale checkpoint would be used.
-
heap_delta_estimate overflow: If the accumulated heap deltas across all canisters exceed a threshold during catch-up processing, the system might force a checkpoint that doesn't properly flush all deltas.
Impact
This is a local development only issue — it cannot affect mainnet (where consensus and replication ensure state consistency). However, it is highly disruptive for local development:
- Silent data loss — writes to stable memory are rolled back without any error or warning
- Canister heap resets break any canister that relies on heap-resident state (counters, caches, flags)
- The "late reconciliation" phase creates duplicate state that is difficult to detect and clean up
- Cleanup operations (like dedup) are themselves subject to rollback on the next sleep event
Diagnostic Data Available
We have extensive logs, overlay file histories, and canister state snapshots from 7 sleep/wake events. The full investigation log (500+ lines) is available if useful. We can also provide the push_stress canister source code as a minimal reproduction case.
Bug: PocketIC silently reloads canister execution state from stale checkpoint after macOS sleep
Summary
When
dfx start(v0.30.2, PocketIC-based) runs on macOS and the laptop sleeps for ≥ ~90 minutes, certain canisters have their Wasm heap reset and stable memory rolled back to a stale on-disk checkpoint — even though the PocketIC process and all sandbox processes remain alive throughout. In-memory page deltas that should still be in the process's address space are lost. This causes canister state to silently regress to an earlier point in time.Environment
Reproduction Steps
Minimal reproduction canister
A trivial canister (
push_stress) with zero application logic — onlyic-stable-structuresMemoryManager + StableBTreeMap + StableCell:The canister:
MemoryManagerwith aStableBTreeMapand twoStableCellsheap_tick) incremented every 30 seconds via timerpush_countin aStableCell) incremented alongsideheap_tickMemory::grow()Steps
dfx starton macOSheap_tick > 0,push_count > 0,stable_pages = 2177get_status→heap_tickhas reset to a small number (counting from 0),push_countcontinues from its pre-sleep valueheap_tickresetting proves the Wasm heap was re-initialized.push_countcontinuing from its StableCell value proves stable memory is partially restored (the StableCell's MemoryId region was recovered), but writes from the period before the sleep may be lost.Key observations
pocket-ic --port-file ... --ttl 2592000 --log-levels error) remains alive throughout — verified viaps -p <pid> -o lstartshowing continuous uptime across multiple sleep events.pocket-ic --run-as-canister-sandbox ...) also remain alive with their original PIDs and start dates.pmset -g log) confirms normal Clamshell Sleep → DarkWake → Full Wake transitions, NOT process termination.Trigger Conditions
Through controlled experiments over 7 sleep/wake events (Apr 1–3, 2026), we identified the conditions that trigger the bug:
Two conditions appear necessary:
MemoryManagerusage — push_stress (MemoryManager, 2,177 pages) was affected; raw_stable_probe (DefaultMemoryImpl, 2,177 pages) was not. (Based on 1 event — needs more data to confirm.)Sleep duration threshold: ~90 minutes. Sleeps < ~90 minutes do not trigger the issue. The PocketIC embedder config shows
max_sandbox_idle_time: 30 seconds, which may be related — all sandboxes exceed this during any sleep.Observed Behavior
Heap reset
All heap (Wasm linear memory) variables reset to their default values, as if the canister had been freshly installed. This is observed via:
LAST_HEARTBEAT_NS(heap variable) resetting to 0heap_tick(heap counter) resetting to 0Stable memory rollback
Stable memory is restored from an old on-disk checkpoint, not from the in-memory page deltas:
tree_lenvalues from an older checkpoint (e.g., 23 entries when it should have 44).StableCellvalues (small, single-page structures) survive, suggesting that some stable memory regions are recovered while others are not.Late reconciliation
Minutes after wake, the correct state sometimes becomes visible — old entries that were "missing" reappear. This creates duplicate entries when the canister has already created new entries based on the stale state.
Why This Shouldn't Happen
The PocketIC process is not killed by macOS sleep. All in-memory state — including page deltas accumulated since the last checkpoint — should be in the process's address space when it resumes. The IC execution controller's
execute()method passesexecution_state.wasm_memoryandexecution_state.stable_memoryto sandboxes, which should contain all accumulated page deltas.Something inside PocketIC's state management is discarding or bypassing the in-memory page deltas and reloading canister execution state from on-disk checkpoints during or after the sleep/wake cycle. Possible mechanisms:
Checkpoint cycle triggered by round catch-up: After waking, PocketIC processes many accumulated rounds. If this triggers a checkpoint cycle (
tip_to_checkpoint_and_switch+reset_tip_to), and the checkpoint flush doesn't fully capture all canisters' page deltas, the subsequent tip reset would discard the missing deltas.Sandbox eviction + stale execution state reload: The
max_sandbox_idle_time: 30swould mark all sandboxes as idle during sleep. If the eviction path reloads execution state from on-disk checkpoints rather than the main process's canonical in-memory state, the stale checkpoint would be used.heap_delta_estimateoverflow: If the accumulated heap deltas across all canisters exceed a threshold during catch-up processing, the system might force a checkpoint that doesn't properly flush all deltas.Impact
This is a local development only issue — it cannot affect mainnet (where consensus and replication ensure state consistency). However, it is highly disruptive for local development:
Diagnostic Data Available
We have extensive logs, overlay file histories, and canister state snapshots from 7 sleep/wake events. The full investigation log (500+ lines) is available if useful. We can also provide the
push_stresscanister source code as a minimal reproduction case.