Skip to content

Fix offload_state_dict CPU staging, FirstBlockCache crash, FasterCache callback validation#17

Draft
cursor[bot] wants to merge 2 commits into
mainfrom
cursor/critical-bug-management-829a
Draft

Fix offload_state_dict CPU staging, FirstBlockCache crash, FasterCache callback validation#17
cursor[bot] wants to merge 2 commits into
mainfrom
cursor/critical-bug-management-829a

Conversation

@cursor

@cursor cursor Bot commented Jun 26, 2026

Copy link
Copy Markdown

Summary

Fixes three critical correctness bugs found during automated review.

Bug 1: CPU-staged weights never loaded when offload_state_dict=True (without disk offloads)

Impact: Silent weight corruption — models loaded with device_map + offload_state_dict=True leave CPU-staged parameters uninitialized (random/meta values).

Root cause: load_offloaded_weights() was only called inside the disk-offload branch (offload_index non-empty). CPU-staged weights written to the temp folder via state_dict_index were never loaded back.

Trigger:

UNet2DModel.from_pretrained(path, device_map={"": "cpu"}, offload_state_dict=True, low_cpu_mem_usage=True)

Fix: Call load_offloaded_weights() whenever state_dict_index has entries, independent of disk offloads. Always clean up the temp folder.

Validation: pytest tests/models/test_offload_state_dict.py — 1 passed

Bug 2: FirstBlockCache crash on single-block transformers

Impact: IndexError when applying FirstBlockCache to transformers with exactly one block.

Root cause: apply_first_block_cache always pop(0) then pop(-1) without handling the single-block case (MagCache already handles this).

Trigger: apply_first_block_cache(single_block_transformer, FirstBlockCacheConfig(...))IndexError.

Fix: Mirror MagCache's single-block path — apply head+tail hooks to the same block.

Validation: pytest tests/hooks/test_first_block_cache.py — 1 passed

Bug 3: FasterCache crashes when current_timestep_callback is omitted

Impact: Immediate TypeError: 'NoneType' object is not callable on first forward when using default weight callbacks.

Root cause: Default low_frequency_weight_callback / high_frequency_weight_callback call current_timestep_callback(), but apply_faster_cache never validates it (Pyramid Attention Broadcast does).

Trigger: apply_faster_cache(model, FasterCacheConfig(spatial_attention_block_skip_range=2)) without current_timestep_callback.

Fix: Raise ValueError at apply time if current_timestep_callback is missing.

Validation: pytest tests/hooks/test_faster_cache.py — 1 passed

Tooling

  • utils/check_copies.py: invoke ruff via python -m ruff so pre-commit copy checking works when ruff is not on PATH.

Existing open PRs (unchanged)

Open in Web View Automation 

cursoragent and others added 2 commits June 26, 2026 22:07
…h, FasterCache callback validation

- Load CPU-staged weights when offload_state_dict=True even without disk offloads
- Handle single-block transformers in apply_first_block_cache (mirror MagCache)
- Require current_timestep_callback in apply_faster_cache (match PAB behavior)
- Add regression tests for all three fixes

Co-authored-by: Simon Lynch <srlynch1@users.noreply.github.com>
Co-authored-by: Simon Lynch <srlynch1@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant