Fix offload_state_dict CPU staging, FirstBlockCache crash, FasterCache callback validation by cursor[bot] · Pull Request #17 · agentdevsl/diffusers

cursor · 2026-06-26T22:10:23Z

Summary

Fixes three critical correctness bugs found during automated review.

Bug 1: CPU-staged weights never loaded when `offload_state_dict=True` (without disk offloads)

Impact: Silent weight corruption — models loaded with device_map + offload_state_dict=True leave CPU-staged parameters uninitialized (random/meta values).

Root cause: load_offloaded_weights() was only called inside the disk-offload branch (offload_index non-empty). CPU-staged weights written to the temp folder via state_dict_index were never loaded back.

Trigger:

UNet2DModel.from_pretrained(path, device_map={"": "cpu"}, offload_state_dict=True, low_cpu_mem_usage=True)

Fix: Call load_offloaded_weights() whenever state_dict_index has entries, independent of disk offloads. Always clean up the temp folder.

Validation: pytest tests/models/test_offload_state_dict.py — 1 passed

Bug 2: FirstBlockCache crash on single-block transformers

Impact: IndexError when applying FirstBlockCache to transformers with exactly one block.

Root cause: apply_first_block_cache always pop(0) then pop(-1) without handling the single-block case (MagCache already handles this).

Trigger: apply_first_block_cache(single_block_transformer, FirstBlockCacheConfig(...)) → IndexError.

Fix: Mirror MagCache's single-block path — apply head+tail hooks to the same block.

Validation: pytest tests/hooks/test_first_block_cache.py — 1 passed

Bug 3: FasterCache crashes when `current_timestep_callback` is omitted

Impact: Immediate TypeError: 'NoneType' object is not callable on first forward when using default weight callbacks.

Root cause: Default low_frequency_weight_callback / high_frequency_weight_callback call current_timestep_callback(), but apply_faster_cache never validates it (Pyramid Attention Broadcast does).

Trigger: apply_faster_cache(model, FasterCacheConfig(spatial_attention_block_skip_range=2)) without current_timestep_callback.

Fix: Raise ValueError at apply time if current_timestep_callback is missing.

Validation: pytest tests/hooks/test_faster_cache.py — 1 passed

Tooling

utils/check_copies.py: invoke ruff via python -m ruff so pre-commit copy checking works when ruff is not on PATH.

Existing open PRs (unchanged)

PR Fix variant detection crash and parallel shard loading race #5: variant detection crash + parallel shard loading race
PR Fix swapped tensor order in TransformerBlockSkipHook for SkipLayerGuidance #7: TransformerBlockSkipHook swapped tensor order
PR Fix LayerSkipConfig shorthand crash in SkipLayerGuidance and AutoGuidance #10: LayerSkipConfig shorthand int vs list
PR Fix FasterCache crash and SmoothedEnergyGuidance hook install bugs #13: FasterCache None cache + SmoothedEnergyGuidance
PR Fix PAG int shorthand crash in LayerSkipConfig.indices #15: PAG int shorthand crash

…h, FasterCache callback validation - Load CPU-staged weights when offload_state_dict=True even without disk offloads - Handle single-block transformers in apply_first_block_cache (mirror MagCache) - Require current_timestep_callback in apply_faster_cache (match PAB behavior) - Add regression tests for all three fixes Co-authored-by: Simon Lynch <srlynch1@users.noreply.github.com>

Co-authored-by: Simon Lynch <srlynch1@users.noreply.github.com>

cursoragent and others added 2 commits June 26, 2026 22:07

Fix regression tests and invoke ruff via python -m in check_copies

906dd9f

Co-authored-by: Simon Lynch <srlynch1@users.noreply.github.com>

cursor Bot mentioned this pull request Jun 27, 2026

Fix modular pipeline crash when using stateful inference caches #19

Draft

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix offload_state_dict CPU staging, FirstBlockCache crash, FasterCache callback validation#17

Fix offload_state_dict CPU staging, FirstBlockCache crash, FasterCache callback validation#17
cursor[bot] wants to merge 2 commits into
mainfrom
cursor/critical-bug-management-829a

cursor Bot commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

cursor Bot commented Jun 26, 2026

Summary

Bug 1: CPU-staged weights never loaded when offload_state_dict=True (without disk offloads)

Bug 2: FirstBlockCache crash on single-block transformers

Bug 3: FasterCache crashes when current_timestep_callback is omitted

Tooling

Existing open PRs (unchanged)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Bug 1: CPU-staged weights never loaded when `offload_state_dict=True` (without disk offloads)

Bug 3: FasterCache crashes when `current_timestep_callback` is omitted