Fix offload_state_dict CPU staging, FirstBlockCache crash, FasterCache callback validation#17
Draft
cursor[bot] wants to merge 2 commits into
Draft
Fix offload_state_dict CPU staging, FirstBlockCache crash, FasterCache callback validation#17cursor[bot] wants to merge 2 commits into
cursor[bot] wants to merge 2 commits into
Conversation
…h, FasterCache callback validation - Load CPU-staged weights when offload_state_dict=True even without disk offloads - Handle single-block transformers in apply_first_block_cache (mirror MagCache) - Require current_timestep_callback in apply_faster_cache (match PAB behavior) - Add regression tests for all three fixes Co-authored-by: Simon Lynch <srlynch1@users.noreply.github.com>
Co-authored-by: Simon Lynch <srlynch1@users.noreply.github.com>
1 task
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes three critical correctness bugs found during automated review.
Bug 1: CPU-staged weights never loaded when
offload_state_dict=True(without disk offloads)Impact: Silent weight corruption — models loaded with
device_map+offload_state_dict=Trueleave CPU-staged parameters uninitialized (random/meta values).Root cause:
load_offloaded_weights()was only called inside the disk-offload branch (offload_indexnon-empty). CPU-staged weights written to the temp folder viastate_dict_indexwere never loaded back.Trigger:
Fix: Call
load_offloaded_weights()wheneverstate_dict_indexhas entries, independent of disk offloads. Always clean up the temp folder.Validation:
pytest tests/models/test_offload_state_dict.py— 1 passedBug 2: FirstBlockCache crash on single-block transformers
Impact:
IndexErrorwhen applying FirstBlockCache to transformers with exactly one block.Root cause:
apply_first_block_cachealwayspop(0)thenpop(-1)without handling the single-block case (MagCache already handles this).Trigger:
apply_first_block_cache(single_block_transformer, FirstBlockCacheConfig(...))→IndexError.Fix: Mirror MagCache's single-block path — apply head+tail hooks to the same block.
Validation:
pytest tests/hooks/test_first_block_cache.py— 1 passedBug 3: FasterCache crashes when
current_timestep_callbackis omittedImpact: Immediate
TypeError: 'NoneType' object is not callableon first forward when using default weight callbacks.Root cause: Default
low_frequency_weight_callback/high_frequency_weight_callbackcallcurrent_timestep_callback(), butapply_faster_cachenever validates it (Pyramid Attention Broadcast does).Trigger:
apply_faster_cache(model, FasterCacheConfig(spatial_attention_block_skip_range=2))withoutcurrent_timestep_callback.Fix: Raise
ValueErrorat apply time ifcurrent_timestep_callbackis missing.Validation:
pytest tests/hooks/test_faster_cache.py— 1 passedTooling
utils/check_copies.py: invoke ruff viapython -m ruffso pre-commit copy checking works whenruffis not on PATH.Existing open PRs (unchanged)