Autoregressive Diffusion Techniques such as Self Forcing rely on a rolling KV Cache across video frame chunks to transfer information from past context frames to the current frames being denoised.
This rolling KV Cache design (or variants similar to it) is likely to show up in other types of long video generation/ world models, so it would be good to see if we can support it natively in Diffusers.
Tasks
- Implement rolling KV Cache seen in Self Forcing using Diffusers' cache hooks design.
- Add a Modular Block to Wan Modular Pipelines that uses this rolling KV Cache to perform autoregressive inference.
Autoregressive Diffusion Techniques such as Self Forcing rely on a rolling KV Cache across video frame chunks to transfer information from past context frames to the current frames being denoised.
This rolling KV Cache design (or variants similar to it) is likely to show up in other types of long video generation/ world models, so it would be good to see if we can support it natively in Diffusers.
Tasks