Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions docs/source/en/api/cache.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,3 +46,11 @@ Cache methods speedup diffusion transformers by storing and reusing intermediate
[[autodoc]] MagCacheConfig

[[autodoc]] apply_mag_cache

## TeaCacheConfig

[[autodoc]] TeaCacheConfig

[[autodoc]] apply_teacache

[[autodoc]] FLUX_TEACACHE_COEFFICIENTS
30 changes: 30 additions & 0 deletions docs/source/en/optimization/cache.md
Original file line number Diff line number Diff line change
Expand Up @@ -163,3 +163,33 @@ image = pipe("A cat playing chess", num_inference_steps=4).images[0]

> [!TIP]
> For pipelines that run Classifier-Free Guidance in a **batched** manner (like SDXL or Flux), the `hidden_states` processed by the model contain both conditional and unconditional branches concatenated together. The calibration process automatically accounts for this, producing a single array of ratios that represents the joint behavior. You can use this resulting array directly without modification.

## TeaCache

[TeaCache](https://huggingface.co/papers/2411.19108) accelerates FLUX inference by skipping the full transformer block stack when consecutive timestep-modulated inputs are sufficiently similar. At each denoising step, TeaCache extracts the modulated input at the first transformer block, computes a relative L1 distance from the previous step, rescales it with model-specific polynomial coefficients, and accumulates the result. When the accumulated distance stays below `rel_l1_thresh`, the hook replays the cached full-stack residual instead of running the blocks. The first and last denoising steps always compute.

```python
import torch
from diffusers import FluxPipeline, TeaCacheConfig

pipe = FluxPipeline.from_pretrained(
"black-forest-labs/FLUX.1-dev",
torch_dtype=torch.bfloat16,
).to("cuda")

config = TeaCacheConfig(rel_l1_thresh=0.4, num_inference_steps=28)
pipe.transformer.enable_cache(config)

image = pipe("A cat playing chess", num_inference_steps=28).images[0]
```

> [!NOTE]
> TeaCache v1 supports [`FluxTransformer2DModel`] only. FLUX polynomial coefficients are vendored as `FLUX_TEACACHE_COEFFICIENTS` in `diffusers.hooks.teacache`.

### Cache comparison

| Technique | Skip signal | Residual replay | FLUX support |
| --- | --- | --- | --- |
| FirstBlockCache | First-block output delta vs previous step | Reuses tail output from first block | Model-agnostic |
| MagCache | Pre-computed `mag_ratios` error budget on residual magnitude | Full-stack `input + previous_residual` | FLUX (with calibration) |
| TeaCache | Polynomial-rescaled relative L1 on modulated input | Full-stack `input + previous_residual` | FLUX only (v1) |
6 changes: 6 additions & 0 deletions src/diffusers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -177,19 +177,22 @@
[
"FasterCacheConfig",
"FirstBlockCacheConfig",
"FLUX_TEACACHE_COEFFICIENTS",
"HookRegistry",
"LayerSkipConfig",
"MagCacheConfig",
"PyramidAttentionBroadcastConfig",
"SmoothedEnergyGuidanceConfig",
"TaylorSeerCacheConfig",
"TeaCacheConfig",
"TextKVCacheConfig",
"apply_faster_cache",
"apply_first_block_cache",
"apply_layer_skip",
"apply_mag_cache",
"apply_pyramid_attention_broadcast",
"apply_taylorseer_cache",
"apply_teacache",
"apply_text_kv_cache",
]
)
Expand Down Expand Up @@ -1039,19 +1042,22 @@
from .hooks import (
FasterCacheConfig,
FirstBlockCacheConfig,
FLUX_TEACACHE_COEFFICIENTS,
HookRegistry,
LayerSkipConfig,
MagCacheConfig,
PyramidAttentionBroadcastConfig,
SmoothedEnergyGuidanceConfig,
TaylorSeerCacheConfig,
TeaCacheConfig,
TextKVCacheConfig,
apply_faster_cache,
apply_first_block_cache,
apply_layer_skip,
apply_mag_cache,
apply_pyramid_attention_broadcast,
apply_taylorseer_cache,
apply_teacache,
apply_text_kv_cache,
)
from .image_processor import (
Expand Down
1 change: 1 addition & 0 deletions src/diffusers/hooks/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,4 +27,5 @@
from .pyramid_attention_broadcast import PyramidAttentionBroadcastConfig, apply_pyramid_attention_broadcast
from .smoothed_energy_guidance_utils import SmoothedEnergyGuidanceConfig
from .taylorseer_cache import TaylorSeerCacheConfig, apply_taylorseer_cache
from .teacache import FLUX_TEACACHE_COEFFICIENTS, TeaCacheConfig, apply_teacache
from .text_kv_cache import TextKVCacheConfig, apply_text_kv_cache
Loading