diff --git a/docs/ideation/2026-06-23-teacache.md b/docs/ideation/2026-06-23-teacache.md new file mode 100644 index 0000000..66a67bb --- /dev/null +++ b/docs/ideation/2026-06-23-teacache.md @@ -0,0 +1,174 @@ +--- +date: 2026-06-23 +status: active +topic: teacache +focus: GitHub issue #12589 — implement TeaCache in diffusers +mode: repo-grounded +--- + +# Ideation: TeaCache for diffusers + +**Recommendation:** Ship **FLUX-only TeaCache** using the existing **MagCache block-hook scaffold** (`src/diffusers/hooks/mag_cache.py`), wired through `CacheMixin.enable_cache()`. Add other architectures incrementally as copy-paste blocks with model-specific polynomial coefficients — do not land the full four-model monolith from PR #12652 as-is. + +**Source issue:** [#12589 — implement TeaCache](https://github.com/huggingface/diffusers/issues/12589) (3 upvotes, on roadmap, contributions-welcome) + +**In-flight work:** [PR #12652](https://github.com/huggingface/diffusers/pull/12652) (open, ~1085 lines, FLUX/Mochi/Lumina2/CogVideoX; maintainer feedback pending) + +## Grounding Context + +### Codebase Context + +Diffusers already ships several training-free inference caches under `src/diffusers/hooks/`, all reachable via `CacheMixin.enable_cache()` in `src/diffusers/models/cache_utils.py`: + +| Technique | File | Skip signal | What gets reused | +|-----------|------|-------------|------------------| +| FirstBlockCache | `first_block_cache.py` | First-block output residual delta (absmean ratio) | Tail-block residuals replayed through middle blocks | +| MagCache | `mag_cache.py` | Precomputed `mag_ratios` + accumulated ratio error | Full-stack residual at head (`input + previous_residual`) | +| FasterCache | `faster_cache.py` | Timestep-indexed attention approximation | Cached attention states (CFG-aware denoiser hook) | +| TaylorSeer | `taylorseer_cache.py` | Fixed cache interval + Taylor expansion | Predicted module outputs | + +**FirstBlockCache explicitly cites TeaCache** as inspiration but implements a simpler, model-agnostic heuristic: + +```199:200:src/diffusers/hooks/first_block_cache.py + First Block Cache builds on the ideas of [TeaCache](https://huggingface.co/papers/2411.19108). It is much simpler + to implement generically for a wide range of models and has been integrated first for experimental purposes. +``` + +**MagCache** is the closest structural prior art: head/tail block hooks, `TransformerBlockRegistry` for block I/O, `StateManager` for cross-step state, and model-specific constants (`FLUX_MAG_RATIOS` in `mag_cache.py:35-66`). + +**TeaCache algorithm delta (paper / issue):** extract timestep-modulated input from the first transformer block, compute relative L1 distance vs the previous step, apply model-specific polynomial rescaling, accumulate across steps, and skip full forward when accumulated distance < threshold — reusing cached residuals instead. + +**Issue thread consensus:** +- Hooks-based integration (like FasterCache), not monkey-patching model code +- Prototype on **FLUX first** (sayakpaul); contributor opened PR #12652 +- Maintainer DN6: prefer **standalone forward functions** keyed by class name in a `_MODEL_CONFIG` map, utility functions for cache state — avoid adapter indirection + +**Test precedent:** `tests/hooks/test_mag_cache.py` (dummy transformer + `TransformerBlockRegistry`, skip vs compute assertions) and `tests/models/testing_utils/cache.py` (`MagCacheTesterMixin` for pipeline integration). + +## Topic Axes + +1. **Hook placement** — block-level head/tail hooks (MagCache/FBC pattern) vs transformer-root forward interception (PR #12652 approach) +2. **Model scope** — FLUX-only MVP vs multi-model day one +3. **Algorithm fidelity** — true TeaCache metric (polynomial-rescaled modulated-input L1) vs extending existing caches +4. **Landing path** — finish PR #12652 vs fresh minimal PR vs docs-only deferral +5. **Validation** — unit hook tests vs pipeline speed/quality benchmarks against paper claims (1.5–2.6×) + +### How existing caches relate to TeaCache + +```mermaid +flowchart LR + subgraph signal["Skip decision signal"] + TC["TeaCache\nmodulated-input L1\n+ polynomial rescale\n+ accumulate"] + FBC["FirstBlockCache\nfirst-block residual delta"] + MC["MagCache\nmag_ratios budget"] + end + subgraph reuse["Reuse mechanism"] + RES["Cached residual\ninput + previous_residual"] + TAIL["Tail residuals\nthrough middle blocks"] + end + TC --> RES + MC --> RES + FBC --> TAIL +``` + +TeaCache shares **reuse shape** with MagCache but **decision logic** is distinct — a wrapper merging the two would hide unlike policies. + +## Ranked Ideas + +Jump list: [1. FLUX-only via MagCache scaffold](#1-flux-only-teacache-via-mag_cache-block-hook-pattern-recommended) · [2. Revise PR #12652 FLUX slice](#2-revise-and-land-pr-12652--flux-slice-only) · [3. Extend FirstBlockCache](#3-add-teacache-metric-to-firstblockcache-as-opt-in-mode) · [4. Unblock #12652 with benchmark gate](#4-unblock-pr-12652-with-maintainer-pairing--benchmark-gate) · [5. Document cache relationships](#5-document-teacache-relationship-in-cache-docs--ship-flux-example) + +### 1. FLUX-only TeaCache via `mag_cache` block-hook pattern *(recommended)* + +**Description:** Add `TeaCacheConfig` + `apply_teacache()` in a new self-contained `src/diffusers/hooks/teacache.py` (~300–400 lines for v1). Copy the head/tail hook skeleton from `mag_cache.py` (lines 171–441): walk `_ALL_TRANSFORMER_BLOCK_IDENTIFIERS`, register head hook for skip decision + residual replay, middle/tail hooks for pass-through or residual capture. Replace MagCache's ratio-budget logic with TeaCache's polynomial-rescaled modulated-input L1 accumulator. Ship FLUX polynomial coefficients and a FLUX modulated-input extractor only; raise `ValueError` for unsupported model classes. + +**Axis:** Hook placement · Model scope · Algorithm fidelity + +**Basis:** `direct:` `mag_cache.py:171-280` (head skip + residual replay), `first_block_cache.py:199-200` (TeaCache lineage), `cache_utils.py:39-102` (`enable_cache` dispatch pattern); `external:` DN6 review on PR #12652 (standalone functions, class-name map) + +**Rationale:** Matches diffusers' single-file hook convention, keeps model forwards in model files (not copied into hooks), delivers the real TeaCache algorithm for the maintainer-preferred prototype model, and leaves a clear incremental path to add CogVideoX/Wan/etc. as separate copy-paste coefficient blocks. + +**Downsides:** Only FLUX on day one; still requires a FLUX-specific modulated-input extraction path (cannot be fully model-agnostic). + +**Confidence:** 85% + +**Complexity:** Medium + +### 2. Revise and land PR #12652 — FLUX slice only + +**Description:** Take the existing contributor PR (#12652, +1085 lines, tests in `tests/hooks/test_teacache.py`), strip Mochi/Lumina2/CogVideoX paths, apply DN6's refactor (standalone utility functions, `_MODEL_CONFIG` keyed by class name, no adapter indirection), and merge FLUX + tests. Defer additional models to follow-up PRs. + +**Axis:** Landing path + +**Basis:** `external:` PR #12652 file list (`hooks/teacache.py`, `tests/hooks/test_teacache.py`, `cache_utils.py` wiring); issue comment — "prototype on flux first" (sayakpaul) + +**Rationale:** Fastest path to close #12589 if the contributor remains active; reuses months of iteration including bugfixes (CogVideoX fallback, `torch.compile`, state management). + +**Downsides:** Large diff to review; risk that refactor is shallow and full `FluxTransformer2DModel.forward()` copies remain in the hook layer — the main philosophy objection to landing as-is. + +**Confidence:** 70% + +**Complexity:** Medium–High + +### 3. Add TeaCache metric to FirstBlockCache as opt-in mode + +**Description:** Extend `FirstBlockCacheConfig` with an optional TeaCache mode: when enabled, the head hook compares polynomial-rescaled modulated-input L1 instead of raw residual absmean. Models register an extractor callback alongside existing `TransformerBlockRegistry` metadata. + +**Axis:** Algorithm fidelity · Hook placement + +**Basis:** `direct:` `first_block_cache.py:133-142` (residual comparison hook point), `first_block_cache.py:199-200` (already TeaCache-inspired) + +**Rationale:** One cache API surface; reuses the generic block-walk that already works across many `CacheMixin` transformers. + +**Downsides:** Blurs FirstBlockCache vs TeaCache semantics in one config; still needs per-model extractors for true fidelity; increases complexity of an intentionally simple cache. + +**Confidence:** 60% + +**Complexity:** Medium + +### 4. Unblock PR #12652 with maintainer pairing + benchmark gate + +**Description:** Treat #12589 as a coordination task: assign a maintainer co-reviewer, define a FLUX benchmark table (steps, threshold, speedup vs quality metric), and merge #12652 once it passes. No greenfield implementation. + +**Axis:** Landing path · Validation + +**Basis:** `external:` issue body — "propose a design first in this thread"; PR open since Nov 2025 with design feedback but no merge + +**Rationale:** Respects contributor investment; converts stale issue into an actionable review queue item with measurable acceptance criteria. + +**Downsides:** Process-only — depends on maintainer bandwidth; does not resolve architectural concerns if review stalls again. + +**Confidence:** 75% + +**Complexity:** Low (coordination) + +### 5. Document TeaCache relationship in cache docs + ship FLUX example + +**Description:** Update `docs/source/en/optimization/cache.md` (and cross-links from `CacheMixin` docstring) to explain when to use FirstBlockCache vs MagCache vs TeaCache, with a FLUX `enable_cache(TeaCacheConfig(...))` example. Pair with whichever implementation option (1 or 2) ships code. + +**Axis:** Validation · Landing path + +**Basis:** `direct:` `cache_utils.py:27-31` (supported techniques list); existing cache optimization docs referenced in issue body + +**Rationale:** Users searching for "TeaCache" need a named entry point; docs clarify that FBC is TeaCache-*inspired* but not TeaCache-identical. + +**Downsides:** Documentation alone does not close #12589. + +**Confidence:** 90% + +**Complexity:** Low + +## Rejection Summary + +| # | Idea | Reason Rejected | +|---|------|-----------------| +| 1 | Land PR #12652 as-is (4 models) | ~1085-line monolith with copied transformer forwards in hook layer; violates single-file/self-contained philosophy | +| 2 | TeaCache wrapper delegating to MagCache | Different skip algorithms (polynomial L1 vs mag-ratio budget); magic facade hides unlike policies | +| 3 | Close issue — FBC/MagCache sufficient | Under-delivers on named TeaCache request and `roadmap` label | +| 4 | `hooks/teacache/` subdirectory package | Extra structure vs established flat `hooks/*.py` convention | +| 5 | External TeaCache package only | Misses `CacheMixin.enable_cache()` integration users expect | +| 6 | Modular-pipeline-only TeaCache | Narrow surface; standard pipeline users left out | +| 7 | FasterCache denoiser-hook clone | FasterCache targets CFG/uncond branch skip, not TeaCache residual metric | +| 8 | Multi-model day-one in fresh PR | High review cost; maintainers asked for agnostic *structure*, not four models at once | +| 9 | CogVideoX-first prototype | Maintainers preferred FLUX (original repo results + popularity) | +| 10 | Auto-detect all CacheMixin models | Too magic; each architecture needs explicit polynomial coefficients | +| - | axis: Hook placement — root forward only | Block hooks are the established pattern in `mag_cache.py`; root-forward copies belong in model files, not hooks |