Skip to content

Speed up FlowMatchEulerDiscreteScheduler index_for_timestep (#9417)#2

Open
srlynch1 wants to merge 1 commit into
mainfrom
e2e/diffusers-9417
Open

Speed up FlowMatchEulerDiscreteScheduler index_for_timestep (#9417)#2
srlynch1 wants to merge 1 commit into
mainfrom
e2e/diffusers-9417

Conversation

@srlynch1

@srlynch1 srlynch1 commented Jun 21, 2026

Copy link
Copy Markdown
Owner

Summary

  • Vectorize FlowMatchEulerDiscreteScheduler.index_for_timestep to remove per-element nonzero() calls in the scale_noise training hot path
  • Replace list comprehension in scale_noise with batched index lookup
  • Add parity tests (shift=1.0 and shift=3.0) and training batch speedup test

Resolves huggingface#9417 (eval e2e run 2026-06-21-r2).

Test plan

  • pytest tests/schedulers/test_scheduler_flow_match_euler_discrete.py (4/4 pass)
  • ruff check / ruff format --check on changed files
  • python utils/check_copies.py (0 drift)

Note

Low Risk
Changes are localized to one scheduler’s index lookup and noise scaling; behavior is guarded by parity tests against the prior implementation.

Overview
Speeds up training-style batch calls to FlowMatchEulerDiscreteScheduler.scale_noise by replacing per-timestep index lookups with a single batched path.

index_for_timestep now resolves schedule indices with vectorized equality/argmax over a 1-D timestep tensor (still returning a scalar int for scalar inputs), including the existing rule that picks the second matching index when a timestep appears more than once. scale_noise calls that helper once for the full batch instead of building indices in a Python loop.

New scheduler tests compare against a legacy nonzero() reference for several shift settings, verify batched scale_noise output matches the old behavior, and assert the optimized path is faster on a large training-like batch.

Reviewed by Cursor Bugbot for commit 3feb4d4. Bugbot is set up for automated code reviews on this repo. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Suggestion for speeding up index_for_timestep by removing sequential nonzero() calls in samplers

1 participant