Add PixArt-Alpha modular pipeline#14087
Open
cgloriacc wants to merge 1 commit into
Open
Conversation
This ports the PixArt-Alpha text-to-image pipeline into the Modular Diffusers framework. It adds the text-encoder, before-denoise, denoise (guider), and decode blocks, assembles them into PixArtAlphaAutoBlocks and PixArtAlphaModularPipeline, and adds a pipeline-level test.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
This ports the PixArt-Alpha text-to-image pipeline into Modular Diffusers, following the same structure as the existing
qwenimageandstable_diffusion_3modular pipelines.New block files under
src/diffusers/modular_pipelines/pixart_alpha/:encoders.py— the T5 text-encoder step. It emits the prompt embeddings and attention mask, plus the negative pair when the guider needs classifier-free guidance, and cleans captions with bs4/ftfy.before_denoise.py— per-prompt input expansion, timestep setup, latent preparation, and PixArt micro-conditions. Resolution and aspect-ratio conditions are emitted only when the model's sample size is 128.denoise.py— the denoise loop built on the guider abstraction. It also handles the PixArt learned-sigma split, taking the first chunk when out_channels is twice in_channels.decoders.py— VAE decode and image post-processing.modular_blocks_pixart_alpha.pyandmodular_pipeline.py— the blocks assembled intoPixArtAlphaAutoBlocksandPixArtAlphaModularPipeline, registered in the modular pipeline mapping.A pipeline-level test is added under
tests/modular_pipelines/pixart_alpha/.Coordination
Fixes #13301. This is tracked under the Modular Diffusers umbrella #13295, which @sayakpaul approved.
Tests run
Run on CPU with
CUDA_VISIBLE_DEVICES="", which matches the CPU container the modular fast-test CI uses inpr_modular_tests.yml.Full pytest output — 11 passed, 3 skipped
Why the 3 skips. test_to_device, test_inference_is_not_nan, and test_components_auto_cpu_offload_inference_consistent all carry the
@require_acceleratordecorator, so they skip on a CPU-only run. This is expected because the modular fast-test CI runs on CPU. Thenot-NaN check still runs through its CPU companion test_inference_is_not_nan_cpu, which passed.
Why the warnings, none of which come from this PR's code. The SwigPyPacked/SwigPyObject warnings come from a SWIG-based dependency at import time on Python 3.10. The local_dir_use_symlinks warning comes from huggingface_hub while the test downloads the tiny
checkpoint. The numpy 2.0 array warning is raised inside the existing scheduling_dpmsolver_multistep.py scheduler; it is pre-existing and surfaces only because the pipeline uses DPMSolverMultistepScheduler.
The repo-consistency and quality checks that
pr_modular_tests.ymlruns also pass locally:Notes
Before submitting
.ai/review-rules.md?Who can review?
@yiyixuxu @sayakpaul @asomoza