Custom ComfyUI nodes for QwenImage ControlNet and some other QoL nodes, designed to achieve 100% output compatibility with VideoX-Fun's diffusers pipeline while leveraging ComfyUI's efficient model loading system.
We integrate with ComfyUI's model loading nodes (Load Diffusion Model, Load CLIP, Load VAE) but use our own sampler and conditioning nodes. This approach was chosen because:
- ComfyUI's model loading is highly optimized - fast loading, memory efficient, supports quantized models (fp8, GGUF)
- VideoX's sampling pipeline has specific requirements - custom RoPE calculation, True CFG with norm rescaling, and packed 3D latent format that differ from ComfyUI's standard sampler
- Exact output matching - by replicating VideoX's exact forward logic while using ComfyUI's loaded weights, we achieve near identical outputs with the same seed
Our nodes act as a bridge: ComfyUI handles the heavy lifting of model management, while we ensure the inference process matches VideoX exactly.
- VideoX-Fun - The original QwenImage ControlNet implementation. Our pipeline logic is derived from their excellent work.
- ComfyUI - The powerful and modular diffusion model GUI that makes this integration possible.
-
Prerequisites - Install these custom node packs first:
- VideoX-Fun - Required for model components and utilities
- ComfyUI-GGUF - Required if using GGUF quantized models
-
Install ComfyUI-Gen2:
cd ComfyUI/custom_nodes git clone https://github.com/petmycat/ComfyUI-gen2.git -
Tokenizer - Download from Qwen-Image-2512 on HuggingFace:
- Navigate to the model's files and download all files from the
tokenizer/folder - Place them in:
ComfyUI/models/gen2/qwen_2512_tokenizer/ - Navigate to the model's files and download all files from the
Example workflow and reference images are located in:
workflows/qwen_control_example_workflow.json- Example ComfyUI workflowassets/- Reference images for testing (example (1).png, example (2).png)
| Node | Description |
|---|---|
| Gen2 Load QwenImage ControlNet | Load ControlNet weights |
| Gen2 Load QwenImage VAE | Load VAE with VideoX-compatible config |
| Gen2 Apply QwenImage ControlNet | Prepare control context and wrap model |
| Gen2 QwenImage Text Encode | VideoX-style text encoding (use instead of CLIPTextEncode) |
| Gen2 Load QwenImage LoRA | Load LoRA for VideoX-style merging |
| Gen2 QwenImage Control Sampler | VideoX-compatible sampling with True CFG |
| Node | Description |
|---|---|
| Gen2 DWpose with Threshold | DWpose detector with configurable confidence thresholds for body/hand/face keypoints |
| Gen2 StringReplace | Replace all occurrences of a search string with a replacement string (case-sensitive) |
| Gen2 Checkerboard | Generate a checkerboard pattern image (1px black & white squares) at specified width × height |
Supports multiple precision modes:
- bf16/fp16 - Full precision models
- fp8 - Quantized models (automatic compute dtype detection)
- GGUF - Quantized models via ComfyUI-GGUF
- Add node parameter explanations for better user support (document what each parameter does in every node)
- Integrate custom Load VAE node into ComfyUI system and add latent image input to sampler node
- Decouple ControlNet node and sampler node
- Add start and end step parameters to sampler node
- Reorganize code for better maintenance — split into
qwenimage/(core + nodes) andmisc_nodes/(pose, string utils)
This project is licensed under the Apache License 2.0. It also follows the licensing requirements of its dependencies (VideoX-Fun, ComfyUI).