GitHub - G-U-N/UniRL: a unified reinforcement learning toolbox for joint RL on language models and diffusion models

PromptRL

Overview

PromptRL is a framework that jointly trains language models (LMs) and flow-matching models (FMs) within a unified reinforcement learning loop for text-to-image generation. By incorporating LMs as adaptive prompt refiners, PromptRL addresses two critical limitations in current flow-based RL pipelines: exploration collapse due to insufficient generation diversity, and prompt overfitting where models memorize specific training formulations.

Installation

conda env create -f environment.yml
conda activate unirl
pip install git+https://github.com/openai/CLIP.git
pip install git+https://github.com/huggingface/diffusers.git
pip install flash-attn==2.7.4.post1 --no-build-isolation

# run gen.sh for evaluation
# bash gen.sh

Qualitative Results

Text-to-Image Generation

Instructional Image Editing

Key Results

PromptRL achieves 2× sample efficiency compared to flow-only RL while obtains a adaptative prompt refinement agent to improve test-time performance.

Summary

Benchmark	Metric	PromptRL w/ PE	Best Baseline
GenEval	Avg. Score ↑	0.97	0.92 (FlowGRPO)
Aesthetic	PickScore ↑	24.05	23.63 (DiffusionNFT)
Aesthetic	HPS ↑	32.03	31.79 (DiffusionNFT)
OCR	OCR-1k ↑	0.98	0.89 (FlowGRPO)
Image Editing	EditReward Avg. ↑	1.43	1.44 (ReasonEdit-Think)

📊 GenEval Benchmark (Full Results)

Model	1 Obj.	2 Obj.	Cnt.	Clr.	Pos.	Attr.	Avg.↑
Show-o	0.95	0.52	0.49	0.82	0.11	0.28	0.53
Emu3-Gen	0.98	0.71	0.34	0.81	0.17	0.21	0.54
SD3 Medium	0.98	0.74	0.63	0.67	0.34	0.36	0.62
FLUX.1-dev	0.98	0.81	0.74	0.79	0.22	0.45	0.66
SD3.5 Large	0.98	0.89	0.73	0.83	0.34	0.47	0.71
JanusFlow	0.97	0.59	0.45	0.83	0.53	0.42	0.63
Janus-Pro-7B	0.99	0.89	0.59	0.90	0.79	0.66	0.80
HiDream	1.00	0.98	0.79	0.91	0.60	0.72	0.83
Seedream 3.0	0.99	0.96	0.91	0.93	0.47	0.80	0.84
Qwen-Image	0.99	0.92	0.89	0.88	0.76	0.77	0.87
RL-based
RePrompt	0.98	0.87	0.77	0.85	0.62	0.49	0.76
FlowGRPO	1.00	0.99	0.91	0.89	0.95	0.80	0.92
DiffusionNFT	1.00	0.98	0.74	0.92	0.85	0.80	0.88
PromptRL w/o PE	1.00	0.96	0.95	0.95	0.93	0.85	0.94
PromptRL w/ PE	1.00	0.99	0.99	0.96	0.99	0.90	0.97

🎨 Aesthetic & OCR Metrics (Full Results)

Model	P.S.	HPS	U.R.	OCR-1k	TMDB	OpenLib
SD1.5	20.92	23.71	2.00	0.05	0.13	0.08
SDXL	22.14	26.67	2.78	0.13	0.20	0.09
SD3 Medium	22.38	28.56	3.09	—	0.44	0.33
FLUX.1-schnell	22.64	29.39	3.25	0.54	0.66	0.50
FLUX.2-klein	22.79	29.03	3.29	0.55	0.22	0.46
Z-Image	20.14	28.22	3.51	0.70	0.71	0.83
Qwen-Image	23.05	30.40	3.53	0.65	0.79	0.94
Qwen-Image-2512	23.16	30.79	3.40	0.72	0.81	0.87
RL-based
FlowGRPO	23.33	29.80	3.33	0.89	0.83	0.73
DiffusionNFT	23.63	31.79	3.39	0.89	0.91	0.86
PromptRL w/o PE	24.01	31.79	3.38	0.97	0.92	0.95
PromptRL w/ PE	24.05	32.03	3.44	0.98	0.91	0.95

✏️ Image Editing - EditReward (Full Results)

Model	Swap	Style	Add.	Attr.	Env.	Removal	Avg.↑
InstructPix2Pix	-0.24	0.91	-0.45	0.45	0.48	-0.80	0.02
MagicBrush	-0.38	0.36	-0.78	-0.80	0.91	-0.85	-0.27
LEDITS++	-0.81	-0.32	-0.30	-0.60	-0.37	-0.97	-0.60
Qwen-Image-Edit	1.11	1.14	0.95	0.90	1.39	0.61	1.03
FLUX.2-klein	1.42	1.73	1.29	1.42	1.80	0.32	1.34
Nano Banana	1.58	1.20	1.28	1.18	1.61	1.13	1.37
Step1X-Edit	1.39	1.58	1.19	1.34	1.57	0.22	1.24
ReasonEdit	1.51	1.43	1.19	1.47	1.58	1.14	1.40
ReasonEdit-Think	1.52	1.47	1.19	1.44	1.69	1.27	1.44
FLUX.1-Kontext	1.35	1.36	1.16	1.15	1.44	0.55	1.19
FLUX.1-Kontext w/ PE	1.35	0.97	1.04	0.48	1.22	0.65	1.01
PromptRL w/o PE	1.45	1.46	1.28	1.35	1.56	0.98	1.36
PromptRL w/ PE	1.47	1.43	1.29	1.39	1.72	1.24	1.43

Citation

@article{wang2025promptrl,
  title={PromptRL: Prompt Matters in RL for Flow-Based Image Generation},
  author={Wang, Fu-Yun and Zhang, Han and Gharbi, Michael and Li, Hongsheng and Park, Taesung},
  journal={arXiv preprint arXiv:2602.01382},
  year={2026}
}

@article{wang2025unirl,
  title={UniRL-Zero: Reinforcement Learning on Unified Models with Joint Language Model and Diffusion Model Experts},
  author={Wang, Fu-Yun and Zhang, Han and Gharbi, Michael and Li, Hongsheng and Park, Taesung},
  journal={arXiv preprint arXiv:2510.17937},
  year={2025}
}

Acknowledgments

This codebase builds upon UniRL-Zero.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
assets		assets
prompts		prompts
unimodel		unimodel
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
eval.py		eval.py
gen.sh		gen.sh
requirements.txt		requirements.txt
unified_inference.py		unified_inference.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Installation

Qualitative Results

Text-to-Image Generation

Instructional Image Editing

Key Results

Summary

Citation

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

Overview

Installation

Qualitative Results

Text-to-Image Generation

Instructional Image Editing

Key Results

Summary

Citation

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages