You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PromptRL is a framework that jointly trains language models (LMs) and flow-matching models (FMs) within a unified reinforcement learning loop for text-to-image generation. By incorporating LMs as adaptive prompt refiners, PromptRL addresses two critical limitations in current flow-based RL pipelines: exploration collapse due to insufficient generation diversity, and prompt overfitting where models memorize specific training formulations.
PromptRL achieves 2× sample efficiency compared to flow-only RL while obtains a adaptative prompt refinement agent to improve test-time performance.
Summary
Benchmark
Metric
PromptRL w/ PE
Best Baseline
GenEval
Avg. Score ↑
0.97
0.92 (FlowGRPO)
Aesthetic
PickScore ↑
24.05
23.63 (DiffusionNFT)
Aesthetic
HPS ↑
32.03
31.79 (DiffusionNFT)
OCR
OCR-1k ↑
0.98
0.89 (FlowGRPO)
Image Editing
EditReward Avg. ↑
1.43
1.44 (ReasonEdit-Think)
📊 GenEval Benchmark (Full Results)
Model
1 Obj.
2 Obj.
Cnt.
Clr.
Pos.
Attr.
Avg.↑
Show-o
0.95
0.52
0.49
0.82
0.11
0.28
0.53
Emu3-Gen
0.98
0.71
0.34
0.81
0.17
0.21
0.54
SD3 Medium
0.98
0.74
0.63
0.67
0.34
0.36
0.62
FLUX.1-dev
0.98
0.81
0.74
0.79
0.22
0.45
0.66
SD3.5 Large
0.98
0.89
0.73
0.83
0.34
0.47
0.71
JanusFlow
0.97
0.59
0.45
0.83
0.53
0.42
0.63
Janus-Pro-7B
0.99
0.89
0.59
0.90
0.79
0.66
0.80
HiDream
1.00
0.98
0.79
0.91
0.60
0.72
0.83
Seedream 3.0
0.99
0.96
0.91
0.93
0.47
0.80
0.84
Qwen-Image
0.99
0.92
0.89
0.88
0.76
0.77
0.87
RL-based
RePrompt
0.98
0.87
0.77
0.85
0.62
0.49
0.76
FlowGRPO
1.00
0.99
0.91
0.89
0.95
0.80
0.92
DiffusionNFT
1.00
0.98
0.74
0.92
0.85
0.80
0.88
PromptRL w/o PE
1.00
0.96
0.95
0.95
0.93
0.85
0.94
PromptRL w/ PE
1.00
0.99
0.99
0.96
0.99
0.90
0.97
🎨 Aesthetic & OCR Metrics (Full Results)
Model
P.S.
HPS
U.R.
OCR-1k
TMDB
OpenLib
SD1.5
20.92
23.71
2.00
0.05
0.13
0.08
SDXL
22.14
26.67
2.78
0.13
0.20
0.09
SD3 Medium
22.38
28.56
3.09
—
0.44
0.33
FLUX.1-schnell
22.64
29.39
3.25
0.54
0.66
0.50
FLUX.2-klein
22.79
29.03
3.29
0.55
0.22
0.46
Z-Image
20.14
28.22
3.51
0.70
0.71
0.83
Qwen-Image
23.05
30.40
3.53
0.65
0.79
0.94
Qwen-Image-2512
23.16
30.79
3.40
0.72
0.81
0.87
RL-based
FlowGRPO
23.33
29.80
3.33
0.89
0.83
0.73
DiffusionNFT
23.63
31.79
3.39
0.89
0.91
0.86
PromptRL w/o PE
24.01
31.79
3.38
0.97
0.92
0.95
PromptRL w/ PE
24.05
32.03
3.44
0.98
0.91
0.95
✏️ Image Editing - EditReward (Full Results)
Model
Swap
Style
Add.
Attr.
Env.
Removal
Avg.↑
InstructPix2Pix
-0.24
0.91
-0.45
0.45
0.48
-0.80
0.02
MagicBrush
-0.38
0.36
-0.78
-0.80
0.91
-0.85
-0.27
LEDITS++
-0.81
-0.32
-0.30
-0.60
-0.37
-0.97
-0.60
Qwen-Image-Edit
1.11
1.14
0.95
0.90
1.39
0.61
1.03
FLUX.2-klein
1.42
1.73
1.29
1.42
1.80
0.32
1.34
Nano Banana
1.58
1.20
1.28
1.18
1.61
1.13
1.37
Step1X-Edit
1.39
1.58
1.19
1.34
1.57
0.22
1.24
ReasonEdit
1.51
1.43
1.19
1.47
1.58
1.14
1.40
ReasonEdit-Think
1.52
1.47
1.19
1.44
1.69
1.27
1.44
FLUX.1-Kontext
1.35
1.36
1.16
1.15
1.44
0.55
1.19
FLUX.1-Kontext w/ PE
1.35
0.97
1.04
0.48
1.22
0.65
1.01
PromptRL w/o PE
1.45
1.46
1.28
1.35
1.56
0.98
1.36
PromptRL w/ PE
1.47
1.43
1.29
1.39
1.72
1.24
1.43
Citation
@article{wang2025promptrl,
title={PromptRL: Prompt Matters in RL for Flow-Based Image Generation},
author={Wang, Fu-Yun and Zhang, Han and Gharbi, Michael and Li, Hongsheng and Park, Taesung},
journal={arXiv preprint arXiv:2602.01382},
year={2026}
}
@article{wang2025unirl,
title={UniRL-Zero: Reinforcement Learning on Unified Models with Joint Language Model and Diffusion Model Experts},
author={Wang, Fu-Yun and Zhang, Han and Gharbi, Michael and Li, Hongsheng and Park, Taesung},
journal={arXiv preprint arXiv:2510.17937},
year={2025}
}