Batch wgpu submit and present across immediate viewports#7961
Batch wgpu submit and present across immediate viewports#7961gcailly wants to merge 2 commits intoemilk:mainfrom
Conversation
Split `paint_and_update_textures` into three phases (`paint_prepare`, `paint_submit`, `paint_present`) so that immediate viewports can accumulate their prepared frames and submit them in a single `queue.submit()` call instead of one per viewport. This reduces frame time with many immediate viewports by eliminating redundant GPU synchronization between each viewport's submit+present cycle. Closes emilk#7885
|
Preview available at https://egui-pr-preview.github.io/pr/7961-fixviewport-perf View snapshot changes at kitdiff |
|
Thanks for this awesome optimization! The results are impressive — 3x FPS improvement with 10 viewports is a huge win. Hope this gets reviewed and merged soon. Keep up the great work! 🚀 |
|
Thanks @liusuchao! I have to be honest : I was a bit too optimistic with the initial numbers. After more thorough benchmarking (see updated PR description), the real-world gain is closer to 30/40% rather than the 3x improvement I initially reported. |
|
@gcailly Hey, I think there might still be two errors or warnings left. |
- Fix broken intra-doc links (`[\`Self::paint_prepare\`]`, etc.) - Use type-state: `paint_submit(Vec<PreparedFrame>) -> Vec<SubmittedFrame>`, so the compiler enforces prepare → submit → present - Flatten `SubmitData` / `PresentData` into `PreparedFrame`, remove duplicate `viewport_id`, merge encoded + user command buffers into a single `Vec` - Make fields private, expose `viewport_id()` / `vsync_sec()` accessors, add `#[must_use]` on both frame types - Preallocate `all_cmd_bufs` with total capacity in `paint_submit` - Restore emilk#7928 flush: submit `[]` on the missing-surface early return - Measure submit + present durations again in `vsync_sec` (regressed by the split); `paint_present` now returns total present time - Drop narrating comments; keep the macOS vsync comment
|
Closing this PR. After rebasing onto current main (wgpu 29, with the
The vsync-serialization issue described in #5836 also doesn't reproduce on my Windows setup (60 FPS stays 60 FPS with 10 viewports, before or after), so I can't verify whether the batched-present path would still help on the platforms where #5836 was originally reported. Whoever picks this up next should probably benchmark on Linux/Mac first. Sorry for the noise. |
Summary
This PR addresses the FPS drop reported in #7885 (and related #5836) when multiple immediate viewports are open.
As pointed out in #5836, each viewport does its own
queue.submit()+present()sequentially. With vsync on, everypresent()blocks until the next vblank, so N viewports ≈ 1/N FPS. With vsync off, the redundant GPU synchronization still adds noticeable overhead per viewport. This PR splitspaint_and_update_texturesinto three phases:paint_prepare— upload textures/buffers, acquire surface texture, record render pass, encode commandspaint_submit— singlequeue.submit()for all viewports at oncepaint_present— present all viewports after GPU work is doneImmediate viewports now accumulate their
PreparedFrames, and the parent viewport batches everything into one submit+present cycle.paint_and_update_texturesis kept as a convenience wrapper calling the three phases sequentially, so the public API remains backward-compatible. Deferred viewports are unaffected (they still go through the wrapper). The phase API uses type-state (PreparedFrame→SubmittedFrame) so the compiler enforces the prepare → submit → present order.Before and after
Tested on Windows 11, release mode, with a minimal benchmark spawning 0 to 10 immediate viewports (vsync off, high-performance GPU).
With vsync on, both before and after stay pinned at the monitor refresh (60 Hz) on this Windows/DXGI setup, so the vsync-serialization issue described in #5836 isn't reproducible here — but the batched-present path should also help on platforms where it is.
Benchmark code
Disclosure
I'm not a Rust developer — I used Claude Code to help me write this. I hope I'm not making a mess, I just wanted to help! Please don't hesitate to point out anything wrong.
Test plan