Batch wgpu submit and present across immediate viewports by gcailly · Pull Request #7961 · emilk/egui

gcailly · 2026-03-06T14:10:11Z

Summary

This PR addresses the FPS drop reported in #7885 (and related #5836) when multiple immediate viewports are open.

As pointed out in #5836, each viewport does its own queue.submit() + present() sequentially. With vsync on, every present() blocks until the next vblank, so N viewports ≈ 1/N FPS. With vsync off, the redundant GPU synchronization still adds noticeable overhead per viewport. This PR splits paint_and_update_textures into three phases:

paint_prepare — upload textures/buffers, acquire surface texture, record render pass, encode commands
paint_submit — single queue.submit() for all viewports at once
paint_present — present all viewports after GPU work is done

Immediate viewports now accumulate their PreparedFrames, and the parent viewport batches everything into one submit+present cycle.

paint_and_update_textures is kept as a convenience wrapper calling the three phases sequentially, so the public API remains backward-compatible. Deferred viewports are unaffected (they still go through the wrapper). The phase API uses type-state (PreparedFrame → SubmittedFrame) so the compiler enforces the prepare → submit → present order.

Note: egui_glow has the same architectural issue (one swap_buffers per immediate viewport). This PR only addresses egui-wgpu; a follow-up PR would be needed for glow.

Before and after

Tested on Windows 11, release mode, with a minimal benchmark spawning 0 to 10 immediate viewports (vsync off, high-performance GPU).

Viewports	Before	After	Gain
0	1330	2051	+54%
1	813	1163	+43%
2	583	776	+33%
3	448	709	+58%
4	347	574	+65%
5	313	510	+63%
6	282	428	+52%
7	217	376	+73%
8	198	335	+69%
9	187	304	+63%
10	167	317	+90%

With vsync on, both before and after stay pinned at the monitor refresh (60 Hz) on this Windows/DXGI setup, so the vsync-serialization issue described in #5836 isn't reproducible here — but the batched-present path should also help on platforms where it is.

Benchmark code

#![cfg_attr(not(debug_assertions), windows_subsystem = "windows")]

use eframe::egui_wgpu::{WgpuConfiguration, WgpuSetup, WgpuSetupCreateNew};
use egui::{Id, ViewportId};
use std::time::Instant;
use wgpu::{PowerPreference, PresentMode};

const SECONDS_PER_STEP: f64 = 3.0;
const WARMUP_SECS: f64 = 1.0;
const MAX_VIEWPORTS: usize = 10;

fn main() -> eframe::Result {
    let mut wgpu_options = WgpuConfiguration::default();
    wgpu_options.present_mode = PresentMode::AutoNoVsync;
    wgpu_options.wgpu_setup = match wgpu_options.wgpu_setup {
        WgpuSetup::CreateNew(create_new) => WgpuSetup::CreateNew(WgpuSetupCreateNew {
            power_preference: PowerPreference::HighPerformance,
            ..create_new
        }),
        _ => unreachable!(),
    };

    let native_options = eframe::NativeOptions {
        viewport: egui::ViewportBuilder::default()
            .with_inner_size([400.0, 300.0])
            .with_min_inner_size([300.0, 220.0]),
        vsync: false,
        wgpu_options,
        ..Default::default()
    };

    println!("| Viewports | FPS |");
    println!("|:---------:|:---:|");

    eframe::run_native(
        "viewport_perf",
        native_options,
        Box::new(|_cc| Ok(Box::new(App::new()))),
    )
}

struct App {
    current_step: usize,
    frame_count: usize,
    step_start: Instant,
    warming_up: bool,
    done: bool,
    results: Vec<(usize, usize)>,
}

impl App {
    fn new() -> Self {
        Self {
            current_step: 0,
            frame_count: 0,
            step_start: Instant::now(),
            warming_up: true,
            done: false,
            results: Vec::new(),
        }
    }
}

impl eframe::App for App {
    fn ui(&mut self, ui: &mut egui::Ui, _frame: &mut eframe::Frame) {
        let elapsed = self.step_start.elapsed().as_secs_f64();

        if self.done {
            egui::CentralPanel::default().show_inside(ui, |ui| {
                ui.heading("Benchmark complete!");
                ui.separator();
                for (vp, fps) in &self.results {
                    ui.label(format!("{vp} viewports: {fps} FPS"));
                }
            });
            return;
        }

        if self.warming_up {
            if elapsed >= WARMUP_SECS {
                self.warming_up = false;
                self.frame_count = 0;
                self.step_start = Instant::now();
            }
        } else if elapsed >= SECONDS_PER_STEP {
            let fps = (self.frame_count as f64 / elapsed).round() as usize;
            println!("| {:<9} | {fps:>5} |", self.current_step);
            self.results.push((self.current_step, fps));

            self.current_step += 1;
            self.frame_count = 0;
            self.step_start = Instant::now();
            self.warming_up = true;

            if self.current_step > MAX_VIEWPORTS {
                self.done = true;
                println!("\nDone! You can close the window.");
                return;
            }
        }

        if !self.warming_up {
            self.frame_count += 1;
        }

        egui::CentralPanel::default().show_inside(ui, |ui| {
            ui.heading(format!("Benchmarking: {} viewport(s)...", self.current_step));
            if self.warming_up {
                ui.label("Warming up...");
            } else {
                ui.label(format!(
                    "Measuring ({:.1}s / {SECONDS_PER_STEP}s)",
                    self.step_start.elapsed().as_secs_f64()
                ));
            }
        });

        let viewport_ids: Vec<ViewportId> = (0..self.current_step)
            .map(|i| ViewportId(Id::new(format!("w{i}"))))
            .collect();

        for viewport_id in &viewport_ids {
            ui.ctx().show_viewport_immediate(
                *viewport_id,
                egui::ViewportBuilder::default()
                    .with_inner_size([400.0, 300.0])
                    .with_min_inner_size([300.0, 220.0]),
                |ui, _class| {
                    egui::CentralPanel::default().show_inside(ui, |ui| {
                        ui.heading("Extra Window");
                    });
                },
            );
        }

        ui.ctx().request_repaint();
    }
}

Disclosure

I'm not a Rust developer — I used Claude Code to help me write this. I hope I'm not making a mess, I just wanted to help! Please don't hesitate to point out anything wrong.

Test plan

`cargo test -p egui-wgpu -p eframe` — all tests pass
`cargo clippy -p egui-wgpu -p eframe --all-features --all-targets` — no warnings
`cargo doc -p egui-wgpu -p eframe --all-features` with `-D warnings` — no broken links
`multiple_viewports` example — works correctly
Benchmarked 0–10 immediate viewports (vsync off) — see table above

Split `paint_and_update_textures` into three phases (`paint_prepare`, `paint_submit`, `paint_present`) so that immediate viewports can accumulate their prepared frames and submit them in a single `queue.submit()` call instead of one per viewport. This reduces frame time with many immediate viewports by eliminating redundant GPU synchronization between each viewport's submit+present cycle. Closes emilk#7885

github-actions · 2026-03-06T14:10:21Z

Preview available at https://egui-pr-preview.github.io/pr/7961-fixviewport-perf
Note that it might take a couple seconds for the update to show up after the preview_build workflow has completed.

View snapshot changes at kitdiff

liusuchao · 2026-03-11T08:53:34Z

Thanks for this awesome optimization! The results are impressive — 3x FPS improvement with 10 viewports is a huge win. Hope this gets reviewed and merged soon. Keep up the great work! 🚀

gcailly · 2026-03-12T12:03:30Z

Thanks @liusuchao! I have to be honest : I was a bit too optimistic with the initial numbers. After more thorough benchmarking (see updated PR description), the real-world gain is closer to 30/40% rather than the 3x improvement I initially reported.

liusuchao · 2026-04-13T03:24:22Z

@gcailly Hey, I think there might still be two errors or warnings left.

- Fix broken intra-doc links (`[\`Self::paint_prepare\`]`, etc.) - Use type-state: `paint_submit(Vec<PreparedFrame>) -> Vec<SubmittedFrame>`, so the compiler enforces prepare → submit → present - Flatten `SubmitData` / `PresentData` into `PreparedFrame`, remove duplicate `viewport_id`, merge encoded + user command buffers into a single `Vec` - Make fields private, expose `viewport_id()` / `vsync_sec()` accessors, add `#[must_use]` on both frame types - Preallocate `all_cmd_bufs` with total capacity in `paint_submit` - Restore emilk#7928 flush: submit `[]` on the missing-surface early return - Measure submit + present durations again in `vsync_sec` (regressed by the split); `paint_present` now returns total present time - Drop narrating comments; keep the macOS vsync comment

gcailly · 2026-04-13T14:32:22Z

Closing this PR.

After rebasing onto current main (wgpu 29, with the needs_reconfigure surface changes etc.) and re-running the benchmark, the refactor no longer provides a measurable speedup. It actually regresses by ~5–16% on Windows :

Viewports	main	this PR	Δ
0	~1290	~1210	-6%
1	~630	~595	-6%
3	~330	~290	-12%
5	~220	~210	-5%
7	~175	~155	-11%
10	~125	~105	-16%

The vsync-serialization issue described in #5836 also doesn't reproduce on my Windows setup (60 FPS stays 60 FPS with 10 viewports, before or after), so I can't verify whether the batched-present path would still help on the platforms where #5836 was originally reported. Whoever picks this up next should probably benchmark on Linux/Mac first.

Sorry for the noise.

gcailly requested a review from Wumpf as a code owner March 6, 2026 14:10

gcailly marked this pull request as draft April 13, 2026 14:29

gcailly closed this Apr 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch wgpu submit and present across immediate viewports#7961

Batch wgpu submit and present across immediate viewports#7961
gcailly wants to merge 2 commits intoemilk:mainfrom
gcailly:fix/viewport-perf

gcailly commented Mar 6, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 6, 2026 •

edited

Loading

Uh oh!

liusuchao commented Mar 11, 2026

Uh oh!

gcailly commented Mar 12, 2026 •

edited

Loading

Uh oh!

liusuchao commented Apr 13, 2026

Uh oh!

gcailly commented Apr 13, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

gcailly commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Before and after

Disclosure

Test plan

Uh oh!

github-actions bot commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

liusuchao commented Mar 11, 2026

Uh oh!

gcailly commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

liusuchao commented Apr 13, 2026

Uh oh!

gcailly commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

gcailly commented Mar 6, 2026 •

edited

Loading

github-actions bot commented Mar 6, 2026 •

edited

Loading

gcailly commented Mar 12, 2026 •

edited

Loading

gcailly commented Apr 13, 2026 •

edited

Loading