Skip to content

Batch wgpu submit and present across immediate viewports#7961

Closed
gcailly wants to merge 2 commits intoemilk:mainfrom
gcailly:fix/viewport-perf
Closed

Batch wgpu submit and present across immediate viewports#7961
gcailly wants to merge 2 commits intoemilk:mainfrom
gcailly:fix/viewport-perf

Conversation

@gcailly
Copy link
Copy Markdown
Contributor

@gcailly gcailly commented Mar 6, 2026

Summary

This PR addresses the FPS drop reported in #7885 (and related #5836) when multiple immediate viewports are open.

As pointed out in #5836, each viewport does its own queue.submit() + present() sequentially. With vsync on, every present() blocks until the next vblank, so N viewports ≈ 1/N FPS. With vsync off, the redundant GPU synchronization still adds noticeable overhead per viewport. This PR splits paint_and_update_textures into three phases:

  • paint_prepare — upload textures/buffers, acquire surface texture, record render pass, encode commands
  • paint_submit — single queue.submit() for all viewports at once
  • paint_present — present all viewports after GPU work is done

Immediate viewports now accumulate their PreparedFrames, and the parent viewport batches everything into one submit+present cycle.

paint_and_update_textures is kept as a convenience wrapper calling the three phases sequentially, so the public API remains backward-compatible. Deferred viewports are unaffected (they still go through the wrapper). The phase API uses type-state (PreparedFrameSubmittedFrame) so the compiler enforces the prepare → submit → present order.

Note: egui_glow has the same architectural issue (one swap_buffers per immediate viewport). This PR only addresses egui-wgpu; a follow-up PR would be needed for glow.

Before and after

Tested on Windows 11, release mode, with a minimal benchmark spawning 0 to 10 immediate viewports (vsync off, high-performance GPU).

Viewports Before After Gain
0 1330 2051 +54%
1 813 1163 +43%
2 583 776 +33%
3 448 709 +58%
4 347 574 +65%
5 313 510 +63%
6 282 428 +52%
7 217 376 +73%
8 198 335 +69%
9 187 304 +63%
10 167 317 +90%

With vsync on, both before and after stay pinned at the monitor refresh (60 Hz) on this Windows/DXGI setup, so the vsync-serialization issue described in #5836 isn't reproducible here — but the batched-present path should also help on platforms where it is.

Benchmark code
#![cfg_attr(not(debug_assertions), windows_subsystem = "windows")]

use eframe::egui_wgpu::{WgpuConfiguration, WgpuSetup, WgpuSetupCreateNew};
use egui::{Id, ViewportId};
use std::time::Instant;
use wgpu::{PowerPreference, PresentMode};

const SECONDS_PER_STEP: f64 = 3.0;
const WARMUP_SECS: f64 = 1.0;
const MAX_VIEWPORTS: usize = 10;

fn main() -> eframe::Result {
    let mut wgpu_options = WgpuConfiguration::default();
    wgpu_options.present_mode = PresentMode::AutoNoVsync;
    wgpu_options.wgpu_setup = match wgpu_options.wgpu_setup {
        WgpuSetup::CreateNew(create_new) => WgpuSetup::CreateNew(WgpuSetupCreateNew {
            power_preference: PowerPreference::HighPerformance,
            ..create_new
        }),
        _ => unreachable!(),
    };

    let native_options = eframe::NativeOptions {
        viewport: egui::ViewportBuilder::default()
            .with_inner_size([400.0, 300.0])
            .with_min_inner_size([300.0, 220.0]),
        vsync: false,
        wgpu_options,
        ..Default::default()
    };

    println!("| Viewports | FPS |");
    println!("|:---------:|:---:|");

    eframe::run_native(
        "viewport_perf",
        native_options,
        Box::new(|_cc| Ok(Box::new(App::new()))),
    )
}

struct App {
    current_step: usize,
    frame_count: usize,
    step_start: Instant,
    warming_up: bool,
    done: bool,
    results: Vec<(usize, usize)>,
}

impl App {
    fn new() -> Self {
        Self {
            current_step: 0,
            frame_count: 0,
            step_start: Instant::now(),
            warming_up: true,
            done: false,
            results: Vec::new(),
        }
    }
}

impl eframe::App for App {
    fn ui(&mut self, ui: &mut egui::Ui, _frame: &mut eframe::Frame) {
        let elapsed = self.step_start.elapsed().as_secs_f64();

        if self.done {
            egui::CentralPanel::default().show_inside(ui, |ui| {
                ui.heading("Benchmark complete!");
                ui.separator();
                for (vp, fps) in &self.results {
                    ui.label(format!("{vp} viewports: {fps} FPS"));
                }
            });
            return;
        }

        if self.warming_up {
            if elapsed >= WARMUP_SECS {
                self.warming_up = false;
                self.frame_count = 0;
                self.step_start = Instant::now();
            }
        } else if elapsed >= SECONDS_PER_STEP {
            let fps = (self.frame_count as f64 / elapsed).round() as usize;
            println!("| {:<9} | {fps:>5} |", self.current_step);
            self.results.push((self.current_step, fps));

            self.current_step += 1;
            self.frame_count = 0;
            self.step_start = Instant::now();
            self.warming_up = true;

            if self.current_step > MAX_VIEWPORTS {
                self.done = true;
                println!("\nDone! You can close the window.");
                return;
            }
        }

        if !self.warming_up {
            self.frame_count += 1;
        }

        egui::CentralPanel::default().show_inside(ui, |ui| {
            ui.heading(format!("Benchmarking: {} viewport(s)...", self.current_step));
            if self.warming_up {
                ui.label("Warming up...");
            } else {
                ui.label(format!(
                    "Measuring ({:.1}s / {SECONDS_PER_STEP}s)",
                    self.step_start.elapsed().as_secs_f64()
                ));
            }
        });

        let viewport_ids: Vec<ViewportId> = (0..self.current_step)
            .map(|i| ViewportId(Id::new(format!("w{i}"))))
            .collect();

        for viewport_id in &viewport_ids {
            ui.ctx().show_viewport_immediate(
                *viewport_id,
                egui::ViewportBuilder::default()
                    .with_inner_size([400.0, 300.0])
                    .with_min_inner_size([300.0, 220.0]),
                |ui, _class| {
                    egui::CentralPanel::default().show_inside(ui, |ui| {
                        ui.heading("Extra Window");
                    });
                },
            );
        }

        ui.ctx().request_repaint();
    }
}

Disclosure

I'm not a Rust developer — I used Claude Code to help me write this. I hope I'm not making a mess, I just wanted to help! Please don't hesitate to point out anything wrong.

Test plan

  • `cargo test -p egui-wgpu -p eframe` — all tests pass
  • `cargo clippy -p egui-wgpu -p eframe --all-features --all-targets` — no warnings
  • `cargo doc -p egui-wgpu -p eframe --all-features` with `-D warnings` — no broken links
  • `multiple_viewports` example — works correctly
  • Benchmarked 0–10 immediate viewports (vsync off) — see table above

Split `paint_and_update_textures` into three phases (`paint_prepare`,
`paint_submit`, `paint_present`) so that immediate viewports can
accumulate their prepared frames and submit them in a single
`queue.submit()` call instead of one per viewport.

This reduces frame time with many immediate viewports by eliminating
redundant GPU synchronization between each viewport's submit+present
cycle.

Closes emilk#7885
@gcailly gcailly requested a review from Wumpf as a code owner March 6, 2026 14:10
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 6, 2026

Preview available at https://egui-pr-preview.github.io/pr/7961-fixviewport-perf
Note that it might take a couple seconds for the update to show up after the preview_build workflow has completed.

View snapshot changes at kitdiff

@liusuchao
Copy link
Copy Markdown

Thanks for this awesome optimization! The results are impressive — 3x FPS improvement with 10 viewports is a huge win. Hope this gets reviewed and merged soon. Keep up the great work! 🚀

@gcailly
Copy link
Copy Markdown
Contributor Author

gcailly commented Mar 12, 2026

Thanks @liusuchao! I have to be honest : I was a bit too optimistic with the initial numbers. After more thorough benchmarking (see updated PR description), the real-world gain is closer to 30/40% rather than the 3x improvement I initially reported.

@liusuchao
Copy link
Copy Markdown

@gcailly Hey, I think there might still be two errors or warnings left.

- Fix broken intra-doc links (`[\`Self::paint_prepare\`]`, etc.)
- Use type-state: `paint_submit(Vec<PreparedFrame>) -> Vec<SubmittedFrame>`,
  so the compiler enforces prepare → submit → present
- Flatten `SubmitData` / `PresentData` into `PreparedFrame`, remove duplicate
  `viewport_id`, merge encoded + user command buffers into a single `Vec`
- Make fields private, expose `viewport_id()` / `vsync_sec()` accessors,
  add `#[must_use]` on both frame types
- Preallocate `all_cmd_bufs` with total capacity in `paint_submit`
- Restore emilk#7928 flush: submit `[]` on the missing-surface early return
- Measure submit + present durations again in `vsync_sec` (regressed by the
  split); `paint_present` now returns total present time
- Drop narrating comments; keep the macOS vsync comment
@gcailly gcailly marked this pull request as draft April 13, 2026 14:29
@gcailly
Copy link
Copy Markdown
Contributor Author

gcailly commented Apr 13, 2026

Closing this PR.

After rebasing onto current main (wgpu 29, with the needs_reconfigure surface changes etc.) and re-running the benchmark, the refactor no longer provides a measurable speedup. It actually regresses by ~5–16% on Windows :

Viewports main this PR Δ
0 ~1290 ~1210 -6%
1 ~630 ~595 -6%
3 ~330 ~290 -12%
5 ~220 ~210 -5%
7 ~175 ~155 -11%
10 ~125 ~105 -16%

The vsync-serialization issue described in #5836 also doesn't reproduce on my Windows setup (60 FPS stays 60 FPS with 10 viewports, before or after), so I can't verify whether the batched-present path would still help on the platforms where #5836 was originally reported. Whoever picks this up next should probably benchmark on Linux/Mac first.

Sorry for the noise.

@gcailly gcailly closed this Apr 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants