This project is a user-friendly GUI version of the original M2SVID (Monocular-to-Stereo Video Conversion). It provides an end-to-end pipeline for depth-based warping, inpainting, and merging to create high-quality stereo (3D) videos from standard 2D monocular videos.
Note
This is a fork/standalone version focused on ease of use for Windows users, featuring a complete Gradio-based graphical interface. GUI developed by Archit. Original research and code by Nina Shvetsova et al. [3DV 2026].
- Automated Windows Installation: One-click setup using a portable Python 3.12 environment (no complex conda setup required).
- Gradio GUI: A complete graphical interface to manage all stages of the pipeline:
- Section 1: Warping: Generate reprojected right-eye views from depth maps.
- Section 2: Inpainting: Fill in disocclusions using a temporal/spatial-aware model.
- Section 3: Merging: Final SBS (Side-by-Side) encoding with custom shadow/edge mitigation and color transfer.
- Optimized for RTX GPUs: Pre-configured for CUDA 12.8 with support for RTX 20/30/40/50 series GPUs.
- Efficient Memory Management: Built-in support for tiling and chunking to handle high-resolution videos without running out of VRAM.
- NVIDIA GPU: RTX 20 series or newer recommended.
- Drivers: Ensure you have the latest NVIDIA drivers installed.
- Git: Download and install Git for Windows.
- Clone this repository recursively:
(Alternatively, download the ZIP and ensure submodules are initialized manually).
git clone --recursive https://github.com/Archit01/M2SVID-gui.git
- Double-click
install_windows.bat.- This will download a portable Python 3.12 environment.
- It will install all necessary dependencies (PyTorch 2.9.1, CUDA 12.8, xformers, etc.).
- This keeps your system's global Python installation untouched.
You must download the following weights and place them in a ckpts folder in the project root:
- Clip & SGM Weights: Download
ckpts.zipfrom Hi3D repo and unzip intockpts/. (Download ckpts.zip from Hi3D repo and unzip (follow step "2. Download checkpoints here and unzip."). Our model follows Hi3D implementation and uses the same openclip model.) Link: https://drive.google.com/file/d/1j_NEG2CPhFeRetYziWK6Qe62R5h7lG_V/view?usp=sharing - M2SVid Weights: Download the M2SVid weights and extract them into
ckpts/.- You should have
m2svid_weights.ptandm2svid_no_full_atten_weights.ptin theckptsfolder.
- You should have
Double-click run_app.bat. This will start the Gradio server and open the interface in your web browser.
The interface is split into three tabs:
- Tab 1: Warping: Provide your input videos and corresponding depth maps (generated by tools like DepthCrafter).
- Tab 2: Inpainting and Refine: Choose the model variant (Full Attention or No Full Attention) and process the warped videos to fill gaps.
- Tab 3: Merging: Preview the final output, adjust SBS settings (Full/Half SBS, Anaglyph), and render the final 3D video.
If you use this work, please cite the original authors:
@article{shvetsova2026m2svid,
title={M2SVid: End-to-End Inpainting and Refinement for Monocular-to-Stereo Video Conversion},
author={Shvetsova, Nina and Bhat, Goutam and Truong, Prune and Kuehne, Hilde and Tombari, Federico},
journal={3DV},
year={2026}
}Original Repository: google-research/m2svid