DMS: Diffusion-Based Multi-Baseline Stereo Generation for Improving Self-Supervised Depth Estimation
ICCV 2025 – AIM Workshop (Oral)
DMS leverages diffusion models to synthesize epipolar-aligned multi-baseline views (left-shifted, right-shifted, and intermediate) that explicitly complete occluded and out-of-frame regions—boosting self-supervised depth learning without extra labels.
- 🎯 Goal: Improve self-supervised stereo/mono depth in ill-posed regions.
- 🧠 Idea: Fine-tune a diffusion UNet to generate novel views guided by simple directional prompts (“to left / to right / middle”).
- 🚀 Win: Adds valid correspondences where photometric supervision was previously missing.
- Installation
- Data Preparation
- Pretrained Models
- Train the DMB Diffusion Model
- Inference: Multi-Baseline Images
- Citation
- License
- Contact
pip install -r requirements.txtPlease download the following datasets:
Set your dataset roots in the corresponding config or script files as needed.
SceneFlow
cd scripts/SF/train
sh train_unet.shKITTI Raw
cd scripts/KITTI/train
sh train_kitti_raw.shKITTI 2015
cd scripts/KITTI/train
sh train_kitti15.shKITTI 2012
cd scripts/KITTI/train
sh train_kitti12.shMPI-Sintel
cd scripts/MPI/train
sh train_unet.sh# left->right and right->left
cd scripts/SF/evaluation
sh evaluation.sh
# left-left and right-right
cd scripts/SF/evaluation
sh get_additional_view.sh
# middle-state views
cd scripts/SF/evaluation
sh get_middle_view.sh# left->right and right->left
cd scripts/KITTI/kitti_raw_evaluations
sh eval_unet.sh
# left-left and right-right
cd scripts/KITTI/kitti_raw_evaluations
sh unet_generated_new_view.sh
# middle-state views
cd scripts/KITTI/kitti_raw_evaluations
sh unet_generate_med_view.sh# left->right and right->left
cd scripts/KITTI/kitti2015_evaluations
sh unet_eval.sh
# left-left and right-right
cd scripts/KITTI/kitti2015_evaluations
sh get_additional_view.sh
# middle-state views
cd scripts/KITTI/kitti2015_evaluations
sh get_middle_view.sh# left->right and right->left
cd scripts/KITTI/kitti2012_evaluations
sh unet_eval.sh
# left-left and right-right
cd scripts/KITTI/kitti2012_evaluations
sh get_additional_view.sh
# middle-state views
cd scripts/KITTI/kitti2012_evaluations
sh get_middle_view.sh# left/right + left-left/right-right
cd scripts/MPI/evaluations
sh eval_unet.sh
# middle-state views
cd scripts/MPI/evaluations
sh unet_generate_med_view.shIf you find this work helpful, please cite:
@inproceedings{liu2025dms,
title = {DMS: Diffusion-Based Multi-Baseline Stereo Generation for Improving Self-Supervised Depth Estimation},
author = {Liu, Zihua and {co-authors}},
booktitle = {ICCV Workshops (AIM)},
year = {2025},
}This project is released under the MIT License. See LICENSE for details.
For questions, please open an issue or contact the authors via the project page.
