DMS: Diffusion-Based Multi-Baseline Stereo Generation for Improving Self-Supervised Depth Estimation

ICCV 2025 – AIM Workshop (Oral)
DMS leverages diffusion models to synthesize epipolar-aligned multi-baseline views (left-shifted, right-shifted, and intermediate) that explicitly complete occluded and out-of-frame regions—boosting self-supervised depth learning without extra labels.

TL;DR

🎯 Goal: Improve self-supervised stereo/mono depth in ill-posed regions.
🧠 Idea: Fine-tune a diffusion UNet to generate novel views guided by simple directional prompts (“to left / to right / middle”).
🚀 Win: Adds valid correspondences where photometric supervision was previously missing.

Installation

pip install -r requirements.txt

Data Preparation

Please download the following datasets:

Set your dataset roots in the corresponding config or script files as needed.

Pretrained Models (Google Drive)

Train the DMB Diffusion Model

SceneFlow

cd scripts/SF/train
sh train_unet.sh

KITTI Raw

cd scripts/KITTI/train
sh train_kitti_raw.sh

KITTI 2015

cd scripts/KITTI/train
sh train_kitti15.sh

KITTI 2012

cd scripts/KITTI/train
sh train_kitti12.sh

MPI-Sintel

cd scripts/MPI/train
sh train_unet.sh

Inference: Multi-Baseline Images

SceneFlow

# left->right and right->left
cd scripts/SF/evaluation
sh evaluation.sh

# left-left and right-right
cd scripts/SF/evaluation
sh get_additional_view.sh

# middle-state views
cd scripts/SF/evaluation
sh get_middle_view.sh

KITTI Raw

# left->right and right->left
cd scripts/KITTI/kitti_raw_evaluations
sh eval_unet.sh

# left-left and right-right
cd scripts/KITTI/kitti_raw_evaluations
sh unet_generated_new_view.sh

# middle-state views
cd scripts/KITTI/kitti_raw_evaluations
sh unet_generate_med_view.sh

KITTI 2015

# left->right and right->left
cd scripts/KITTI/kitti2015_evaluations
sh unet_eval.sh

# left-left and right-right
cd scripts/KITTI/kitti2015_evaluations
sh get_additional_view.sh

# middle-state views
cd scripts/KITTI/kitti2015_evaluations
sh get_middle_view.sh

KITTI 2012

# left->right and right->left
cd scripts/KITTI/kitti2012_evaluations
sh unet_eval.sh

# left-left and right-right
cd scripts/KITTI/kitti2012_evaluations
sh get_additional_view.sh

# middle-state views
cd scripts/KITTI/kitti2012_evaluations
sh get_middle_view.sh

MPI-Sintel

# left/right + left-left/right-right
cd scripts/MPI/evaluations
sh eval_unet.sh

# middle-state views
cd scripts/MPI/evaluations
sh unet_generate_med_view.sh

Citation

If you find this work helpful, please cite:

@inproceedings{liu2025dms,
  title     = {DMS: Diffusion-Based Multi-Baseline Stereo Generation for Improving Self-Supervised Depth Estimation},
  author    = {Liu, Zihua and {co-authors}},
  booktitle = {ICCV Workshops (AIM)},
  year      = {2025},
}

License

This project is released under the MIT License. See LICENSE for details.

Contact

For questions, please open an issue or contact the authors via the project page.

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
datafiles		datafiles
dataloader		dataloader
evaluations		evaluations
figures		figures
input_examples		input_examples
pipeline		pipeline
scripts		scripts
trainers		trainers
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DMS: Diffusion-Based Multi-Baseline Stereo Generation for Improving Self-Supervised Depth Estimation

TL;DR

Table of Contents

Installation

Data Preparation

Pretrained Models (Google Drive)

Train the DMB Diffusion Model

Inference: Multi-Baseline Images

SceneFlow

KITTI Raw

KITTI 2015

KITTI 2012

MPI-Sintel

Citation

License

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DMS: Diffusion-Based Multi-Baseline Stereo Generation for Improving Self-Supervised Depth Estimation

TL;DR

Table of Contents

Installation

Data Preparation

Pretrained Models (Google Drive)

Train the DMB Diffusion Model

Inference: Multi-Baseline Images

SceneFlow

KITTI Raw

KITTI 2015

KITTI 2012

MPI-Sintel

Citation

License

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages