Skip to content

Magicboomliu/DMS

Repository files navigation

DMS: Diffusion-Based Multi-Baseline Stereo Generation for Improving Self-Supervised Depth Estimation

Project Page Paper (ICCV 2025 Workshop) License Python PyTorch

ICCV 2025 – AIM Workshop (Oral)
DMS leverages diffusion models to synthesize epipolar-aligned multi-baseline views (left-shifted, right-shifted, and intermediate) that explicitly complete occluded and out-of-frame regions—boosting self-supervised depth learning without extra labels.

Extended View Teaser


TL;DR

  • 🎯 Goal: Improve self-supervised stereo/mono depth in ill-posed regions.
  • 🧠 Idea: Fine-tune a diffusion UNet to generate novel views guided by simple directional prompts (“to left / to right / middle”).
  • 🚀 Win: Adds valid correspondences where photometric supervision was previously missing.

Table of Contents


Installation

pip install -r requirements.txt

Data Preparation

Please download the following datasets:

Set your dataset roots in the corresponding config or script files as needed.


Pretrained Models (Google Drive)


Train the DMB Diffusion Model

SceneFlow

cd scripts/SF/train
sh train_unet.sh

KITTI Raw

cd scripts/KITTI/train
sh train_kitti_raw.sh

KITTI 2015

cd scripts/KITTI/train
sh train_kitti15.sh

KITTI 2012

cd scripts/KITTI/train
sh train_kitti12.sh

MPI-Sintel

cd scripts/MPI/train
sh train_unet.sh

Inference: Multi-Baseline Images

SceneFlow

# left->right and right->left
cd scripts/SF/evaluation
sh evaluation.sh

# left-left and right-right
cd scripts/SF/evaluation
sh get_additional_view.sh

# middle-state views
cd scripts/SF/evaluation
sh get_middle_view.sh

KITTI Raw

# left->right and right->left
cd scripts/KITTI/kitti_raw_evaluations
sh eval_unet.sh

# left-left and right-right
cd scripts/KITTI/kitti_raw_evaluations
sh unet_generated_new_view.sh

# middle-state views
cd scripts/KITTI/kitti_raw_evaluations
sh unet_generate_med_view.sh

KITTI 2015

# left->right and right->left
cd scripts/KITTI/kitti2015_evaluations
sh unet_eval.sh

# left-left and right-right
cd scripts/KITTI/kitti2015_evaluations
sh get_additional_view.sh

# middle-state views
cd scripts/KITTI/kitti2015_evaluations
sh get_middle_view.sh

KITTI 2012

# left->right and right->left
cd scripts/KITTI/kitti2012_evaluations
sh unet_eval.sh

# left-left and right-right
cd scripts/KITTI/kitti2012_evaluations
sh get_additional_view.sh

# middle-state views
cd scripts/KITTI/kitti2012_evaluations
sh get_middle_view.sh

MPI-Sintel

# left/right + left-left/right-right
cd scripts/MPI/evaluations
sh eval_unet.sh

# middle-state views
cd scripts/MPI/evaluations
sh unet_generate_med_view.sh

Citation

If you find this work helpful, please cite:

@inproceedings{liu2025dms,
  title     = {DMS: Diffusion-Based Multi-Baseline Stereo Generation for Improving Self-Supervised Depth Estimation},
  author    = {Liu, Zihua and {co-authors}},
  booktitle = {ICCV Workshops (AIM)},
  year      = {2025},
}

License

This project is released under the MIT License. See LICENSE for details.


Contact

For questions, please open an issue or contact the authors via the project page.

About

Using Stable Diffusion Model for generating multi-baseline images for autonomous driving scenes like KITTI.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors