derm-bench

"Bridging Research and Practice: A Systematic Evaluation of Generalist and Dermatology-Specific Models in Clinical Skin Lesion Classification", early accept at MICCAI2026, Paper

Benchmark suite for binary benign vs. malignant dermatology lesion classification across public and merged datasets, comparing three complementary modeling paradigms in a single monorepo.

Overview

derm-bench standardizes data preparation, evaluation protocols, and reporting for skin-lesion malignancy prediction. All pipelines share the same task definition (two classes: benign and malignant) and a common on-disk dataset layout under datasets/. Each sub-project implements a different approach:

Project	Paradigm	Input format	Configuration
derm-lesion-cnn_vit_classification	Fine-tune end-to-end image classifiers (CNN, ViT, DINOv3)	CSV metadata + `images/`	`configuration/config.yaml`
derm-lesion-foundation_vlms-classification	Vision-language and dual-encoder models (prompt-based / similarity)	CSV metadata + `images/` (test split)	`configuration_yaml/setup_config.yaml`
derm-lesion-embeddings-classification	Frozen foundation backbones → embeddings → classical / MLP heads	H5 metadata + `images/`	`configuration/config.yaml`

Features

Unified task: Binary malignancy classification with consistent labels and metrics across pipelines.
Multiple public datasets: HAM10000, ISIC18/24, PAD, HC, DDI, SD-198, segmented variants, and merged corpora.
Reproducible workflows: YAML-driven configuration and Makefile targets per project.
Aggregated reporting: Summary CSVs and comparison plots from per-run metrics.
Dataset tooling: Jupyter notebooks for per-source ETL and a merger for combined benchmarks.

Repository layout

derm-bench/
├── README.md
├── LICENSE
├── datasets/                          # User-provided (not versioned)
├── notebooks/                         # Per-dataset preprocessing notebooks
├── dataset_merger/                    # Build merged_clinic, merged_dermatoscopic, merged
├── derm-lesion-cnn_vit_classification/
├── derm-lesion-foundation_vlms-classification/
└── derm-lesion-embeddings-classification/

Prerequisites

Python 3.10+ recommended for all sub-projects.
CUDA-capable GPU strongly recommended for CNN/ViT training, large VLMs, and DINOv3 / Derm Foundation embedding extraction.
Disk space: Raw dermatology image collections are large; plan tens of GB per full benchmark grid.
Optional: Ollama for local vision-language models in the VLM project.

Datasets Used

Some datasets were already provided with predefined train/validation/test splits by the original authors. To ensure fair comparison, these official splits were preserved whenever available. For datasets without predefined splits, a standardized 70%/15%/15% train/validation/test division was adopted to maintain consistency across experiments.

In some datasets, the number of images used is lower than the total number of available images due to missing labels. For ISIC24, the sample size was reduced due to its substantial imbalance relative to the other datasets.

PAD-UFES-20

Source: PAD-UFES-20: a skin lesion dataset composed of patient data and clinical images collected from smartphones
Modality: Clinical images
Total images: 2,298
Split: Official dataset split
Train: 1,723 (74.98%)
Validation: 287 (12.49%)
Test: 288 (12.53%)

HAM10000

Source: HAM10000
Modality: Dermoscopic images
Total images: 11,720
Split: Official dataset split
Train: 8,790 (75.00%)
Validation: 1,465 (12.50%)
Test: 1,465 (12.50%)

ISIC18

Source: ISIC18
Modality: Dermoscopic images
Total images: 11,720
Split: Official dataset split
Train: 10015 (85.45%)
Validation: 193 (1.65%)
Test: 1512 (12.90%)

ISIC24

Source: ISIC24
Modality: Dermoscopic images
Total images: 401,059
Used images: 1,000
Split: Predefined
Train: 700 (70.00%)
Validation: 150 (15.00%)
Test: 150 (15.00%)

DDI (Diverse Dermatology Images)

Source: DDI - Diverse Dermatology Images
Modality: Clinical images
Total images: 656
Used images: 371
Split: Official dataset split
Train: 316 (85.18%)
Validation: 28 (7.55%)
Test: 27 (7.28%)

SD-198

Source: SD-198
Modality: Dermoscopic images
Total images: 6,583
Used images: 552
Split: Predefined
Train: 386 (69.93%)
Validation: 83 (15.04%)
Test: 83 (15.04%)

HC (Hospital das Clínicas)

Source Paper: DermAI: Clinical dermatology acquisition through quality-driven image collection for AI classification in mobile
Institution: Universidade Federal de Pernambuco (UFPE)
Notes: Clinical smartphone images collected by brazilian doctors under a research protocol.
Modality: Clinical images
Total images: 5,918
Used images: 2,507
Split: Predefined
Train: 1,754 (69.96%)
Validation: 376 (15.00%)
Test: 377 (15.04%)

Dataset layout

Place prepared datasets at the repository root under datasets/<dataset_name>/:

datasets/<dataset_name>/
├── images/                            # RGB lesion images (.jpg, .jpeg, .png)
├── train_metadata.csv
├── validation_metadata.csv
└── test_metadata.csv

Required CSV columns

Column	Description
`img_id`	Image filename or stem (extension added automatically if missing)
`benign_malignant`	Ground-truth label: `benign` or `malignant` (case-insensitive)

Rows with missing labels or missing image files are dropped during loading.

Optional columns

Column	Used by
`partition`	VLM evaluation (filters to `test` when present)
Patient / clinical metadata	VLM `vlm_prompt_patient_info` config (ISIC, HC, PAD datasets)

Preparing datasets

Run the relevant Jupyter notebooks in notebooks/ to convert each public source into the layout above (e.g. ham.ipynb, isic18.ipynb, isic24.ipynb, pad.ipynb, hc.ipynb, ddi.ipynb, sd-198.ipynb).
Optionally build merged benchmarks from the repo root:

cd dataset_merger
python3 merge_datasets.py

This creates three merged folders under datasets/:

Output name	Source datasets
`merged_clinic`	HC, PAD, ddi, sd-198
`merged_dermatoscopic`	ISIC18, ISIC24
`merged`	All of the above

See dataset_merger/merge_datasets.py and dataset_merger/merger.py for implementation details.

Embeddings pipeline note

The embeddings project expects HDF5 partition files (train_metadata.h5, etc.) rather than CSV. Convert CSV partitions to H5 with your own tooling before running make embeddings.

Getting started

Prepare data — Run notebooks → populate datasets/.
Merge (optional) — cd dataset_merger && python3 merge_datasets.py.
Choose a pipeline — cd into one of the three derm-lesion-* directories.
Install dependencies — pip install -r requirements.txt (see sub-project README).
Edit configuration — Adjust models, datasets, and paths in the project YAML file.
Run — Use make targets documented in each sub-project README.

Sub-project documentation

Project	README
CNN / ViT fine-tuning	derm-lesion-cnn_vit_classification/README.md
VLM / dual-encoder evaluation	derm-lesion-foundation_vlms-classification/README.md
Embedding + classifier heads	derm-lesion-embeddings-classification/README.md

Supporting components

`dataset_merger/`

DatasetMerger copies images from selected source datasets into a single images/ folder, resolves duplicate img_id values, and writes merged train_metadata.csv, validation_metadata.csv, and test_metadata.csv for each partition.

`notebooks/`

Interactive preprocessing for individual dermatology benchmarks. Outputs should match the shared CSV layout under datasets/.

`datasets/`

Not tracked in git (see .gitignore). You must download or generate data locally before running any pipeline.

Known limitations

The VLM project Makefile target dataset-merge points to a missing scripts/merge_datasets.py. Use dataset_merger/merge_datasets.py at the repo root instead.
The embeddings pipeline requires H5 metadata; CSV→H5 conversion is not included in this repository.
Some import paths in the VLM project may need alignment with src/vlms/base_model.py before make eval runs successfully.

License

This project is licensed under the Apache License 2.0.

Acknowledgments

The project was supported by the Ministry of Science, Technology, and Innovation of Brazil, with resources from Law No. 8,248, dated October 23, 1991, under the scope of the PPI-SOFTEX, coordinated by Softex and published under RESIDÊNCIA EM TIC 63 – ROBÓTICA E IA – FASE II, DOU 23076.043130/2025-27 and partially supported by INES.IA (National Institute of Science and Technology for Software Engineering Based on and for Artificial Intelligence) www.ines.org.br, CNPq grant 408817/2024-0.

Contact

website: Criar
email: criar@softex.cin.ufpe.br;
authors email: etcs@softex.cin.ufpe.br; kbc@softex.cin.ufpe.br; tir@cin.ufpe.br

Citation

@inproceedings{dosSantos2026, title={Bridging Research and Practice: A Systematic Evaluation of Generalist and Dermatology-Specific Models in Clinical Skin Lesion Classification}, author={dos Santos, Emanoel dos Santos and Cunha, Kelvin and Mota, Rodrigo and Papais, Fabio and Bezerra, Thales and Lopes, Natalia and Medeiros, Erico and Cruz, Shirley and Araujo, Jessica and Borba, Paulo and Ing Ren, Tsang}, booktitle={MICCAI, 2026}, year={2026} }

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
dataset_merger		dataset_merger
derm-lesion-cnn_vit_classification		derm-lesion-cnn_vit_classification
derm-lesion-embeddings-classification		derm-lesion-embeddings-classification
derm-lesion-foundation_vlms-classification		derm-lesion-foundation_vlms-classification
notebooks		notebooks
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

derm-bench

Overview

Features

Repository layout

Prerequisites

Datasets Used

PAD-UFES-20

HAM10000

ISIC18

ISIC24

DDI (Diverse Dermatology Images)

SD-198

HC (Hospital das Clínicas)

Dataset layout

Required CSV columns

Optional columns

Preparing datasets

Embeddings pipeline note

Getting started

Sub-project documentation

Supporting components

dataset_merger/

notebooks/

datasets/

Known limitations

License

Acknowledgments

Contact

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`dataset_merger/`

`notebooks/`

`datasets/`

Packages