"Bridging Research and Practice: A Systematic Evaluation of Generalist and Dermatology-Specific Models in Clinical Skin Lesion Classification", early accept at MICCAI2026, Paper
Benchmark suite for binary benign vs. malignant dermatology lesion classification across public and merged datasets, comparing three complementary modeling paradigms in a single monorepo.
derm-bench standardizes data preparation, evaluation protocols, and reporting for skin-lesion malignancy prediction. All pipelines share the same task definition (two classes: benign and malignant) and a common on-disk dataset layout under datasets/. Each sub-project implements a different approach:
| Project | Paradigm | Input format | Configuration |
|---|---|---|---|
| derm-lesion-cnn_vit_classification | Fine-tune end-to-end image classifiers (CNN, ViT, DINOv3) | CSV metadata + images/ |
configuration/config.yaml |
| derm-lesion-foundation_vlms-classification | Vision-language and dual-encoder models (prompt-based / similarity) | CSV metadata + images/ (test split) |
configuration_yaml/setup_config.yaml |
| derm-lesion-embeddings-classification | Frozen foundation backbones → embeddings → classical / MLP heads | H5 metadata + images/ |
configuration/config.yaml |
- Unified task: Binary malignancy classification with consistent labels and metrics across pipelines.
- Multiple public datasets: HAM10000, ISIC18/24, PAD, HC, DDI, SD-198, segmented variants, and merged corpora.
- Reproducible workflows: YAML-driven configuration and
Makefiletargets per project. - Aggregated reporting: Summary CSVs and comparison plots from per-run metrics.
- Dataset tooling: Jupyter notebooks for per-source ETL and a merger for combined benchmarks.
derm-bench/
├── README.md
├── LICENSE
├── datasets/ # User-provided (not versioned)
├── notebooks/ # Per-dataset preprocessing notebooks
├── dataset_merger/ # Build merged_clinic, merged_dermatoscopic, merged
├── derm-lesion-cnn_vit_classification/
├── derm-lesion-foundation_vlms-classification/
└── derm-lesion-embeddings-classification/
- Python 3.10+ recommended for all sub-projects.
- CUDA-capable GPU strongly recommended for CNN/ViT training, large VLMs, and DINOv3 / Derm Foundation embedding extraction.
- Disk space: Raw dermatology image collections are large; plan tens of GB per full benchmark grid.
- Optional: Ollama for local vision-language models in the VLM project.
Some datasets were already provided with predefined train/validation/test splits by the original authors. To ensure fair comparison, these official splits were preserved whenever available. For datasets without predefined splits, a standardized 70%/15%/15% train/validation/test division was adopted to maintain consistency across experiments.
In some datasets, the number of images used is lower than the total number of available images due to missing labels. For ISIC24, the sample size was reduced due to its substantial imbalance relative to the other datasets.
- Source: PAD-UFES-20: a skin lesion dataset composed of patient data and clinical images collected from smartphones
- Modality: Clinical images
- Total images: 2,298
- Split: Official dataset split
- Train: 1,723 (74.98%)
- Validation: 287 (12.49%)
- Test: 288 (12.53%)
- Source: HAM10000
- Modality: Dermoscopic images
- Total images: 11,720
- Split: Official dataset split
- Train: 8,790 (75.00%)
- Validation: 1,465 (12.50%)
- Test: 1,465 (12.50%)
- Source: ISIC18
- Modality: Dermoscopic images
- Total images: 11,720
- Split: Official dataset split
- Train: 10015 (85.45%)
- Validation: 193 (1.65%)
- Test: 1512 (12.90%)
- Source: ISIC24
- Modality: Dermoscopic images
- Total images: 401,059
- Used images: 1,000
- Split: Predefined
- Train: 700 (70.00%)
- Validation: 150 (15.00%)
- Test: 150 (15.00%)
- Source: DDI - Diverse Dermatology Images
- Modality: Clinical images
- Total images: 656
- Used images: 371
- Split: Official dataset split
- Train: 316 (85.18%)
- Validation: 28 (7.55%)
- Test: 27 (7.28%)
- Source: SD-198
- Modality: Dermoscopic images
- Total images: 6,583
- Used images: 552
- Split: Predefined
- Train: 386 (69.93%)
- Validation: 83 (15.04%)
- Test: 83 (15.04%)
- Source Paper: DermAI: Clinical dermatology acquisition through quality-driven image collection for AI classification in mobile
- Institution: Universidade Federal de Pernambuco (UFPE)
- Notes: Clinical smartphone images collected by brazilian doctors under a research protocol.
- Modality: Clinical images
- Total images: 5,918
- Used images: 2,507
- Split: Predefined
- Train: 1,754 (69.96%)
- Validation: 376 (15.00%)
- Test: 377 (15.04%)
Place prepared datasets at the repository root under datasets/<dataset_name>/:
datasets/<dataset_name>/
├── images/ # RGB lesion images (.jpg, .jpeg, .png)
├── train_metadata.csv
├── validation_metadata.csv
└── test_metadata.csv
| Column | Description |
|---|---|
img_id |
Image filename or stem (extension added automatically if missing) |
benign_malignant |
Ground-truth label: benign or malignant (case-insensitive) |
Rows with missing labels or missing image files are dropped during loading.
| Column | Used by |
|---|---|
partition |
VLM evaluation (filters to test when present) |
| Patient / clinical metadata | VLM vlm_prompt_patient_info config (ISIC, HC, PAD datasets) |
- Run the relevant Jupyter notebooks in
notebooks/to convert each public source into the layout above (e.g.ham.ipynb,isic18.ipynb,isic24.ipynb,pad.ipynb,hc.ipynb,ddi.ipynb,sd-198.ipynb). - Optionally build merged benchmarks from the repo root:
cd dataset_merger
python3 merge_datasets.pyThis creates three merged folders under datasets/:
| Output name | Source datasets |
|---|---|
merged_clinic |
HC, PAD, ddi, sd-198 |
merged_dermatoscopic |
ISIC18, ISIC24 |
merged |
All of the above |
See dataset_merger/merge_datasets.py and dataset_merger/merger.py for implementation details.
The embeddings project expects HDF5 partition files (train_metadata.h5, etc.) rather than CSV. Convert CSV partitions to H5 with your own tooling before running make embeddings.
- Prepare data — Run notebooks → populate
datasets/. - Merge (optional) —
cd dataset_merger && python3 merge_datasets.py. - Choose a pipeline —
cdinto one of the threederm-lesion-*directories. - Install dependencies —
pip install -r requirements.txt(see sub-project README). - Edit configuration — Adjust models, datasets, and paths in the project YAML file.
- Run — Use
maketargets documented in each sub-project README.
| Project | README |
|---|---|
| CNN / ViT fine-tuning | derm-lesion-cnn_vit_classification/README.md |
| VLM / dual-encoder evaluation | derm-lesion-foundation_vlms-classification/README.md |
| Embedding + classifier heads | derm-lesion-embeddings-classification/README.md |
DatasetMerger copies images from selected source datasets into a single images/ folder, resolves duplicate img_id values, and writes merged train_metadata.csv, validation_metadata.csv, and test_metadata.csv for each partition.
Interactive preprocessing for individual dermatology benchmarks. Outputs should match the shared CSV layout under datasets/.
Not tracked in git (see .gitignore). You must download or generate data locally before running any pipeline.
- The VLM project
Makefiletargetdataset-mergepoints to a missingscripts/merge_datasets.py. Usedataset_merger/merge_datasets.pyat the repo root instead. - The embeddings pipeline requires H5 metadata; CSV→H5 conversion is not included in this repository.
- Some import paths in the VLM project may need alignment with
src/vlms/base_model.pybeforemake evalruns successfully.
This project is licensed under the Apache License 2.0.
The project was supported by the Ministry of Science, Technology, and Innovation of Brazil, with resources from Law No. 8,248, dated October 23, 1991, under the scope of the PPI-SOFTEX, coordinated by Softex and published under RESIDÊNCIA EM TIC 63 – ROBÓTICA E IA – FASE II, DOU 23076.043130/2025-27 and partially supported by INES.IA (National Institute of Science and Technology for Software Engineering Based on and for Artificial Intelligence) www.ines.org.br, CNPq grant 408817/2024-0.
website: Criar
email: criar@softex.cin.ufpe.br;
authors email: etcs@softex.cin.ufpe.br; kbc@softex.cin.ufpe.br; tir@cin.ufpe.br
@inproceedings{dosSantos2026, title={Bridging Research and Practice: A Systematic Evaluation of Generalist and Dermatology-Specific Models in Clinical Skin Lesion Classification}, author={dos Santos, Emanoel dos Santos and Cunha, Kelvin and Mota, Rodrigo and Papais, Fabio and Bezerra, Thales and Lopes, Natalia and Medeiros, Erico and Cruz, Shirley and Araujo, Jessica and Borba, Paulo and Ing Ren, Tsang}, booktitle={MICCAI, 2026}, year={2026} }