This repository contains evaluation scripts and results for the Boltz 2's performance on the Polaris blind challenge datasets, specifically focusing on antiviral potency.
Performance Summary:
- Ranking: 33rd/33 methods across all datasets for MAE
- Correlation Performance: 20-21st/33 methods (39-42nd percentile)
- Best Performance: SARS-CoV-2 target (Pearson R: 0.824)
- Target Preference: Consistently better on SARS-CoV-2 vs MERS-CoV
├── boltz_affinity_predictions.csv # Affinity prediction results
├── boltz_polaris_evaluation.ipynb # Main evaluation notebook
├── code_to_create_yamls.py # YAML configuration generator
├── env.yml # Conda environment specification
├── evaluation/ # Evaluation modules
│ ├── admet.py # ADMET evaluation functions
│ ├── bootstrapping.py # Statistical bootstrapping utilities
│ ├── cld.py # CLD (Chemical Library Design) evaluation
│ ├── ligand_poses.py # Ligand pose evaluation
│ ├── potency.py # Potency prediction evaluation
│ ├── utils.py # Common utilities
│ └── data/ # Evaluation datasets
└── leaderboards/ # Challenge leaderboard submissions
├── antiviral-admet-2025/ # ADMET challenge results
├── antiviral-ligand-poses-2025/ # Ligand poses challenge results
└── antiviral-potency-2025/ # Potency challenge results
-
Create the conda environment:
conda env create -f env.yml conda activate boltz-polaris
-
Generate YAML configurations for Boltz 2:
python code_to_create_yamls.py
-
Run Boltz 2 predictions (not included in this repo):
boltz predict ./yaml_files/*.yaml --use_msa_server --use_potentials --no_kernel -
Run the evaluation notebook:
jupyter notebook boltz_polaris_evaluation.ipynb
Generate YAML configurations for Boltz 2 input:
python code_to_create_yamls.pyRun evaluations:
from evaluation import potency, ligand_poses, admet
# Use evaluation functions as neededBoltz 2 Model Performance:
- Detailed affinity predictions available in
boltz_affinity_predictions.csv - Complete evaluation metrics and rankings can be generated from
boltz_polaris_evaluation.ipynb - All leaderboard comparisons are documented in
all_the_comparisons.txt - Visual performance comparisons shown in the generated image files from the notebook
Performance Summary:
- Boltz2 consistently ranks last in the evaluated methods subset
- Shows better correlation metrics than absolute error metrics
- Performs better on SARS-CoV-2 target compared to MERS-CoV across all metrics
The evaluation framework is adapted from the ASAP Polaris Blind Challenge Examples repository (accessed August 21st, 2025).
For the SARS-CoV-2 (PDB ID: 7CAM) and MERS-CoV (PDB ID: 8R5J) main proteases, the sequences for chains A and B were extracted from the respective PDB entries. These sequences were used to co-fold the dimer structure with the ASAP molecule. Initially, one YAML file was used to generate for each virus (MERS and SARS) to obtain the multiple sequence alignments (MSA). The resulting MSA CSV files were then reused to generate for the remaining YAML files for subsequent predictions.
To generate predictions using Boltz 2, the following command is executed:
boltz predict example_yaml_files/ASAP-0000175_MERS.yaml --use_msa_server --use_potentials --no_kernelAccording to the Boltz documentation, the model's output can be converted to pIC50 (in kcal/mol) using the formula:
pIC50 = (6 - y) * 1.364
where y is the model's prediction. To obtain a unitless pIC50 value, the conversion simplifies to:
pIC50 = 6 - y
This adjustment removes the scaling factor of 1.364, resulting in a dimensionless pIC50 value.


