Skip to content

layer6ai-labs/ssrl-mi-maximization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Project Logo

Self-Supervised Representation Learning as Mutual Information Maximization

This repository contains code and resources for our paper "Self-Supervised Representation Learning as Mutual Information Maximization" available at this link.

Conceptual Overview

Our work unifies SSRL methods under two distinct optimization paradigms:

SDMI and JMI Paradigms

Figure 1: Canonical forms of our proposed paradigms: (a) SDMI alternates updates between two encoders using stop-gradients, while (b) JMI jointly updates both views with shared gradients.

Key Contributions

  1. A Unified MI Maximization View: We formulate a general MI maximization perspective under the DV bound, showing that existing SSRL methods implicitly follow one of two optimization paradigms, namely Self-Distillation MI (SDMI) or Joint MI (JMI).

  2. Explaining Architectural Components: We show that design elements like stop-gradients, exponential moving average targets, predictor networks, and statistical regularizers are not heuristics, but theoretically necessary under MI-based objectives, providing a formal explanation for common design choices.

  3. Unifying Existing SSRL Methods: We show that many well-known SSRL approaches (e.g., SimCLR, BYOL, SimSiam) can be mapped directly to our two paradigms. This helps unify the field under a shared theoretical lens and offers guidance for future method design.

Quick Start Guide

Prerequisites

All dependencies are listed in requirements.txt

Installation

git clone https://github.com/layer6ai-labs/ssrl-mi-maximization.git
cd ssrl-mi-maximization
pip install -r requirements.txt

Running Experiments

All experimental scripts are provided in the scripts/ directory. The codebase supports training on CIFAR-10/100, TinyImageNet, and ImageNet100 datasets with various SSRL methods.

Training Scripts

Each script is self-contained and includes all necessary hyperparameters. Run from the project root directory:

# Run specific experiments from project root
bash scripts/CIFAR10/SDMI.sh
bash scripts/CIFAR10/JMI.sh
bash scripts/CIFAR100/BarlowTwins.sh
bash scripts/ImageNet100/SimCLR.sh

Example Command Structure

python3 main.py \
    --model_name SDMI \
    --dataset ImageNet100 \
    --architecture ResNet50 \
    --num_classes 100 \
    --epochs 800 \
    --warmup_epochs 10 \
    --batch_size 64 \
    --initial_lr 0.05 \
    --weight_decay 0.0001 \
    --temperature 0.1 \
    --num_runs 3 \
    --augmentation \
    --feature_dim 512 \
    --projection_dim 256 \
    --projection_layer 3 \
    --model_save_interval 50 \
    --model_evaluation_interval 1000 \
    --num_workers 16

Available Models

  • SDMI Prototype: Our canonical Self-Distillation MI implementation
  • JMI Prototype: Our canonical Joint MI implementation
  • Baseline Methods: SimCLR, BYOL, SimSiam, MoCo-v3, Barlow Twins, VICReg

Supported Datasets

  • CIFAR-10/100
  • TinyImageNet
  • ImageNet100
  • Synthetic Gaussian Mixture (for controlled experiments)

Experimental Results

Mutual Information Dynamics

MI Growth Curves MI Growth Curves

Figure 2: Estimated MI over CIFAR10 training for SDMI-based (top row) and JMI-based (bottom row) methods, using three estimators (cos–DV, InfoNCE and JSD; left to right). Both paradigms exhibit consistent MI growth: SDMI curves feature early fluctuations before trending upward, while JMI estimates rise more uniformly, and to much higher levels.


Representation Space Quality

Gaussian Cluster Trajectories Gaussian Cluster Trajectories

Figure 3: Embedding trajectories of the five Gaussian cluster centers. Opacity increases over training. SDMI separates centers more distinctly than analogous methods.


Linear Probe Accuracy

Model CIFAR10 CIFAR100 TinyImageNet ImageNet100
SDMI prototype (fθ) 88.61 ± 0.13 57.37 ± 0.38 33.30 ± 0.58 70.73 ± 0.57
SDMI prototype (gξ) 88.59 ± 0.35 57.85 ± 0.32 32.94 ± 0.71 70.83 ± 0.16
SimSiam 89.72 ± 0.18 60.45 ± 0.60 19.19 ± 0.69 78.23 ± 0.58
BYOL 91.28 ± 0.16 63.11 ± 0.21 32.77 ± 0.10 81.09 ± 0.61
MoCo-v3 91.10 ± 0.16 58.90 ± 0.32 32.18 ± 0.55 76.86 ± 0.74
JMI prototype 88.01 ± 0.48 57.22 ± 0.56 32.23 ± 0.52 73.41 ± 0.36
SimCLR 87.24 ± 0.37 55.32 ± 0.46 33.79 ± 0.31 75.31 ± 0.76
Barlow Twins 85.56 ± 0.71 51.91 ± 0.49 30.26 ± 0.12 78.96 ± 0.30
VICReg 85.49 ± 1.03 54.00 ± 0.34 32.03 ± 0.32 78.86 ± 0.23

Table 1: Linear probing accuracy (%) on four datasets. Mean ± std over 3 runs.


Summary of Results

  1. Monotonic MI Growth: Both paradigms demonstrate consistent mutual information increase during training
  2. Competitive Performance: Canonical forms achieve performance comparable to established methods
  3. Theoretical Alignment: Empirical behavior matches theoretical predictions

Reproducing Paper Results

To reproduce the main results from our paper, run each of the scripts

Project Structure

├── main.py                  # Entry point for training
├── README.md
├── requirements.txt
├── assets/                  # Figures, logos, and result plots for README/paper
├── controlled_experiments/  # Experiments using smaller networks for the Gaussian dataset
├── scripts/                 # Training scripts for all datasets and methods
│   ├── CIFAR10/
│   ├── CIFAR100/
│   ├── TinyImageNet/
│   └── ImageNet100/
├── src/
│   ├── trainer.py           # Core training loop
│   ├── evaluation.py        # Linear probing and metrics
│   ├── models/              # SDMI, JMI, and baseline implementations
│   └── utils/               # Losses, data loaders, checkpointing, logging

Citing

If you use any part of this repository in your research, please cite the associated paper with the following bibtex entry:

@article{sabby2025ssl,
  title={Self-Supervised Representation Learning as Mutual Information Maximization},
  author={Sabby, Akhlaqur Rahman and Sui, Yi and Wu, Tongzi and Cresswell, Jesse C and Wu, Ga},
  journal={arXiv:2510.01345},
  year={2025}
}

About

Self-Supervised Representation Learning as Mutual Information Maximization

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors