GitHub - IST-DASLab/ant: 🐜 Research-friendly Deep Learning framework

Description

ant is a PyTorch-based Deep Learning framework. It is inspired by nanoGPT, although it does not borrow code from it. It tries to keep being research-friendly, while integrating (way) more features, following modern best practices and being modular. Specifically, among others, ant supports:

The CIFAR-10, OpenWebText, FineWeb-Edu and ClimbMix (SoTA) datasets
Any Hugging Face or TokenMonster (SoTA) tokenizer
ResNet, GPT2, Llama 2, nGPT, and OLMo 2 models
The PSGD, DistributedShampoo, AdEMAMix, SOAP, Muon and Scion optimizers
μP for zero-shot hyperparameter transfer
Downstream evaluations (HellaSwag, ARC etc.) through lm_eval during training
Model summary (parameters and FLOPS) and computational graph visualization
Offline logging of gradients and weights in simple .dat files
Attention heatmaps
Plotting through PGFPlots
Pure PyTorch attention, FlashAttention, FlexAttention and cuDNN attention with RoPE, ALiBi and Sliding Window Attention (SWA)
Distributed Data Parallel, torch.compile and Automatic Mixed Precision

Model summary, learning rate schedule and logging in the terminal.

Computational graph of a Transformer with RoPE.

Gradient and weight statistics of the Embeddings Table.

Attention weights for a Transformer block with twelve heads.

Getting Started

Tip

All scripts have a help menu available via --help. In cases this is not enough, it is recommended that you look at the code (self-documenting) directly. Alternatively, if you are a vibe coder 🤖, you can try feeding the whole codebase to an LLM (e.g. via https://uithub.com/gvlassis/ant), or a coding agent (e.g. Codex CLI).

Clone the repo:

 git clone https://github.com/gvlassis/ant.git

Install PyTorch and FlashAttention.
Install requirements.txt:
```
 pip install -r requirements.txt
```
Prepare the dataset via ./src/data/make.py. The dataset is first downloaded from Hugging Face, processed, and then saved as tensors in .pt files. If you are lazy 🦥, you can also directly download the artifacts of the following command.
```
 python ./src/data/make.py --dataset climbmix10m
```

Train a neural network via ./src/train.py. For training on cuda:0:

 # If you are not using μP, k_input is the learning rate
 python ./src/train.py --opt muon --micro_batch_size 32 --train_batches 2000 --k_input 3e-2 --momentum 0.95 --model_device_index 0 ./out/test

A lot of settings (e.g. depth, number of heads) are configured in ./src/models/utils_models.py/get_model_opts() and ./src/models/transformer.py. For one node with 4 GPUs:

  OMP_NUM_THREADS=1 torchrun --standalone --nproc_per_node=4 ./src/train.py --opt shampoo --micro_batch_size 32 --train_batches 2000 --k_input 3e-3 --momentum 0.95 --beta2 0.95 --eps 1e-10 ./out/test

Name		Name	Last commit message	Last commit date
Latest commit History 116 Commits
res		res
src		src
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Description

Getting Started

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Description

Getting Started

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages