Skip to content

ktk-07/Transformer_For_Fun

Repository files navigation

Original Transformers for Fun

A from-scratch implementation of the Transformer architecture as described in "Attention Is All You Need" (Vaswani et al., 2017). This project implements the encoder-decoder architecture with multi-head attention, positional encoding, and all core components from the original paper.

Overview

This repository contains a complete implementation of the Transformer model, including:

  • Encoder-Decoder Architecture: Full transformer with stacked encoder and decoder layers
  • Multi-Head Attention: Self-attention and cross-attention mechanisms
  • Positional Encoding: Sinusoidal positional encodings
  • Layer Normalization: Pre-norm architecture with residual connections
  • Feed-Forward Networks: Position-wise feed-forward neural networks

Project Structure

.
├── custom_transformers.py      # Main Transformer model implementation
├── transformer_layers.py        # Encoder and Decoder layer implementations
├── transformer_sublayers.py     # Core sublayer components (attention, FFN, etc.)
├── transformer_utils.py         # Utility functions (positional encoding, attention, etc.)
├── notebook/                    # Jupyter notebooks for training and evaluation
│   ├── train_with_custom_BPE.ipynb
│   ├── train_with_prebuilt_tokenizers.ipynb
│   ├── test_sublayers.ipynb
│   └── testing_evals.ipynb
├── data/                        # Training datasets
│   ├── bible/                   # Bible translation dataset (en-zh)
│   ├── php_docs/                # PHP documentation dataset (en-zh)
│   └── wmt/                     # WMT translation datasets
└── requirements.txt             # Python dependencies

Features

  • Pure PyTorch Implementation: No reliance on high-level transformer libraries
  • Modular Design: Clean separation of concerns with sublayers, layers, and full model
  • Training Notebooks: Ready-to-use notebooks for training with different tokenization strategies
  • Evaluation Metrics: Comprehensive evaluation with BLEU and ROUGE scores
  • Multiple Datasets: Support for various machine translation datasets
  • Custom BPE Tokenizer: Implementation of Byte-Pair Encoding from scratch
  • Prebuilt Tokenizers: Support for Hugging Face tokenizers (BERT, Marian)

Installation

See SETUP.md for detailed installation instructions.

Quick Setup

  1. Clone the repository and navigate to the project directory:
git clone <repository-url>
cd original_transformers_for_fun
  1. Create and activate a virtual environment (recommended):
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
  1. Install dependencies:
pip install -r requirements.txt
  1. Download SpaCy language models:
python -m spacy download en_core_web_sm

Quick Start

  1. Start Jupyter Notebook:
jupyter notebook
  1. Open one of the training notebooks:

    • notebook/train_with_custom_BPE.ipynb - Training with custom Byte-Pair Encoding
    • notebook/train_with_prebuilt_tokenizers.ipynb - Training with prebuilt tokenizers (recommended for beginners)
  2. Run the notebook cells to start training your transformer model.

  3. Evaluate your trained model:

    • notebook/testing_evals.ipynb - Evaluate model performance with BLEU and ROUGE metrics

Note: Make sure to run the first cell in each notebook that adds the parent directory to the Python path.

Model Architecture

The implementation follows the original Transformer architecture:

  • Encoder: Stack of N identical layers, each with:

    • Multi-head self-attention
    • Position-wise feed-forward network
    • Residual connections and layer normalization
  • Decoder: Stack of N identical layers, each with:

    • Masked multi-head self-attention
    • Multi-head cross-attention (encoder-decoder attention)
    • Position-wise feed-forward network
    • Residual connections and layer normalization

Usage Example

from custom_transformers import Transformer

# Initialize model
model = Transformer(
    source_vocab_size=10000,
    target_vocab_size=10000,
    embedding_dim=512,
    num_of_heads=8,
    dropout_prob=0.1,
    n=6,  # number of layers
    global_max_seq_len=512,
    src_padding_idx=0,
    tgt_padding_idx=0
)

# Forward pass
output = model(
    source_input=src_tokens,
    target_input=tgt_tokens,
    src_max_len=src_length,
    tgt_max_len=tgt_length,
    encoder_mask=enc_mask,
    decoder_mask=dec_mask,
    encoder_decoder_mask=enc_dec_mask
)

Datasets

The project includes several machine translation datasets:

  • Bible Dataset: English-Chinese translation pairs
  • PHP Documentation: Technical documentation translations
  • WMT: Various language pairs from WMT datasets

Model Checkpoints

Pre-trained model checkpoints are stored in:

  • notebook/model_weights_with_custom_BPE/ - Models trained with custom BPE tokenizer
  • notebook/model_weights_with_prebuilt_tokenizer/ - Models trained with Hugging Face tokenizers

Note: These directories are excluded from git (see .gitignore) due to file size.

License

This is an educational project. Please refer to individual dataset licenses in the data/ directory.

References

  • Vaswani, A., et al. (2017). "Attention Is All You Need." Advances in Neural Information Processing Systems.

Development

Project Structure Details

  • custom_transformers.py: Main Transformer class with Encoder, Decoder, and full model
  • transformer_layers.py: EncoderLayer and DecoderLayer implementations
  • transformer_sublayers.py: Core components (MultiHeadAttention, FeedForward, LayerNorm)
  • transformer_utils.py: Helper functions for positional encoding, masking, and attention

Testing and Evaluation

The project includes several notebooks for testing and evaluation:

  • notebook/test_sublayers.ipynb - Unit tests for individual components
  • notebook/testing_evals.ipynb - Comprehensive evaluation with BLEU and ROUGE metrics

Evaluation Metrics

The testing_evals.ipynb notebook provides comprehensive evaluation of trained models:

  • BLEU Scores: Computes BLEU-1, BLEU-2, BLEU-3, and BLEU-4 scores

    • Measures n-gram precision between predicted and reference translations
    • Handles both English (word-level) and Chinese (character-level) tokenization
  • ROUGE Scores: Computes ROUGE-1, ROUGE-2, and ROUGE-L scores

    • Measures recall-oriented n-gram overlap
    • Includes precision, recall, and F-measure for each metric
    • Uses character-level tokenization for Chinese text

To run evaluation:

  1. Open notebook/testing_evals.ipynb
  2. Configure the model checkpoint path and tokenizer type
  3. Run all cells to evaluate on the test set
  4. View average scores and example translations

Dependencies

Key dependencies include:

  • PyTorch: Deep learning framework
  • Transformers: Hugging Face library for tokenizers
  • SpaCy: Natural language processing
  • Jieba: Chinese text segmentation
  • NumPy/Pandas: Data manipulation
  • NLTK: Natural language toolkit (for BLEU scores)
  • rouge-score: ROUGE metric computation

See requirements.txt for the complete list.

Evaluation Results

The evaluation notebook (testing_evals.ipynb) provides detailed metrics for model performance:

  • BLEU Scores: Standard metric for machine translation evaluation

    • BLEU-4 > 0.3: Good translation quality
    • BLEU-4 > 0.5: Very good translation quality
    • BLEU-4 > 0.7: Excellent translation quality
  • ROUGE Scores: Recall-oriented evaluation metrics

    • Useful for evaluating fluency and coverage
    • Character-level tokenization for Chinese text

The notebook automatically handles:

  • Space removal for Chinese text (Chinese doesn't use spaces)
  • Character-level vs word-level tokenization
  • Error handling and progress tracking

Notes

This implementation is for educational purposes and follows the original paper's architecture. For production use, consider optimizations and improvements beyond the original design.

Contributing

This is a learning project. Feel free to fork and experiment with different architectures, optimizations, or training strategies.

About

Just an implementation of the original Transformers

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors