Independent C++ implementation package for Approximate Matrix Multiplication (AMM) algorithms, extracted from the SAGE project.
Status: 🚀 Active Development
PyPI Package: isage-amms
Parent Project: SAGE - Unified ML System
SAGE-AMMS provides high-performance C++ implementations of various approximate matrix multiplication algorithms with Python bindings. This package was extracted from the main SAGE repository to:
- ✅ Enable independent versioning and releases
- ✅ Reduce main SAGE repository size
- ✅ Allow optional installation
- ✅ Simplify CI/CD for C++ builds
- ✅ Make AMM algorithms reusable in other projects
pip install isage-ammsPrerequisites:
- CMake >= 3.14
- C++14 compatible compiler
- Python >= 3.8
- PyTorch >= 2.0.0
# Clone the repository
git clone https://github.com/intellistream/sage-amms.git
cd sage-amms
# Install in development mode
pip install -e .
# Or build wheel
python -m build --wheel
pip install dist/*.whlThis package provides the algorithm implementations. The unified interface is provided by the main SAGE package:
# Install both packages
# pip install sage isage-amms
# Import from SAGE (provides the interface)
from sage.libs.amms import create, registered
# Check available algorithms
print(registered())
# Output: ['countsketch', 'fastjlt', 'crs', 'bcrs', ...]
# Create an algorithm instance
amm = create("countsketch", sketch_size=1000)
# Perform approximate matrix multiplication
import numpy as np
A = np.random.randn(1000, 500)
B = np.random.randn(500, 800)
result = amm.multiply(A, B) # Approximate A @ BSee ARCHITECTURE.md for detailed architecture documentation.
AMMS provides a unified interface for various approximate matrix multiplication algorithms, similar to how ANNS provides a unified interface for approximate nearest neighbor search algorithms.
sage-amms/
├── sage/libs/amms/
│ ├── __init__.py # Package initialization
│ ├── implementations/ # C++ source code
│ │ ├── include/ # C++ headers
│ │ │ ├── CPPAlgos/ # Core AMM algorithm implementations
│ │ │ ├── MatrixLoader/ # Matrix loading utilities
│ │ │ ├── Utils/ # Utility functions
│ │ │ └── ...
│ │ ├── src/ # C++ implementation files
│ │ │ ├── CPPAlgos/ # Algorithm implementations
│ │ │ ├── PyAMM.cpp # Python bindings
│ │ │ └── ...
│ │ └── CMakeLists.txt # Build configuration
│ └── wrappers/ # Python wrappers
│ └── pyamm.py # PyAMM wrapper
├── tests/ # Unit tests
├── pyproject.toml # Package metadata
└── setup.py # Build configuration
This package provides implementations of various AMM algorithms:
- CountSketch: Count-Min Sketch based AMM
- FastJLT: Fast Johnson-Lindenstrauss Transform
- RIP: Random Index Projection
- TugOfWar: Tug-of-war sketch
- CRS: Coordinate-wise Random Sampling
- CRSV2: Improved CRS
- BCRS: Block-wise CRS
- EWS: Entry-wise Sampling
- ProductQuantization: Product quantization for AMM
- VectorQuantization: Vector quantization
- INT8: 8-bit integer quantization
- CoOccurringFD: Co-occurring Feature Detection
- BetaCoOFD: Beta Co-occurring Feature Detection
- BlockLRA: Block Low-Rank Approximation
- CLMM: Clustered Low-rank Matrix Multiplication
- SMPCA: Symmetric Matrix PCA
- WeightedCR: Weighted Cross-Ranking
# Install CPU-only version
pip install isage-ammsThis installs the CPU-only version with all core AMM algorithms.
Prerequisites:
- CMake >= 3.14
- C++14 compatible compiler (GCC 7+, Clang 5+, MSVC 2017+)
- Python 3.8-3.12
- PyTorch >= 2.0.0
- 64GB+ RAM recommended for building
# Clone repository
git clone https://github.com/intellistream/sage-amms.git
cd sage-amms
# CPU-only build
pip install -e .
# CUDA-enabled build
AMMS_ENABLE_CUDA=1 pip install -e .
# Explicitly disable CUDA
AMMS_ENABLE_CUDA=0 pip install -e .AMMS_ENABLE_CUDA is an explicit switch. Use 1 to enable CUDA and 0 to force CPU build.
# Enable CUDA
AMMS_ENABLE_CUDA=1 pip install isage-amms --no-binary :all:
# Force CPU-only build
AMMS_ENABLE_CUDA=0 pip install isage-amms --no-binary :all:
# Specify CUDA path
CUDA_HOME=/usr/local/cuda AMMS_ENABLE_CUDA=1 pip install isage-amms --no-binary :all:Build mode is now selected automatically by memory probe:
- If available memory >=
AMMS_FAST_BUILD_MEMORY_GB(default48), fast build is enabled. - Otherwise low-memory mode is enabled to reduce OOM risk.
# Default behavior (auto memory probe)
pip install -e .You can still set it explicitly:
AMMS_LOW_MEMORY_BUILD=1 pip install isage-amms --no-binary :all:If you have enough RAM and want faster compilation:
AMMS_FAST_BUILD=1 AMMS_MAX_JOBS=4 pip install -e .Adjust auto fast-build threshold:
AMMS_FAST_BUILD_MEMORY_GB=64 pip install -e .instructions.
# Navigate to amms directory
cd packages/sage-libs/src/sage/libs/amms
# Quick build
./quick_build.sh
# Or use the full build script with options
./publish_to_pypi.sh --build-only --low-memory
# Install locally
pip install dist/isage_amms-*.whlThe build system supports various options:
# Low-memory build (default)
export AMMS_LOW_MEMORY_BUILD=1
# Default parallelism is 1 job in low-memory mode
export AMMS_MAX_JOBS=1
# Enable CUDA support
export AMMS_ENABLE_CUDA=1
export CUDA_HOME=/usr/local/cuda
# Override parallel jobs when memory is sufficient
export AMMS_MAX_JOBS=4For maintainers who want to build and publish to PyPI:
# Build only (dry-run, no upload)
./publish_to_pypi.sh
# Build and upload to TestPyPI
./publish_to_pypi.sh --test-pypi --no-dry-run
# Build and upload to PyPI (production)
./publish_to_pypi.sh --no-dry-run
# With CUDA and low-memory options
./publish_to_pypi.sh --cuda --low-memory --no-dry-runSee BUILD_PUBLISH.md for comprehensive build and publish documentation.
from sage.libs.amms import create_amm_index
# Create an AMM index using the factory
amm = create_amm_index("countsketch", config={
"sketch_size": 1000,
"hash_functions": 5
})
# Perform approximate matrix multiplication
result = amm.multiply(matrix_a, matrix_b)from sage.libs.amms.wrappers.pyamm import PyAMM
# Create a specific AMM algorithm instance
amm = PyAMM.CountSketch(sketch_size=1000)
# Use the algorithm
result = amm.multiply(matrix_a, matrix_b)For benchmarking AMM algorithms, see the sage-benchmark package:
# Run AMM benchmarks
sage-dev benchmark amm --algorithms countsketch,fastjlt --datasets dataset1See packages/sage-benchmark/src/sage/benchmark/benchmark_libamm/README.md for details.
# Build-path matrix + perf baseline regression
pytest -q tests/test_issue6_build_matrix_and_perf_baseline.py
# CUDA/CPU switch cleanup regression
pytest -q tests/test_issue5_cuda_cpu_switch_cleanup.pyThis module is refactored from the original libamm submodule:
- Algorithm implementations: Moved from
libamm/include/CPPAlgosandlibamm/src/CPPAlgostoamms/implementations/ - Benchmarking code: Moved from
libamm/benchmark/tosage-benchmark/benchmark_libamm/ - Python bindings: Refactored into
amms/wrappers/pyamm/ - Interface layer: New unified interface similar to ANNS
AMMS follows SAGE's architecture principles:
- Layer 3 (L3-libs): Algorithm implementations and interfaces
- Separation of concerns: Core algorithms (amms/) vs benchmarking (benchmark_libamm/)
- Unified interfaces: Factory pattern for algorithm creation
- Modular design: Independent wrappers for different algorithm families
- Original LibAMM paper and documentation
- PyTorch integration guide
- AMM algorithm theory and applications
When adding new AMM algorithms:
- Add C++ implementation to
implementations/include/CPPAlgos/andimplementations/src/CPPAlgos/ - Create Python wrapper in
wrappers/ - Register algorithm in
interface/registry.py - Add tests in
sage-libs/tests/ - Add benchmark configuration in
sage-benchmark/benchmark_libamm/
See CONTRIBUTING.md at project root for detailed guidelines.