Skip to content

feat(ops): add RmsNorm with Iluvatar, NVIDIA, CPU backends and fp16/bf16 support#6

Open
zhangyue207 wants to merge 5 commits intofeat/dev-infrafrom
feat/dev-rmsnorm-cuda
Open

feat(ops): add RmsNorm with Iluvatar, NVIDIA, CPU backends and fp16/bf16 support#6
zhangyue207 wants to merge 5 commits intofeat/dev-infrafrom
feat/dev-rmsnorm-cuda

Conversation

@zhangyue207
Copy link

  • Add 'RmsNorm' operator with 'CPU', 'NVIDIA', and 'Iluvatar' implementations
  • Support fp32/fp16/bf16 on NVIDIA and Iluvatar; fp32 only on CPU
  • Add shared CUDA kernel (kernel.cuh) and backend-specific wrappers
  • Extend generate_wrappers.py and CMake for RmsNorm
  • Add tests covering backends and dtypes

@zhangyue207 zhangyue207 changed the base branch from master to feat/dev-infra March 2, 2026 02:52
@zhangyue207 zhangyue207 force-pushed the feat/dev-rmsnorm-cuda branch from ea03f0f to 10187f4 Compare March 2, 2026 03:22
@zhangyue207 zhangyue207 changed the title feat(ops): add RmsNorm with Iluvatar, NVIDIA, CPU backends and fp16/bf16 support feat(ops): add 'RmsNorm' with Iluvatar, NVIDIA, CPU backends and fp16/bf16 support Mar 2, 2026
@zhangyue207 zhangyue207 changed the title feat(ops): add 'RmsNorm' with Iluvatar, NVIDIA, CPU backends and fp16/bf16 support feat(ops): add RmsNorm with Iluvatar, NVIDIA, CPU backends and fp16/bf16 support Mar 2, 2026
@zhangyue207
Copy link
Author

zhangyue207 commented Mar 2, 2026

Iluvatar

root@iluvatar:/workspace/InfiniOps# pytest
==================================== test session starts ====================================
platform linux -- Python 3.10.18, pytest-9.0.2, pluggy-1.6.0
rootdir: /workspace/InfiniOps
configfile: pyproject.toml
plugins: anyio-4.9.0, cov-7.0.0, xdist-3.8.0, typeguard-4.4.4
collected 572 items                                                                         

tests/test_add.py ....................................                                [  6%]
tests/test_gemm.py .................................................................. [ 17%]
..................................................................................... [ 32%]
..................................................................................... [ 47%]
..................................................................................... [ 62%]
..................................................................................... [ 77%]
..................................................................................... [ 92%]
.........                                                                             [ 93%]
tests/test_rms_norm.py ....................................                           [100%]

==================================== 572 passed in 1.61s ====================================

@zhangyue207
Copy link
Author

Nvidia

(python3.10) zhangyue@server:~/InfiniOps$ pytest
========================================== test session starts ==========================================
platform linux -- Python 3.10.19, pytest-9.0.2, pluggy-1.6.0
rootdir: /home/zhangyue/InfiniOps
configfile: pyproject.toml
plugins: xdist-3.8.0, cov-7.0.0
collected 572 items                                                                                     

tests/test_add.py ....................................                                            [  6%]
tests/test_gemm.py .............................................................................. [ 19%]
................................................................................................. [ 36%]
................................................................................................. [ 53%]
................................................................................................. [ 70%]
................................................................................................. [ 87%]
..................................                                                                [ 93%]
tests/test_rms_norm.py ....................................                                       [100%]

========================================== 572 passed in 2.20s ==========================================

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant