[MLIR][NVVM] Update mbarrier.init/inval Ops to use AnyTypeOf[]#165558
Merged
durga4github merged 1 commit intoOct 31, 2025
Conversation
Member
|
@llvm/pr-subscribers-flang-fir-hlfir @llvm/pr-subscribers-mlir-gpu Author: Durgadoss R (durga4github) ChangesThis patch updates the mbarrier.init/inval Ops
Full diff: https://github.com/llvm/llvm-project/pull/165558.diff 6 Files Affected:
diff --git a/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td b/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
index 4f483859ac18d..b572ef9c1d07b 100644
--- a/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
+++ b/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
@@ -579,7 +579,8 @@ def NVVM_PMEventOp : NVVM_PTXBuilder_Op<"pmevent">,
/// mbarrier.init instruction with generic pointer type
def NVVM_MBarrierInitOp : NVVM_PTXBuilder_Op<"mbarrier.init">,
- Arguments<(ins LLVM_AnyPointer:$addr, I32:$count, PtxPredicate:$predicate)> {
+ Arguments<(ins AnyTypeOf<[LLVM_PointerGeneric, LLVM_PointerShared]>:$addr,
+ I32:$count, PtxPredicate:$predicate)> {
let summary = "MBarrier Initialization Op";
let description = [{
The `nvvm.mbarrier.init` operation initializes an *mbarrier object* at the specified
@@ -592,48 +593,35 @@ def NVVM_MBarrierInitOp : NVVM_PTXBuilder_Op<"mbarrier.init">,
- Transaction count (tx-count): 0
The operation takes the following operands:
- - `addr`: A pointer to the memory location of the *mbarrier object*. Uses generic
- addressing, but the address must still be in the shared memory space.
+ - `addr`: A pointer to the memory location of the *mbarrier object*. The `addr`
+ must be a pointer to generic or shared::cta memory. When it is generic, the
+ underlying address must be within the shared::cta memory space; otherwise
+ the behavior is undefined.
- `count`: Integer specifying the number of threads that will participate in barrier
synchronization. Must be in the range [1, 2²⁰ - 1].
- `predicate`: Optional predicate for conditional execution.
[For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-mbarrier-init)
}];
- string llvmBuilder = [{
- createIntrinsicCall(builder, llvm::Intrinsic::nvvm_mbarrier_init, {$addr, $count});
- }];
let assemblyFormat = "$addr `,` $count (`,` `predicate` `=` $predicate^)? attr-dict `:` type(operands)";
+
let extraClassDeclaration = [{
bool hasIntrinsic() { if(getPredicate()) return false; return true; }
- }];
- let extraClassDefinition = [{
- std::string $cppClass::getPtx() { return std::string("mbarrier.init.b64 [%0], %1;"); }
- }];
-}
-/// mbarrier.init instruction with shared pointer type
-def NVVM_MBarrierInitSharedOp : NVVM_PTXBuilder_Op<"mbarrier.init.shared", [NVVMRequiresSM<80>, DeclareOpInterfaceMethods<BasicPtxBuilderOpInterface>]>,
- Arguments<(ins LLVM_PointerShared:$addr, I32:$count, PtxPredicate:$predicate)> {
- let summary = "Shared MBarrier Initialization Op";
- let description = [{
- This Op is the same as `nvvm.mbarrier.init` except that the *mbarrier object*
- should be accessed using a shared-memory pointer instead of a generic-memory pointer.
-
- [For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-mbarrier-init)
+ static mlir::NVVM::IDArgPair
+ getIntrinsicIDAndArgs(Operation &op, LLVM::ModuleTranslation &mt,
+ llvm::IRBuilderBase& builder);
}];
+
string llvmBuilder = [{
- createIntrinsicCall(builder, llvm::Intrinsic::nvvm_mbarrier_init_shared, {$addr, $count});
- }];
- let assemblyFormat = "$addr `,` $count (`,` `predicate` `=` $predicate^)? attr-dict `:` type(operands)";
- let extraClassDeclaration = "bool hasIntrinsic() { return !getPredicate(); }";
- let extraClassDefinition = [{
- std::string $cppClass::getPtx() { return std::string("mbarrier.init.shared.b64 [%0], %1;"); }
+ auto [id, args] = NVVM::MBarrierInitOp::getIntrinsicIDAndArgs(
+ *op, moduleTranslation, builder);
+ createIntrinsicCall(builder, id, args);
}];
}
def NVVM_MBarrierInvalOp : NVVM_Op<"mbarrier.inval">,
- Arguments<(ins LLVM_AnyPointer:$addr)> {
+ Arguments<(ins AnyTypeOf<[LLVM_PointerGeneric, LLVM_PointerShared]>:$addr)> {
let summary = "MBarrier Invalidation Operation";
let description = [{
The `nvvm.mbarrier.inval` operation invalidates an *mbarrier object* at the
@@ -644,30 +632,27 @@ def NVVM_MBarrierInvalOp : NVVM_Op<"mbarrier.inval">,
It is undefined behavior if the *mbarrier object* is already invalid.
The operation takes the following operand:
- - `addr`: A pointer to the memory location of the *mbarrier object*. Uses generic
- addressing, but the address must still be in the shared memory space.
+ - `addr`: A pointer to the memory location of the *mbarrier object*. The `addr`
+ must be a pointer to generic or shared::cta memory. When it is generic, the
+ underlying address must be within the shared::cta memory space; otherwise
+ the behavior is undefined.
[For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-mbarrier-inval)
}];
- string llvmBuilder = [{
- createIntrinsicCall(builder, llvm::Intrinsic::nvvm_mbarrier_inval, {$addr});
- }];
- let assemblyFormat = "$addr attr-dict `:` type(operands)";
-}
-def NVVM_MBarrierInvalSharedOp : NVVM_Op<"mbarrier.inval.shared">,
- Arguments<(ins LLVM_PointerShared:$addr)> {
- let summary = "Shared MBarrier Invalidation Operation";
- let description = [{
- This Op is the same as `nvvm.mbarrier.inval` except that the *mbarrier object*
- should be accessed using a shared-memory pointer instead of a generic-memory pointer.
+ let assemblyFormat = "$addr attr-dict `:` type(operands)";
- [For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-mbarrier-inval)
+ let extraClassDeclaration = [{
+ static mlir::NVVM::IDArgPair
+ getIntrinsicIDAndArgs(Operation &op, LLVM::ModuleTranslation &mt,
+ llvm::IRBuilderBase& builder);
}];
+
string llvmBuilder = [{
- createIntrinsicCall(builder, llvm::Intrinsic::nvvm_mbarrier_inval_shared, {$addr});
+ auto [id, args] = NVVM::MBarrierInvalOp::getIntrinsicIDAndArgs(
+ *op, moduleTranslation, builder);
+ createIntrinsicCall(builder, id, args);
}];
- let assemblyFormat = "$addr attr-dict `:` type(operands)";
}
def NVVM_MBarrierArriveOp : NVVM_Op<"mbarrier.arrive">,
diff --git a/mlir/lib/Conversion/NVGPUToNVVM/NVGPUToNVVM.cpp b/mlir/lib/Conversion/NVGPUToNVVM/NVGPUToNVVM.cpp
index a9efada28a320..ec182f1db48ac 100644
--- a/mlir/lib/Conversion/NVGPUToNVVM/NVGPUToNVVM.cpp
+++ b/mlir/lib/Conversion/NVGPUToNVVM/NVGPUToNVVM.cpp
@@ -846,13 +846,8 @@ struct NVGPUMBarrierInitLowering
Value barrier = getMbarrierPtr(b, mbarrierType, adaptor.getBarriers(),
adaptor.getMbarId(), rewriter);
Value count = truncToI32(b, adaptor.getCount());
- if (isMbarrierShared(mbarrierType)) {
- rewriter.replaceOpWithNewOp<NVVM::MBarrierInitSharedOp>(
- op, barrier, count, adaptor.getPredicate());
- } else {
- rewriter.replaceOpWithNewOp<NVVM::MBarrierInitOp>(op, barrier, count,
- adaptor.getPredicate());
- }
+ rewriter.replaceOpWithNewOp<NVVM::MBarrierInitOp>(op, barrier, count,
+ adaptor.getPredicate());
return success();
}
};
diff --git a/mlir/lib/Dialect/LLVMIR/IR/NVVMDialect.cpp b/mlir/lib/Dialect/LLVMIR/IR/NVVMDialect.cpp
index f0de4dbcc1d4b..53a6f43c0bbcf 100644
--- a/mlir/lib/Dialect/LLVMIR/IR/NVVMDialect.cpp
+++ b/mlir/lib/Dialect/LLVMIR/IR/NVVMDialect.cpp
@@ -1607,10 +1607,53 @@ void Tcgen05MmaSmemDescOp::createSmemDescriptor(Operation &op,
mt.mapValue(thisOp.getRes()) = smemDesc;
}
+//===----------------------------------------------------------------------===//
+// getPtx methods
+//===----------------------------------------------------------------------===//
+
+std::string NVVM::MBarrierInitOp::getPtx() {
+ unsigned addressSpace =
+ llvm::cast<LLVM::LLVMPointerType>(getAddr().getType()).getAddressSpace();
+ return (addressSpace == NVVMMemorySpace::Shared)
+ ? std::string("mbarrier.init.shared.b64 [%0], %1;")
+ : std::string("mbarrier.init.b64 [%0], %1;");
+}
+
//===----------------------------------------------------------------------===//
// getIntrinsicID/getIntrinsicIDAndArgs methods
//===----------------------------------------------------------------------===//
+mlir::NVVM::IDArgPair MBarrierInitOp::getIntrinsicIDAndArgs(
+ Operation &op, LLVM::ModuleTranslation &mt, llvm::IRBuilderBase &builder) {
+ auto thisOp = cast<NVVM::MBarrierInitOp>(op);
+ unsigned addressSpace =
+ llvm::cast<LLVM::LLVMPointerType>(thisOp.getAddr().getType())
+ .getAddressSpace();
+ llvm::Intrinsic::ID id = (addressSpace == NVVMMemorySpace::Shared)
+ ? llvm::Intrinsic::nvvm_mbarrier_init_shared
+ : llvm::Intrinsic::nvvm_mbarrier_init;
+
+ // Fill the Intrinsic Args
+ llvm::SmallVector<llvm::Value *> args;
+ args.push_back(mt.lookupValue(thisOp.getAddr()));
+ args.push_back(mt.lookupValue(thisOp.getCount()));
+
+ return {id, std::move(args)};
+}
+
+mlir::NVVM::IDArgPair MBarrierInvalOp::getIntrinsicIDAndArgs(
+ Operation &op, LLVM::ModuleTranslation &mt, llvm::IRBuilderBase &builder) {
+ auto thisOp = cast<NVVM::MBarrierInvalOp>(op);
+ unsigned addressSpace =
+ llvm::cast<LLVM::LLVMPointerType>(thisOp.getAddr().getType())
+ .getAddressSpace();
+ llvm::Intrinsic::ID id = (addressSpace == NVVMMemorySpace::Shared)
+ ? llvm::Intrinsic::nvvm_mbarrier_inval_shared
+ : llvm::Intrinsic::nvvm_mbarrier_inval;
+
+ return {id, {mt.lookupValue(thisOp.getAddr())}};
+}
+
#define CP_ASYNC_ID_IMPL(mod, size, suffix) \
llvm::Intrinsic::nvvm_cp_async_##mod##_shared_global_##size##suffix
diff --git a/mlir/test/Conversion/NVGPUToNVVM/nvgpu-to-nvvm.mlir b/mlir/test/Conversion/NVGPUToNVVM/nvgpu-to-nvvm.mlir
index 5755ca9258283..8cce6308018e2 100644
--- a/mlir/test/Conversion/NVGPUToNVVM/nvgpu-to-nvvm.mlir
+++ b/mlir/test/Conversion/NVGPUToNVVM/nvgpu-to-nvvm.mlir
@@ -486,7 +486,7 @@ func.func @mbarrier() {
// CHECK: %[[barStr:.+]] = builtin.unrealized_conversion_cast %[[barMemref]] : memref<1xi64, 3> to !llvm.struct<(ptr<3>, ptr<3>, i64, array<1 x i64>, array<1 x i64>)>
// CHECK: %[[base:.+]] = llvm.extractvalue %[[barStr]][1] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<1 x i64>, array<1 x i64>)>
// CHECK: %[[barPtr:.+]] = llvm.getelementptr %[[base]][%[[mid]]] : (!llvm.ptr<3>, i64) -> !llvm.ptr<3>, i64
- // CHECK: nvvm.mbarrier.init.shared %[[barPtr]]
+ // CHECK: nvvm.mbarrier.init %[[barPtr]]
nvgpu.mbarrier.init %barrier[%c0], %num_threads : !barrierType
// CHECK: %[[base2:.+]] = llvm.extractvalue %[[barStr]][1] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<1 x i64>, array<1 x i64>)>
@@ -516,7 +516,7 @@ func.func @mbarrier_nocomplete() {
// CHECK: %[[barStr:.+]] = builtin.unrealized_conversion_cast %[[barMemref]] : memref<1xi64, 3> to !llvm.struct<(ptr<3>, ptr<3>, i64, array<1 x i64>, array<1 x i64>)>
// CHECK: %[[base:.+]] = llvm.extractvalue %[[barStr]][1] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<1 x i64>, array<1 x i64>)>
// CHECK: %[[barPtr:.+]] = llvm.getelementptr %[[base]][%[[mid]]] : (!llvm.ptr<3>, i64) -> !llvm.ptr<3>, i64
- // CHECK: nvvm.mbarrier.init.shared %[[barPtr]]
+ // CHECK: nvvm.mbarrier.init %[[barPtr]]
nvgpu.mbarrier.init %barrier[%c0], %num_threads : !barrierType
// CHECK: %[[base2:.+]] = llvm.extractvalue %[[barStr]][1] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<1 x i64>, array<1 x i64>)>
@@ -592,7 +592,7 @@ func.func @mbarrier_txcount() {
// CHECK: %[[barStr:.+]] = builtin.unrealized_conversion_cast %[[barMemref]] : memref<1xi64, 3> to !llvm.struct<(ptr<3>, ptr<3>, i64, array<1 x i64>, array<1 x i64>)>
// CHECK: %[[base:.+]] = llvm.extractvalue %[[barStr]][1] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<1 x i64>, array<1 x i64>)>
// CHECK: %[[barPtr:.+]] = llvm.getelementptr %[[base]][%[[mid]]] : (!llvm.ptr<3>, i64) -> !llvm.ptr<3>, i64
- // CHECK: nvvm.mbarrier.init.shared %[[barPtr]]
+ // CHECK: nvvm.mbarrier.init %[[barPtr]]
nvgpu.mbarrier.init %barrier[%c0], %num_threads : !barrierType
%tidxreg = nvvm.read.ptx.sreg.tid.x : i32
@@ -643,7 +643,7 @@ func.func @mbarrier_txcount_pred() {
// CHECK: %[[barStr:.+]] = builtin.unrealized_conversion_cast %[[barMemref]] : memref<1xi64, 3> to !llvm.struct<(ptr<3>, ptr<3>, i64, array<1 x i64>, array<1 x i64>)>
// CHECK: %[[base:.+]] = llvm.extractvalue %[[barStr]][1] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<1 x i64>, array<1 x i64>)>
// CHECK: %[[barPtr:.+]] = llvm.getelementptr %[[base]][%[[mid]]] : (!llvm.ptr<3>, i64) -> !llvm.ptr<3>, i64
- // CHECK: nvvm.mbarrier.init.shared %[[barPtr]], {{.*}}, predicate = %[[P]]
+ // CHECK: nvvm.mbarrier.init %[[barPtr]], {{.*}}, predicate = %[[P]]
nvgpu.mbarrier.init %barrier[%c0], %mine, predicate = %pred : !barrierType
%txcount = arith.constant 256 : index
diff --git a/mlir/test/Conversion/NVVMToLLVM/nvvm-to-llvm.mlir b/mlir/test/Conversion/NVVMToLLVM/nvvm-to-llvm.mlir
index 6960e83be3573..fbc4c0af60360 100644
--- a/mlir/test/Conversion/NVVMToLLVM/nvvm-to-llvm.mlir
+++ b/mlir/test/Conversion/NVVMToLLVM/nvvm-to-llvm.mlir
@@ -8,7 +8,7 @@
// CHECK-LABEL: @init_mbarrier
llvm.func @init_mbarrier(%barrier_gen : !llvm.ptr, %barrier : !llvm.ptr<3>, %count : i32, %pred : i1) {
//CHECK: llvm.inline_asm has_side_effects asm_dialect = att "@$2 mbarrier.init.shared.b64 [$0], $1;", "r,r,b"
- nvvm.mbarrier.init.shared %barrier, %count, predicate = %pred : !llvm.ptr<3>, i32, i1
+ nvvm.mbarrier.init %barrier, %count, predicate = %pred : !llvm.ptr<3>, i32, i1
//CHECK: llvm.inline_asm has_side_effects asm_dialect = att "@$2 mbarrier.init.b64 [$0], $1;", "l,r,b"
nvvm.mbarrier.init %barrier_gen, %count, predicate = %pred : !llvm.ptr, i32, i1
llvm.return
diff --git a/mlir/test/Dialect/LLVMIR/nvvm.mlir b/mlir/test/Dialect/LLVMIR/nvvm.mlir
index 0243f5eb8c862..2505e56407c2b 100644
--- a/mlir/test/Dialect/LLVMIR/nvvm.mlir
+++ b/mlir/test/Dialect/LLVMIR/nvvm.mlir
@@ -419,8 +419,8 @@ llvm.func private @mbarrier_init_generic(%barrier: !llvm.ptr) {
llvm.func private @mbarrier_init_shared(%barrier: !llvm.ptr<3>) {
%count = nvvm.read.ptx.sreg.ntid.x : i32
- // CHECK: nvvm.mbarrier.init.shared %{{.*}}, %{{.*}} : !llvm.ptr<3>, i32
- nvvm.mbarrier.init.shared %barrier, %count : !llvm.ptr<3>, i32
+ // CHECK: nvvm.mbarrier.init %{{.*}}, %{{.*}} : !llvm.ptr<3>, i32
+ nvvm.mbarrier.init %barrier, %count : !llvm.ptr<3>, i32
llvm.return
}
@@ -433,8 +433,8 @@ llvm.func private @mbarrier_inval_generic(%barrier: !llvm.ptr) {
llvm.func private @mbarrier_inval_shared(%barrier: !llvm.ptr<3>) {
- // CHECK: nvvm.mbarrier.inval.shared %{{.*}} : !llvm.ptr<3>
- nvvm.mbarrier.inval.shared %barrier : !llvm.ptr<3>
+ // CHECK: nvvm.mbarrier.inval %{{.*}} : !llvm.ptr<3>
+ nvvm.mbarrier.inval %barrier : !llvm.ptr<3>
llvm.return
}
|
2b65afc to
974d5e7
Compare
This patch updates the mbarrier.init/inval Ops to use the AnyTypeOf[] construct for their `addr` argument. This enables us to have a single Op that can take a pointer in either generic or shared memory space and generate the right intrinsics during the lowering. * Existing tests are updated. * Locally verified that there are no new regressions in the Integration tests. Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
974d5e7 to
b2601d7
Compare
grypp
approved these changes
Oct 30, 2025
DEBADRIBASAK
pushed a commit
to DEBADRIBASAK/llvm-project
that referenced
this pull request
Nov 3, 2025
…165558) This patch updates the mbarrier.init/inval Ops to use the AnyTypeOf[] construct for their `addr` argument. This enables us to have a single Op that can take a pointer in either generic or shared memory space and generate the right intrinsics during the lowering. * Updated existing tests accordingly. * Verified locally that there are no new regressions in `integration` tests. * TODO: Additional updates for the remaining mbarrier Ops are in progress. These will be refactored in subsequent patches. Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
durga4github
added a commit
to durga4github/llvm-project
that referenced
this pull request
Nov 5, 2025
This is a follow up of PR llvm#165558. (1/n) This patch updates the below mbarrier Ops to use AnyTypeOf[] construct: * mbarrier.arrive * mbarrier.arrive.noComplete * mbarrier.test.wait * cp.async.mbarrier.arrive * Updated existing tests accordingly. * Verified locally that there are no new regressions in the `integration` tests. * TODO: A few more Ops are remaining and will be migrated in a subsequent PR. Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
durga4github
added a commit
that referenced
this pull request
Nov 5, 2025
This is a follow up of PR #165558. (1/n) This patch updates the below mbarrier Ops to use AnyTypeOf[] construct: ``` * mbarrier.arrive * mbarrier.arrive.noComplete * mbarrier.test.wait * cp.async.mbarrier.arrive ``` * Updated existing tests accordingly. * Verified locally that there are no new regressions in the `integration` tests. * TODO: Two more Ops remain and will be migrated in a subsequent PR. Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
git-crd
pushed a commit
to git-crd/crd-llvm-project
that referenced
this pull request
Nov 13, 2025
This is a follow-up of PR llvm#165558 and llvm#165993. This patch updates the remaining two Ops to use the AnyTypeOf[] construct, completing the migration for the mbarrier family of Ops. ``` mbarrier.arrive.expect_tx mbarrier.try_wait.parity ``` Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This patch updates the mbarrier.init/inval Ops
to use the AnyTypeOf[] construct for their
addrargument. This enables us to have asingle Op that can take a pointer in either
generic or shared memory space and generate the
right intrinsics during the lowering.
integrationtests.These will be refactored in subsequent patches.