Skip to content

[OpenACC][CIR] Implement 'alloca copying' for private lowering#161382

Merged
erichkeane merged 2 commits into
llvm:mainfrom
erichkeane:PrivatePointerLowering_Copies
Oct 1, 2025
Merged

[OpenACC][CIR] Implement 'alloca copying' for private lowering#161382
erichkeane merged 2 commits into
llvm:mainfrom
erichkeane:PrivatePointerLowering_Copies

Conversation

@erichkeane

Copy link
Copy Markdown
Contributor

The previous patch ensured that we correctly got the allocas put in place. This patch takes the address of each element of each alloca, and copies it to the previous one. This allows us to re-form the pointer-structure for a recipe.

The previous patch ensured that we correctly got the allocas put in
place.  This patch takes the address of each element of each alloca, and
copies it to the previous one.  This allows us to re-form the
pointer-structure for a recipe.
@llvmbot llvmbot added clang Clang issues not falling into any other category ClangIR Anything related to the ClangIR project labels Sep 30, 2025
@llvmbot

llvmbot commented Sep 30, 2025

Copy link
Copy Markdown
Member

@llvm/pr-subscribers-clangir

Author: Erich Keane (erichkeane)

Changes

The previous patch ensured that we correctly got the allocas put in place. This patch takes the address of each element of each alloca, and copies it to the previous one. This allows us to re-form the pointer-structure for a recipe.


Patch is 200.56 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/161382.diff

8 Files Affected:

  • (modified) clang/lib/CIR/CodeGen/CIRGenOpenACCRecipe.cpp (+90-8)
  • (modified) clang/lib/CIR/CodeGen/CIRGenOpenACCRecipe.h (+4)
  • (modified) clang/test/CIR/CodeGenOpenACC/private-clause-pointer-array-recipes-CtorDtor.cpp (+489)
  • (modified) clang/test/CIR/CodeGenOpenACC/private-clause-pointer-array-recipes-NoOps.cpp (+485)
  • (modified) clang/test/CIR/CodeGenOpenACC/private-clause-pointer-array-recipes-int.cpp (+487-1)
  • (modified) clang/test/CIR/CodeGenOpenACC/private-clause-pointer-recipes-CtorDtor.cpp (+266)
  • (modified) clang/test/CIR/CodeGenOpenACC/private-clause-pointer-recipes-NoOps.cpp (+268)
  • (modified) clang/test/CIR/CodeGenOpenACC/private-clause-pointer-recipes-int.cpp (+267-1)
diff --git a/clang/lib/CIR/CodeGen/CIRGenOpenACCRecipe.cpp b/clang/lib/CIR/CodeGen/CIRGenOpenACCRecipe.cpp
index e41c2d85fbd5d..c9316635ab5ba 100644
--- a/clang/lib/CIR/CodeGen/CIRGenOpenACCRecipe.cpp
+++ b/clang/lib/CIR/CodeGen/CIRGenOpenACCRecipe.cpp
@@ -36,6 +36,77 @@ mlir::Block *OpenACCRecipeBuilderBase::createRecipeBlock(mlir::Region &region,
   llvm::SmallVector<mlir::Location> locs{types.size(), loc};
   return builder.createBlock(&region, region.end(), types, locs);
 }
+void OpenACCRecipeBuilderBase::makeAllocaCopy(mlir::Location loc,
+                                              mlir::Type copyType,
+                                              mlir::Value numEltsToCopy,
+                                              mlir::Value offsetPerSubarray,
+                                              mlir::Value destAlloca,
+                                              mlir::Value srcAlloca) {
+  mlir::OpBuilder::InsertionGuard guardCase(builder);
+
+  mlir::Type itrTy = cgf.cgm.convertType(cgf.getContext().UnsignedLongLongTy);
+  auto itrPtrTy = cir::PointerType::get(itrTy);
+  mlir::IntegerAttr itrAlign =
+      cgf.cgm.getSize(cgf.getContext().getTypeAlignInChars(
+          cgf.getContext().UnsignedLongLongTy));
+
+  auto loopBuilder = [&]() {
+    auto itr =
+        cir::AllocaOp::create(builder, loc, itrPtrTy, itrTy, "itr", itrAlign);
+    cir::ConstantOp constZero = builder.getConstInt(loc, itrTy, 0);
+    builder.CIRBaseBuilderTy::createStore(loc, constZero.getResult(), itr);
+    builder.createFor(
+        loc,
+        /*condBuilder=*/
+        [&](mlir::OpBuilder &b, mlir::Location loc) {
+          // itr < numEltsToCopy
+          // Enforce a trip count of 1 if there wasn't any element count, this
+          // way we can just use this loop with a constant bounds instead of a
+          // separate code path.
+          if (!numEltsToCopy)
+            numEltsToCopy = builder.getConstInt(loc, itrTy, 1).getResult();
+
+          auto loadCur = cir::LoadOp::create(builder, loc, {itr});
+          auto cmp = builder.createCompare(loc, cir::CmpOpKind::lt,
+                                           loadCur.getResult(), numEltsToCopy);
+          builder.createCondition(cmp);
+        },
+        /*bodyBuilder=*/
+        [&](mlir::OpBuilder &b, mlir::Location loc) {
+          // destAlloca[itr] = srcAlloca[offsetPerSubArray * itr];
+          auto loadCur = cir::LoadOp::create(builder, loc, {itr});
+          auto srcOffset =
+              builder.createMul(loc, offsetPerSubarray, loadCur.getResult());
+
+          auto ptrToOffsetIntoSrc = cir::PtrStrideOp::create(
+              builder, loc, copyType, srcAlloca, srcOffset);
+
+          auto offsetIntoDecayDest = cir::PtrStrideOp::create(
+              builder, loc, builder.getPointerTo(copyType), destAlloca,
+              loadCur.getResult());
+
+          builder.CIRBaseBuilderTy::createStore(loc, ptrToOffsetIntoSrc,
+                                                offsetIntoDecayDest);
+          builder.createYield(loc);
+        },
+        /*stepBuilder=*/
+        [&](mlir::OpBuilder &b, mlir::Location loc) {
+          // Simple increment of the iterator.
+          auto load = cir::LoadOp::create(builder, loc, {itr});
+          auto inc =
+              cir::UnaryOp::create(builder, loc, load.getType(),
+                                   cir::UnaryOpKind::Inc, load.getResult());
+          builder.CIRBaseBuilderTy::createStore(loc, inc.getResult(), itr);
+          builder.createYield(loc);
+        });
+  };
+
+  cir::ScopeOp::create(builder, loc,
+                       [&](mlir::OpBuilder &b, mlir::Location loc) {
+                         loopBuilder();
+                         builder.createYield(loc);
+                       });
+}
 
 mlir::Value OpenACCRecipeBuilderBase::makeBoundsAlloca(
     mlir::Block *block, SourceRange exprRange, mlir::Location loc,
@@ -78,6 +149,10 @@ mlir::Value OpenACCRecipeBuilderBase::makeBoundsAlloca(
 
   bool lastBoundWasArray = isArrayTy(boundTypes.back());
 
+  // Make sure we track a moving version of this so we can get our
+  // 'copying' back to correct.
+  mlir::Value lastAlloca = initialAlloca;
+
   // Since we're iterating the types in reverse, this sets up for each index
   // corresponding to the boundsRange to be the 'after application of the
   // bounds.
@@ -125,14 +200,21 @@ mlir::Value OpenACCRecipeBuilderBase::makeBoundsAlloca(
 
       mlir::Type eltTy = cgf.convertType(resultType);
       cir::PointerType ptrTy = builder.getPointerTo(eltTy);
-      builder.createAlloca(loc, ptrTy, eltTy, "openacc.init.bounds",
-                           cgf.getContext().getTypeAlignInChars(resultType),
-                           curSize);
-
-      // TODO: OpenACC : At this point we should be copying the addresses of
-      // each element of this to the last allocation.  At the moment, that is
-      // not yet implemented.
-      cgf.cgm.errorNYI(exprRange, "OpenACC recipe alloca copying");
+      mlir::Value curAlloca = builder.createAlloca(
+          loc, ptrTy, eltTy, "openacc.init.bounds",
+          cgf.getContext().getTypeAlignInChars(resultType), curSize);
+
+      makeAllocaCopy(loc, ptrTy, cumulativeElts, eltsPerSubArray, lastAlloca,
+                     curAlloca);
+      lastAlloca = curAlloca;
+    } else {
+      // In the case of an array, we just need to decay the pointer, so just do
+      // a zero-offset stride on the last alloca to decay it down an array
+      // level.
+      cir::ConstantOp constZero = builder.getConstInt(loc, itrTy, 0);
+      lastAlloca = builder.getArrayElement(
+          loc, loc, lastAlloca, cgf.convertType(resultType),
+          constZero.getResult(), /*shouldDecay=*/true);
     }
 
     cumulativeElts = eltsToAlloca;
diff --git a/clang/lib/CIR/CodeGen/CIRGenOpenACCRecipe.h b/clang/lib/CIR/CodeGen/CIRGenOpenACCRecipe.h
index acd187bbaee0d..d802ccb151b64 100644
--- a/clang/lib/CIR/CodeGen/CIRGenOpenACCRecipe.h
+++ b/clang/lib/CIR/CodeGen/CIRGenOpenACCRecipe.h
@@ -24,6 +24,10 @@
 
 namespace clang::CIRGen {
 class OpenACCRecipeBuilderBase {
+  // makes the copy of the addresses of an alloca to the previous allocation.
+  void makeAllocaCopy(mlir::Location loc, mlir::Type copyType,
+                      mlir::Value numEltsToCopy, mlir::Value offsetPerSubarray,
+                      mlir::Value destAlloca, mlir::Value srcAlloca);
   // This function generates the required alloca, similar to
   // 'emitAutoVarAlloca', except for the OpenACC array/pointer types.
   mlir::Value makeBoundsAlloca(mlir::Block *block, SourceRange exprRange,
diff --git a/clang/test/CIR/CodeGenOpenACC/private-clause-pointer-array-recipes-CtorDtor.cpp b/clang/test/CIR/CodeGenOpenACC/private-clause-pointer-array-recipes-CtorDtor.cpp
index 52bcd7cd539f2..b6e5859532188 100644
--- a/clang/test/CIR/CodeGenOpenACC/private-clause-pointer-array-recipes-CtorDtor.cpp
+++ b/clang/test/CIR/CodeGenOpenACC/private-clause-pointer-array-recipes-CtorDtor.cpp
@@ -20,6 +20,30 @@ void do_things(unsigned A, unsigned B) {
 // CHECK-NEXT: %[[ARR_SIZE:.*]] = cir.const #cir.int<4> : !u64i
 // CHECK-NEXT: %[[ALLOCA_SIZE:.*]] = cir.binop(mul, %[[UB1_CAST]], %[[ARR_SIZE]]) : !u64i
 // CHECK-NEXT: %[[ARR_ALLOCA:.*]] = cir.alloca !rec_CtorDtor, !cir.ptr<!rec_CtorDtor>, %[[ALLOCA_SIZE]] : !u64i, ["openacc.init.bounds"] {alignment = 4 : i64}
+//
+// CHECK-NEXT: cir.scope {
+// CHECK-NEXT: %[[ITR:.*]] = cir.alloca !u64i, !cir.ptr<!u64i>, ["itr"] {alignment = 8 : i64}
+// CHECK-NEXT: %[[ZERO:.*]] = cir.const #cir.int<0> : !u64i
+// CHECK-NEXT: cir.store %[[ZERO]], %[[ITR]] : !u64i, !cir.ptr<!u64i>
+// CHECK-NEXT: cir.for : cond {
+// CHECK-NEXT: %[[UPP_BOUND:.*]] = cir.const #cir.int<1> : !u64i
+// CHECK-NEXT: %[[ITR_LOAD:.*]] = cir.load %[[ITR]] : !cir.ptr<!u64i>, !u64i
+// CHECK-NEXT: %[[CMP:.*]] = cir.cmp(lt, %[[ITR_LOAD]], %[[UPP_BOUND]]) : !u64i, !cir.bool
+// CHECK-NEXT: cir.condition(%[[CMP]]) 
+// CHECK-NEXT: } body {
+// CHECK-NEXT: %[[ITR_LOAD:.*]] = cir.load %[[ITR]] : !cir.ptr<!u64i>, !u64i
+// CHECK-NEXT: %[[SRC_IDX:.*]] = cir.binop(mul, %[[UB1_CAST]], %[[ITR_LOAD]]) : !u64i
+// CHECK-NEXT: %[[SRC:.*]] = cir.ptr_stride(%[[ARR_ALLOCA]] : !cir.ptr<!rec_CtorDtor>, %[[SRC_IDX]] : !u64i), !cir.ptr<!rec_CtorDtor>
+// CHECK-NEXT: %[[DEST:.*]] = cir.ptr_stride(%[[TL_ALLOCA]] : !cir.ptr<!cir.ptr<!rec_CtorDtor>>, %[[ITR_LOAD]] : !u64i), !cir.ptr<!cir.ptr<!rec_CtorDtor>>
+// CHECK-NEXT: cir.store %[[SRC]], %[[DEST]] : !cir.ptr<!rec_CtorDtor>, !cir.ptr<!cir.ptr<!rec_CtorDtor>>
+// CHECK-NEXT: cir.yield 
+// CHECK-NEXT: } step {
+// CHECK-NEXT: %[[ITR_LOAD:.*]] = cir.load %[[ITR]] : !cir.ptr<!u64i>, !u64i
+// CHECK-NEXT: %[[INC:.*]] = cir.unary(inc, %[[ITR_LOAD]]) : !u64i, !u64i 
+// CHECK-NEXT: cir.store %[[INC]], %[[ITR]] : !u64i, !cir.ptr<!u64i> 
+// CHECK-NEXT: cir.yield
+// CHECK-NEXT: } 
+// CHECK-NEXT: } 
 // TODO: Add Init here.
 // CHECK-NEXT: acc.yield
 // CHECK-NEXT: } destroy {
@@ -77,12 +101,59 @@ void do_things(unsigned A, unsigned B) {
 // CHECK-NEXT: %[[ALLOCA_SIZE:.*]] = cir.binop(mul, %[[UB2_CAST]], %[[ARR_SIZE]]) : !u64i
 // CHECK-NEXT: %[[ARR_ALLOCA:.*]] = cir.alloca !cir.ptr<!rec_CtorDtor>, !cir.ptr<!cir.ptr<!rec_CtorDtor>>, %[[ALLOCA_SIZE]] : !u64i, ["openacc.init.bounds"] {alignment = 8 : i64}
 //
+// CHECK-NEXT: cir.scope {
+// CHECK-NEXT: %[[ITR:.*]] = cir.alloca !u64i, !cir.ptr<!u64i>, ["itr"] {alignment = 8 : i64}
+// CHECK-NEXT: %[[ZERO:.*]] = cir.const #cir.int<0> : !u64i
+// CHECK-NEXT: cir.store %[[ZERO]], %[[ITR]] : !u64i, !cir.ptr<!u64i>
+// CHECK-NEXT: cir.for : cond {
+// CHECK-NEXT: %[[UPP_BOUND:.*]] = cir.const #cir.int<1> : !u64i
+// CHECK-NEXT: %[[ITR_LOAD:.*]] = cir.load %[[ITR]] : !cir.ptr<!u64i>, !u64i
+// CHECK-NEXT: %[[CMP:.*]] = cir.cmp(lt, %[[ITR_LOAD]], %[[UPP_BOUND]]) : !u64i, !cir.bool
+// CHECK-NEXT: cir.condition(%[[CMP]]) 
+// CHECK-NEXT: } body {
+// CHECK-NEXT: %[[ITR_LOAD:.*]] = cir.load %[[ITR]] : !cir.ptr<!u64i>, !u64i
+// CHECK-NEXT: %[[SRC_IDX:.*]] = cir.binop(mul, %[[UB2_CAST]], %[[ITR_LOAD]]) : !u64i
+// CHECK-NEXT: %[[SRC:.*]] = cir.ptr_stride(%[[ARR_ALLOCA]] : !cir.ptr<!cir.ptr<!rec_CtorDtor>>, %[[SRC_IDX]] : !u64i), !cir.ptr<!cir.ptr<!rec_CtorDtor>> 
+// CHECK-NEXT: %[[DEST:.*]] = cir.ptr_stride(%[[TL_ALLOCA]] : !cir.ptr<!cir.ptr<!cir.ptr<!rec_CtorDtor>>>, %[[ITR_LOAD]] : !u64i), !cir.ptr<!cir.ptr<!cir.ptr<!rec_CtorDtor>>>
+// CHECK-NEXT: cir.store %[[SRC]], %[[DEST]] : !cir.ptr<!cir.ptr<!rec_CtorDtor>>, !cir.ptr<!cir.ptr<!cir.ptr<!rec_CtorDtor>>> 
+// CHECK-NEXT: cir.yield 
+// CHECK-NEXT: } step {
+// CHECK-NEXT: %[[ITR_LOAD:.*]] = cir.load %[[ITR]] : !cir.ptr<!u64i>, !u64i
+// CHECK-NEXT: %[[INC:.*]] = cir.unary(inc, %[[ITR_LOAD]]) : !u64i, !u64i 
+// CHECK-NEXT: cir.store %[[INC]], %[[ITR]] : !u64i, !cir.ptr<!u64i> 
+// CHECK-NEXT: cir.yield
+// CHECK-NEXT: } 
+// CHECK-NEXT: } 
+//
 // CHECK-NEXT: %[[UB1:.*]] = acc.get_upperbound %[[BOUND1]] : (!acc.data_bounds_ty) -> index
 // CHECK-NEXT: %[[UB1_CAST:.*]] = builtin.unrealized_conversion_cast %[[UB1]] : index to !u64i 
 // CHECK-NEXT: %[[NUM_ELTS:.*]] = cir.binop(mul, %[[UB1_CAST]], %[[UB2_CAST]]) : !u64i
 // CHECK-NEXT: %[[ARR_SIZE:.*]] = cir.const #cir.int<4> : !u64i
 // CHECK-NEXT: %[[ALLOCA_SIZE:.*]] = cir.binop(mul, %[[NUM_ELTS]], %[[ARR_SIZE]]) : !u64i
 // CHECK-NEXT: %[[ARR_ALLOCA2:.*]] = cir.alloca !rec_CtorDtor, !cir.ptr<!rec_CtorDtor>, %[[ALLOCA_SIZE]] : !u64i, ["openacc.init.bounds"] {alignment = 4 : i64}
+//
+// CHECK-NEXT: cir.scope {
+// CHECK-NEXT: %[[ITR:.*]] = cir.alloca !u64i, !cir.ptr<!u64i>, ["itr"] {alignment = 8 : i64}
+// CHECK-NEXT: %[[ZERO:.*]] = cir.const #cir.int<0> : !u64i
+// CHECK-NEXT: cir.store %[[ZERO]], %[[ITR]] : !u64i, !cir.ptr<!u64i>
+// CHECK-NEXT: cir.for : cond {
+// CHECK-NEXT: %[[ITR_LOAD:.*]] = cir.load %[[ITR]] : !cir.ptr<!u64i>, !u64i
+// CHECK-NEXT: %[[CMP:.*]] = cir.cmp(lt, %[[ITR_LOAD]], %[[UB2_CAST]]) : !u64i, !cir.bool
+// CHECK-NEXT: cir.condition(%[[CMP]]) 
+// CHECK-NEXT: } body {
+// CHECK-NEXT: %[[ITR_LOAD:.*]] = cir.load %[[ITR]] : !cir.ptr<!u64i>, !u64i
+// CHECK-NEXT: %[[SRC_IDX:.*]] = cir.binop(mul, %[[UB1_CAST]], %[[ITR_LOAD]]) : !u64i
+// CHECK-NEXT: %[[SRC:.*]] = cir.ptr_stride(%[[ARR_ALLOCA2]] : !cir.ptr<!rec_CtorDtor>, %[[SRC_IDX]] : !u64i), !cir.ptr<!rec_CtorDtor>
+// CHECK-NEXT: %[[DEST:.*]] = cir.ptr_stride(%[[ARR_ALLOCA]] : !cir.ptr<!cir.ptr<!rec_CtorDtor>>, %[[ITR_LOAD]] : !u64i), !cir.ptr<!cir.ptr<!rec_CtorDtor>>
+// CHECK-NEXT: cir.store %[[SRC]], %[[DEST]] : !cir.ptr<!rec_CtorDtor>, !cir.ptr<!cir.ptr<!rec_CtorDtor>>
+// CHECK-NEXT: cir.yield 
+// CHECK-NEXT: } step {
+// CHECK-NEXT: %[[ITR_LOAD:.*]] = cir.load %[[ITR]] : !cir.ptr<!u64i>, !u64i
+// CHECK-NEXT: %[[INC:.*]] = cir.unary(inc, %[[ITR_LOAD]]) : !u64i, !u64i 
+// CHECK-NEXT: cir.store %[[INC]], %[[ITR]] : !u64i, !cir.ptr<!u64i> 
+// CHECK-NEXT: cir.yield
+// CHECK-NEXT: } 
+// CHECK-NEXT: } 
 // TODO: Add Init here.
 // CHECK-NEXT: acc.yield
 // CHECK-NEXT: } destroy {
@@ -165,6 +236,30 @@ void do_things(unsigned A, unsigned B) {
 // CHECK-NEXT: %[[ALLOCA_SIZE:.*]] = cir.binop(mul, %[[UB3_CAST]], %[[ARR_SIZE]]) : !u64i
 // CHECK-NEXT: %[[ARR_ALLOCA:.*]] = cir.alloca !cir.ptr<!cir.ptr<!rec_CtorDtor>>, !cir.ptr<!cir.ptr<!cir.ptr<!rec_CtorDtor>>>, %[[ALLOCA_SIZE]] : !u64i, ["openacc.init.bounds"] {alignment = 8 : i64}
 //
+// CHECK-NEXT: cir.scope {
+// CHECK-NEXT: %[[ITR:.*]] = cir.alloca !u64i, !cir.ptr<!u64i>, ["itr"] {alignment = 8 : i64}
+// CHECK-NEXT: %[[ZERO:.*]] = cir.const #cir.int<0> : !u64i
+// CHECK-NEXT: cir.store %[[ZERO]], %[[ITR]] : !u64i, !cir.ptr<!u64i>
+// CHECK-NEXT: cir.for : cond {
+// CHECK-NEXT: %[[UPP_BOUND:.*]] = cir.const #cir.int<1> : !u64i
+// CHECK-NEXT: %[[ITR_LOAD:.*]] = cir.load %[[ITR]] : !cir.ptr<!u64i>, !u64i
+// CHECK-NEXT: %[[CMP:.*]] = cir.cmp(lt, %[[ITR_LOAD]], %[[UPP_BOUND]]) : !u64i, !cir.bool
+// CHECK-NEXT: cir.condition(%[[CMP]]) 
+// CHECK-NEXT: } body {
+// CHECK-NEXT: %[[ITR_LOAD:.*]] = cir.load %[[ITR]] : !cir.ptr<!u64i>, !u64i
+// CHECK-NEXT: %[[SRC_IDX:.*]] = cir.binop(mul, %[[UB3_CAST]], %[[ITR_LOAD]]) : !u64i
+// CHECK-NEXT: %[[SRC:.*]] = cir.ptr_stride(%[[ARR_ALLOCA]] : !cir.ptr<!cir.ptr<!cir.ptr<!rec_CtorDtor>>>, %[[SRC_IDX]] : !u64i), !cir.ptr<!cir.ptr<!cir.ptr<!rec_CtorDtor>>> 
+// CHECK-NEXT: %[[DEST:.*]] = cir.ptr_stride(%[[TL_ALLOCA]] : !cir.ptr<!cir.ptr<!cir.ptr<!cir.ptr<!rec_CtorDtor>>>>, %[[ITR_LOAD]] : !u64i), !cir.ptr<!cir.ptr<!cir.ptr<!cir.ptr<!rec_CtorDtor>>>> 
+// CHECK-NEXT: cir.store %[[SRC]], %[[DEST]] : !cir.ptr<!cir.ptr<!cir.ptr<!rec_CtorDtor>>>, !cir.ptr<!cir.ptr<!cir.ptr<!cir.ptr<!rec_CtorDtor>>>> 
+// CHECK-NEXT: cir.yield 
+// CHECK-NEXT: } step {
+// CHECK-NEXT: %[[ITR_LOAD:.*]] = cir.load %[[ITR]] : !cir.ptr<!u64i>, !u64i
+// CHECK-NEXT: %[[INC:.*]] = cir.unary(inc, %[[ITR_LOAD]]) : !u64i, !u64i 
+// CHECK-NEXT: cir.store %[[INC]], %[[ITR]] : !u64i, !cir.ptr<!u64i> 
+// CHECK-NEXT: cir.yield
+// CHECK-NEXT: } 
+// CHECK-NEXT: } 
+//
 // CHECK-NEXT: %[[UB2:.*]] = acc.get_upperbound %[[BOUND2]] : (!acc.data_bounds_ty) -> index
 // CHECK-NEXT: %[[UB2_CAST:.*]] = builtin.unrealized_conversion_cast %[[UB2]] : index to !u64i 
 // CHECK-NEXT: %[[NUM_ELTS:.*]] = cir.binop(mul, %[[UB2_CAST]], %[[UB3_CAST]]) : !u64i
@@ -172,12 +267,60 @@ void do_things(unsigned A, unsigned B) {
 // CHECK-NEXT: %[[ALLOCA_SIZE:.*]] = cir.binop(mul, %[[NUM_ELTS]], %[[ARR_SIZE]]) : !u64i
 // CHECK-NEXT: %[[ARR_ALLOCA2:.*]] = cir.alloca !cir.ptr<!rec_CtorDtor>, !cir.ptr<!cir.ptr<!rec_CtorDtor>>, %[[ALLOCA_SIZE]] : !u64i, ["openacc.init.bounds"] {alignment = 8 : i64}
 //
+// CHECK-NEXT: cir.scope {
+// CHECK-NEXT: %[[ITR:.*]] = cir.alloca !u64i, !cir.ptr<!u64i>, ["itr"] {alignment = 8 : i64}
+// CHECK-NEXT: %[[ZERO:.*]] = cir.const #cir.int<0> : !u64i
+// CHECK-NEXT: cir.store %[[ZERO]], %[[ITR]] : !u64i, !cir.ptr<!u64i>
+// CHECK-NEXT: cir.for : cond {
+// CHECK-NEXT: %[[ITR_LOAD:.*]] = cir.load %[[ITR]] : !cir.ptr<!u64i>, !u64i
+// CHECK-NEXT: %[[CMP:.*]] = cir.cmp(lt, %[[ITR_LOAD]], %[[UB3_CAST]]) : !u64i, !cir.bool
+// CHECK-NEXT: cir.condition(%[[CMP]]) 
+// CHECK-NEXT: } body {
+// CHECK-NEXT: %[[ITR_LOAD:.*]] = cir.load %[[ITR]] : !cir.ptr<!u64i>, !u64i
+// CHECK-NEXT: %[[SRC_IDX:.*]] = cir.binop(mul, %[[UB2_CAST]], %[[ITR_LOAD]]) : !u64i
+// CHECK-NEXT: %[[SRC:.*]] = cir.ptr_stride(%[[ARR_ALLOCA2]] : !cir.ptr<!cir.ptr<!rec_CtorDtor>>, %[[SRC_IDX]] : !u64i), !cir.ptr<!cir.ptr<!rec_CtorDtor>>
+// CHECK-NEXT: %[[DEST:.*]] = cir.ptr_stride(%[[ARR_ALLOCA]] : !cir.ptr<!cir.ptr<!cir.ptr<!rec_CtorDtor>>>, %[[ITR_LOAD]] : !u64i), !cir.ptr<!cir.ptr<!cir.ptr<!rec_CtorDtor>>>
+// CHECK-NEXT: cir.store %[[SRC]], %[[DEST]] : !cir.ptr<!cir.ptr<!rec_CtorDtor>>, !cir.ptr<!cir.ptr<!cir.ptr<!rec_CtorDtor>>>
+// CHECK-NEXT: cir.yield 
+// CHECK-NEXT: } step {
+// CHECK-NEXT: %[[ITR_LOAD:.*]] = cir.load %[[ITR]] : !cir.ptr<!u64i>, !u64i
+// CHECK-NEXT: %[[INC:.*]] = cir.unary(inc, %[[ITR_LOAD]]) : !u64i, !u64i 
+// CHECK-NEXT: cir.store %[[INC]], %[[ITR]] : !u64i, !cir.ptr<!u64i> 
+// CHECK-NEXT: cir.yield
+// CHECK-NEXT: } 
+// CHECK-NEXT: } 
+//
 // CHECK-NEXT: %[[UB1:.*]] = acc.get_upperbound %[[BOUND1]] : (!acc.data_bounds_ty) -> index
 // CHECK-NEXT: %[[UB1_CAST:.*]] = builtin.unrealized_conversion_cast %[[UB1]] : index to !u64i 
 // CHECK-NEXT: %[[NUM_ELTS2:.*]] = cir.binop(mul, %[[UB1_CAST]], %[[NUM_ELTS]]) : !u64i
 // CHECK-NEXT: %[[ARR_SIZE:.*]] = cir.const #cir.int<4> : !u64i
 // CHECK-NEXT: %[[ALLOCA_SIZE:.*]] = cir.binop(mul, %[[NUM_ELTS2]], %[[ARR_SIZE]]) : !u64i
 // CHECK-NEXT: %[[ARR_ALLOCA3:.*]] = cir.alloca !rec_CtorDtor, !cir.ptr<!rec_CtorDtor>, %[[ALLOCA_SIZE]] : !u64i, ["openacc.init.bounds"] {alignment = 4 : i64}
+//
+// CHECK-NEXT: cir.scope {
+// CHECK-NEXT: %[[ITR:.*]] = cir.alloca !u64i, !cir.ptr<!u64i>, ["itr"] {alignment = 8 : i64}
+// CHECK-NEXT: %[[ZERO:.*]] = cir.const #cir.int<0> : !u64i
+// CHECK-NEXT: cir.store %[[ZERO]], %[[ITR]] : !u64i, !cir.ptr<!u64i>
+// CHECK-NEXT: cir.for : cond {
+// CHECK-NEXT: %[[ITR_LOAD:.*]] = cir.load %[[ITR]] : !cir.ptr<!u64i>, !u64i
+// CHECK-NEXT: %[[CMP:.*]] = cir.cmp(lt, %[[ITR_LOAD]], %[[NUM_ELTS]]) : !u64i, !cir.bool
+// CHECK-NEXT: cir.condition(%[[CMP]]) 
+// CHECK-NEXT: } body {
+// CHECK-NEXT: %[[ITR_LOAD:.*]] = cir.load %[[ITR]] : !cir.ptr<!u64i>, !u64i
+// CHECK-NEXT: %[[SRC_IDX:.*]] = cir.binop(mul, %[[UB1_CAST]], %[[ITR_LOAD]]) : !u64i
+// CHECK-NEXT: %[[SRC:.*]] = cir.ptr_stride(%[[ARR_ALLOCA3]] : !cir.ptr<!rec_CtorDtor>, %[[SRC_IDX]] : !u64i), !cir.ptr<!rec_CtorDtor> 
+// CHECK-NEXT: %[[DEST:.*]] = cir.ptr_stride(%[[ARR_ALLOCA2]] : !cir.ptr<!cir.ptr<!rec_CtorDtor>>, %[[ITR_LOAD]] : !u64i), !cir.ptr<!cir.ptr<!rec_CtorDtor>> 
+// CHECK-NEXT: cir.store %[[SRC]], %[[DEST]] : !cir.ptr<!rec_CtorDtor>, !cir.ptr<!cir.ptr<!rec_CtorDtor>> 
+// CHECK-NEXT: cir.yield 
+// CHECK-NEXT: } step {
+// CHECK-NEXT: %[[ITR_LOAD:.*]] = cir.load %[[ITR]] : !cir.ptr<!u64i>, !u64i
+// CHECK-NEXT: %[[INC:.*]] = cir.unary(inc, %[[ITR_LOAD]]) : !u64i, !u64i 
+// CHECK-NEXT: cir.store %[[INC]], %[[ITR]] : !u64i, !cir.ptr<!u64i> 
+// CHECK-NEXT: cir.yield
+// CHECK-NEXT: } 
+// CHECK-NEXT: } 
+//
+//
 // TODO: Add Init here.
 // CHECK-NEXT: acc.yield
 // CHECK-NEXT: } destroy {
@@ -279,12 +422,59 @@ void do_things(unsigned A, unsigned B) {
 // CHECK-NEXT: %[[ALLOCA_SIZE:.*]] = cir.binop(mul, %[[UB2_CAST]], %[[ARR_SIZE]]) : !u64i
 // CHECK-NEXT: %[[ARR_ALLOCA:.*]] = cir.alloca !cir.ptr<!cir.ptr<!rec_CtorDtor>>, !cir.ptr<!cir.ptr<!cir.ptr<!rec_CtorDtor>>>, %[[ALLOCA_SIZE]] : !u64i, ["openacc.init.bounds"] {alignment = 8 : i64}
 //
+// CHECK-NEXT: cir.scope {
+// CHECK-NEXT: %[[ITR:.*]] = cir.alloca !u64i, !cir.ptr<!u64i>, ["itr"] {alignment = 8 : i64}
+// CHECK-NEXT: %[[ZERO:.*]] = cir.const #cir.int<0> : !u64i
+// CHECK-NEXT: cir.store %[[ZERO]], %[[ITR]] : !u64i, !cir.ptr<!u64i>
+// CHECK-NEXT: cir.for : cond {
+// CHECK-NEXT: %[[UPP_BOUND:.*]] = cir.const #cir.int<1> : !u64i
+// CHECK-NEXT: %[[ITR_LOAD:.*]] = cir.load %[[ITR]] : !cir.ptr<!u64i>, !u64i
+// CHECK-NEXT: %[[CMP:.*]] = cir.cmp(lt, %[[ITR_LOAD]], %[[UPP_BOUND]]) : !u64i, !cir.bool
+// CHECK-NEXT: cir.condition(%[[CMP]]) 
+// CHECK-NEXT: } body {
+// CHECK-NEXT: %[[ITR_LOAD:.*]] = cir.load %[[ITR]] : !cir.ptr<!u64i>, !u64i
+// CHECK-NEXT: %[[SRC_IDX:.*]] = cir.binop(mul, %[[UB2_CAST]], %[[ITR_LOAD]]) : !u64i
+// CHECK-NEXT: %[[SRC:.*]] = cir.ptr_stride(%[[ARR_ALLOCA]] : !cir.ptr<!cir.ptr<!cir.ptr<!rec_CtorDtor>>>, %[[SRC_IDX]] : !u64i), !cir.ptr<!cir.ptr<!cir.ptr<!rec_CtorDtor>>> ...
[truncated]

@llvmbot

llvmbot commented Sep 30, 2025

Copy link
Copy Markdown
Member

@llvm/pr-subscribers-clang

Author: Erich Keane (erichkeane)

Changes

The previous patch ensured that we correctly got the allocas put in place. This patch takes the address of each element of each alloca, and copies it to the previous one. This allows us to re-form the pointer-structure for a recipe.


Patch is 200.56 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/161382.diff

8 Files Affected:

  • (modified) clang/lib/CIR/CodeGen/CIRGenOpenACCRecipe.cpp (+90-8)
  • (modified) clang/lib/CIR/CodeGen/CIRGenOpenACCRecipe.h (+4)
  • (modified) clang/test/CIR/CodeGenOpenACC/private-clause-pointer-array-recipes-CtorDtor.cpp (+489)
  • (modified) clang/test/CIR/CodeGenOpenACC/private-clause-pointer-array-recipes-NoOps.cpp (+485)
  • (modified) clang/test/CIR/CodeGenOpenACC/private-clause-pointer-array-recipes-int.cpp (+487-1)
  • (modified) clang/test/CIR/CodeGenOpenACC/private-clause-pointer-recipes-CtorDtor.cpp (+266)
  • (modified) clang/test/CIR/CodeGenOpenACC/private-clause-pointer-recipes-NoOps.cpp (+268)
  • (modified) clang/test/CIR/CodeGenOpenACC/private-clause-pointer-recipes-int.cpp (+267-1)
diff --git a/clang/lib/CIR/CodeGen/CIRGenOpenACCRecipe.cpp b/clang/lib/CIR/CodeGen/CIRGenOpenACCRecipe.cpp
index e41c2d85fbd5d..c9316635ab5ba 100644
--- a/clang/lib/CIR/CodeGen/CIRGenOpenACCRecipe.cpp
+++ b/clang/lib/CIR/CodeGen/CIRGenOpenACCRecipe.cpp
@@ -36,6 +36,77 @@ mlir::Block *OpenACCRecipeBuilderBase::createRecipeBlock(mlir::Region &region,
   llvm::SmallVector<mlir::Location> locs{types.size(), loc};
   return builder.createBlock(&region, region.end(), types, locs);
 }
+void OpenACCRecipeBuilderBase::makeAllocaCopy(mlir::Location loc,
+                                              mlir::Type copyType,
+                                              mlir::Value numEltsToCopy,
+                                              mlir::Value offsetPerSubarray,
+                                              mlir::Value destAlloca,
+                                              mlir::Value srcAlloca) {
+  mlir::OpBuilder::InsertionGuard guardCase(builder);
+
+  mlir::Type itrTy = cgf.cgm.convertType(cgf.getContext().UnsignedLongLongTy);
+  auto itrPtrTy = cir::PointerType::get(itrTy);
+  mlir::IntegerAttr itrAlign =
+      cgf.cgm.getSize(cgf.getContext().getTypeAlignInChars(
+          cgf.getContext().UnsignedLongLongTy));
+
+  auto loopBuilder = [&]() {
+    auto itr =
+        cir::AllocaOp::create(builder, loc, itrPtrTy, itrTy, "itr", itrAlign);
+    cir::ConstantOp constZero = builder.getConstInt(loc, itrTy, 0);
+    builder.CIRBaseBuilderTy::createStore(loc, constZero.getResult(), itr);
+    builder.createFor(
+        loc,
+        /*condBuilder=*/
+        [&](mlir::OpBuilder &b, mlir::Location loc) {
+          // itr < numEltsToCopy
+          // Enforce a trip count of 1 if there wasn't any element count, this
+          // way we can just use this loop with a constant bounds instead of a
+          // separate code path.
+          if (!numEltsToCopy)
+            numEltsToCopy = builder.getConstInt(loc, itrTy, 1).getResult();
+
+          auto loadCur = cir::LoadOp::create(builder, loc, {itr});
+          auto cmp = builder.createCompare(loc, cir::CmpOpKind::lt,
+                                           loadCur.getResult(), numEltsToCopy);
+          builder.createCondition(cmp);
+        },
+        /*bodyBuilder=*/
+        [&](mlir::OpBuilder &b, mlir::Location loc) {
+          // destAlloca[itr] = srcAlloca[offsetPerSubArray * itr];
+          auto loadCur = cir::LoadOp::create(builder, loc, {itr});
+          auto srcOffset =
+              builder.createMul(loc, offsetPerSubarray, loadCur.getResult());
+
+          auto ptrToOffsetIntoSrc = cir::PtrStrideOp::create(
+              builder, loc, copyType, srcAlloca, srcOffset);
+
+          auto offsetIntoDecayDest = cir::PtrStrideOp::create(
+              builder, loc, builder.getPointerTo(copyType), destAlloca,
+              loadCur.getResult());
+
+          builder.CIRBaseBuilderTy::createStore(loc, ptrToOffsetIntoSrc,
+                                                offsetIntoDecayDest);
+          builder.createYield(loc);
+        },
+        /*stepBuilder=*/
+        [&](mlir::OpBuilder &b, mlir::Location loc) {
+          // Simple increment of the iterator.
+          auto load = cir::LoadOp::create(builder, loc, {itr});
+          auto inc =
+              cir::UnaryOp::create(builder, loc, load.getType(),
+                                   cir::UnaryOpKind::Inc, load.getResult());
+          builder.CIRBaseBuilderTy::createStore(loc, inc.getResult(), itr);
+          builder.createYield(loc);
+        });
+  };
+
+  cir::ScopeOp::create(builder, loc,
+                       [&](mlir::OpBuilder &b, mlir::Location loc) {
+                         loopBuilder();
+                         builder.createYield(loc);
+                       });
+}
 
 mlir::Value OpenACCRecipeBuilderBase::makeBoundsAlloca(
     mlir::Block *block, SourceRange exprRange, mlir::Location loc,
@@ -78,6 +149,10 @@ mlir::Value OpenACCRecipeBuilderBase::makeBoundsAlloca(
 
   bool lastBoundWasArray = isArrayTy(boundTypes.back());
 
+  // Make sure we track a moving version of this so we can get our
+  // 'copying' back to correct.
+  mlir::Value lastAlloca = initialAlloca;
+
   // Since we're iterating the types in reverse, this sets up for each index
   // corresponding to the boundsRange to be the 'after application of the
   // bounds.
@@ -125,14 +200,21 @@ mlir::Value OpenACCRecipeBuilderBase::makeBoundsAlloca(
 
       mlir::Type eltTy = cgf.convertType(resultType);
       cir::PointerType ptrTy = builder.getPointerTo(eltTy);
-      builder.createAlloca(loc, ptrTy, eltTy, "openacc.init.bounds",
-                           cgf.getContext().getTypeAlignInChars(resultType),
-                           curSize);
-
-      // TODO: OpenACC : At this point we should be copying the addresses of
-      // each element of this to the last allocation.  At the moment, that is
-      // not yet implemented.
-      cgf.cgm.errorNYI(exprRange, "OpenACC recipe alloca copying");
+      mlir::Value curAlloca = builder.createAlloca(
+          loc, ptrTy, eltTy, "openacc.init.bounds",
+          cgf.getContext().getTypeAlignInChars(resultType), curSize);
+
+      makeAllocaCopy(loc, ptrTy, cumulativeElts, eltsPerSubArray, lastAlloca,
+                     curAlloca);
+      lastAlloca = curAlloca;
+    } else {
+      // In the case of an array, we just need to decay the pointer, so just do
+      // a zero-offset stride on the last alloca to decay it down an array
+      // level.
+      cir::ConstantOp constZero = builder.getConstInt(loc, itrTy, 0);
+      lastAlloca = builder.getArrayElement(
+          loc, loc, lastAlloca, cgf.convertType(resultType),
+          constZero.getResult(), /*shouldDecay=*/true);
     }
 
     cumulativeElts = eltsToAlloca;
diff --git a/clang/lib/CIR/CodeGen/CIRGenOpenACCRecipe.h b/clang/lib/CIR/CodeGen/CIRGenOpenACCRecipe.h
index acd187bbaee0d..d802ccb151b64 100644
--- a/clang/lib/CIR/CodeGen/CIRGenOpenACCRecipe.h
+++ b/clang/lib/CIR/CodeGen/CIRGenOpenACCRecipe.h
@@ -24,6 +24,10 @@
 
 namespace clang::CIRGen {
 class OpenACCRecipeBuilderBase {
+  // makes the copy of the addresses of an alloca to the previous allocation.
+  void makeAllocaCopy(mlir::Location loc, mlir::Type copyType,
+                      mlir::Value numEltsToCopy, mlir::Value offsetPerSubarray,
+                      mlir::Value destAlloca, mlir::Value srcAlloca);
   // This function generates the required alloca, similar to
   // 'emitAutoVarAlloca', except for the OpenACC array/pointer types.
   mlir::Value makeBoundsAlloca(mlir::Block *block, SourceRange exprRange,
diff --git a/clang/test/CIR/CodeGenOpenACC/private-clause-pointer-array-recipes-CtorDtor.cpp b/clang/test/CIR/CodeGenOpenACC/private-clause-pointer-array-recipes-CtorDtor.cpp
index 52bcd7cd539f2..b6e5859532188 100644
--- a/clang/test/CIR/CodeGenOpenACC/private-clause-pointer-array-recipes-CtorDtor.cpp
+++ b/clang/test/CIR/CodeGenOpenACC/private-clause-pointer-array-recipes-CtorDtor.cpp
@@ -20,6 +20,30 @@ void do_things(unsigned A, unsigned B) {
 // CHECK-NEXT: %[[ARR_SIZE:.*]] = cir.const #cir.int<4> : !u64i
 // CHECK-NEXT: %[[ALLOCA_SIZE:.*]] = cir.binop(mul, %[[UB1_CAST]], %[[ARR_SIZE]]) : !u64i
 // CHECK-NEXT: %[[ARR_ALLOCA:.*]] = cir.alloca !rec_CtorDtor, !cir.ptr<!rec_CtorDtor>, %[[ALLOCA_SIZE]] : !u64i, ["openacc.init.bounds"] {alignment = 4 : i64}
+//
+// CHECK-NEXT: cir.scope {
+// CHECK-NEXT: %[[ITR:.*]] = cir.alloca !u64i, !cir.ptr<!u64i>, ["itr"] {alignment = 8 : i64}
+// CHECK-NEXT: %[[ZERO:.*]] = cir.const #cir.int<0> : !u64i
+// CHECK-NEXT: cir.store %[[ZERO]], %[[ITR]] : !u64i, !cir.ptr<!u64i>
+// CHECK-NEXT: cir.for : cond {
+// CHECK-NEXT: %[[UPP_BOUND:.*]] = cir.const #cir.int<1> : !u64i
+// CHECK-NEXT: %[[ITR_LOAD:.*]] = cir.load %[[ITR]] : !cir.ptr<!u64i>, !u64i
+// CHECK-NEXT: %[[CMP:.*]] = cir.cmp(lt, %[[ITR_LOAD]], %[[UPP_BOUND]]) : !u64i, !cir.bool
+// CHECK-NEXT: cir.condition(%[[CMP]]) 
+// CHECK-NEXT: } body {
+// CHECK-NEXT: %[[ITR_LOAD:.*]] = cir.load %[[ITR]] : !cir.ptr<!u64i>, !u64i
+// CHECK-NEXT: %[[SRC_IDX:.*]] = cir.binop(mul, %[[UB1_CAST]], %[[ITR_LOAD]]) : !u64i
+// CHECK-NEXT: %[[SRC:.*]] = cir.ptr_stride(%[[ARR_ALLOCA]] : !cir.ptr<!rec_CtorDtor>, %[[SRC_IDX]] : !u64i), !cir.ptr<!rec_CtorDtor>
+// CHECK-NEXT: %[[DEST:.*]] = cir.ptr_stride(%[[TL_ALLOCA]] : !cir.ptr<!cir.ptr<!rec_CtorDtor>>, %[[ITR_LOAD]] : !u64i), !cir.ptr<!cir.ptr<!rec_CtorDtor>>
+// CHECK-NEXT: cir.store %[[SRC]], %[[DEST]] : !cir.ptr<!rec_CtorDtor>, !cir.ptr<!cir.ptr<!rec_CtorDtor>>
+// CHECK-NEXT: cir.yield 
+// CHECK-NEXT: } step {
+// CHECK-NEXT: %[[ITR_LOAD:.*]] = cir.load %[[ITR]] : !cir.ptr<!u64i>, !u64i
+// CHECK-NEXT: %[[INC:.*]] = cir.unary(inc, %[[ITR_LOAD]]) : !u64i, !u64i 
+// CHECK-NEXT: cir.store %[[INC]], %[[ITR]] : !u64i, !cir.ptr<!u64i> 
+// CHECK-NEXT: cir.yield
+// CHECK-NEXT: } 
+// CHECK-NEXT: } 
 // TODO: Add Init here.
 // CHECK-NEXT: acc.yield
 // CHECK-NEXT: } destroy {
@@ -77,12 +101,59 @@ void do_things(unsigned A, unsigned B) {
 // CHECK-NEXT: %[[ALLOCA_SIZE:.*]] = cir.binop(mul, %[[UB2_CAST]], %[[ARR_SIZE]]) : !u64i
 // CHECK-NEXT: %[[ARR_ALLOCA:.*]] = cir.alloca !cir.ptr<!rec_CtorDtor>, !cir.ptr<!cir.ptr<!rec_CtorDtor>>, %[[ALLOCA_SIZE]] : !u64i, ["openacc.init.bounds"] {alignment = 8 : i64}
 //
+// CHECK-NEXT: cir.scope {
+// CHECK-NEXT: %[[ITR:.*]] = cir.alloca !u64i, !cir.ptr<!u64i>, ["itr"] {alignment = 8 : i64}
+// CHECK-NEXT: %[[ZERO:.*]] = cir.const #cir.int<0> : !u64i
+// CHECK-NEXT: cir.store %[[ZERO]], %[[ITR]] : !u64i, !cir.ptr<!u64i>
+// CHECK-NEXT: cir.for : cond {
+// CHECK-NEXT: %[[UPP_BOUND:.*]] = cir.const #cir.int<1> : !u64i
+// CHECK-NEXT: %[[ITR_LOAD:.*]] = cir.load %[[ITR]] : !cir.ptr<!u64i>, !u64i
+// CHECK-NEXT: %[[CMP:.*]] = cir.cmp(lt, %[[ITR_LOAD]], %[[UPP_BOUND]]) : !u64i, !cir.bool
+// CHECK-NEXT: cir.condition(%[[CMP]]) 
+// CHECK-NEXT: } body {
+// CHECK-NEXT: %[[ITR_LOAD:.*]] = cir.load %[[ITR]] : !cir.ptr<!u64i>, !u64i
+// CHECK-NEXT: %[[SRC_IDX:.*]] = cir.binop(mul, %[[UB2_CAST]], %[[ITR_LOAD]]) : !u64i
+// CHECK-NEXT: %[[SRC:.*]] = cir.ptr_stride(%[[ARR_ALLOCA]] : !cir.ptr<!cir.ptr<!rec_CtorDtor>>, %[[SRC_IDX]] : !u64i), !cir.ptr<!cir.ptr<!rec_CtorDtor>> 
+// CHECK-NEXT: %[[DEST:.*]] = cir.ptr_stride(%[[TL_ALLOCA]] : !cir.ptr<!cir.ptr<!cir.ptr<!rec_CtorDtor>>>, %[[ITR_LOAD]] : !u64i), !cir.ptr<!cir.ptr<!cir.ptr<!rec_CtorDtor>>>
+// CHECK-NEXT: cir.store %[[SRC]], %[[DEST]] : !cir.ptr<!cir.ptr<!rec_CtorDtor>>, !cir.ptr<!cir.ptr<!cir.ptr<!rec_CtorDtor>>> 
+// CHECK-NEXT: cir.yield 
+// CHECK-NEXT: } step {
+// CHECK-NEXT: %[[ITR_LOAD:.*]] = cir.load %[[ITR]] : !cir.ptr<!u64i>, !u64i
+// CHECK-NEXT: %[[INC:.*]] = cir.unary(inc, %[[ITR_LOAD]]) : !u64i, !u64i 
+// CHECK-NEXT: cir.store %[[INC]], %[[ITR]] : !u64i, !cir.ptr<!u64i> 
+// CHECK-NEXT: cir.yield
+// CHECK-NEXT: } 
+// CHECK-NEXT: } 
+//
 // CHECK-NEXT: %[[UB1:.*]] = acc.get_upperbound %[[BOUND1]] : (!acc.data_bounds_ty) -> index
 // CHECK-NEXT: %[[UB1_CAST:.*]] = builtin.unrealized_conversion_cast %[[UB1]] : index to !u64i 
 // CHECK-NEXT: %[[NUM_ELTS:.*]] = cir.binop(mul, %[[UB1_CAST]], %[[UB2_CAST]]) : !u64i
 // CHECK-NEXT: %[[ARR_SIZE:.*]] = cir.const #cir.int<4> : !u64i
 // CHECK-NEXT: %[[ALLOCA_SIZE:.*]] = cir.binop(mul, %[[NUM_ELTS]], %[[ARR_SIZE]]) : !u64i
 // CHECK-NEXT: %[[ARR_ALLOCA2:.*]] = cir.alloca !rec_CtorDtor, !cir.ptr<!rec_CtorDtor>, %[[ALLOCA_SIZE]] : !u64i, ["openacc.init.bounds"] {alignment = 4 : i64}
+//
+// CHECK-NEXT: cir.scope {
+// CHECK-NEXT: %[[ITR:.*]] = cir.alloca !u64i, !cir.ptr<!u64i>, ["itr"] {alignment = 8 : i64}
+// CHECK-NEXT: %[[ZERO:.*]] = cir.const #cir.int<0> : !u64i
+// CHECK-NEXT: cir.store %[[ZERO]], %[[ITR]] : !u64i, !cir.ptr<!u64i>
+// CHECK-NEXT: cir.for : cond {
+// CHECK-NEXT: %[[ITR_LOAD:.*]] = cir.load %[[ITR]] : !cir.ptr<!u64i>, !u64i
+// CHECK-NEXT: %[[CMP:.*]] = cir.cmp(lt, %[[ITR_LOAD]], %[[UB2_CAST]]) : !u64i, !cir.bool
+// CHECK-NEXT: cir.condition(%[[CMP]]) 
+// CHECK-NEXT: } body {
+// CHECK-NEXT: %[[ITR_LOAD:.*]] = cir.load %[[ITR]] : !cir.ptr<!u64i>, !u64i
+// CHECK-NEXT: %[[SRC_IDX:.*]] = cir.binop(mul, %[[UB1_CAST]], %[[ITR_LOAD]]) : !u64i
+// CHECK-NEXT: %[[SRC:.*]] = cir.ptr_stride(%[[ARR_ALLOCA2]] : !cir.ptr<!rec_CtorDtor>, %[[SRC_IDX]] : !u64i), !cir.ptr<!rec_CtorDtor>
+// CHECK-NEXT: %[[DEST:.*]] = cir.ptr_stride(%[[ARR_ALLOCA]] : !cir.ptr<!cir.ptr<!rec_CtorDtor>>, %[[ITR_LOAD]] : !u64i), !cir.ptr<!cir.ptr<!rec_CtorDtor>>
+// CHECK-NEXT: cir.store %[[SRC]], %[[DEST]] : !cir.ptr<!rec_CtorDtor>, !cir.ptr<!cir.ptr<!rec_CtorDtor>>
+// CHECK-NEXT: cir.yield 
+// CHECK-NEXT: } step {
+// CHECK-NEXT: %[[ITR_LOAD:.*]] = cir.load %[[ITR]] : !cir.ptr<!u64i>, !u64i
+// CHECK-NEXT: %[[INC:.*]] = cir.unary(inc, %[[ITR_LOAD]]) : !u64i, !u64i 
+// CHECK-NEXT: cir.store %[[INC]], %[[ITR]] : !u64i, !cir.ptr<!u64i> 
+// CHECK-NEXT: cir.yield
+// CHECK-NEXT: } 
+// CHECK-NEXT: } 
 // TODO: Add Init here.
 // CHECK-NEXT: acc.yield
 // CHECK-NEXT: } destroy {
@@ -165,6 +236,30 @@ void do_things(unsigned A, unsigned B) {
 // CHECK-NEXT: %[[ALLOCA_SIZE:.*]] = cir.binop(mul, %[[UB3_CAST]], %[[ARR_SIZE]]) : !u64i
 // CHECK-NEXT: %[[ARR_ALLOCA:.*]] = cir.alloca !cir.ptr<!cir.ptr<!rec_CtorDtor>>, !cir.ptr<!cir.ptr<!cir.ptr<!rec_CtorDtor>>>, %[[ALLOCA_SIZE]] : !u64i, ["openacc.init.bounds"] {alignment = 8 : i64}
 //
+// CHECK-NEXT: cir.scope {
+// CHECK-NEXT: %[[ITR:.*]] = cir.alloca !u64i, !cir.ptr<!u64i>, ["itr"] {alignment = 8 : i64}
+// CHECK-NEXT: %[[ZERO:.*]] = cir.const #cir.int<0> : !u64i
+// CHECK-NEXT: cir.store %[[ZERO]], %[[ITR]] : !u64i, !cir.ptr<!u64i>
+// CHECK-NEXT: cir.for : cond {
+// CHECK-NEXT: %[[UPP_BOUND:.*]] = cir.const #cir.int<1> : !u64i
+// CHECK-NEXT: %[[ITR_LOAD:.*]] = cir.load %[[ITR]] : !cir.ptr<!u64i>, !u64i
+// CHECK-NEXT: %[[CMP:.*]] = cir.cmp(lt, %[[ITR_LOAD]], %[[UPP_BOUND]]) : !u64i, !cir.bool
+// CHECK-NEXT: cir.condition(%[[CMP]]) 
+// CHECK-NEXT: } body {
+// CHECK-NEXT: %[[ITR_LOAD:.*]] = cir.load %[[ITR]] : !cir.ptr<!u64i>, !u64i
+// CHECK-NEXT: %[[SRC_IDX:.*]] = cir.binop(mul, %[[UB3_CAST]], %[[ITR_LOAD]]) : !u64i
+// CHECK-NEXT: %[[SRC:.*]] = cir.ptr_stride(%[[ARR_ALLOCA]] : !cir.ptr<!cir.ptr<!cir.ptr<!rec_CtorDtor>>>, %[[SRC_IDX]] : !u64i), !cir.ptr<!cir.ptr<!cir.ptr<!rec_CtorDtor>>> 
+// CHECK-NEXT: %[[DEST:.*]] = cir.ptr_stride(%[[TL_ALLOCA]] : !cir.ptr<!cir.ptr<!cir.ptr<!cir.ptr<!rec_CtorDtor>>>>, %[[ITR_LOAD]] : !u64i), !cir.ptr<!cir.ptr<!cir.ptr<!cir.ptr<!rec_CtorDtor>>>> 
+// CHECK-NEXT: cir.store %[[SRC]], %[[DEST]] : !cir.ptr<!cir.ptr<!cir.ptr<!rec_CtorDtor>>>, !cir.ptr<!cir.ptr<!cir.ptr<!cir.ptr<!rec_CtorDtor>>>> 
+// CHECK-NEXT: cir.yield 
+// CHECK-NEXT: } step {
+// CHECK-NEXT: %[[ITR_LOAD:.*]] = cir.load %[[ITR]] : !cir.ptr<!u64i>, !u64i
+// CHECK-NEXT: %[[INC:.*]] = cir.unary(inc, %[[ITR_LOAD]]) : !u64i, !u64i 
+// CHECK-NEXT: cir.store %[[INC]], %[[ITR]] : !u64i, !cir.ptr<!u64i> 
+// CHECK-NEXT: cir.yield
+// CHECK-NEXT: } 
+// CHECK-NEXT: } 
+//
 // CHECK-NEXT: %[[UB2:.*]] = acc.get_upperbound %[[BOUND2]] : (!acc.data_bounds_ty) -> index
 // CHECK-NEXT: %[[UB2_CAST:.*]] = builtin.unrealized_conversion_cast %[[UB2]] : index to !u64i 
 // CHECK-NEXT: %[[NUM_ELTS:.*]] = cir.binop(mul, %[[UB2_CAST]], %[[UB3_CAST]]) : !u64i
@@ -172,12 +267,60 @@ void do_things(unsigned A, unsigned B) {
 // CHECK-NEXT: %[[ALLOCA_SIZE:.*]] = cir.binop(mul, %[[NUM_ELTS]], %[[ARR_SIZE]]) : !u64i
 // CHECK-NEXT: %[[ARR_ALLOCA2:.*]] = cir.alloca !cir.ptr<!rec_CtorDtor>, !cir.ptr<!cir.ptr<!rec_CtorDtor>>, %[[ALLOCA_SIZE]] : !u64i, ["openacc.init.bounds"] {alignment = 8 : i64}
 //
+// CHECK-NEXT: cir.scope {
+// CHECK-NEXT: %[[ITR:.*]] = cir.alloca !u64i, !cir.ptr<!u64i>, ["itr"] {alignment = 8 : i64}
+// CHECK-NEXT: %[[ZERO:.*]] = cir.const #cir.int<0> : !u64i
+// CHECK-NEXT: cir.store %[[ZERO]], %[[ITR]] : !u64i, !cir.ptr<!u64i>
+// CHECK-NEXT: cir.for : cond {
+// CHECK-NEXT: %[[ITR_LOAD:.*]] = cir.load %[[ITR]] : !cir.ptr<!u64i>, !u64i
+// CHECK-NEXT: %[[CMP:.*]] = cir.cmp(lt, %[[ITR_LOAD]], %[[UB3_CAST]]) : !u64i, !cir.bool
+// CHECK-NEXT: cir.condition(%[[CMP]]) 
+// CHECK-NEXT: } body {
+// CHECK-NEXT: %[[ITR_LOAD:.*]] = cir.load %[[ITR]] : !cir.ptr<!u64i>, !u64i
+// CHECK-NEXT: %[[SRC_IDX:.*]] = cir.binop(mul, %[[UB2_CAST]], %[[ITR_LOAD]]) : !u64i
+// CHECK-NEXT: %[[SRC:.*]] = cir.ptr_stride(%[[ARR_ALLOCA2]] : !cir.ptr<!cir.ptr<!rec_CtorDtor>>, %[[SRC_IDX]] : !u64i), !cir.ptr<!cir.ptr<!rec_CtorDtor>>
+// CHECK-NEXT: %[[DEST:.*]] = cir.ptr_stride(%[[ARR_ALLOCA]] : !cir.ptr<!cir.ptr<!cir.ptr<!rec_CtorDtor>>>, %[[ITR_LOAD]] : !u64i), !cir.ptr<!cir.ptr<!cir.ptr<!rec_CtorDtor>>>
+// CHECK-NEXT: cir.store %[[SRC]], %[[DEST]] : !cir.ptr<!cir.ptr<!rec_CtorDtor>>, !cir.ptr<!cir.ptr<!cir.ptr<!rec_CtorDtor>>>
+// CHECK-NEXT: cir.yield 
+// CHECK-NEXT: } step {
+// CHECK-NEXT: %[[ITR_LOAD:.*]] = cir.load %[[ITR]] : !cir.ptr<!u64i>, !u64i
+// CHECK-NEXT: %[[INC:.*]] = cir.unary(inc, %[[ITR_LOAD]]) : !u64i, !u64i 
+// CHECK-NEXT: cir.store %[[INC]], %[[ITR]] : !u64i, !cir.ptr<!u64i> 
+// CHECK-NEXT: cir.yield
+// CHECK-NEXT: } 
+// CHECK-NEXT: } 
+//
 // CHECK-NEXT: %[[UB1:.*]] = acc.get_upperbound %[[BOUND1]] : (!acc.data_bounds_ty) -> index
 // CHECK-NEXT: %[[UB1_CAST:.*]] = builtin.unrealized_conversion_cast %[[UB1]] : index to !u64i 
 // CHECK-NEXT: %[[NUM_ELTS2:.*]] = cir.binop(mul, %[[UB1_CAST]], %[[NUM_ELTS]]) : !u64i
 // CHECK-NEXT: %[[ARR_SIZE:.*]] = cir.const #cir.int<4> : !u64i
 // CHECK-NEXT: %[[ALLOCA_SIZE:.*]] = cir.binop(mul, %[[NUM_ELTS2]], %[[ARR_SIZE]]) : !u64i
 // CHECK-NEXT: %[[ARR_ALLOCA3:.*]] = cir.alloca !rec_CtorDtor, !cir.ptr<!rec_CtorDtor>, %[[ALLOCA_SIZE]] : !u64i, ["openacc.init.bounds"] {alignment = 4 : i64}
+//
+// CHECK-NEXT: cir.scope {
+// CHECK-NEXT: %[[ITR:.*]] = cir.alloca !u64i, !cir.ptr<!u64i>, ["itr"] {alignment = 8 : i64}
+// CHECK-NEXT: %[[ZERO:.*]] = cir.const #cir.int<0> : !u64i
+// CHECK-NEXT: cir.store %[[ZERO]], %[[ITR]] : !u64i, !cir.ptr<!u64i>
+// CHECK-NEXT: cir.for : cond {
+// CHECK-NEXT: %[[ITR_LOAD:.*]] = cir.load %[[ITR]] : !cir.ptr<!u64i>, !u64i
+// CHECK-NEXT: %[[CMP:.*]] = cir.cmp(lt, %[[ITR_LOAD]], %[[NUM_ELTS]]) : !u64i, !cir.bool
+// CHECK-NEXT: cir.condition(%[[CMP]]) 
+// CHECK-NEXT: } body {
+// CHECK-NEXT: %[[ITR_LOAD:.*]] = cir.load %[[ITR]] : !cir.ptr<!u64i>, !u64i
+// CHECK-NEXT: %[[SRC_IDX:.*]] = cir.binop(mul, %[[UB1_CAST]], %[[ITR_LOAD]]) : !u64i
+// CHECK-NEXT: %[[SRC:.*]] = cir.ptr_stride(%[[ARR_ALLOCA3]] : !cir.ptr<!rec_CtorDtor>, %[[SRC_IDX]] : !u64i), !cir.ptr<!rec_CtorDtor> 
+// CHECK-NEXT: %[[DEST:.*]] = cir.ptr_stride(%[[ARR_ALLOCA2]] : !cir.ptr<!cir.ptr<!rec_CtorDtor>>, %[[ITR_LOAD]] : !u64i), !cir.ptr<!cir.ptr<!rec_CtorDtor>> 
+// CHECK-NEXT: cir.store %[[SRC]], %[[DEST]] : !cir.ptr<!rec_CtorDtor>, !cir.ptr<!cir.ptr<!rec_CtorDtor>> 
+// CHECK-NEXT: cir.yield 
+// CHECK-NEXT: } step {
+// CHECK-NEXT: %[[ITR_LOAD:.*]] = cir.load %[[ITR]] : !cir.ptr<!u64i>, !u64i
+// CHECK-NEXT: %[[INC:.*]] = cir.unary(inc, %[[ITR_LOAD]]) : !u64i, !u64i 
+// CHECK-NEXT: cir.store %[[INC]], %[[ITR]] : !u64i, !cir.ptr<!u64i> 
+// CHECK-NEXT: cir.yield
+// CHECK-NEXT: } 
+// CHECK-NEXT: } 
+//
+//
 // TODO: Add Init here.
 // CHECK-NEXT: acc.yield
 // CHECK-NEXT: } destroy {
@@ -279,12 +422,59 @@ void do_things(unsigned A, unsigned B) {
 // CHECK-NEXT: %[[ALLOCA_SIZE:.*]] = cir.binop(mul, %[[UB2_CAST]], %[[ARR_SIZE]]) : !u64i
 // CHECK-NEXT: %[[ARR_ALLOCA:.*]] = cir.alloca !cir.ptr<!cir.ptr<!rec_CtorDtor>>, !cir.ptr<!cir.ptr<!cir.ptr<!rec_CtorDtor>>>, %[[ALLOCA_SIZE]] : !u64i, ["openacc.init.bounds"] {alignment = 8 : i64}
 //
+// CHECK-NEXT: cir.scope {
+// CHECK-NEXT: %[[ITR:.*]] = cir.alloca !u64i, !cir.ptr<!u64i>, ["itr"] {alignment = 8 : i64}
+// CHECK-NEXT: %[[ZERO:.*]] = cir.const #cir.int<0> : !u64i
+// CHECK-NEXT: cir.store %[[ZERO]], %[[ITR]] : !u64i, !cir.ptr<!u64i>
+// CHECK-NEXT: cir.for : cond {
+// CHECK-NEXT: %[[UPP_BOUND:.*]] = cir.const #cir.int<1> : !u64i
+// CHECK-NEXT: %[[ITR_LOAD:.*]] = cir.load %[[ITR]] : !cir.ptr<!u64i>, !u64i
+// CHECK-NEXT: %[[CMP:.*]] = cir.cmp(lt, %[[ITR_LOAD]], %[[UPP_BOUND]]) : !u64i, !cir.bool
+// CHECK-NEXT: cir.condition(%[[CMP]]) 
+// CHECK-NEXT: } body {
+// CHECK-NEXT: %[[ITR_LOAD:.*]] = cir.load %[[ITR]] : !cir.ptr<!u64i>, !u64i
+// CHECK-NEXT: %[[SRC_IDX:.*]] = cir.binop(mul, %[[UB2_CAST]], %[[ITR_LOAD]]) : !u64i
+// CHECK-NEXT: %[[SRC:.*]] = cir.ptr_stride(%[[ARR_ALLOCA]] : !cir.ptr<!cir.ptr<!cir.ptr<!rec_CtorDtor>>>, %[[SRC_IDX]] : !u64i), !cir.ptr<!cir.ptr<!cir.ptr<!rec_CtorDtor>>> ...
[truncated]

@andykaylor andykaylor left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, with a repeated nit about getResult()

Comment thread clang/lib/CIR/CodeGen/CIRGenOpenACCRecipe.cpp Outdated
Comment thread clang/lib/CIR/CodeGen/CIRGenOpenACCRecipe.cpp Outdated
Comment thread clang/lib/CIR/CodeGen/CIRGenOpenACCRecipe.cpp Outdated
Comment thread clang/lib/CIR/CodeGen/CIRGenOpenACCRecipe.cpp Outdated
Comment thread clang/lib/CIR/CodeGen/CIRGenOpenACCRecipe.cpp Outdated
Comment thread clang/lib/CIR/CodeGen/CIRGenOpenACCRecipe.cpp Outdated
Comment thread clang/lib/CIR/CodeGen/CIRGenOpenACCRecipe.cpp Outdated
@erichkeane erichkeane merged commit a33544b into llvm:main Oct 1, 2025
9 checks passed
@erichkeane

Copy link
Copy Markdown
Contributor Author

This crossed in the air with #161431, so you'll see failures. I'm working on a patch to fix something else that came out of this review, so i'll fix those tests too.

erichkeane added a commit to erichkeane/llvm-project that referenced this pull request Oct 1, 2025
A previous review comment pointed out that operations with only a single
result implicitly convert to `mlir::Value`.  This patch removes the
explicit use of `getResult` where it is unnecessary in OpenACC lowering.

However, there ARE a few cases where it is necessary where the
`mlir::ValueRange` implicit constructor from a single value is being
used, so those are untouched.

Additionally, while the previous patch was being committed (llvm#161382), a
second patch (llvm#161431) changed the format of cir.casts, so this patch
fixes the additional test lines for that as well.
erichkeane added a commit that referenced this pull request Oct 1, 2025
…1526)

A previous review comment pointed out that operations with only a single
result implicitly convert to `mlir::Value`. This patch removes the
explicit use of `getResult` where it is unnecessary in OpenACC lowering.

However, there ARE a few cases where it is necessary where the
`mlir::ValueRange` implicit constructor from a single value is being
used, so those are untouched.

Additionally, while the previous patch was being committed (#161382), a
second patch (#161431) changed the format of cir.casts, so this patch
fixes the additional test lines for that as well.
mahesh-attarde pushed a commit to mahesh-attarde/llvm-project that referenced this pull request Oct 3, 2025
…161382)

The previous patch ensured that we correctly got the allocas put in
place. This patch takes the address of each element of each alloca, and
copies it to the previous one. This allows us to re-form the
pointer-structure for a recipe.
mahesh-attarde pushed a commit to mahesh-attarde/llvm-project that referenced this pull request Oct 3, 2025
…m#161526)

A previous review comment pointed out that operations with only a single
result implicitly convert to `mlir::Value`. This patch removes the
explicit use of `getResult` where it is unnecessary in OpenACC lowering.

However, there ARE a few cases where it is necessary where the
`mlir::ValueRange` implicit constructor from a single value is being
used, so those are untouched.

Additionally, while the previous patch was being committed (llvm#161382), a
second patch (llvm#161431) changed the format of cir.casts, so this patch
fixes the additional test lines for that as well.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clang Clang issues not falling into any other category ClangIR Anything related to the ClangIR project

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants