Skip to content

[AArch64] Optimize vector slide shuffles with zeros to use shift instructions#185170

Merged
davemgreen merged 2 commits into
llvm:mainfrom
dibrinsofor:shifts_ops_183398
Jun 12, 2026
Merged

[AArch64] Optimize vector slide shuffles with zeros to use shift instructions#185170
davemgreen merged 2 commits into
llvm:mainfrom
dibrinsofor:shifts_ops_183398

Conversation

@dibrinsofor

Copy link
Copy Markdown
Contributor

We currently emit movi+ext instructions when generating code for shuffle slides of a 64-bit vector left/right and fill it with zeros. This patch optimizes these patterns to use a single ushr/shl instruction instead.

Example:

  define <8 x i8> @slide_left(<8 x i8> %v) {
    %r = shufflevector <8 x i8> %v, <8 x i8> zeroinitializer,
         <8 x i32> <i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8>
    ret <8 x i8> %r
  }

Before, we generate:

  movi    v1.2d, #0
  ext     v0.8b, v0.8b, v1.8b, #1

Now:

  ushr    d0, d0, #8

Fixes: #183398
Alive2 proof: https://alive2.llvm.org/ce/z/QaW5CQ

@github-actions

github-actions Bot commented Mar 7, 2026

Copy link
Copy Markdown

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot

llvmbot commented Mar 7, 2026

Copy link
Copy Markdown
Member

@llvm/pr-subscribers-backend-aarch64

Author: . (dibrinsofor)

Changes

We currently emit movi+ext instructions when generating code for shuffle slides of a 64-bit vector left/right and fill it with zeros. This patch optimizes these patterns to use a single ushr/shl instruction instead.

Example:

  define &lt;8 x i8&gt; @<!-- -->slide_left(&lt;8 x i8&gt; %v) {
    %r = shufflevector &lt;8 x i8&gt; %v, &lt;8 x i8&gt; zeroinitializer,
         &lt;8 x i32&gt; &lt;i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8&gt;
    ret &lt;8 x i8&gt; %r
  }

Before, we generate:

  movi    v1.2d, #<!-- -->0
  ext     v0.8b, v0.8b, v1.8b, #<!-- -->1

Now:

  ushr    d0, d0, #<!-- -->8

Fixes: #183398
Alive2 proof: https://alive2.llvm.org/ce/z/QaW5CQ


Patch is 97.09 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/185170.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AArch64/AArch64ISelLowering.cpp (+540-406)
  • (added) llvm/test/CodeGen/AArch64/shuffle-slide-to-shift.ll (+106)
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index dc5a3736ecaa1..e625602771efe 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -120,20 +120,20 @@ cl::opt<bool> EnableAArch64ELFLocalDynamicTLSGeneration(
     cl::init(false));
 
 static cl::opt<bool>
-EnableOptimizeLogicalImm("aarch64-enable-logical-imm", cl::Hidden,
-                         cl::desc("Enable AArch64 logical imm instruction "
-                                  "optimization"),
-                         cl::init(true));
+    EnableOptimizeLogicalImm("aarch64-enable-logical-imm", cl::Hidden,
+                             cl::desc("Enable AArch64 logical imm instruction "
+                                      "optimization"),
+                             cl::init(true));
 
 // Temporary option added for the purpose of testing functionality added
 // to DAGCombiner.cpp in D92230. It is expected that this can be removed
 // in future when both implementations will be based off MGATHER rather
 // than the GLD1 nodes added for the SVE gather load intrinsics.
 static cl::opt<bool>
-EnableCombineMGatherIntrinsics("aarch64-enable-mgather-combine", cl::Hidden,
-                                cl::desc("Combine extends of AArch64 masked "
-                                         "gather intrinsics"),
-                                cl::init(true));
+    EnableCombineMGatherIntrinsics("aarch64-enable-mgather-combine", cl::Hidden,
+                                   cl::desc("Combine extends of AArch64 masked "
+                                            "gather intrinsics"),
+                                   cl::init(true));
 
 static cl::opt<bool> EnableExtToTBL("aarch64-enable-ext-to-tbl", cl::Hidden,
                                     cl::desc("Combine ext and trunc to TBL"),
@@ -863,53 +863,53 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
 
     // promote v4f16 to v4f32 when that is known to be safe.
     auto V4Narrow = MVT::getVectorVT(ScalarVT, 4);
-    setOperationPromotedToType(ISD::FADD,       V4Narrow, MVT::v4f32);
-    setOperationPromotedToType(ISD::FSUB,       V4Narrow, MVT::v4f32);
-    setOperationPromotedToType(ISD::FMUL,       V4Narrow, MVT::v4f32);
-    setOperationPromotedToType(ISD::FDIV,       V4Narrow, MVT::v4f32);
-    setOperationPromotedToType(ISD::FCEIL,      V4Narrow, MVT::v4f32);
-    setOperationPromotedToType(ISD::FFLOOR,     V4Narrow, MVT::v4f32);
-    setOperationPromotedToType(ISD::FROUND,     V4Narrow, MVT::v4f32);
-    setOperationPromotedToType(ISD::FTRUNC,     V4Narrow, MVT::v4f32);
+    setOperationPromotedToType(ISD::FADD, V4Narrow, MVT::v4f32);
+    setOperationPromotedToType(ISD::FSUB, V4Narrow, MVT::v4f32);
+    setOperationPromotedToType(ISD::FMUL, V4Narrow, MVT::v4f32);
+    setOperationPromotedToType(ISD::FDIV, V4Narrow, MVT::v4f32);
+    setOperationPromotedToType(ISD::FCEIL, V4Narrow, MVT::v4f32);
+    setOperationPromotedToType(ISD::FFLOOR, V4Narrow, MVT::v4f32);
+    setOperationPromotedToType(ISD::FROUND, V4Narrow, MVT::v4f32);
+    setOperationPromotedToType(ISD::FTRUNC, V4Narrow, MVT::v4f32);
     setOperationPromotedToType(ISD::FROUNDEVEN, V4Narrow, MVT::v4f32);
-    setOperationPromotedToType(ISD::FRINT,      V4Narrow, MVT::v4f32);
+    setOperationPromotedToType(ISD::FRINT, V4Narrow, MVT::v4f32);
     setOperationPromotedToType(ISD::FNEARBYINT, V4Narrow, MVT::v4f32);
     setOperationPromotedToType(ISD::FCANONICALIZE, V4Narrow, MVT::v4f32);
-    setOperationPromotedToType(ISD::SETCC,         V4Narrow, MVT::v4f32);
+    setOperationPromotedToType(ISD::SETCC, V4Narrow, MVT::v4f32);
 
-    setOperationAction(ISD::FABS,        V4Narrow, Legal);
-    setOperationAction(ISD::FNEG,        V4Narrow, Legal);
-    setOperationAction(ISD::FMA,         V4Narrow, Expand);
-    setOperationAction(ISD::BR_CC,       V4Narrow, Expand);
-    setOperationAction(ISD::SELECT,      V4Narrow, Expand);
-    setOperationAction(ISD::SELECT_CC,   V4Narrow, Expand);
-    setOperationAction(ISD::FCOPYSIGN,   V4Narrow, Custom);
-    setOperationAction(ISD::FSQRT,       V4Narrow, Expand);
+    setOperationAction(ISD::FABS, V4Narrow, Legal);
+    setOperationAction(ISD::FNEG, V4Narrow, Legal);
+    setOperationAction(ISD::FMA, V4Narrow, Expand);
+    setOperationAction(ISD::BR_CC, V4Narrow, Expand);
+    setOperationAction(ISD::SELECT, V4Narrow, Expand);
+    setOperationAction(ISD::SELECT_CC, V4Narrow, Expand);
+    setOperationAction(ISD::FCOPYSIGN, V4Narrow, Custom);
+    setOperationAction(ISD::FSQRT, V4Narrow, Expand);
 
     auto V8Narrow = MVT::getVectorVT(ScalarVT, 8);
     setOperationPromotedToType(ISD::FCANONICALIZE, V8Narrow, MVT::v8f32);
-    setOperationPromotedToType(ISD::SETCC,         V8Narrow, MVT::v8f32);
-
-    setOperationAction(ISD::FABS,        V8Narrow, Legal);
-    setOperationAction(ISD::FADD,        V8Narrow, Legal);
-    setOperationAction(ISD::FCEIL,       V8Narrow, Legal);
-    setOperationAction(ISD::FCOPYSIGN,   V8Narrow, Custom);
-    setOperationAction(ISD::FDIV,        V8Narrow, Legal);
-    setOperationAction(ISD::FFLOOR,      V8Narrow, Legal);
-    setOperationAction(ISD::FMA,         V8Narrow, Expand);
-    setOperationAction(ISD::FMUL,        V8Narrow, Legal);
-    setOperationAction(ISD::FNEARBYINT,  V8Narrow, Legal);
-    setOperationAction(ISD::FNEG,        V8Narrow, Legal);
-    setOperationAction(ISD::FROUND,      V8Narrow, Legal);
-    setOperationAction(ISD::FROUNDEVEN,  V8Narrow, Legal);
-    setOperationAction(ISD::FRINT,       V8Narrow, Legal);
-    setOperationAction(ISD::FSQRT,       V8Narrow, Expand);
-    setOperationAction(ISD::FSUB,        V8Narrow, Legal);
-    setOperationAction(ISD::FTRUNC,      V8Narrow, Legal);
-    setOperationAction(ISD::BR_CC,       V8Narrow, Expand);
-    setOperationAction(ISD::SELECT,      V8Narrow, Expand);
-    setOperationAction(ISD::SELECT_CC,   V8Narrow, Expand);
-    setOperationAction(ISD::FP_EXTEND,   V8Narrow, Expand);
+    setOperationPromotedToType(ISD::SETCC, V8Narrow, MVT::v8f32);
+
+    setOperationAction(ISD::FABS, V8Narrow, Legal);
+    setOperationAction(ISD::FADD, V8Narrow, Legal);
+    setOperationAction(ISD::FCEIL, V8Narrow, Legal);
+    setOperationAction(ISD::FCOPYSIGN, V8Narrow, Custom);
+    setOperationAction(ISD::FDIV, V8Narrow, Legal);
+    setOperationAction(ISD::FFLOOR, V8Narrow, Legal);
+    setOperationAction(ISD::FMA, V8Narrow, Expand);
+    setOperationAction(ISD::FMUL, V8Narrow, Legal);
+    setOperationAction(ISD::FNEARBYINT, V8Narrow, Legal);
+    setOperationAction(ISD::FNEG, V8Narrow, Legal);
+    setOperationAction(ISD::FROUND, V8Narrow, Legal);
+    setOperationAction(ISD::FROUNDEVEN, V8Narrow, Legal);
+    setOperationAction(ISD::FRINT, V8Narrow, Legal);
+    setOperationAction(ISD::FSQRT, V8Narrow, Expand);
+    setOperationAction(ISD::FSUB, V8Narrow, Legal);
+    setOperationAction(ISD::FTRUNC, V8Narrow, Legal);
+    setOperationAction(ISD::BR_CC, V8Narrow, Expand);
+    setOperationAction(ISD::SELECT, V8Narrow, Expand);
+    setOperationAction(ISD::SELECT_CC, V8Narrow, Expand);
+    setOperationAction(ISD::FP_EXTEND, V8Narrow, Expand);
   };
 
   if (!Subtarget->hasFullFP16()) {
@@ -1320,8 +1320,8 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
       setOperationPromotedToType(ISD::UINT_TO_FP, MVT::v8i16, MVT::v8i32);
     }
 
-    setOperationAction(ISD::CTLZ,       MVT::v1i64, Expand);
-    setOperationAction(ISD::CTLZ,       MVT::v2i64, Expand);
+    setOperationAction(ISD::CTLZ, MVT::v1i64, Expand);
+    setOperationAction(ISD::CTLZ, MVT::v2i64, Expand);
     // CTLS (Count Leading Sign bits) - Legal for BHS types (8/16/32-bit
     // elements) No hardware support for 64-bit element vectors
     for (auto VT : {MVT::v8i8, MVT::v16i8, MVT::v4i16, MVT::v8i16, MVT::v2i32,
@@ -1349,8 +1349,8 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
     setOperationAction(ISD::MUL, MVT::v1i64, Custom);
 
     // Saturates
-    for (MVT VT : { MVT::v8i8, MVT::v4i16, MVT::v2i32, MVT::v1i64,
-                    MVT::v16i8, MVT::v8i16, MVT::v4i32, MVT::v2i64 }) {
+    for (MVT VT : {MVT::v8i8, MVT::v4i16, MVT::v2i32, MVT::v1i64, MVT::v16i8,
+                   MVT::v8i16, MVT::v4i32, MVT::v2i64}) {
       setOperationAction(ISD::SADDSAT, VT, Legal);
       setOperationAction(ISD::UADDSAT, VT, Legal);
       setOperationAction(ISD::SSUBSAT, VT, Legal);
@@ -1368,8 +1368,8 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
     }
 
     // Vector reductions
-    for (MVT VT : { MVT::v4f16, MVT::v2f32,
-                    MVT::v8f16, MVT::v4f32, MVT::v2f64 }) {
+    for (MVT VT :
+         {MVT::v4f16, MVT::v2f32, MVT::v8f16, MVT::v4f32, MVT::v2f64}) {
       if (VT.getVectorElementType() != MVT::f16 || Subtarget->hasFullFP16()) {
         setOperationAction(ISD::VECREDUCE_FMAX, VT, Legal);
         setOperationAction(ISD::VECREDUCE_FMIN, VT, Legal);
@@ -1382,8 +1382,8 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
     if (Subtarget->hasFullFP16())
       setOperationAction(ISD::VECREDUCE_FADD, MVT::v2f16, Custom);
 
-    for (MVT VT : { MVT::v8i8, MVT::v4i16, MVT::v2i32,
-                    MVT::v16i8, MVT::v8i16, MVT::v4i32 }) {
+    for (MVT VT : {MVT::v8i8, MVT::v4i16, MVT::v2i32, MVT::v16i8, MVT::v8i16,
+                   MVT::v4i32}) {
       setOperationAction(ISD::VECREDUCE_ADD, VT, Custom);
       setOperationAction(ISD::VECREDUCE_SMAX, VT, Custom);
       setOperationAction(ISD::VECREDUCE_SMIN, VT, Custom);
@@ -1483,10 +1483,10 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
     setLoadExtAction(ISD::ZEXTLOAD, MVT::v2i64, MVT::v2i16, Custom);
 
     // ADDP custom lowering
-    for (MVT VT : { MVT::v32i8, MVT::v16i16, MVT::v8i32, MVT::v4i64 })
+    for (MVT VT : {MVT::v32i8, MVT::v16i16, MVT::v8i32, MVT::v4i64})
       setOperationAction(ISD::ADD, VT, Custom);
     // FADDP custom lowering
-    for (MVT VT : { MVT::v16f16, MVT::v8f32, MVT::v4f64 })
+    for (MVT VT : {MVT::v16f16, MVT::v8f32, MVT::v4f64})
       setOperationAction(ISD::FADD, VT, Custom);
 
     if (Subtarget->hasDotProd()) {
@@ -1666,8 +1666,8 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
       setOperationAction(ISD::BITCAST, VT, Custom);
 
     for (auto VT :
-         { MVT::nxv2i8, MVT::nxv2i16, MVT::nxv2i32, MVT::nxv2i64, MVT::nxv4i8,
-           MVT::nxv4i16, MVT::nxv4i32, MVT::nxv8i8, MVT::nxv8i16 })
+         {MVT::nxv2i8, MVT::nxv2i16, MVT::nxv2i32, MVT::nxv2i64, MVT::nxv4i8,
+          MVT::nxv4i16, MVT::nxv4i32, MVT::nxv8i8, MVT::nxv8i16})
       setOperationAction(ISD::SIGN_EXTEND_INREG, VT, Legal);
 
     // Promote predicate as counter load/stores to standard predicates.
@@ -1702,10 +1702,9 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
     }
 
     // NEON doesn't support masked loads/stores, but SME and SVE do.
-    for (auto VT :
-         {MVT::v4f16, MVT::v8f16, MVT::v2f32, MVT::v4f32, MVT::v1f64,
-          MVT::v2f64, MVT::v8i8, MVT::v16i8, MVT::v4i16, MVT::v8i16,
-          MVT::v2i32, MVT::v4i32, MVT::v1i64, MVT::v2i64}) {
+    for (auto VT : {MVT::v4f16, MVT::v8f16, MVT::v2f32, MVT::v4f32, MVT::v1f64,
+                    MVT::v2f64, MVT::v8i8, MVT::v16i8, MVT::v4i16, MVT::v8i16,
+                    MVT::v2i32, MVT::v4i32, MVT::v1i64, MVT::v2i64}) {
       setOperationAction(ISD::MLOAD, VT, Custom);
       setOperationAction(ISD::MSTORE, VT, Custom);
     }
@@ -1960,8 +1959,8 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
       setOperationAction(ISD::VECREDUCE_UMIN, MVT::v2i64, Custom);
 
       // Int operations with no NEON support.
-      for (auto VT : {MVT::v8i8, MVT::v16i8, MVT::v4i16, MVT::v8i16,
-                      MVT::v2i32, MVT::v4i32, MVT::v2i64}) {
+      for (auto VT : {MVT::v8i8, MVT::v16i8, MVT::v4i16, MVT::v8i16, MVT::v2i32,
+                      MVT::v4i32, MVT::v2i64}) {
         setOperationAction(ISD::BITREVERSE, VT, Custom);
         setOperationAction(ISD::CTTZ, VT, Custom);
         setOperationAction(ISD::VECREDUCE_AND, VT, Custom);
@@ -2241,8 +2240,7 @@ void AArch64TargetLowering::addTypeForNEON(MVT VT) {
 
   // F[MIN|MAX][NUM|NAN] and simple strict operations are available for all FP
   // NEON types.
-  if (VT.isFloatingPoint() &&
-      VT.getVectorElementType() != MVT::bf16 &&
+  if (VT.isFloatingPoint() && VT.getVectorElementType() != MVT::bf16 &&
       (VT.getVectorElementType() != MVT::f16 || Subtarget->hasFullFP16()))
     for (unsigned Opcode :
          {ISD::FMINIMUM, ISD::FMAXIMUM, ISD::FMINNUM, ISD::FMAXNUM,
@@ -2657,8 +2655,8 @@ static bool optimizeLogicalImm(SDValue Op, unsigned Size, uint64_t Imm,
   if (NewImm == 0 || NewImm == OrigMask) {
     New = TLO.DAG.getNode(Op.getOpcode(), DL, VT, Op.getOperand(0),
                           TLO.DAG.getConstant(NewImm, DL, VT));
-  // Otherwise, create a machine node so that target independent DAG combine
-  // doesn't undo this optimization.
+    // Otherwise, create a machine node so that target independent DAG combine
+    // doesn't undo this optimization.
   } else {
     Enc = AArch64_AM::encodeLogicalImmediate(NewImm, Size);
     SDValue EncConst = TLO.DAG.getTargetConstant(Enc, DL, VT);
@@ -2846,7 +2844,8 @@ void AArch64TargetLowering::computeKnownBitsForTargetNode(
     Intrinsic::ID IntID =
         static_cast<Intrinsic::ID>(Op->getConstantOperandVal(1));
     switch (IntID) {
-    default: return;
+    default:
+      return;
     case Intrinsic::aarch64_ldaxr:
     case Intrinsic::aarch64_ldxr: {
       unsigned BitWidth = Known.getBitWidth();
@@ -2868,7 +2867,7 @@ void AArch64TargetLowering::computeKnownBitsForTargetNode(
       MVT VT = Op.getOperand(1).getValueType().getSimpleVT();
       unsigned BitWidth = Known.getBitWidth();
       if (VT == MVT::v8i8 || VT == MVT::v16i8) {
-        unsigned Bound = (VT == MVT::v8i8) ?  11 : 12;
+        unsigned Bound = (VT == MVT::v8i8) ? 11 : 12;
         assert(BitWidth >= Bound && "Unexpected width!");
         APInt Mask = APInt::getHighBitsSet(BitWidth, BitWidth - Bound);
         Known.Zero |= Mask;
@@ -3055,8 +3054,9 @@ AArch64TargetLowering::EmitF128CSEL(MachineInstr &MI,
   return EndBB;
 }
 
-MachineBasicBlock *AArch64TargetLowering::EmitLoweredCatchRet(
-       MachineInstr &MI, MachineBasicBlock *BB) const {
+MachineBasicBlock *
+AArch64TargetLowering::EmitLoweredCatchRet(MachineInstr &MI,
+                                           MachineBasicBlock *BB) const {
   assert(!isAsynchronousEHPersonality(classifyEHPersonality(
              BB->getParent()->getFunction().getPersonalityFn())) &&
          "SEH does not use catchret!");
@@ -4095,8 +4095,8 @@ static bool canEmitConjunction(SelectionDAG &DAG, const SDValue Val,
 /// \p Negate is true if we want this sub-tree being negated just by changing
 /// SETCC conditions.
 static SDValue emitConjunctionRec(SelectionDAG &DAG, SDValue Val,
-    AArch64CC::CondCode &OutCC, bool Negate, SDValue CCOp,
-    AArch64CC::CondCode Predicate) {
+                                  AArch64CC::CondCode &OutCC, bool Negate,
+                                  SDValue CCOp, AArch64CC::CondCode Predicate) {
   // We're at a tree leaf, produce a conditional comparison operation.
   unsigned Opcode = Val->getOpcode();
   if (Opcode == ISD::SETCC) {
@@ -4498,10 +4498,9 @@ getAArch64XALUOOp(AArch64CC::CondCode &CC, SDValue Op, SelectionDAG &DAG) {
     } else {
       SDValue UpperBits = DAG.getNode(ISD::MULHU, DL, MVT::i64, LHS, RHS);
       SDVTList VTs = DAG.getVTList(MVT::i64, FlagsVT);
-      Overflow =
-          DAG.getNode(AArch64ISD::SUBS, DL, VTs,
-                      DAG.getConstant(0, DL, MVT::i64),
-                      UpperBits).getValue(1);
+      Overflow = DAG.getNode(AArch64ISD::SUBS, DL, VTs,
+                             DAG.getConstant(0, DL, MVT::i64), UpperBits)
+                     .getValue(1);
     }
     break;
   }
@@ -4740,10 +4739,10 @@ static SDValue LowerPREFETCH(SDValue Op, SelectionDAG &DAG) {
   }
 
   // built the mask value encoding the expected behavior.
-  unsigned PrfOp = (IsWrite << 4) |     // Load/Store bit
-                   (!IsData << 3) |     // IsDataCache bit
-                   (Locality << 1) |    // Cache level bits
-                   (unsigned)IsStream;  // Stream bit
+  unsigned PrfOp = (IsWrite << 4) |    // Load/Store bit
+                   (!IsData << 3) |    // IsDataCache bit
+                   (Locality << 1) |   // Cache level bits
+                   (unsigned)IsStream; // Stream bit
   return DAG.getNode(AArch64ISD::PREFETCH, DL, MVT::Other, Op.getOperand(0),
                      DAG.getTargetConstant(PrfOp, DL, MVT::i32),
                      Op.getOperand(1));
@@ -5177,7 +5176,8 @@ AArch64TargetLowering::LowerVectorFP_TO_INT_SAT(SDValue Op,
     SDValue MinC = DAG.getConstant(
         APInt::getSignedMaxValue(SatWidth).sext(SrcElementWidth), DL, IntVT);
     SDValue Min = DAG.getNode(ISD::SMIN, DL, IntVT, NativeCvt, MinC);
-    SDValue Min2 = SrcVal2 ? DAG.getNode(ISD::SMIN, DL, IntVT, NativeCvt2, MinC) : SDValue();
+    SDValue Min2 = SrcVal2 ? DAG.getNode(ISD::SMIN, DL, IntVT, NativeCvt2, MinC)
+                           : SDValue();
     SDValue MaxC = DAG.getConstant(
         APInt::getSignedMinValue(SatWidth).sext(SrcElementWidth), DL, IntVT);
     Sat = DAG.getNode(ISD::SMAX, DL, IntVT, Min, MaxC);
@@ -5186,7 +5186,8 @@ AArch64TargetLowering::LowerVectorFP_TO_INT_SAT(SDValue Op,
     SDValue MinC = DAG.getConstant(
         APInt::getAllOnes(SatWidth).zext(SrcElementWidth), DL, IntVT);
     Sat = DAG.getNode(ISD::UMIN, DL, IntVT, NativeCvt, MinC);
-    Sat2 = SrcVal2 ? DAG.getNode(ISD::UMIN, DL, IntVT, NativeCvt2, MinC) : SDValue();
+    Sat2 = SrcVal2 ? DAG.getNode(ISD::UMIN, DL, IntVT, NativeCvt2, MinC)
+                   : SDValue();
   }
 
   if (SrcVal2)
@@ -5258,8 +5259,8 @@ SDValue AArch64TargetLowering::LowerFP_TO_INT_SAT(SDValue Op,
         APInt::getSignedMinValue(SatWidth).sext(DstWidth), DL, DstVT);
     Sat = DAG.getNode(ISD::SMAX, DL, DstVT, Min, MaxC);
   } else {
-    SDValue MinC = DAG.getConstant(
-        APInt::getAllOnes(SatWidth).zext(DstWidth), DL, DstVT);
+    SDValue MinC =
+        DAG.getConstant(APInt::getAllOnes(SatWidth).zext(DstWidth), DL, DstVT);
     Sat = DAG.getNode(ISD::UMIN, DL, DstVT, NativeCvt, MinC);
   }
 
@@ -5406,7 +5407,7 @@ SDValue AArch64TargetLowering::LowerVectorINT_TO_FP(SDValue Op,
 }
 
 SDValue AArch64TargetLowering::LowerINT_TO_FP(SDValue Op,
-                                            SelectionDAG &DAG) const {
+                                              SelectionDAG &DAG) const {
   if (Op.getValueType().isVector())
     return LowerVectorINT_TO_FP(Op, DAG);
 
@@ -5723,8 +5724,8 @@ static bool isAddSubSExt(SDValue N, SelectionDAG &DAG) {
   if (Opcode == ISD::ADD || Opcode == ISD::SUB) {
     SDValue N0 = N.getOperand(0);
     SDValue N1 = N.getOperand(1);
-    return N0->hasOneUse() && N1->hasOneUse() &&
-      isSignExtended(N0, DAG) && isSignExtended(N1, DAG);
+    return N0->hasOneUse() && N1->hasOneUse() && isSignExtended(N0, DAG) &&
+           isSignExtended(N1, DAG);
   }
   return false;
 }
@@ -5734,8 +5735,8 @@ static bool isAddSubZExt(SDValue N, SelectionDAG &DAG) {
   if (Opcode == ISD::ADD || Opcode == ISD::SUB) {
     SDValue N0 = N.getOperand(0);
     SDValue N1 = N.getOperand(1);
-    return N0->hasOneUse() && N1->hasOneUse() &&
-      isZeroExtended(N0, DAG) && isZeroExtended(N1, DAG);
+    return N0->hasOneUse() && N1->hasOneUse() && isZeroExtended(N0, DAG) &&
+           isZeroExtended(N1, DAG);
   }
   return false;
 }
@@ -6419,12 +6420,14 @@ SDValue AArch64TargetLowering::LowerINTRINSIC_W_CHAIN(SDValue Op,
   }
 }
 
-SDValue AArch64TargetLowering::LowerINTRINSIC_WO_CHAIN(SDValue Op,
-                                                     SelectionDAG &DAG) const {
+SDValue
+AArch64TargetLowering::LowerINTRINSIC_WO_CHAIN(SDValue Op,
+                                               SelectionDAG &DAG) const {
   unsigned IntNo = Op.getConstantOperandVal(0);
   SDLoc DL(Op);
   switch (IntNo) {
-  default: return SDValue();    // Don't custom lower most intrinsics.
+  default:
+    return SDValue(); // Don't custom lower most intrinsics.
   case Intrinsic::thread_pointer: {
     E...
[truncated]

@davemgreen davemgreen left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove all the formatting diffs - they make it impossible to review.

Comment thread llvm/test/CodeGen/AArch64/shuffle-slide-to-shift.ll
@github-actions

github-actions Bot commented Mar 8, 2026

Copy link
Copy Markdown

🐧 Linux x64 Test Results

  • 196501 tests passed
  • 5348 tests skipped

✅ The build succeeded and all tests passed.

@github-actions

github-actions Bot commented Mar 8, 2026

Copy link
Copy Markdown

🪟 Windows x64 Test Results

  • 135592 tests passed
  • 3415 tests skipped

✅ The build succeeded and all tests passed.

@dibrinsofor dibrinsofor requested a review from davemgreen March 11, 2026 00:32
@github-actions

github-actions Bot commented Mar 11, 2026

Copy link
Copy Markdown

✅ With the latest revision this PR passed the C/C++ code formatter.

@davemgreen davemgreen requested a review from nasherm March 11, 2026 13:10

@davemgreen davemgreen left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't look like this is applying at the moment?

Comment thread llvm/lib/Target/AArch64/AArch64ISelLowering.cpp Outdated
Comment thread llvm/lib/Target/AArch64/AArch64ISelLowering.cpp Outdated
Comment thread llvm/lib/Target/AArch64/AArch64ISelLowering.cpp Outdated
Comment thread llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
Comment thread llvm/lib/Target/AArch64/AArch64ISelLowering.cpp Outdated
if (i < LaneElts - SlideAmt) {
// Data element: must be consecutive within lane
if (M != (int)(LaneStart + SlideAmt + i))
Valid = false;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Valid = false could do return 0;, as the two If's will be independat.

Comment thread llvm/test/CodeGen/AArch64/shuffle-slide-to-shift.ll
; RUN: llc -mtriple=aarch64 < %s | FileCheck %s


; repro of gh issue

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment can probably be removed as it doesn't add much. It isnt clean what the gh issue is from just this test.

Comment thread llvm/lib/Target/AArch64/AArch64ISelLowering.cpp Outdated
; CHECK-LABEL: slide_right_v16i8:
; CHECK: // %bb.0:
; CHECK-NEXT: movi v1.2d, #0000000000000000
; CHECK-NEXT: ext v0.16b, v0.16b, v1.16b, #15

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this is going wrong now due to the swap.

Comment on lines +14649 to +14650
static bool isSlideWithZerosMask(ArrayRef<int> M, EVT VT, SDValue &V1,
SDValue &V2, unsigned &ShiftAmount,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pass V1 and V2 by value not reference, and return the operand that is shifted?

Comment on lines +14695 to +14696


Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be a single new line.

@dibrinsofor dibrinsofor force-pushed the shifts_ops_183398 branch 3 times, most recently from edd2125 to 61c8e0a Compare March 30, 2026 16:57
@dibrinsofor dibrinsofor requested a review from davemgreen April 1, 2026 22:58
@davemgreen

Copy link
Copy Markdown
Contributor

Can you update the existing tests?

@dibrinsofor

dibrinsofor commented Jun 5, 2026

Copy link
Copy Markdown
Contributor Author

Can you update the existing tests?

@davemgreen Please what do you mean? are you referring to the commuted tests? I have already implemented those.

@davemgreen

Copy link
Copy Markdown
Contributor

Hello. I presume I meant because these are failing in the precommit CI and need updating:
LLVM.CodeGen/AArch64/ext-narrow-index.ll
LLVM.CodeGen/AArch64/fp-conversion-to-tbl.ll
If that is incorrect (or even if it is not), can you give it a rebase too?

@dibrinsofor

dibrinsofor commented Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

the failing ci case seems to be unrelated.

...
sccache: error: Timed out waiting for server startup. Maybe the remote service is unreachable?
...

not sure how to proceed

all good now. @davemgreen

@davemgreen davemgreen left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - It's been a while so I had to check again but this LGTM.

@davemgreen

Copy link
Copy Markdown
Contributor

Are you happy for this to be merged in?

@dibrinsofor

Copy link
Copy Markdown
Contributor Author

yes, i think it is good to go @davemgreen

@dibrinsofor

Copy link
Copy Markdown
Contributor Author

@nasherm could you take a look?

@davemgreen

Copy link
Copy Markdown
Contributor

I'll push this if there are no other comment.

@davemgreen davemgreen merged commit 9bbed74 into llvm:main Jun 12, 2026
10 checks passed
@github-actions

Copy link
Copy Markdown

@dibrinsofor Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

KseniyaTikhomirova pushed a commit to KseniyaTikhomirova/llvm-project that referenced this pull request Jun 16, 2026
…ructions (llvm#185170)

We currently emit `movi`+`ext` instructions when generating code for
shuffle slides of a 64-bit vector left/right and fill it with zeros.
This patch optimizes these patterns to use a single `ushr`/`shl`
instruction instead.

Example:
```llvm
  define <8 x i8> @slide_left(<8 x i8> %v) {
    %r = shufflevector <8 x i8> %v, <8 x i8> zeroinitializer,
         <8 x i32> <i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8>
    ret <8 x i8> %r
  }
```

Before, we generate:
```
  movi    v1.2d, #0
  ext     v0.8b, v0.8b, v1.8b, #1
```

Now:
```
  ushr    d0, d0, llvm#8
```

Fixes: llvm#183398
Alive2 proof: https://alive2.llvm.org/ce/z/QaW5CQ

---------

Signed-off-by: Dibri Nsofor <dibrinsofor@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[AArch64] Opportunity to use shifts instead of ext for some shuffles

3 participants