[AArch64] Optimize vector slide shuffles with zeros to use shift instructions by dibrinsofor · Pull Request #185170 · llvm/llvm-project

dibrinsofor · 2026-03-07T09:45:22Z

We currently emit movi+ext instructions when generating code for shuffle slides of a 64-bit vector left/right and fill it with zeros. This patch optimizes these patterns to use a single ushr/shl instruction instead.

Example:

  define <8 x i8> @slide_left(<8 x i8> %v) {
    %r = shufflevector <8 x i8> %v, <8 x i8> zeroinitializer,
         <8 x i32> <i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8>
    ret <8 x i8> %r
  }

Before, we generate:

  movi    v1.2d, #0
  ext     v0.8b, v0.8b, v1.8b, #1

Now:

  ushr    d0, d0, #8

Fixes: #183398
Alive2 proof: https://alive2.llvm.org/ce/z/QaW5CQ

github-actions · 2026-03-07T09:56:50Z

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

llvmbot · 2026-03-07T10:06:39Z

@llvm/pr-subscribers-backend-aarch64

Author: . (dibrinsofor)

Changes

We currently emit movi+ext instructions when generating code for shuffle slides of a 64-bit vector left/right and fill it with zeros. This patch optimizes these patterns to use a single ushr/shl instruction instead.

Example:

  define &lt;8 x i8&gt; @<!-- -->slide_left(&lt;8 x i8&gt; %v) {
    %r = shufflevector &lt;8 x i8&gt; %v, &lt;8 x i8&gt; zeroinitializer,
         &lt;8 x i32&gt; &lt;i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8&gt;
    ret &lt;8 x i8&gt; %r
  }

Before, we generate:

  movi    v1.2d, #<!-- -->0
  ext     v0.8b, v0.8b, v1.8b, #<!-- -->1

Now:

  ushr    d0, d0, #<!-- -->8

Fixes: #183398
Alive2 proof: https://alive2.llvm.org/ce/z/QaW5CQ

Patch is 97.09 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/185170.diff

2 Files Affected:

(modified) llvm/lib/Target/AArch64/AArch64ISelLowering.cpp (+540-406)
(added) llvm/test/CodeGen/AArch64/shuffle-slide-to-shift.ll (+106)

diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index dc5a3736ecaa1..e625602771efe 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -120,20 +120,20 @@ cl::opt<bool> EnableAArch64ELFLocalDynamicTLSGeneration(
     cl::init(false));
 
 static cl::opt<bool>
-EnableOptimizeLogicalImm("aarch64-enable-logical-imm", cl::Hidden,
-                         cl::desc("Enable AArch64 logical imm instruction "
-                                  "optimization"),
-                         cl::init(true));
+    EnableOptimizeLogicalImm("aarch64-enable-logical-imm", cl::Hidden,
+                             cl::desc("Enable AArch64 logical imm instruction "
+                                      "optimization"),
+                             cl::init(true));
 
 // Temporary option added for the purpose of testing functionality added
 // to DAGCombiner.cpp in D92230. It is expected that this can be removed
 // in future when both implementations will be based off MGATHER rather
 // than the GLD1 nodes added for the SVE gather load intrinsics.
 static cl::opt<bool>
-EnableCombineMGatherIntrinsics("aarch64-enable-mgather-combine", cl::Hidden,
-                                cl::desc("Combine extends of AArch64 masked "
-                                         "gather intrinsics"),
-                                cl::init(true));
+    EnableCombineMGatherIntrinsics("aarch64-enable-mgather-combine", cl::Hidden,
+                                   cl::desc("Combine extends of AArch64 masked "
+                                            "gather intrinsics"),
+                                   cl::init(true));
 
 static cl::opt<bool> EnableExtToTBL("aarch64-enable-ext-to-tbl", cl::Hidden,
                                     cl::desc("Combine ext and trunc to TBL"),
@@ -863,53 +863,53 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
 
     // promote v4f16 to v4f32 when that is known to be safe.
     auto V4Narrow = MVT::getVectorVT(ScalarVT, 4);
-    setOperationPromotedToType(ISD::FADD,       V4Narrow, MVT::v4f32);
-    setOperationPromotedToType(ISD::FSUB,       V4Narrow, MVT::v4f32);
-    setOperationPromotedToType(ISD::FMUL,       V4Narrow, MVT::v4f32);
-    setOperationPromotedToType(ISD::FDIV,       V4Narrow, MVT::v4f32);
-    setOperationPromotedToType(ISD::FCEIL,      V4Narrow, MVT::v4f32);
-    setOperationPromotedToType(ISD::FFLOOR,     V4Narrow, MVT::v4f32);
-    setOperationPromotedToType(ISD::FROUND,     V4Narrow, MVT::v4f32);
-    setOperationPromotedToType(ISD::FTRUNC,     V4Narrow, MVT::v4f32);
+    setOperationPromotedToType(ISD::FADD, V4Narrow, MVT::v4f32);
+    setOperationPromotedToType(ISD::FSUB, V4Narrow, MVT::v4f32);
+    setOperationPromotedToType(ISD::FMUL, V4Narrow, MVT::v4f32);
+    setOperationPromotedToType(ISD::FDIV, V4Narrow, MVT::v4f32);
+    setOperationPromotedToType(ISD::FCEIL, V4Narrow, MVT::v4f32);
+    setOperationPromotedToType(ISD::FFLOOR, V4Narrow, MVT::v4f32);
+    setOperationPromotedToType(ISD::FROUND, V4Narrow, MVT::v4f32);
+    setOperationPromotedToType(ISD::FTRUNC, V4Narrow, MVT::v4f32);
     setOperationPromotedToType(ISD::FROUNDEVEN, V4Narrow, MVT::v4f32);
-    setOperationPromotedToType(ISD::FRINT,      V4Narrow, MVT::v4f32);
+    setOperationPromotedToType(ISD::FRINT, V4Narrow, MVT::v4f32);
     setOperationPromotedToType(ISD::FNEARBYINT, V4Narrow, MVT::v4f32);
     setOperationPromotedToType(ISD::FCANONICALIZE, V4Narrow, MVT::v4f32);
-    setOperationPromotedToType(ISD::SETCC,         V4Narrow, MVT::v4f32);
+    setOperationPromotedToType(ISD::SETCC, V4Narrow, MVT::v4f32);
 
-    setOperationAction(ISD::FABS,        V4Narrow, Legal);
-    setOperationAction(ISD::FNEG,        V4Narrow, Legal);
-    setOperationAction(ISD::FMA,         V4Narrow, Expand);
-    setOperationAction(ISD::BR_CC,       V4Narrow, Expand);
-    setOperationAction(ISD::SELECT,      V4Narrow, Expand);
-    setOperationAction(ISD::SELECT_CC,   V4Narrow, Expand);
-    setOperationAction(ISD::FCOPYSIGN,   V4Narrow, Custom);
-    setOperationAction(ISD::FSQRT,       V4Narrow, Expand);
+    setOperationAction(ISD::FABS, V4Narrow, Legal);
+    setOperationAction(ISD::FNEG, V4Narrow, Legal);
+    setOperationAction(ISD::FMA, V4Narrow, Expand);
+    setOperationAction(ISD::BR_CC, V4Narrow, Expand);
+    setOperationAction(ISD::SELECT, V4Narrow, Expand);
+    setOperationAction(ISD::SELECT_CC, V4Narrow, Expand);
+    setOperationAction(ISD::FCOPYSIGN, V4Narrow, Custom);
+    setOperationAction(ISD::FSQRT, V4Narrow, Expand);
 
     auto V8Narrow = MVT::getVectorVT(ScalarVT, 8);
     setOperationPromotedToType(ISD::FCANONICALIZE, V8Narrow, MVT::v8f32);
-    setOperationPromotedToType(ISD::SETCC,         V8Narrow, MVT::v8f32);
-
-    setOperationAction(ISD::FABS,        V8Narrow, Legal);
-    setOperationAction(ISD::FADD,        V8Narrow, Legal);
-    setOperationAction(ISD::FCEIL,       V8Narrow, Legal);
-    setOperationAction(ISD::FCOPYSIGN,   V8Narrow, Custom);
-    setOperationAction(ISD::FDIV,        V8Narrow, Legal);
-    setOperationAction(ISD::FFLOOR,      V8Narrow, Legal);
-    setOperationAction(ISD::FMA,         V8Narrow, Expand);
-    setOperationAction(ISD::FMUL,        V8Narrow, Legal);
-    setOperationAction(ISD::FNEARBYINT,  V8Narrow, Legal);
-    setOperationAction(ISD::FNEG,        V8Narrow, Legal);
-    setOperationAction(ISD::FROUND,      V8Narrow, Legal);
-    setOperationAction(ISD::FROUNDEVEN,  V8Narrow, Legal);
-    setOperationAction(ISD::FRINT,       V8Narrow, Legal);
-    setOperationAction(ISD::FSQRT,       V8Narrow, Expand);
-    setOperationAction(ISD::FSUB,        V8Narrow, Legal);
-    setOperationAction(ISD::FTRUNC,      V8Narrow, Legal);
-    setOperationAction(ISD::BR_CC,       V8Narrow, Expand);
-    setOperationAction(ISD::SELECT,      V8Narrow, Expand);
-    setOperationAction(ISD::SELECT_CC,   V8Narrow, Expand);
-    setOperationAction(ISD::FP_EXTEND,   V8Narrow, Expand);
+    setOperationPromotedToType(ISD::SETCC, V8Narrow, MVT::v8f32);
+
+    setOperationAction(ISD::FABS, V8Narrow, Legal);
+    setOperationAction(ISD::FADD, V8Narrow, Legal);
+    setOperationAction(ISD::FCEIL, V8Narrow, Legal);
+    setOperationAction(ISD::FCOPYSIGN, V8Narrow, Custom);
+    setOperationAction(ISD::FDIV, V8Narrow, Legal);
+    setOperationAction(ISD::FFLOOR, V8Narrow, Legal);
+    setOperationAction(ISD::FMA, V8Narrow, Expand);
+    setOperationAction(ISD::FMUL, V8Narrow, Legal);
+    setOperationAction(ISD::FNEARBYINT, V8Narrow, Legal);
+    setOperationAction(ISD::FNEG, V8Narrow, Legal);
+    setOperationAction(ISD::FROUND, V8Narrow, Legal);
+    setOperationAction(ISD::FROUNDEVEN, V8Narrow, Legal);
+    setOperationAction(ISD::FRINT, V8Narrow, Legal);
+    setOperationAction(ISD::FSQRT, V8Narrow, Expand);
+    setOperationAction(ISD::FSUB, V8Narrow, Legal);
+    setOperationAction(ISD::FTRUNC, V8Narrow, Legal);
+    setOperationAction(ISD::BR_CC, V8Narrow, Expand);
+    setOperationAction(ISD::SELECT, V8Narrow, Expand);
+    setOperationAction(ISD::SELECT_CC, V8Narrow, Expand);
+    setOperationAction(ISD::FP_EXTEND, V8Narrow, Expand);
   };
 
   if (!Subtarget->hasFullFP16()) {
@@ -1320,8 +1320,8 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
       setOperationPromotedToType(ISD::UINT_TO_FP, MVT::v8i16, MVT::v8i32);
     }
 
-    setOperationAction(ISD::CTLZ,       MVT::v1i64, Expand);
-    setOperationAction(ISD::CTLZ,       MVT::v2i64, Expand);
+    setOperationAction(ISD::CTLZ, MVT::v1i64, Expand);
+    setOperationAction(ISD::CTLZ, MVT::v2i64, Expand);
     // CTLS (Count Leading Sign bits) - Legal for BHS types (8/16/32-bit
     // elements) No hardware support for 64-bit element vectors
     for (auto VT : {MVT::v8i8, MVT::v16i8, MVT::v4i16, MVT::v8i16, MVT::v2i32,
@@ -1349,8 +1349,8 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
     setOperationAction(ISD::MUL, MVT::v1i64, Custom);
 
     // Saturates
-    for (MVT VT : { MVT::v8i8, MVT::v4i16, MVT::v2i32, MVT::v1i64,
-                    MVT::v16i8, MVT::v8i16, MVT::v4i32, MVT::v2i64 }) {
+    for (MVT VT : {MVT::v8i8, MVT::v4i16, MVT::v2i32, MVT::v1i64, MVT::v16i8,
+                   MVT::v8i16, MVT::v4i32, MVT::v2i64}) {
       setOperationAction(ISD::SADDSAT, VT, Legal);
       setOperationAction(ISD::UADDSAT, VT, Legal);
       setOperationAction(ISD::SSUBSAT, VT, Legal);
@@ -1368,8 +1368,8 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
     }
 
     // Vector reductions
-    for (MVT VT : { MVT::v4f16, MVT::v2f32,
-                    MVT::v8f16, MVT::v4f32, MVT::v2f64 }) {
+    for (MVT VT :
+         {MVT::v4f16, MVT::v2f32, MVT::v8f16, MVT::v4f32, MVT::v2f64}) {
       if (VT.getVectorElementType() != MVT::f16 || Subtarget->hasFullFP16()) {
         setOperationAction(ISD::VECREDUCE_FMAX, VT, Legal);
         setOperationAction(ISD::VECREDUCE_FMIN, VT, Legal);
@@ -1382,8 +1382,8 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
     if (Subtarget->hasFullFP16())
       setOperationAction(ISD::VECREDUCE_FADD, MVT::v2f16, Custom);
 
-    for (MVT VT : { MVT::v8i8, MVT::v4i16, MVT::v2i32,
-                    MVT::v16i8, MVT::v8i16, MVT::v4i32 }) {
+    for (MVT VT : {MVT::v8i8, MVT::v4i16, MVT::v2i32, MVT::v16i8, MVT::v8i16,
+                   MVT::v4i32}) {
       setOperationAction(ISD::VECREDUCE_ADD, VT, Custom);
       setOperationAction(ISD::VECREDUCE_SMAX, VT, Custom);
       setOperationAction(ISD::VECREDUCE_SMIN, VT, Custom);
@@ -1483,10 +1483,10 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
     setLoadExtAction(ISD::ZEXTLOAD, MVT::v2i64, MVT::v2i16, Custom);
 
     // ADDP custom lowering
-    for (MVT VT : { MVT::v32i8, MVT::v16i16, MVT::v8i32, MVT::v4i64 })
+    for (MVT VT : {MVT::v32i8, MVT::v16i16, MVT::v8i32, MVT::v4i64})
       setOperationAction(ISD::ADD, VT, Custom);
     // FADDP custom lowering
-    for (MVT VT : { MVT::v16f16, MVT::v8f32, MVT::v4f64 })
+    for (MVT VT : {MVT::v16f16, MVT::v8f32, MVT::v4f64})
       setOperationAction(ISD::FADD, VT, Custom);
 
     if (Subtarget->hasDotProd()) {
@@ -1666,8 +1666,8 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
       setOperationAction(ISD::BITCAST, VT, Custom);
 
     for (auto VT :
-         { MVT::nxv2i8, MVT::nxv2i16, MVT::nxv2i32, MVT::nxv2i64, MVT::nxv4i8,
-           MVT::nxv4i16, MVT::nxv4i32, MVT::nxv8i8, MVT::nxv8i16 })
+         {MVT::nxv2i8, MVT::nxv2i16, MVT::nxv2i32, MVT::nxv2i64, MVT::nxv4i8,
+          MVT::nxv4i16, MVT::nxv4i32, MVT::nxv8i8, MVT::nxv8i16})
       setOperationAction(ISD::SIGN_EXTEND_INREG, VT, Legal);
 
     // Promote predicate as counter load/stores to standard predicates.
@@ -1702,10 +1702,9 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
     }
 
     // NEON doesn't support masked loads/stores, but SME and SVE do.
-    for (auto VT :
-         {MVT::v4f16, MVT::v8f16, MVT::v2f32, MVT::v4f32, MVT::v1f64,
-          MVT::v2f64, MVT::v8i8, MVT::v16i8, MVT::v4i16, MVT::v8i16,
-          MVT::v2i32, MVT::v4i32, MVT::v1i64, MVT::v2i64}) {
+    for (auto VT : {MVT::v4f16, MVT::v8f16, MVT::v2f32, MVT::v4f32, MVT::v1f64,
+                    MVT::v2f64, MVT::v8i8, MVT::v16i8, MVT::v4i16, MVT::v8i16,
+                    MVT::v2i32, MVT::v4i32, MVT::v1i64, MVT::v2i64}) {
       setOperationAction(ISD::MLOAD, VT, Custom);
       setOperationAction(ISD::MSTORE, VT, Custom);
     }
@@ -1960,8 +1959,8 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
       setOperationAction(ISD::VECREDUCE_UMIN, MVT::v2i64, Custom);
 
       // Int operations with no NEON support.
-      for (auto VT : {MVT::v8i8, MVT::v16i8, MVT::v4i16, MVT::v8i16,
-                      MVT::v2i32, MVT::v4i32, MVT::v2i64}) {
+      for (auto VT : {MVT::v8i8, MVT::v16i8, MVT::v4i16, MVT::v8i16, MVT::v2i32,
+                      MVT::v4i32, MVT::v2i64}) {
         setOperationAction(ISD::BITREVERSE, VT, Custom);
         setOperationAction(ISD::CTTZ, VT, Custom);
         setOperationAction(ISD::VECREDUCE_AND, VT, Custom);
@@ -2241,8 +2240,7 @@ void AArch64TargetLowering::addTypeForNEON(MVT VT) {
 
   // F[MIN|MAX][NUM|NAN] and simple strict operations are available for all FP
   // NEON types.
-  if (VT.isFloatingPoint() &&
-      VT.getVectorElementType() != MVT::bf16 &&
+  if (VT.isFloatingPoint() && VT.getVectorElementType() != MVT::bf16 &&
       (VT.getVectorElementType() != MVT::f16 || Subtarget->hasFullFP16()))
     for (unsigned Opcode :
          {ISD::FMINIMUM, ISD::FMAXIMUM, ISD::FMINNUM, ISD::FMAXNUM,
@@ -2657,8 +2655,8 @@ static bool optimizeLogicalImm(SDValue Op, unsigned Size, uint64_t Imm,
   if (NewImm == 0 || NewImm == OrigMask) {
     New = TLO.DAG.getNode(Op.getOpcode(), DL, VT, Op.getOperand(0),
                           TLO.DAG.getConstant(NewImm, DL, VT));
-  // Otherwise, create a machine node so that target independent DAG combine
-  // doesn't undo this optimization.
+    // Otherwise, create a machine node so that target independent DAG combine
+    // doesn't undo this optimization.
   } else {
     Enc = AArch64_AM::encodeLogicalImmediate(NewImm, Size);
     SDValue EncConst = TLO.DAG.getTargetConstant(Enc, DL, VT);
@@ -2846,7 +2844,8 @@ void AArch64TargetLowering::computeKnownBitsForTargetNode(
     Intrinsic::ID IntID =
         static_cast<Intrinsic::ID>(Op->getConstantOperandVal(1));
     switch (IntID) {
-    default: return;
+    default:
+      return;
     case Intrinsic::aarch64_ldaxr:
     case Intrinsic::aarch64_ldxr: {
       unsigned BitWidth = Known.getBitWidth();
@@ -2868,7 +2867,7 @@ void AArch64TargetLowering::computeKnownBitsForTargetNode(
       MVT VT = Op.getOperand(1).getValueType().getSimpleVT();
       unsigned BitWidth = Known.getBitWidth();
       if (VT == MVT::v8i8 || VT == MVT::v16i8) {
-        unsigned Bound = (VT == MVT::v8i8) ?  11 : 12;
+        unsigned Bound = (VT == MVT::v8i8) ? 11 : 12;
         assert(BitWidth >= Bound && "Unexpected width!");
         APInt Mask = APInt::getHighBitsSet(BitWidth, BitWidth - Bound);
         Known.Zero |= Mask;
@@ -3055,8 +3054,9 @@ AArch64TargetLowering::EmitF128CSEL(MachineInstr &MI,
   return EndBB;
 }
 
-MachineBasicBlock *AArch64TargetLowering::EmitLoweredCatchRet(
-       MachineInstr &MI, MachineBasicBlock *BB) const {
+MachineBasicBlock *
+AArch64TargetLowering::EmitLoweredCatchRet(MachineInstr &MI,
+                                           MachineBasicBlock *BB) const {
   assert(!isAsynchronousEHPersonality(classifyEHPersonality(
              BB->getParent()->getFunction().getPersonalityFn())) &&
          "SEH does not use catchret!");
@@ -4095,8 +4095,8 @@ static bool canEmitConjunction(SelectionDAG &DAG, const SDValue Val,
 /// \p Negate is true if we want this sub-tree being negated just by changing
 /// SETCC conditions.
 static SDValue emitConjunctionRec(SelectionDAG &DAG, SDValue Val,
-    AArch64CC::CondCode &OutCC, bool Negate, SDValue CCOp,
-    AArch64CC::CondCode Predicate) {
+                                  AArch64CC::CondCode &OutCC, bool Negate,
+                                  SDValue CCOp, AArch64CC::CondCode Predicate) {
   // We're at a tree leaf, produce a conditional comparison operation.
   unsigned Opcode = Val->getOpcode();
   if (Opcode == ISD::SETCC) {
@@ -4498,10 +4498,9 @@ getAArch64XALUOOp(AArch64CC::CondCode &CC, SDValue Op, SelectionDAG &DAG) {
     } else {
       SDValue UpperBits = DAG.getNode(ISD::MULHU, DL, MVT::i64, LHS, RHS);
       SDVTList VTs = DAG.getVTList(MVT::i64, FlagsVT);
-      Overflow =
-          DAG.getNode(AArch64ISD::SUBS, DL, VTs,
-                      DAG.getConstant(0, DL, MVT::i64),
-                      UpperBits).getValue(1);
+      Overflow = DAG.getNode(AArch64ISD::SUBS, DL, VTs,
+                             DAG.getConstant(0, DL, MVT::i64), UpperBits)
+                     .getValue(1);
     }
     break;
   }
@@ -4740,10 +4739,10 @@ static SDValue LowerPREFETCH(SDValue Op, SelectionDAG &DAG) {
   }
 
   // built the mask value encoding the expected behavior.
-  unsigned PrfOp = (IsWrite << 4) |     // Load/Store bit
-                   (!IsData << 3) |     // IsDataCache bit
-                   (Locality << 1) |    // Cache level bits
-                   (unsigned)IsStream;  // Stream bit
+  unsigned PrfOp = (IsWrite << 4) |    // Load/Store bit
+                   (!IsData << 3) |    // IsDataCache bit
+                   (Locality << 1) |   // Cache level bits
+                   (unsigned)IsStream; // Stream bit
   return DAG.getNode(AArch64ISD::PREFETCH, DL, MVT::Other, Op.getOperand(0),
                      DAG.getTargetConstant(PrfOp, DL, MVT::i32),
                      Op.getOperand(1));
@@ -5177,7 +5176,8 @@ AArch64TargetLowering::LowerVectorFP_TO_INT_SAT(SDValue Op,
     SDValue MinC = DAG.getConstant(
         APInt::getSignedMaxValue(SatWidth).sext(SrcElementWidth), DL, IntVT);
     SDValue Min = DAG.getNode(ISD::SMIN, DL, IntVT, NativeCvt, MinC);
-    SDValue Min2 = SrcVal2 ? DAG.getNode(ISD::SMIN, DL, IntVT, NativeCvt2, MinC) : SDValue();
+    SDValue Min2 = SrcVal2 ? DAG.getNode(ISD::SMIN, DL, IntVT, NativeCvt2, MinC)
+                           : SDValue();
     SDValue MaxC = DAG.getConstant(
         APInt::getSignedMinValue(SatWidth).sext(SrcElementWidth), DL, IntVT);
     Sat = DAG.getNode(ISD::SMAX, DL, IntVT, Min, MaxC);
@@ -5186,7 +5186,8 @@ AArch64TargetLowering::LowerVectorFP_TO_INT_SAT(SDValue Op,
     SDValue MinC = DAG.getConstant(
         APInt::getAllOnes(SatWidth).zext(SrcElementWidth), DL, IntVT);
     Sat = DAG.getNode(ISD::UMIN, DL, IntVT, NativeCvt, MinC);
-    Sat2 = SrcVal2 ? DAG.getNode(ISD::UMIN, DL, IntVT, NativeCvt2, MinC) : SDValue();
+    Sat2 = SrcVal2 ? DAG.getNode(ISD::UMIN, DL, IntVT, NativeCvt2, MinC)
+                   : SDValue();
   }
 
   if (SrcVal2)
@@ -5258,8 +5259,8 @@ SDValue AArch64TargetLowering::LowerFP_TO_INT_SAT(SDValue Op,
         APInt::getSignedMinValue(SatWidth).sext(DstWidth), DL, DstVT);
     Sat = DAG.getNode(ISD::SMAX, DL, DstVT, Min, MaxC);
   } else {
-    SDValue MinC = DAG.getConstant(
-        APInt::getAllOnes(SatWidth).zext(DstWidth), DL, DstVT);
+    SDValue MinC =
+        DAG.getConstant(APInt::getAllOnes(SatWidth).zext(DstWidth), DL, DstVT);
     Sat = DAG.getNode(ISD::UMIN, DL, DstVT, NativeCvt, MinC);
   }
 
@@ -5406,7 +5407,7 @@ SDValue AArch64TargetLowering::LowerVectorINT_TO_FP(SDValue Op,
 }
 
 SDValue AArch64TargetLowering::LowerINT_TO_FP(SDValue Op,
-                                            SelectionDAG &DAG) const {
+                                              SelectionDAG &DAG) const {
   if (Op.getValueType().isVector())
     return LowerVectorINT_TO_FP(Op, DAG);
 
@@ -5723,8 +5724,8 @@ static bool isAddSubSExt(SDValue N, SelectionDAG &DAG) {
   if (Opcode == ISD::ADD || Opcode == ISD::SUB) {
     SDValue N0 = N.getOperand(0);
     SDValue N1 = N.getOperand(1);
-    return N0->hasOneUse() && N1->hasOneUse() &&
-      isSignExtended(N0, DAG) && isSignExtended(N1, DAG);
+    return N0->hasOneUse() && N1->hasOneUse() && isSignExtended(N0, DAG) &&
+           isSignExtended(N1, DAG);
   }
   return false;
 }
@@ -5734,8 +5735,8 @@ static bool isAddSubZExt(SDValue N, SelectionDAG &DAG) {
   if (Opcode == ISD::ADD || Opcode == ISD::SUB) {
     SDValue N0 = N.getOperand(0);
     SDValue N1 = N.getOperand(1);
-    return N0->hasOneUse() && N1->hasOneUse() &&
-      isZeroExtended(N0, DAG) && isZeroExtended(N1, DAG);
+    return N0->hasOneUse() && N1->hasOneUse() && isZeroExtended(N0, DAG) &&
+           isZeroExtended(N1, DAG);
   }
   return false;
 }
@@ -6419,12 +6420,14 @@ SDValue AArch64TargetLowering::LowerINTRINSIC_W_CHAIN(SDValue Op,
   }
 }
 
-SDValue AArch64TargetLowering::LowerINTRINSIC_WO_CHAIN(SDValue Op,
-                                                     SelectionDAG &DAG) const {
+SDValue
+AArch64TargetLowering::LowerINTRINSIC_WO_CHAIN(SDValue Op,
+                                               SelectionDAG &DAG) const {
   unsigned IntNo = Op.getConstantOperandVal(0);
   SDLoc DL(Op);
   switch (IntNo) {
-  default: return SDValue();    // Don't custom lower most intrinsics.
+  default:
+    return SDValue(); // Don't custom lower most intrinsics.
   case Intrinsic::thread_pointer: {
     E...
[truncated]

davemgreen

Can you remove all the formatting diffs - they make it impossible to review.

github-actions · 2026-03-08T08:30:47Z

🐧 Linux x64 Test Results

196501 tests passed
5348 tests skipped

✅ The build succeeded and all tests passed.

github-actions · 2026-03-08T08:30:47Z

🪟 Windows x64 Test Results

135592 tests passed
3415 tests skipped

✅ The build succeeded and all tests passed.

github-actions · 2026-03-11T12:55:12Z

✅ With the latest revision this PR passed the C/C++ code formatter.

davemgreen

It doesn't look like this is applying at the moment?

davemgreen · 2026-03-16T13:49:32Z

+      if (i < LaneElts - SlideAmt) {
+        // Data element: must be consecutive within lane
+        if (M != (int)(LaneStart + SlideAmt + i))
+          Valid = false;


I think Valid = false could do return 0;, as the two If's will be independat.

davemgreen · 2026-03-16T13:58:12Z

+; RUN: llc -mtriple=aarch64 < %s | FileCheck %s
+
+
+; repro  of gh issue


This comment can probably be removed as it doesn't add much. It isnt clean what the gh issue is from just this test.

davemgreen · 2026-03-16T14:02:55Z

+; CHECK-LABEL: slide_right_v16i8:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    movi v1.2d, #0000000000000000
+; CHECK-NEXT:    ext v0.16b, v0.16b, v1.16b, #15


Looks like this is going wrong now due to the swap.

davemgreen · 2026-03-16T14:03:53Z

+static bool isSlideWithZerosMask(ArrayRef<int> M, EVT VT, SDValue &V1,
+                                 SDValue &V2, unsigned &ShiftAmount,


Pass V1 and V2 by value not reference, and return the operand that is shifted?

davemgreen · 2026-03-16T14:06:13Z

+
+

Can be a single new line.

davemgreen · 2026-04-07T10:45:38Z

Can you update the existing tests?

dibrinsofor · 2026-06-05T18:54:16Z

Can you update the existing tests?

@davemgreen Please what do you mean? are you referring to the commuted tests? I have already implemented those.

davemgreen · 2026-06-06T06:25:52Z

Hello. I presume I meant because these are failing in the precommit CI and need updating:
LLVM.CodeGen/AArch64/ext-narrow-index.ll
LLVM.CodeGen/AArch64/fp-conversion-to-tbl.ll
If that is incorrect (or even if it is not), can you give it a rebase too?

dibrinsofor · 2026-06-08T20:42:41Z

~~the failing ci case seems to be unrelated.~~

...
sccache: error: Timed out waiting for server startup. Maybe the remote service is unreachable?
...

~~not sure how to proceed~~

all good now. @davemgreen

davemgreen

Thanks - It's been a while so I had to check again but this LGTM.

davemgreen · 2026-06-10T09:35:58Z

Are you happy for this to be merged in?

dibrinsofor · 2026-06-10T18:53:22Z

yes, i think it is good to go @davemgreen

…ructions

Signed-off-by: Dibri Nsofor <dibrinsofor@gmail.com>

dibrinsofor · 2026-06-10T19:22:46Z

@nasherm could you take a look?

davemgreen · 2026-06-12T15:54:52Z

I'll push this if there are no other comment.

github-actions · 2026-06-12T15:55:32Z

@dibrinsofor Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

…ructions (llvm#185170) We currently emit `movi`+`ext` instructions when generating code for shuffle slides of a 64-bit vector left/right and fill it with zeros. This patch optimizes these patterns to use a single `ushr`/`shl` instruction instead. Example: ```llvm define <8 x i8> @slide_left(<8 x i8> %v) { %r = shufflevector <8 x i8> %v, <8 x i8> zeroinitializer, <8 x i32> <i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8> ret <8 x i8> %r } ``` Before, we generate: ``` movi v1.2d, #0 ext v0.8b, v0.8b, v1.8b, #1 ``` Now: ``` ushr d0, d0, llvm#8 ``` Fixes: llvm#183398 Alive2 proof: https://alive2.llvm.org/ce/z/QaW5CQ --------- Signed-off-by: Dibri Nsofor <dibrinsofor@gmail.com>

llvmbot added the backend:AArch64 label Mar 7, 2026

davemgreen requested changes Mar 8, 2026

View reviewed changes

Comment thread llvm/test/CodeGen/AArch64/shuffle-slide-to-shift.ll

dibrinsofor requested a review from davemgreen March 11, 2026 00:32

davemgreen requested a review from nasherm March 11, 2026 13:10

davemgreen reviewed Mar 11, 2026

View reviewed changes

Comment thread llvm/lib/Target/AArch64/AArch64ISelLowering.cpp Outdated

Comment thread llvm/lib/Target/AArch64/AArch64ISelLowering.cpp Outdated

Comment thread llvm/lib/Target/AArch64/AArch64ISelLowering.cpp Outdated

Comment thread llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

davemgreen reviewed Mar 16, 2026

View reviewed changes

dibrinsofor force-pushed the shifts_ops_183398 branch 3 times, most recently from edd2125 to 61c8e0a Compare March 30, 2026 16:57

dibrinsofor requested a review from davemgreen April 1, 2026 22:58

dibrinsofor force-pushed the shifts_ops_183398 branch from 61c8e0a to 0986774 Compare June 8, 2026 00:15

davemgreen approved these changes Jun 10, 2026

View reviewed changes

dibrinsofor added 2 commits June 10, 2026 13:21

[AArch64] Optimize vector slide shuffles with zeros to use shift inst…

af46b22

…ructions

updating failing checks

81efc4a

Signed-off-by: Dibri Nsofor <dibrinsofor@gmail.com>

dibrinsofor force-pushed the shifts_ops_183398 branch from f74bc85 to 81efc4a Compare June 10, 2026 19:21

davemgreen merged commit 9bbed74 into llvm:main Jun 12, 2026
10 checks passed

		; RUN: llc -mtriple=aarch64 < %s \| FileCheck %s


		; repro of gh issue

		static bool isSlideWithZerosMask(ArrayRef<int> M, EVT VT, SDValue &V1,
		SDValue &V2, unsigned &ShiftAmount,

Conversation

dibrinsofor commented Mar 7, 2026

Uh oh!

github-actions Bot commented Mar 7, 2026

Uh oh!

llvmbot commented Mar 7, 2026

Uh oh!

davemgreen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions Bot commented Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🐧 Linux x64 Test Results

Uh oh!

github-actions Bot commented Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🪟 Windows x64 Test Results

Uh oh!

github-actions Bot commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

davemgreen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

davemgreen Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

davemgreen Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

davemgreen Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

davemgreen Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

davemgreen Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

davemgreen commented Apr 7, 2026

Uh oh!

dibrinsofor commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

davemgreen commented Jun 6, 2026

Uh oh!

dibrinsofor commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

davemgreen left a comment

Choose a reason for hiding this comment

Uh oh!

davemgreen commented Jun 10, 2026

Uh oh!

dibrinsofor commented Jun 10, 2026

Uh oh!

dibrinsofor commented Jun 10, 2026

Uh oh!

davemgreen commented Jun 12, 2026

Uh oh!

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions Bot commented Mar 8, 2026 •

edited

Loading

github-actions Bot commented Mar 8, 2026 •

edited

Loading

github-actions Bot commented Mar 11, 2026 •

edited

Loading

dibrinsofor commented Jun 5, 2026 •

edited

Loading

dibrinsofor commented Jun 8, 2026 •

edited

Loading