[AArch64] Optimize vector slide shuffles with zeros to use shift instructions#185170
Conversation
|
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
|
@llvm/pr-subscribers-backend-aarch64 Author: . (dibrinsofor) ChangesWe currently emit Example: define <8 x i8> @<!-- -->slide_left(<8 x i8> %v) {
%r = shufflevector <8 x i8> %v, <8 x i8> zeroinitializer,
<8 x i32> <i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8>
ret <8 x i8> %r
}Before, we generate: Now: Fixes: #183398 Patch is 97.09 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/185170.diff 2 Files Affected:
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index dc5a3736ecaa1..e625602771efe 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -120,20 +120,20 @@ cl::opt<bool> EnableAArch64ELFLocalDynamicTLSGeneration(
cl::init(false));
static cl::opt<bool>
-EnableOptimizeLogicalImm("aarch64-enable-logical-imm", cl::Hidden,
- cl::desc("Enable AArch64 logical imm instruction "
- "optimization"),
- cl::init(true));
+ EnableOptimizeLogicalImm("aarch64-enable-logical-imm", cl::Hidden,
+ cl::desc("Enable AArch64 logical imm instruction "
+ "optimization"),
+ cl::init(true));
// Temporary option added for the purpose of testing functionality added
// to DAGCombiner.cpp in D92230. It is expected that this can be removed
// in future when both implementations will be based off MGATHER rather
// than the GLD1 nodes added for the SVE gather load intrinsics.
static cl::opt<bool>
-EnableCombineMGatherIntrinsics("aarch64-enable-mgather-combine", cl::Hidden,
- cl::desc("Combine extends of AArch64 masked "
- "gather intrinsics"),
- cl::init(true));
+ EnableCombineMGatherIntrinsics("aarch64-enable-mgather-combine", cl::Hidden,
+ cl::desc("Combine extends of AArch64 masked "
+ "gather intrinsics"),
+ cl::init(true));
static cl::opt<bool> EnableExtToTBL("aarch64-enable-ext-to-tbl", cl::Hidden,
cl::desc("Combine ext and trunc to TBL"),
@@ -863,53 +863,53 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
// promote v4f16 to v4f32 when that is known to be safe.
auto V4Narrow = MVT::getVectorVT(ScalarVT, 4);
- setOperationPromotedToType(ISD::FADD, V4Narrow, MVT::v4f32);
- setOperationPromotedToType(ISD::FSUB, V4Narrow, MVT::v4f32);
- setOperationPromotedToType(ISD::FMUL, V4Narrow, MVT::v4f32);
- setOperationPromotedToType(ISD::FDIV, V4Narrow, MVT::v4f32);
- setOperationPromotedToType(ISD::FCEIL, V4Narrow, MVT::v4f32);
- setOperationPromotedToType(ISD::FFLOOR, V4Narrow, MVT::v4f32);
- setOperationPromotedToType(ISD::FROUND, V4Narrow, MVT::v4f32);
- setOperationPromotedToType(ISD::FTRUNC, V4Narrow, MVT::v4f32);
+ setOperationPromotedToType(ISD::FADD, V4Narrow, MVT::v4f32);
+ setOperationPromotedToType(ISD::FSUB, V4Narrow, MVT::v4f32);
+ setOperationPromotedToType(ISD::FMUL, V4Narrow, MVT::v4f32);
+ setOperationPromotedToType(ISD::FDIV, V4Narrow, MVT::v4f32);
+ setOperationPromotedToType(ISD::FCEIL, V4Narrow, MVT::v4f32);
+ setOperationPromotedToType(ISD::FFLOOR, V4Narrow, MVT::v4f32);
+ setOperationPromotedToType(ISD::FROUND, V4Narrow, MVT::v4f32);
+ setOperationPromotedToType(ISD::FTRUNC, V4Narrow, MVT::v4f32);
setOperationPromotedToType(ISD::FROUNDEVEN, V4Narrow, MVT::v4f32);
- setOperationPromotedToType(ISD::FRINT, V4Narrow, MVT::v4f32);
+ setOperationPromotedToType(ISD::FRINT, V4Narrow, MVT::v4f32);
setOperationPromotedToType(ISD::FNEARBYINT, V4Narrow, MVT::v4f32);
setOperationPromotedToType(ISD::FCANONICALIZE, V4Narrow, MVT::v4f32);
- setOperationPromotedToType(ISD::SETCC, V4Narrow, MVT::v4f32);
+ setOperationPromotedToType(ISD::SETCC, V4Narrow, MVT::v4f32);
- setOperationAction(ISD::FABS, V4Narrow, Legal);
- setOperationAction(ISD::FNEG, V4Narrow, Legal);
- setOperationAction(ISD::FMA, V4Narrow, Expand);
- setOperationAction(ISD::BR_CC, V4Narrow, Expand);
- setOperationAction(ISD::SELECT, V4Narrow, Expand);
- setOperationAction(ISD::SELECT_CC, V4Narrow, Expand);
- setOperationAction(ISD::FCOPYSIGN, V4Narrow, Custom);
- setOperationAction(ISD::FSQRT, V4Narrow, Expand);
+ setOperationAction(ISD::FABS, V4Narrow, Legal);
+ setOperationAction(ISD::FNEG, V4Narrow, Legal);
+ setOperationAction(ISD::FMA, V4Narrow, Expand);
+ setOperationAction(ISD::BR_CC, V4Narrow, Expand);
+ setOperationAction(ISD::SELECT, V4Narrow, Expand);
+ setOperationAction(ISD::SELECT_CC, V4Narrow, Expand);
+ setOperationAction(ISD::FCOPYSIGN, V4Narrow, Custom);
+ setOperationAction(ISD::FSQRT, V4Narrow, Expand);
auto V8Narrow = MVT::getVectorVT(ScalarVT, 8);
setOperationPromotedToType(ISD::FCANONICALIZE, V8Narrow, MVT::v8f32);
- setOperationPromotedToType(ISD::SETCC, V8Narrow, MVT::v8f32);
-
- setOperationAction(ISD::FABS, V8Narrow, Legal);
- setOperationAction(ISD::FADD, V8Narrow, Legal);
- setOperationAction(ISD::FCEIL, V8Narrow, Legal);
- setOperationAction(ISD::FCOPYSIGN, V8Narrow, Custom);
- setOperationAction(ISD::FDIV, V8Narrow, Legal);
- setOperationAction(ISD::FFLOOR, V8Narrow, Legal);
- setOperationAction(ISD::FMA, V8Narrow, Expand);
- setOperationAction(ISD::FMUL, V8Narrow, Legal);
- setOperationAction(ISD::FNEARBYINT, V8Narrow, Legal);
- setOperationAction(ISD::FNEG, V8Narrow, Legal);
- setOperationAction(ISD::FROUND, V8Narrow, Legal);
- setOperationAction(ISD::FROUNDEVEN, V8Narrow, Legal);
- setOperationAction(ISD::FRINT, V8Narrow, Legal);
- setOperationAction(ISD::FSQRT, V8Narrow, Expand);
- setOperationAction(ISD::FSUB, V8Narrow, Legal);
- setOperationAction(ISD::FTRUNC, V8Narrow, Legal);
- setOperationAction(ISD::BR_CC, V8Narrow, Expand);
- setOperationAction(ISD::SELECT, V8Narrow, Expand);
- setOperationAction(ISD::SELECT_CC, V8Narrow, Expand);
- setOperationAction(ISD::FP_EXTEND, V8Narrow, Expand);
+ setOperationPromotedToType(ISD::SETCC, V8Narrow, MVT::v8f32);
+
+ setOperationAction(ISD::FABS, V8Narrow, Legal);
+ setOperationAction(ISD::FADD, V8Narrow, Legal);
+ setOperationAction(ISD::FCEIL, V8Narrow, Legal);
+ setOperationAction(ISD::FCOPYSIGN, V8Narrow, Custom);
+ setOperationAction(ISD::FDIV, V8Narrow, Legal);
+ setOperationAction(ISD::FFLOOR, V8Narrow, Legal);
+ setOperationAction(ISD::FMA, V8Narrow, Expand);
+ setOperationAction(ISD::FMUL, V8Narrow, Legal);
+ setOperationAction(ISD::FNEARBYINT, V8Narrow, Legal);
+ setOperationAction(ISD::FNEG, V8Narrow, Legal);
+ setOperationAction(ISD::FROUND, V8Narrow, Legal);
+ setOperationAction(ISD::FROUNDEVEN, V8Narrow, Legal);
+ setOperationAction(ISD::FRINT, V8Narrow, Legal);
+ setOperationAction(ISD::FSQRT, V8Narrow, Expand);
+ setOperationAction(ISD::FSUB, V8Narrow, Legal);
+ setOperationAction(ISD::FTRUNC, V8Narrow, Legal);
+ setOperationAction(ISD::BR_CC, V8Narrow, Expand);
+ setOperationAction(ISD::SELECT, V8Narrow, Expand);
+ setOperationAction(ISD::SELECT_CC, V8Narrow, Expand);
+ setOperationAction(ISD::FP_EXTEND, V8Narrow, Expand);
};
if (!Subtarget->hasFullFP16()) {
@@ -1320,8 +1320,8 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
setOperationPromotedToType(ISD::UINT_TO_FP, MVT::v8i16, MVT::v8i32);
}
- setOperationAction(ISD::CTLZ, MVT::v1i64, Expand);
- setOperationAction(ISD::CTLZ, MVT::v2i64, Expand);
+ setOperationAction(ISD::CTLZ, MVT::v1i64, Expand);
+ setOperationAction(ISD::CTLZ, MVT::v2i64, Expand);
// CTLS (Count Leading Sign bits) - Legal for BHS types (8/16/32-bit
// elements) No hardware support for 64-bit element vectors
for (auto VT : {MVT::v8i8, MVT::v16i8, MVT::v4i16, MVT::v8i16, MVT::v2i32,
@@ -1349,8 +1349,8 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
setOperationAction(ISD::MUL, MVT::v1i64, Custom);
// Saturates
- for (MVT VT : { MVT::v8i8, MVT::v4i16, MVT::v2i32, MVT::v1i64,
- MVT::v16i8, MVT::v8i16, MVT::v4i32, MVT::v2i64 }) {
+ for (MVT VT : {MVT::v8i8, MVT::v4i16, MVT::v2i32, MVT::v1i64, MVT::v16i8,
+ MVT::v8i16, MVT::v4i32, MVT::v2i64}) {
setOperationAction(ISD::SADDSAT, VT, Legal);
setOperationAction(ISD::UADDSAT, VT, Legal);
setOperationAction(ISD::SSUBSAT, VT, Legal);
@@ -1368,8 +1368,8 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
}
// Vector reductions
- for (MVT VT : { MVT::v4f16, MVT::v2f32,
- MVT::v8f16, MVT::v4f32, MVT::v2f64 }) {
+ for (MVT VT :
+ {MVT::v4f16, MVT::v2f32, MVT::v8f16, MVT::v4f32, MVT::v2f64}) {
if (VT.getVectorElementType() != MVT::f16 || Subtarget->hasFullFP16()) {
setOperationAction(ISD::VECREDUCE_FMAX, VT, Legal);
setOperationAction(ISD::VECREDUCE_FMIN, VT, Legal);
@@ -1382,8 +1382,8 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
if (Subtarget->hasFullFP16())
setOperationAction(ISD::VECREDUCE_FADD, MVT::v2f16, Custom);
- for (MVT VT : { MVT::v8i8, MVT::v4i16, MVT::v2i32,
- MVT::v16i8, MVT::v8i16, MVT::v4i32 }) {
+ for (MVT VT : {MVT::v8i8, MVT::v4i16, MVT::v2i32, MVT::v16i8, MVT::v8i16,
+ MVT::v4i32}) {
setOperationAction(ISD::VECREDUCE_ADD, VT, Custom);
setOperationAction(ISD::VECREDUCE_SMAX, VT, Custom);
setOperationAction(ISD::VECREDUCE_SMIN, VT, Custom);
@@ -1483,10 +1483,10 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
setLoadExtAction(ISD::ZEXTLOAD, MVT::v2i64, MVT::v2i16, Custom);
// ADDP custom lowering
- for (MVT VT : { MVT::v32i8, MVT::v16i16, MVT::v8i32, MVT::v4i64 })
+ for (MVT VT : {MVT::v32i8, MVT::v16i16, MVT::v8i32, MVT::v4i64})
setOperationAction(ISD::ADD, VT, Custom);
// FADDP custom lowering
- for (MVT VT : { MVT::v16f16, MVT::v8f32, MVT::v4f64 })
+ for (MVT VT : {MVT::v16f16, MVT::v8f32, MVT::v4f64})
setOperationAction(ISD::FADD, VT, Custom);
if (Subtarget->hasDotProd()) {
@@ -1666,8 +1666,8 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
setOperationAction(ISD::BITCAST, VT, Custom);
for (auto VT :
- { MVT::nxv2i8, MVT::nxv2i16, MVT::nxv2i32, MVT::nxv2i64, MVT::nxv4i8,
- MVT::nxv4i16, MVT::nxv4i32, MVT::nxv8i8, MVT::nxv8i16 })
+ {MVT::nxv2i8, MVT::nxv2i16, MVT::nxv2i32, MVT::nxv2i64, MVT::nxv4i8,
+ MVT::nxv4i16, MVT::nxv4i32, MVT::nxv8i8, MVT::nxv8i16})
setOperationAction(ISD::SIGN_EXTEND_INREG, VT, Legal);
// Promote predicate as counter load/stores to standard predicates.
@@ -1702,10 +1702,9 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
}
// NEON doesn't support masked loads/stores, but SME and SVE do.
- for (auto VT :
- {MVT::v4f16, MVT::v8f16, MVT::v2f32, MVT::v4f32, MVT::v1f64,
- MVT::v2f64, MVT::v8i8, MVT::v16i8, MVT::v4i16, MVT::v8i16,
- MVT::v2i32, MVT::v4i32, MVT::v1i64, MVT::v2i64}) {
+ for (auto VT : {MVT::v4f16, MVT::v8f16, MVT::v2f32, MVT::v4f32, MVT::v1f64,
+ MVT::v2f64, MVT::v8i8, MVT::v16i8, MVT::v4i16, MVT::v8i16,
+ MVT::v2i32, MVT::v4i32, MVT::v1i64, MVT::v2i64}) {
setOperationAction(ISD::MLOAD, VT, Custom);
setOperationAction(ISD::MSTORE, VT, Custom);
}
@@ -1960,8 +1959,8 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
setOperationAction(ISD::VECREDUCE_UMIN, MVT::v2i64, Custom);
// Int operations with no NEON support.
- for (auto VT : {MVT::v8i8, MVT::v16i8, MVT::v4i16, MVT::v8i16,
- MVT::v2i32, MVT::v4i32, MVT::v2i64}) {
+ for (auto VT : {MVT::v8i8, MVT::v16i8, MVT::v4i16, MVT::v8i16, MVT::v2i32,
+ MVT::v4i32, MVT::v2i64}) {
setOperationAction(ISD::BITREVERSE, VT, Custom);
setOperationAction(ISD::CTTZ, VT, Custom);
setOperationAction(ISD::VECREDUCE_AND, VT, Custom);
@@ -2241,8 +2240,7 @@ void AArch64TargetLowering::addTypeForNEON(MVT VT) {
// F[MIN|MAX][NUM|NAN] and simple strict operations are available for all FP
// NEON types.
- if (VT.isFloatingPoint() &&
- VT.getVectorElementType() != MVT::bf16 &&
+ if (VT.isFloatingPoint() && VT.getVectorElementType() != MVT::bf16 &&
(VT.getVectorElementType() != MVT::f16 || Subtarget->hasFullFP16()))
for (unsigned Opcode :
{ISD::FMINIMUM, ISD::FMAXIMUM, ISD::FMINNUM, ISD::FMAXNUM,
@@ -2657,8 +2655,8 @@ static bool optimizeLogicalImm(SDValue Op, unsigned Size, uint64_t Imm,
if (NewImm == 0 || NewImm == OrigMask) {
New = TLO.DAG.getNode(Op.getOpcode(), DL, VT, Op.getOperand(0),
TLO.DAG.getConstant(NewImm, DL, VT));
- // Otherwise, create a machine node so that target independent DAG combine
- // doesn't undo this optimization.
+ // Otherwise, create a machine node so that target independent DAG combine
+ // doesn't undo this optimization.
} else {
Enc = AArch64_AM::encodeLogicalImmediate(NewImm, Size);
SDValue EncConst = TLO.DAG.getTargetConstant(Enc, DL, VT);
@@ -2846,7 +2844,8 @@ void AArch64TargetLowering::computeKnownBitsForTargetNode(
Intrinsic::ID IntID =
static_cast<Intrinsic::ID>(Op->getConstantOperandVal(1));
switch (IntID) {
- default: return;
+ default:
+ return;
case Intrinsic::aarch64_ldaxr:
case Intrinsic::aarch64_ldxr: {
unsigned BitWidth = Known.getBitWidth();
@@ -2868,7 +2867,7 @@ void AArch64TargetLowering::computeKnownBitsForTargetNode(
MVT VT = Op.getOperand(1).getValueType().getSimpleVT();
unsigned BitWidth = Known.getBitWidth();
if (VT == MVT::v8i8 || VT == MVT::v16i8) {
- unsigned Bound = (VT == MVT::v8i8) ? 11 : 12;
+ unsigned Bound = (VT == MVT::v8i8) ? 11 : 12;
assert(BitWidth >= Bound && "Unexpected width!");
APInt Mask = APInt::getHighBitsSet(BitWidth, BitWidth - Bound);
Known.Zero |= Mask;
@@ -3055,8 +3054,9 @@ AArch64TargetLowering::EmitF128CSEL(MachineInstr &MI,
return EndBB;
}
-MachineBasicBlock *AArch64TargetLowering::EmitLoweredCatchRet(
- MachineInstr &MI, MachineBasicBlock *BB) const {
+MachineBasicBlock *
+AArch64TargetLowering::EmitLoweredCatchRet(MachineInstr &MI,
+ MachineBasicBlock *BB) const {
assert(!isAsynchronousEHPersonality(classifyEHPersonality(
BB->getParent()->getFunction().getPersonalityFn())) &&
"SEH does not use catchret!");
@@ -4095,8 +4095,8 @@ static bool canEmitConjunction(SelectionDAG &DAG, const SDValue Val,
/// \p Negate is true if we want this sub-tree being negated just by changing
/// SETCC conditions.
static SDValue emitConjunctionRec(SelectionDAG &DAG, SDValue Val,
- AArch64CC::CondCode &OutCC, bool Negate, SDValue CCOp,
- AArch64CC::CondCode Predicate) {
+ AArch64CC::CondCode &OutCC, bool Negate,
+ SDValue CCOp, AArch64CC::CondCode Predicate) {
// We're at a tree leaf, produce a conditional comparison operation.
unsigned Opcode = Val->getOpcode();
if (Opcode == ISD::SETCC) {
@@ -4498,10 +4498,9 @@ getAArch64XALUOOp(AArch64CC::CondCode &CC, SDValue Op, SelectionDAG &DAG) {
} else {
SDValue UpperBits = DAG.getNode(ISD::MULHU, DL, MVT::i64, LHS, RHS);
SDVTList VTs = DAG.getVTList(MVT::i64, FlagsVT);
- Overflow =
- DAG.getNode(AArch64ISD::SUBS, DL, VTs,
- DAG.getConstant(0, DL, MVT::i64),
- UpperBits).getValue(1);
+ Overflow = DAG.getNode(AArch64ISD::SUBS, DL, VTs,
+ DAG.getConstant(0, DL, MVT::i64), UpperBits)
+ .getValue(1);
}
break;
}
@@ -4740,10 +4739,10 @@ static SDValue LowerPREFETCH(SDValue Op, SelectionDAG &DAG) {
}
// built the mask value encoding the expected behavior.
- unsigned PrfOp = (IsWrite << 4) | // Load/Store bit
- (!IsData << 3) | // IsDataCache bit
- (Locality << 1) | // Cache level bits
- (unsigned)IsStream; // Stream bit
+ unsigned PrfOp = (IsWrite << 4) | // Load/Store bit
+ (!IsData << 3) | // IsDataCache bit
+ (Locality << 1) | // Cache level bits
+ (unsigned)IsStream; // Stream bit
return DAG.getNode(AArch64ISD::PREFETCH, DL, MVT::Other, Op.getOperand(0),
DAG.getTargetConstant(PrfOp, DL, MVT::i32),
Op.getOperand(1));
@@ -5177,7 +5176,8 @@ AArch64TargetLowering::LowerVectorFP_TO_INT_SAT(SDValue Op,
SDValue MinC = DAG.getConstant(
APInt::getSignedMaxValue(SatWidth).sext(SrcElementWidth), DL, IntVT);
SDValue Min = DAG.getNode(ISD::SMIN, DL, IntVT, NativeCvt, MinC);
- SDValue Min2 = SrcVal2 ? DAG.getNode(ISD::SMIN, DL, IntVT, NativeCvt2, MinC) : SDValue();
+ SDValue Min2 = SrcVal2 ? DAG.getNode(ISD::SMIN, DL, IntVT, NativeCvt2, MinC)
+ : SDValue();
SDValue MaxC = DAG.getConstant(
APInt::getSignedMinValue(SatWidth).sext(SrcElementWidth), DL, IntVT);
Sat = DAG.getNode(ISD::SMAX, DL, IntVT, Min, MaxC);
@@ -5186,7 +5186,8 @@ AArch64TargetLowering::LowerVectorFP_TO_INT_SAT(SDValue Op,
SDValue MinC = DAG.getConstant(
APInt::getAllOnes(SatWidth).zext(SrcElementWidth), DL, IntVT);
Sat = DAG.getNode(ISD::UMIN, DL, IntVT, NativeCvt, MinC);
- Sat2 = SrcVal2 ? DAG.getNode(ISD::UMIN, DL, IntVT, NativeCvt2, MinC) : SDValue();
+ Sat2 = SrcVal2 ? DAG.getNode(ISD::UMIN, DL, IntVT, NativeCvt2, MinC)
+ : SDValue();
}
if (SrcVal2)
@@ -5258,8 +5259,8 @@ SDValue AArch64TargetLowering::LowerFP_TO_INT_SAT(SDValue Op,
APInt::getSignedMinValue(SatWidth).sext(DstWidth), DL, DstVT);
Sat = DAG.getNode(ISD::SMAX, DL, DstVT, Min, MaxC);
} else {
- SDValue MinC = DAG.getConstant(
- APInt::getAllOnes(SatWidth).zext(DstWidth), DL, DstVT);
+ SDValue MinC =
+ DAG.getConstant(APInt::getAllOnes(SatWidth).zext(DstWidth), DL, DstVT);
Sat = DAG.getNode(ISD::UMIN, DL, DstVT, NativeCvt, MinC);
}
@@ -5406,7 +5407,7 @@ SDValue AArch64TargetLowering::LowerVectorINT_TO_FP(SDValue Op,
}
SDValue AArch64TargetLowering::LowerINT_TO_FP(SDValue Op,
- SelectionDAG &DAG) const {
+ SelectionDAG &DAG) const {
if (Op.getValueType().isVector())
return LowerVectorINT_TO_FP(Op, DAG);
@@ -5723,8 +5724,8 @@ static bool isAddSubSExt(SDValue N, SelectionDAG &DAG) {
if (Opcode == ISD::ADD || Opcode == ISD::SUB) {
SDValue N0 = N.getOperand(0);
SDValue N1 = N.getOperand(1);
- return N0->hasOneUse() && N1->hasOneUse() &&
- isSignExtended(N0, DAG) && isSignExtended(N1, DAG);
+ return N0->hasOneUse() && N1->hasOneUse() && isSignExtended(N0, DAG) &&
+ isSignExtended(N1, DAG);
}
return false;
}
@@ -5734,8 +5735,8 @@ static bool isAddSubZExt(SDValue N, SelectionDAG &DAG) {
if (Opcode == ISD::ADD || Opcode == ISD::SUB) {
SDValue N0 = N.getOperand(0);
SDValue N1 = N.getOperand(1);
- return N0->hasOneUse() && N1->hasOneUse() &&
- isZeroExtended(N0, DAG) && isZeroExtended(N1, DAG);
+ return N0->hasOneUse() && N1->hasOneUse() && isZeroExtended(N0, DAG) &&
+ isZeroExtended(N1, DAG);
}
return false;
}
@@ -6419,12 +6420,14 @@ SDValue AArch64TargetLowering::LowerINTRINSIC_W_CHAIN(SDValue Op,
}
}
-SDValue AArch64TargetLowering::LowerINTRINSIC_WO_CHAIN(SDValue Op,
- SelectionDAG &DAG) const {
+SDValue
+AArch64TargetLowering::LowerINTRINSIC_WO_CHAIN(SDValue Op,
+ SelectionDAG &DAG) const {
unsigned IntNo = Op.getConstantOperandVal(0);
SDLoc DL(Op);
switch (IntNo) {
- default: return SDValue(); // Don't custom lower most intrinsics.
+ default:
+ return SDValue(); // Don't custom lower most intrinsics.
case Intrinsic::thread_pointer: {
E...
[truncated]
|
davemgreen
left a comment
There was a problem hiding this comment.
Can you remove all the formatting diffs - they make it impossible to review.
🐧 Linux x64 Test Results
✅ The build succeeded and all tests passed. |
🪟 Windows x64 Test Results
✅ The build succeeded and all tests passed. |
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
davemgreen
left a comment
There was a problem hiding this comment.
It doesn't look like this is applying at the moment?
| if (i < LaneElts - SlideAmt) { | ||
| // Data element: must be consecutive within lane | ||
| if (M != (int)(LaneStart + SlideAmt + i)) | ||
| Valid = false; |
There was a problem hiding this comment.
I think Valid = false could do return 0;, as the two If's will be independat.
| ; RUN: llc -mtriple=aarch64 < %s | FileCheck %s | ||
|
|
||
|
|
||
| ; repro of gh issue |
There was a problem hiding this comment.
This comment can probably be removed as it doesn't add much. It isnt clean what the gh issue is from just this test.
| ; CHECK-LABEL: slide_right_v16i8: | ||
| ; CHECK: // %bb.0: | ||
| ; CHECK-NEXT: movi v1.2d, #0000000000000000 | ||
| ; CHECK-NEXT: ext v0.16b, v0.16b, v1.16b, #15 |
There was a problem hiding this comment.
Looks like this is going wrong now due to the swap.
| static bool isSlideWithZerosMask(ArrayRef<int> M, EVT VT, SDValue &V1, | ||
| SDValue &V2, unsigned &ShiftAmount, |
There was a problem hiding this comment.
Pass V1 and V2 by value not reference, and return the operand that is shifted?
|
|
||
|
|
There was a problem hiding this comment.
Can be a single new line.
edd2125 to
61c8e0a
Compare
|
Can you update the existing tests? |
@davemgreen Please what do you mean? are you referring to the commuted tests? I have already implemented those. |
|
Hello. I presume I meant because these are failing in the precommit CI and need updating: |
61c8e0a to
0986774
Compare
|
all good now. @davemgreen |
davemgreen
left a comment
There was a problem hiding this comment.
Thanks - It's been a while so I had to check again but this LGTM.
|
Are you happy for this to be merged in? |
|
yes, i think it is good to go @davemgreen |
Signed-off-by: Dibri Nsofor <dibrinsofor@gmail.com>
f74bc85 to
81efc4a
Compare
|
@nasherm could you take a look? |
|
I'll push this if there are no other comment. |
|
@dibrinsofor Congratulations on having your first Pull Request (PR) merged into the LLVM Project! Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR. Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues. How to do this, and the rest of the post-merge process, is covered in detail here. If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again. If you don't get any reports, no action is required from you. Your changes are working as expected, well done! |
…ructions (llvm#185170) We currently emit `movi`+`ext` instructions when generating code for shuffle slides of a 64-bit vector left/right and fill it with zeros. This patch optimizes these patterns to use a single `ushr`/`shl` instruction instead. Example: ```llvm define <8 x i8> @slide_left(<8 x i8> %v) { %r = shufflevector <8 x i8> %v, <8 x i8> zeroinitializer, <8 x i32> <i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8> ret <8 x i8> %r } ``` Before, we generate: ``` movi v1.2d, #0 ext v0.8b, v0.8b, v1.8b, #1 ``` Now: ``` ushr d0, d0, llvm#8 ``` Fixes: llvm#183398 Alive2 proof: https://alive2.llvm.org/ce/z/QaW5CQ --------- Signed-off-by: Dibri Nsofor <dibrinsofor@gmail.com>
We currently emit
movi+extinstructions when generating code for shuffle slides of a 64-bit vector left/right and fill it with zeros. This patch optimizes these patterns to use a singleushr/shlinstruction instead.Example:
Before, we generate:
Now:
Fixes: #183398
Alive2 proof: https://alive2.llvm.org/ce/z/QaW5CQ