[SLP] Preserve profitable trees when subtree trimming would reduce to buildvector-only#197763
Conversation
Created using spr 1.3.7
|
@llvm/pr-subscribers-vectorizers @llvm/pr-subscribers-llvm-transforms Author: Alexey Bataev (alexey-bataev) ChangesIn calculateTreeCostAndTrimNonProfitable, the subtree trim loop returns Patch is 35.27 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/197763.diff 10 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
index 9592771917995..0f88605045abb 100644
--- a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+++ b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
@@ -18951,11 +18951,21 @@ InstructionCost BoUpSLP::calculateTreeCostAndTrimNonProfitable(
IsEqualCostAltShuffleToTrim()) {
PreferTrimmedTree |= TotalSubtreeCost == GatherCost;
// If the remaining tree is just a buildvector - exit, it will cause
- // endless attempts to vectorize.
+ // endless attempts to vectorize. When the tree is already profitable,
+ // skip trimming this node and let the post-loop logic (including
+ // gathered loads processing) decide.
if (VectorizableTree.front()->hasState() &&
VectorizableTree.front()->getOpcode() == Instruction::InsertElement &&
- TE->Idx == 1)
+ TE->Idx == 1) {
+ if (Cost < -SLPCostThreshold) {
+ LLVM_DEBUG(dbgs() << "SLP: Skipping trim of node " << TE->Idx
+ << " - tree already profitable with cost " << Cost
+ << ".\n");
+ Worklist.pop();
+ continue;
+ }
return InstructionCost::getInvalid();
+ }
LLVM_DEBUG(dbgs() << "SLP: Trimming unprofitable subtree at node "
<< TE->Idx << " with cost "
diff --git a/llvm/test/Transforms/SLPVectorizer/X86/alternate-int-inseltpoison.ll b/llvm/test/Transforms/SLPVectorizer/X86/alternate-int-inseltpoison.ll
index 9b45fe6a2804b..174be55bd7b0a 100644
--- a/llvm/test/Transforms/SLPVectorizer/X86/alternate-int-inseltpoison.ll
+++ b/llvm/test/Transforms/SLPVectorizer/X86/alternate-int-inseltpoison.ll
@@ -282,26 +282,23 @@ define <8 x i32> @ashr_shl_v8i32_const(<8 x i32> %a) {
define <8 x i32> @ashr_lshr_shl_v8i32(<8 x i32> %a, <8 x i32> %b) {
; SSE-LABEL: @ashr_lshr_shl_v8i32(
-; SSE-NEXT: [[A4:%.*]] = extractelement <8 x i32> [[A:%.*]], i32 4
-; SSE-NEXT: [[A5:%.*]] = extractelement <8 x i32> [[A]], i32 5
-; SSE-NEXT: [[A6:%.*]] = extractelement <8 x i32> [[A]], i32 6
+; SSE-NEXT: [[A6:%.*]] = extractelement <8 x i32> [[A:%.*]], i32 6
; SSE-NEXT: [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7
-; SSE-NEXT: [[B4:%.*]] = extractelement <8 x i32> [[B:%.*]], i32 4
-; SSE-NEXT: [[B5:%.*]] = extractelement <8 x i32> [[B]], i32 5
-; SSE-NEXT: [[B6:%.*]] = extractelement <8 x i32> [[B]], i32 6
+; SSE-NEXT: [[B6:%.*]] = extractelement <8 x i32> [[B:%.*]], i32 6
; SSE-NEXT: [[B7:%.*]] = extractelement <8 x i32> [[B]], i32 7
; SSE-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; SSE-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; SSE-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]
; SSE-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]
; SSE-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
-; SSE-NEXT: [[AB4:%.*]] = lshr i32 [[A4]], [[B4]]
-; SSE-NEXT: [[AB5:%.*]] = lshr i32 [[A5]], [[B5]]
+; SSE-NEXT: [[TMP6:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> poison, <2 x i32> <i32 4, i32 5>
+; SSE-NEXT: [[TMP7:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> poison, <2 x i32> <i32 4, i32 5>
+; SSE-NEXT: [[TMP9:%.*]] = lshr <2 x i32> [[TMP6]], [[TMP7]]
; SSE-NEXT: [[AB6:%.*]] = shl i32 [[A6]], [[B6]]
; SSE-NEXT: [[AB7:%.*]] = shl i32 [[A7]], [[B7]]
; SSE-NEXT: [[TMP8:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 poison, i32 poison, i32 poison, i32 poison>
-; SSE-NEXT: [[R4:%.*]] = insertelement <8 x i32> [[TMP8]], i32 [[AB4]], i32 4
-; SSE-NEXT: [[R51:%.*]] = insertelement <8 x i32> [[R4]], i32 [[AB5]], i32 5
+; SSE-NEXT: [[TMP10:%.*]] = shufflevector <2 x i32> [[TMP9]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; SSE-NEXT: [[R51:%.*]] = shufflevector <8 x i32> [[TMP8]], <8 x i32> [[TMP10]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 6, i32 7>
; SSE-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R51]], i32 [[AB6]], i32 6
; SSE-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7
; SSE-NEXT: ret <8 x i32> [[R7]]
diff --git a/llvm/test/Transforms/SLPVectorizer/X86/alternate-int.ll b/llvm/test/Transforms/SLPVectorizer/X86/alternate-int.ll
index d812cc813c20f..8a5e278e2b03a 100644
--- a/llvm/test/Transforms/SLPVectorizer/X86/alternate-int.ll
+++ b/llvm/test/Transforms/SLPVectorizer/X86/alternate-int.ll
@@ -282,26 +282,23 @@ define <8 x i32> @ashr_shl_v8i32_const(<8 x i32> %a) {
define <8 x i32> @ashr_lshr_shl_v8i32(<8 x i32> %a, <8 x i32> %b) {
; SSE-LABEL: @ashr_lshr_shl_v8i32(
-; SSE-NEXT: [[A4:%.*]] = extractelement <8 x i32> [[A:%.*]], i32 4
-; SSE-NEXT: [[A5:%.*]] = extractelement <8 x i32> [[A]], i32 5
-; SSE-NEXT: [[A6:%.*]] = extractelement <8 x i32> [[A]], i32 6
+; SSE-NEXT: [[A6:%.*]] = extractelement <8 x i32> [[A:%.*]], i32 6
; SSE-NEXT: [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7
-; SSE-NEXT: [[B4:%.*]] = extractelement <8 x i32> [[B:%.*]], i32 4
-; SSE-NEXT: [[B5:%.*]] = extractelement <8 x i32> [[B]], i32 5
-; SSE-NEXT: [[B6:%.*]] = extractelement <8 x i32> [[B]], i32 6
+; SSE-NEXT: [[B6:%.*]] = extractelement <8 x i32> [[B:%.*]], i32 6
; SSE-NEXT: [[B7:%.*]] = extractelement <8 x i32> [[B]], i32 7
; SSE-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; SSE-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; SSE-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]
; SSE-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]
; SSE-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
-; SSE-NEXT: [[AB4:%.*]] = lshr i32 [[A4]], [[B4]]
-; SSE-NEXT: [[AB5:%.*]] = lshr i32 [[A5]], [[B5]]
+; SSE-NEXT: [[TMP6:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> poison, <2 x i32> <i32 4, i32 5>
+; SSE-NEXT: [[TMP7:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> poison, <2 x i32> <i32 4, i32 5>
+; SSE-NEXT: [[TMP9:%.*]] = lshr <2 x i32> [[TMP6]], [[TMP7]]
; SSE-NEXT: [[AB6:%.*]] = shl i32 [[A6]], [[B6]]
; SSE-NEXT: [[AB7:%.*]] = shl i32 [[A7]], [[B7]]
; SSE-NEXT: [[TMP8:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 poison, i32 poison, i32 poison, i32 poison>
-; SSE-NEXT: [[R4:%.*]] = insertelement <8 x i32> [[TMP8]], i32 [[AB4]], i32 4
-; SSE-NEXT: [[R51:%.*]] = insertelement <8 x i32> [[R4]], i32 [[AB5]], i32 5
+; SSE-NEXT: [[TMP10:%.*]] = shufflevector <2 x i32> [[TMP9]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; SSE-NEXT: [[R51:%.*]] = shufflevector <8 x i32> [[TMP8]], <8 x i32> [[TMP10]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 6, i32 7>
; SSE-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R51]], i32 [[AB6]], i32 6
; SSE-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7
; SSE-NEXT: ret <8 x i32> [[R7]]
diff --git a/llvm/test/Transforms/SLPVectorizer/X86/arith-fp-inseltpoison.ll b/llvm/test/Transforms/SLPVectorizer/X86/arith-fp-inseltpoison.ll
index 57deca1d62516..7f7e77eadc987 100644
--- a/llvm/test/Transforms/SLPVectorizer/X86/arith-fp-inseltpoison.ll
+++ b/llvm/test/Transforms/SLPVectorizer/X86/arith-fp-inseltpoison.ll
@@ -607,38 +607,25 @@ define <8 x double> @buildvector_div_8f64(<8 x double> %a, <8 x double> %b) {
; SSE-NEXT: ret <8 x double> [[TMP1]]
;
; SLM-LABEL: @buildvector_div_8f64(
-; SLM-NEXT: [[A0:%.*]] = extractelement <8 x double> [[A:%.*]], i32 0
-; SLM-NEXT: [[A1:%.*]] = extractelement <8 x double> [[A]], i32 1
-; SLM-NEXT: [[A2:%.*]] = extractelement <8 x double> [[A]], i32 2
-; SLM-NEXT: [[A3:%.*]] = extractelement <8 x double> [[A]], i32 3
-; SLM-NEXT: [[A4:%.*]] = extractelement <8 x double> [[A]], i32 4
-; SLM-NEXT: [[A5:%.*]] = extractelement <8 x double> [[A]], i32 5
-; SLM-NEXT: [[A6:%.*]] = extractelement <8 x double> [[A]], i32 6
-; SLM-NEXT: [[A7:%.*]] = extractelement <8 x double> [[A]], i32 7
-; SLM-NEXT: [[B0:%.*]] = extractelement <8 x double> [[B:%.*]], i32 0
-; SLM-NEXT: [[B1:%.*]] = extractelement <8 x double> [[B]], i32 1
-; SLM-NEXT: [[B2:%.*]] = extractelement <8 x double> [[B]], i32 2
-; SLM-NEXT: [[B3:%.*]] = extractelement <8 x double> [[B]], i32 3
-; SLM-NEXT: [[B4:%.*]] = extractelement <8 x double> [[B]], i32 4
-; SLM-NEXT: [[B5:%.*]] = extractelement <8 x double> [[B]], i32 5
-; SLM-NEXT: [[B6:%.*]] = extractelement <8 x double> [[B]], i32 6
-; SLM-NEXT: [[B7:%.*]] = extractelement <8 x double> [[B]], i32 7
-; SLM-NEXT: [[C0:%.*]] = fdiv double [[A0]], [[B0]]
-; SLM-NEXT: [[C1:%.*]] = fdiv double [[A1]], [[B1]]
-; SLM-NEXT: [[C2:%.*]] = fdiv double [[A2]], [[B2]]
-; SLM-NEXT: [[C3:%.*]] = fdiv double [[A3]], [[B3]]
-; SLM-NEXT: [[C4:%.*]] = fdiv double [[A4]], [[B4]]
-; SLM-NEXT: [[C5:%.*]] = fdiv double [[A5]], [[B5]]
-; SLM-NEXT: [[C6:%.*]] = fdiv double [[A6]], [[B6]]
-; SLM-NEXT: [[C7:%.*]] = fdiv double [[A7]], [[B7]]
-; SLM-NEXT: [[R0:%.*]] = insertelement <8 x double> poison, double [[C0]], i32 0
-; SLM-NEXT: [[R1:%.*]] = insertelement <8 x double> [[R0]], double [[C1]], i32 1
-; SLM-NEXT: [[R2:%.*]] = insertelement <8 x double> [[R1]], double [[C2]], i32 2
-; SLM-NEXT: [[R3:%.*]] = insertelement <8 x double> [[R2]], double [[C3]], i32 3
-; SLM-NEXT: [[R4:%.*]] = insertelement <8 x double> [[R3]], double [[C4]], i32 4
-; SLM-NEXT: [[R5:%.*]] = insertelement <8 x double> [[R4]], double [[C5]], i32 5
-; SLM-NEXT: [[R6:%.*]] = insertelement <8 x double> [[R5]], double [[C6]], i32 6
-; SLM-NEXT: [[R73:%.*]] = insertelement <8 x double> [[R6]], double [[C7]], i32 7
+; SLM-NEXT: [[TMP1:%.*]] = shufflevector <8 x double> [[A:%.*]], <8 x double> poison, <2 x i32> <i32 0, i32 1>
+; SLM-NEXT: [[TMP2:%.*]] = shufflevector <8 x double> [[B:%.*]], <8 x double> poison, <2 x i32> <i32 0, i32 1>
+; SLM-NEXT: [[TMP3:%.*]] = fdiv <2 x double> [[TMP1]], [[TMP2]]
+; SLM-NEXT: [[TMP4:%.*]] = shufflevector <8 x double> [[A]], <8 x double> poison, <2 x i32> <i32 2, i32 3>
+; SLM-NEXT: [[TMP5:%.*]] = shufflevector <8 x double> [[B]], <8 x double> poison, <2 x i32> <i32 2, i32 3>
+; SLM-NEXT: [[TMP6:%.*]] = fdiv <2 x double> [[TMP4]], [[TMP5]]
+; SLM-NEXT: [[TMP7:%.*]] = shufflevector <8 x double> [[A]], <8 x double> poison, <2 x i32> <i32 4, i32 5>
+; SLM-NEXT: [[TMP8:%.*]] = shufflevector <8 x double> [[B]], <8 x double> poison, <2 x i32> <i32 4, i32 5>
+; SLM-NEXT: [[TMP9:%.*]] = fdiv <2 x double> [[TMP7]], [[TMP8]]
+; SLM-NEXT: [[TMP10:%.*]] = shufflevector <8 x double> [[A]], <8 x double> poison, <2 x i32> <i32 6, i32 7>
+; SLM-NEXT: [[TMP11:%.*]] = shufflevector <8 x double> [[B]], <8 x double> poison, <2 x i32> <i32 6, i32 7>
+; SLM-NEXT: [[TMP12:%.*]] = fdiv <2 x double> [[TMP10]], [[TMP11]]
+; SLM-NEXT: [[TMP13:%.*]] = shufflevector <2 x double> [[TMP3]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; SLM-NEXT: [[TMP14:%.*]] = shufflevector <2 x double> [[TMP6]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; SLM-NEXT: [[R31:%.*]] = shufflevector <8 x double> [[TMP13]], <8 x double> [[TMP14]], <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 4, i32 5, i32 6, i32 7>
+; SLM-NEXT: [[TMP15:%.*]] = shufflevector <2 x double> [[TMP9]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; SLM-NEXT: [[R52:%.*]] = shufflevector <8 x double> [[R31]], <8 x double> [[TMP15]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 6, i32 7>
+; SLM-NEXT: [[TMP16:%.*]] = shufflevector <2 x double> [[TMP12]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; SLM-NEXT: [[R73:%.*]] = shufflevector <8 x double> [[R52]], <8 x double> [[TMP16]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
; SLM-NEXT: ret <8 x double> [[R73]]
;
; AVX-LABEL: @buildvector_div_8f64(
diff --git a/llvm/test/Transforms/SLPVectorizer/X86/arith-fp.ll b/llvm/test/Transforms/SLPVectorizer/X86/arith-fp.ll
index d1a5c3bb032e0..8b8bc71c2ceda 100644
--- a/llvm/test/Transforms/SLPVectorizer/X86/arith-fp.ll
+++ b/llvm/test/Transforms/SLPVectorizer/X86/arith-fp.ll
@@ -607,38 +607,25 @@ define <8 x double> @buildvector_div_8f64(<8 x double> %a, <8 x double> %b) {
; SSE-NEXT: ret <8 x double> [[TMP1]]
;
; SLM-LABEL: @buildvector_div_8f64(
-; SLM-NEXT: [[A0:%.*]] = extractelement <8 x double> [[A:%.*]], i32 0
-; SLM-NEXT: [[A1:%.*]] = extractelement <8 x double> [[A]], i32 1
-; SLM-NEXT: [[A2:%.*]] = extractelement <8 x double> [[A]], i32 2
-; SLM-NEXT: [[A3:%.*]] = extractelement <8 x double> [[A]], i32 3
-; SLM-NEXT: [[A4:%.*]] = extractelement <8 x double> [[A]], i32 4
-; SLM-NEXT: [[A5:%.*]] = extractelement <8 x double> [[A]], i32 5
-; SLM-NEXT: [[A6:%.*]] = extractelement <8 x double> [[A]], i32 6
-; SLM-NEXT: [[A7:%.*]] = extractelement <8 x double> [[A]], i32 7
-; SLM-NEXT: [[B0:%.*]] = extractelement <8 x double> [[B:%.*]], i32 0
-; SLM-NEXT: [[B1:%.*]] = extractelement <8 x double> [[B]], i32 1
-; SLM-NEXT: [[B2:%.*]] = extractelement <8 x double> [[B]], i32 2
-; SLM-NEXT: [[B3:%.*]] = extractelement <8 x double> [[B]], i32 3
-; SLM-NEXT: [[B4:%.*]] = extractelement <8 x double> [[B]], i32 4
-; SLM-NEXT: [[B5:%.*]] = extractelement <8 x double> [[B]], i32 5
-; SLM-NEXT: [[B6:%.*]] = extractelement <8 x double> [[B]], i32 6
-; SLM-NEXT: [[B7:%.*]] = extractelement <8 x double> [[B]], i32 7
-; SLM-NEXT: [[C0:%.*]] = fdiv double [[A0]], [[B0]]
-; SLM-NEXT: [[C1:%.*]] = fdiv double [[A1]], [[B1]]
-; SLM-NEXT: [[C2:%.*]] = fdiv double [[A2]], [[B2]]
-; SLM-NEXT: [[C3:%.*]] = fdiv double [[A3]], [[B3]]
-; SLM-NEXT: [[C4:%.*]] = fdiv double [[A4]], [[B4]]
-; SLM-NEXT: [[C5:%.*]] = fdiv double [[A5]], [[B5]]
-; SLM-NEXT: [[C6:%.*]] = fdiv double [[A6]], [[B6]]
-; SLM-NEXT: [[C7:%.*]] = fdiv double [[A7]], [[B7]]
-; SLM-NEXT: [[R0:%.*]] = insertelement <8 x double> undef, double [[C0]], i32 0
-; SLM-NEXT: [[R1:%.*]] = insertelement <8 x double> [[R0]], double [[C1]], i32 1
-; SLM-NEXT: [[R2:%.*]] = insertelement <8 x double> [[R1]], double [[C2]], i32 2
-; SLM-NEXT: [[R3:%.*]] = insertelement <8 x double> [[R2]], double [[C3]], i32 3
-; SLM-NEXT: [[R4:%.*]] = insertelement <8 x double> [[R3]], double [[C4]], i32 4
-; SLM-NEXT: [[R5:%.*]] = insertelement <8 x double> [[R4]], double [[C5]], i32 5
-; SLM-NEXT: [[R6:%.*]] = insertelement <8 x double> [[R5]], double [[C6]], i32 6
-; SLM-NEXT: [[R73:%.*]] = insertelement <8 x double> [[R6]], double [[C7]], i32 7
+; SLM-NEXT: [[TMP1:%.*]] = shufflevector <8 x double> [[A:%.*]], <8 x double> poison, <2 x i32> <i32 0, i32 1>
+; SLM-NEXT: [[TMP2:%.*]] = shufflevector <8 x double> [[B:%.*]], <8 x double> poison, <2 x i32> <i32 0, i32 1>
+; SLM-NEXT: [[TMP3:%.*]] = fdiv <2 x double> [[TMP1]], [[TMP2]]
+; SLM-NEXT: [[TMP4:%.*]] = shufflevector <8 x double> [[A]], <8 x double> poison, <2 x i32> <i32 2, i32 3>
+; SLM-NEXT: [[TMP5:%.*]] = shufflevector <8 x double> [[B]], <8 x double> poison, <2 x i32> <i32 2, i32 3>
+; SLM-NEXT: [[TMP6:%.*]] = fdiv <2 x double> [[TMP4]], [[TMP5]]
+; SLM-NEXT: [[TMP7:%.*]] = shufflevector <8 x double> [[A]], <8 x double> poison, <2 x i32> <i32 4, i32 5>
+; SLM-NEXT: [[TMP8:%.*]] = shufflevector <8 x double> [[B]], <8 x double> poison, <2 x i32> <i32 4, i32 5>
+; SLM-NEXT: [[TMP9:%.*]] = fdiv <2 x double> [[TMP7]], [[TMP8]]
+; SLM-NEXT: [[TMP10:%.*]] = shufflevector <8 x double> [[A]], <8 x double> poison, <2 x i32> <i32 6, i32 7>
+; SLM-NEXT: [[TMP11:%.*]] = shufflevector <8 x double> [[B]], <8 x double> poison, <2 x i32> <i32 6, i32 7>
+; SLM-NEXT: [[TMP12:%.*]] = fdiv <2 x double> [[TMP10]], [[TMP11]]
+; SLM-NEXT: [[TMP13:%.*]] = shufflevector <2 x double> [[TMP3]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; SLM-NEXT: [[TMP14:%.*]] = shufflevector <2 x double> [[TMP6]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; SLM-NEXT: [[R31:%.*]] = shufflevector <8 x double> [[TMP13]], <8 x double> [[TMP14]], <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 4, i32 5, i32 6, i32 7>
+; SLM-NEXT: [[TMP15:%.*]] = shufflevector <2 x double> [[TMP9]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; SLM-NEXT: [[R52:%.*]] = shufflevector <8 x double> [[R31]], <8 x double> [[TMP15]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 6, i32 7>
+; SLM-NEXT: [[TMP16:%.*]] = shufflevector <2 x double> [[TMP12]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; SLM-NEXT: [[R73:%.*]] = shufflevector <8 x double> [[R52]], <8 x double> [[TMP16]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
; SLM-NEXT: ret <8 x double> [[R73]]
;
; AVX-LABEL: @buildvector_div_8f64(
diff --git a/llvm/test/Transforms/SLPVectorizer/X86/deleted-instructions-clear.ll b/llvm/test/Transforms/SLPVectorizer/X86/deleted-instructions-clear.ll
index 1bb24c524bb3e..042f5cb3f512b 100644
--- a/llvm/test/Transforms/SLPVectorizer/X86/deleted-instructions-clear.ll
+++ b/llvm/test/Transforms/SLPVectorizer/X86/deleted-instructions-clear.ll
@@ -5,8 +5,7 @@ define void @test(i32 %arg, i32 %arg1, i64 %arg2) {
; CHECK-LABEL: define void @test(
; CHECK-SAME: i32 [[ARG:%.*]], i32 [[ARG1:%.*]], i64 [[ARG2:%.*]]) #[[ATTR0:[0-9]+]] {
; CHECK-NEXT: [[BB:.*]]:
-; CHECK-NEXT: [[TMP0:%.*]] = insertelement <4 x i32> <i32 0, i32 0, i32 poison, i32 poison>, i32 [[ARG]], i32 2
-; CHECK-NEXT: [[TMP11:%.*]] = insertelement <4 x i32> [[TMP0]], i32 [[ARG1]], i32 3
+; CHECK-NEXT: [[TMP11:%.*]] = insertelement <4 x i32> <i32 0, i32 0, i32 0, i32 poison>, i32 [[ARG]], i32 3
; CHECK-NEXT: br label %[[BB3:.*]]
; CHECK: [[BB3]]:
; CHECK-NEXT: [[TMP3:%.*]] = phi i64 [ 0, %[[BB3]] ], [ 0, %[[BB]] ]
@@ -21,7 +20,6 @@ define void @test(i32 %arg, i32 %arg1, i64 %arg2) {
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x i32> [[TMP5]], i32 0
; CHECK-NEXT: [[TMP10:%.*]] = zext i32 [[TMP4]] to i64
; CHECK-NEXT: [[TRUNC10:%.*]] = trunc i64 [[TMP10]] to i32
-; CHECK-NEXT: [[SHL:%.*]] = shl i32 0, 1
; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x i32> [[TMP5]], i32 2
; CHECK-NEXT: [[TMP7:%.*]] = zext i32 [[TMP6]] to i64
; CHECK-NEXT: [[TRUNC27:%.*]] = trunc i64 [[TMP7]] to i32
@@ -30,14 +28,15 @@ define void @test(i32 %arg, i32 %arg1, i64 %arg2) {
; CHECK-NEXT: [[TMP9:%.*]] = mul <4 x i32> [[TMP5]], [[TMP8]]
; CHECK-NEXT: [[XOR38:%.*]] = xor i32 [[ARG]], [[TRUNC28]]
; CHECK-NEXT: [[SHL35:%.*]] = shl i32 [[ARG1]], 0
-; CHECK-NEXT: [[TMP12:%.*]] = insertelement <4 x i32> poison, i32 [[TRUNC10]], i32 0
-; CHECK-NEXT: [[TMP13:%.*]] = insertelement <4 x i32> [[TMP12]], i32 [[SHL]], i32 1
-; CHECK-NEXT: [[TMP24:%.*]] = insertelement <4 x i32> [[TMP13]], i32 [[TRUNC27]], i32 2
-; CHECK-NEXT: [[TMP25:%.*]] = insertelement <4 x i32> [[TMP24]], i32 [[TRUNC19]], i32 3
+; CHECK-NEXT: [[XOR31:%.*]] = xor i32 [[ARG1]], [[TRUNC19]]
+; CHECK-NEXT: [[SHL:%.*]] = shl i32 0, 1
+; CHECK-NEXT: [[TMP23:%.*]] = insertelement <4 x i32> poison, i32 [[SHL]], i32 0
+; CHECK-NEXT: [[TMP12:%.*]] = insertelement <4 x i32> [[TMP23]], i32 [[TRUNC10]], i32 1
+; CHECK-NEXT: [[TMP13:%.*]] = insertelement <4 x i32> [[TMP12]], i32 [[TRUNC27]], i32 3
+; CHECK-NEXT: [[TMP25:%.*]] = shuffle...
[truncated]
|
You can test this locally with the following command:git diff -U0 --pickaxe-regex -S '([^a-zA-Z0-9#_-]undef([^a-zA-Z0-9_-]|$)|UndefValue::get)' 'HEAD~1' HEAD llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp llvm/test/Transforms/SLPVectorizer/X86/alternate-int-inseltpoison.ll llvm/test/Transforms/SLPVectorizer/X86/alternate-int.ll llvm/test/Transforms/SLPVectorizer/X86/arith-fp-inseltpoison.ll llvm/test/Transforms/SLPVectorizer/X86/arith-fp.ll llvm/test/Transforms/SLPVectorizer/X86/deleted-instructions-clear.ll llvm/test/Transforms/SLPVectorizer/X86/scalarize-ctlz.ll llvm/test/Transforms/SLPVectorizer/X86/select-copyable-cmp-poison.ll llvm/test/Transforms/SLPVectorizer/X86/trunced-buildvector-scalar-extended.ll llvm/test/Transforms/SLPVectorizer/X86/vectorize-widest-phis.llThe following files introduce new uses of undef:
Undef is now deprecated and should only be used in the rare cases where no replacement is possible. For example, a load of uninitialized memory yields In tests, avoid using For example, this is considered a bad practice: define void @fn() {
...
br i1 undef, ...
}Please use the following instead: define void @fn(i1 %cond) {
...
br i1 %cond, ...
}Please refer to the Undefined Behavior Manual for more information. |
…d reduce to buildvector-only In calculateTreeCostAndTrimNonProfitable, the subtree trim loop returns Invalid when trimming node Idx==1 under an InsertElement root would leave only a buildvector, to avoid infinite vectorization attempts. This is too aggressive when the original untrimmed tree is already profitable (Cost < -SLPCostThreshold). In that case, undo any partial trims and return the original cost instead of rejecting the tree. Reviewers: RKSimon, hiraditya, bababuck Pull Request: llvm/llvm-project#197763
…d reduce to buildvector-only In calculateTreeCostAndTrimNonProfitable, the subtree trim loop returns Invalid when trimming node Idx==1 under an InsertElement root would leave only a buildvector, to avoid infinite vectorization attempts. This is too aggressive when the original untrimmed tree is already profitable (Cost < -SLPCostThreshold). In that case, undo any partial trims and return the original cost instead of rejecting the tree. Reviewers: RKSimon, hiraditya, bababuck Pull Request: llvm/llvm-project#197763
…d reduce to buildvector-only In calculateTreeCostAndTrimNonProfitable, the subtree trim loop returns Invalid when trimming node Idx==1 under an InsertElement root would leave only a buildvector, to avoid infinite vectorization attempts. This is too aggressive when the original untrimmed tree is already profitable (Cost < -SLPCostThreshold). In that case, undo any partial trims and return the original cost instead of rejecting the tree. Reviewers: RKSimon, hiraditya, bababuck Pull Request: llvm/llvm-project#197763
… buildvector-only In calculateTreeCostAndTrimNonProfitable, the subtree trim loop returns Invalid when trimming node Idx==1 under an InsertElement root would leave only a buildvector, to avoid infinite vectorization attempts. This is too aggressive when the original untrimmed tree is already profitable (Cost < -SLPCostThreshold). In that case, undo any partial trims and return the original cost instead of rejecting the tree. Original Pull Request: #197763 Recommit after unrelated revert in #198265 Reviewers: Pull Request: #198336
…d reduce to buildvector-only In calculateTreeCostAndTrimNonProfitable, the subtree trim loop returns Invalid when trimming node Idx==1 under an InsertElement root would leave only a buildvector, to avoid infinite vectorization attempts. This is too aggressive when the original untrimmed tree is already profitable (Cost < -SLPCostThreshold). In that case, undo any partial trims and return the original cost instead of rejecting the tree. Original Pull Request: llvm/llvm-project#197763 Recommit after unrelated revert in llvm/llvm-project#198265 Reviewers: Pull Request: llvm/llvm-project#198336
…d reduce to buildvector-only In calculateTreeCostAndTrimNonProfitable, the subtree trim loop returns Invalid when trimming node Idx==1 under an InsertElement root would leave only a buildvector, to avoid infinite vectorization attempts. This is too aggressive when the original untrimmed tree is already profitable (Cost < -SLPCostThreshold). In that case, undo any partial trims and return the original cost instead of rejecting the tree. Original Pull Request: llvm/llvm-project#197763 Recommit after unrelated revert in llvm/llvm-project#198265 Reviewers: Pull Request: llvm/llvm-project#198336
…d reduce to buildvector-only In calculateTreeCostAndTrimNonProfitable, the subtree trim loop returns Invalid when trimming node Idx==1 under an InsertElement root would leave only a buildvector, to avoid infinite vectorization attempts. This is too aggressive when the original untrimmed tree is already profitable (Cost < -SLPCostThreshold). In that case, undo any partial trims and return the original cost instead of rejecting the tree. Original Pull Request: llvm/llvm-project#197763 Recommit after unrelated revert in llvm/llvm-project#198265 Reviewers: Pull Request: llvm/llvm-project#198336
… buildvector-only In calculateTreeCostAndTrimNonProfitable, the subtree trim loop returns Invalid when trimming node Idx==1 under an InsertElement root would leave only a buildvector, to avoid infinite vectorization attempts. This is too aggressive when the original untrimmed tree is already profitable (Cost < -SLPCostThreshold). In that case, undo any partial trims and return the original cost instead of rejecting the tree. Reviewers: RKSimon, hiraditya, bababuck Pull Request: llvm#197763
… buildvector-only In calculateTreeCostAndTrimNonProfitable, the subtree trim loop returns Invalid when trimming node Idx==1 under an InsertElement root would leave only a buildvector, to avoid infinite vectorization attempts. This is too aggressive when the original untrimmed tree is already profitable (Cost < -SLPCostThreshold). In that case, undo any partial trims and return the original cost instead of rejecting the tree. Reviewers: RKSimon, hiraditya, bababuck Pull Request: llvm#197763
… buildvector-only In calculateTreeCostAndTrimNonProfitable, the subtree trim loop returns Invalid when trimming node Idx==1 under an InsertElement root would leave only a buildvector, to avoid infinite vectorization attempts. This is too aggressive when the original untrimmed tree is already profitable (Cost < -SLPCostThreshold). In that case, undo any partial trims and return the original cost instead of rejecting the tree. Original Pull Request: llvm#197763 Recommit after unrelated revert in llvm#198265 Reviewers: Pull Request: llvm#198336
In calculateTreeCostAndTrimNonProfitable, the subtree trim loop returns
Invalid when trimming node Idx==1 under an InsertElement root would
leave only a buildvector, to avoid infinite vectorization attempts.
This is too aggressive when the original untrimmed tree is already
profitable (Cost < -SLPCostThreshold). In that case, undo any partial
trims and return the original cost instead of rejecting the tree.