[SLP] Preserve profitable trees when subtree trimming would reduce to buildvector-only#198336
Merged
Conversation
Created using spr 1.3.7
|
@llvm/pr-subscribers-llvm-transforms @llvm/pr-subscribers-vectorizers Author: Alexey Bataev (alexey-bataev) ChangesIn calculateTreeCostAndTrimNonProfitable, the subtree trim loop returns Original Pull Request: #197763 Recommit after unrelated revert in #198265 Patch is 35.27 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/198336.diff 10 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
index cab8e8f7987ec..3fb8a160625eb 100644
--- a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+++ b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
@@ -18982,11 +18982,21 @@ InstructionCost BoUpSLP::calculateTreeCostAndTrimNonProfitable(
IsEqualCostAltShuffleToTrim()) {
PreferTrimmedTree |= TotalSubtreeCost == GatherCost;
// If the remaining tree is just a buildvector - exit, it will cause
- // endless attempts to vectorize.
+ // endless attempts to vectorize. When the tree is already profitable,
+ // skip trimming this node and let the post-loop logic (including
+ // gathered loads processing) decide.
if (VectorizableTree.front()->hasState() &&
VectorizableTree.front()->getOpcode() == Instruction::InsertElement &&
- TE->Idx == 1)
+ TE->Idx == 1) {
+ if (Cost < -SLPCostThreshold) {
+ LLVM_DEBUG(dbgs() << "SLP: Skipping trim of node " << TE->Idx
+ << " - tree already profitable with cost " << Cost
+ << ".\n");
+ Worklist.pop();
+ continue;
+ }
return InstructionCost::getInvalid();
+ }
LLVM_DEBUG(dbgs() << "SLP: Trimming unprofitable subtree at node "
<< TE->Idx << " with cost "
diff --git a/llvm/test/Transforms/SLPVectorizer/X86/alternate-int-inseltpoison.ll b/llvm/test/Transforms/SLPVectorizer/X86/alternate-int-inseltpoison.ll
index 9b45fe6a2804b..174be55bd7b0a 100644
--- a/llvm/test/Transforms/SLPVectorizer/X86/alternate-int-inseltpoison.ll
+++ b/llvm/test/Transforms/SLPVectorizer/X86/alternate-int-inseltpoison.ll
@@ -282,26 +282,23 @@ define <8 x i32> @ashr_shl_v8i32_const(<8 x i32> %a) {
define <8 x i32> @ashr_lshr_shl_v8i32(<8 x i32> %a, <8 x i32> %b) {
; SSE-LABEL: @ashr_lshr_shl_v8i32(
-; SSE-NEXT: [[A4:%.*]] = extractelement <8 x i32> [[A:%.*]], i32 4
-; SSE-NEXT: [[A5:%.*]] = extractelement <8 x i32> [[A]], i32 5
-; SSE-NEXT: [[A6:%.*]] = extractelement <8 x i32> [[A]], i32 6
+; SSE-NEXT: [[A6:%.*]] = extractelement <8 x i32> [[A:%.*]], i32 6
; SSE-NEXT: [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7
-; SSE-NEXT: [[B4:%.*]] = extractelement <8 x i32> [[B:%.*]], i32 4
-; SSE-NEXT: [[B5:%.*]] = extractelement <8 x i32> [[B]], i32 5
-; SSE-NEXT: [[B6:%.*]] = extractelement <8 x i32> [[B]], i32 6
+; SSE-NEXT: [[B6:%.*]] = extractelement <8 x i32> [[B:%.*]], i32 6
; SSE-NEXT: [[B7:%.*]] = extractelement <8 x i32> [[B]], i32 7
; SSE-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; SSE-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; SSE-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]
; SSE-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]
; SSE-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
-; SSE-NEXT: [[AB4:%.*]] = lshr i32 [[A4]], [[B4]]
-; SSE-NEXT: [[AB5:%.*]] = lshr i32 [[A5]], [[B5]]
+; SSE-NEXT: [[TMP6:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> poison, <2 x i32> <i32 4, i32 5>
+; SSE-NEXT: [[TMP7:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> poison, <2 x i32> <i32 4, i32 5>
+; SSE-NEXT: [[TMP9:%.*]] = lshr <2 x i32> [[TMP6]], [[TMP7]]
; SSE-NEXT: [[AB6:%.*]] = shl i32 [[A6]], [[B6]]
; SSE-NEXT: [[AB7:%.*]] = shl i32 [[A7]], [[B7]]
; SSE-NEXT: [[TMP8:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 poison, i32 poison, i32 poison, i32 poison>
-; SSE-NEXT: [[R4:%.*]] = insertelement <8 x i32> [[TMP8]], i32 [[AB4]], i32 4
-; SSE-NEXT: [[R51:%.*]] = insertelement <8 x i32> [[R4]], i32 [[AB5]], i32 5
+; SSE-NEXT: [[TMP10:%.*]] = shufflevector <2 x i32> [[TMP9]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; SSE-NEXT: [[R51:%.*]] = shufflevector <8 x i32> [[TMP8]], <8 x i32> [[TMP10]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 6, i32 7>
; SSE-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R51]], i32 [[AB6]], i32 6
; SSE-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7
; SSE-NEXT: ret <8 x i32> [[R7]]
diff --git a/llvm/test/Transforms/SLPVectorizer/X86/alternate-int.ll b/llvm/test/Transforms/SLPVectorizer/X86/alternate-int.ll
index d812cc813c20f..8a5e278e2b03a 100644
--- a/llvm/test/Transforms/SLPVectorizer/X86/alternate-int.ll
+++ b/llvm/test/Transforms/SLPVectorizer/X86/alternate-int.ll
@@ -282,26 +282,23 @@ define <8 x i32> @ashr_shl_v8i32_const(<8 x i32> %a) {
define <8 x i32> @ashr_lshr_shl_v8i32(<8 x i32> %a, <8 x i32> %b) {
; SSE-LABEL: @ashr_lshr_shl_v8i32(
-; SSE-NEXT: [[A4:%.*]] = extractelement <8 x i32> [[A:%.*]], i32 4
-; SSE-NEXT: [[A5:%.*]] = extractelement <8 x i32> [[A]], i32 5
-; SSE-NEXT: [[A6:%.*]] = extractelement <8 x i32> [[A]], i32 6
+; SSE-NEXT: [[A6:%.*]] = extractelement <8 x i32> [[A:%.*]], i32 6
; SSE-NEXT: [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7
-; SSE-NEXT: [[B4:%.*]] = extractelement <8 x i32> [[B:%.*]], i32 4
-; SSE-NEXT: [[B5:%.*]] = extractelement <8 x i32> [[B]], i32 5
-; SSE-NEXT: [[B6:%.*]] = extractelement <8 x i32> [[B]], i32 6
+; SSE-NEXT: [[B6:%.*]] = extractelement <8 x i32> [[B:%.*]], i32 6
; SSE-NEXT: [[B7:%.*]] = extractelement <8 x i32> [[B]], i32 7
; SSE-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; SSE-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; SSE-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]
; SSE-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]
; SSE-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
-; SSE-NEXT: [[AB4:%.*]] = lshr i32 [[A4]], [[B4]]
-; SSE-NEXT: [[AB5:%.*]] = lshr i32 [[A5]], [[B5]]
+; SSE-NEXT: [[TMP6:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> poison, <2 x i32> <i32 4, i32 5>
+; SSE-NEXT: [[TMP7:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> poison, <2 x i32> <i32 4, i32 5>
+; SSE-NEXT: [[TMP9:%.*]] = lshr <2 x i32> [[TMP6]], [[TMP7]]
; SSE-NEXT: [[AB6:%.*]] = shl i32 [[A6]], [[B6]]
; SSE-NEXT: [[AB7:%.*]] = shl i32 [[A7]], [[B7]]
; SSE-NEXT: [[TMP8:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 poison, i32 poison, i32 poison, i32 poison>
-; SSE-NEXT: [[R4:%.*]] = insertelement <8 x i32> [[TMP8]], i32 [[AB4]], i32 4
-; SSE-NEXT: [[R51:%.*]] = insertelement <8 x i32> [[R4]], i32 [[AB5]], i32 5
+; SSE-NEXT: [[TMP10:%.*]] = shufflevector <2 x i32> [[TMP9]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; SSE-NEXT: [[R51:%.*]] = shufflevector <8 x i32> [[TMP8]], <8 x i32> [[TMP10]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 6, i32 7>
; SSE-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R51]], i32 [[AB6]], i32 6
; SSE-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7
; SSE-NEXT: ret <8 x i32> [[R7]]
diff --git a/llvm/test/Transforms/SLPVectorizer/X86/arith-fp-inseltpoison.ll b/llvm/test/Transforms/SLPVectorizer/X86/arith-fp-inseltpoison.ll
index 57deca1d62516..7f7e77eadc987 100644
--- a/llvm/test/Transforms/SLPVectorizer/X86/arith-fp-inseltpoison.ll
+++ b/llvm/test/Transforms/SLPVectorizer/X86/arith-fp-inseltpoison.ll
@@ -607,38 +607,25 @@ define <8 x double> @buildvector_div_8f64(<8 x double> %a, <8 x double> %b) {
; SSE-NEXT: ret <8 x double> [[TMP1]]
;
; SLM-LABEL: @buildvector_div_8f64(
-; SLM-NEXT: [[A0:%.*]] = extractelement <8 x double> [[A:%.*]], i32 0
-; SLM-NEXT: [[A1:%.*]] = extractelement <8 x double> [[A]], i32 1
-; SLM-NEXT: [[A2:%.*]] = extractelement <8 x double> [[A]], i32 2
-; SLM-NEXT: [[A3:%.*]] = extractelement <8 x double> [[A]], i32 3
-; SLM-NEXT: [[A4:%.*]] = extractelement <8 x double> [[A]], i32 4
-; SLM-NEXT: [[A5:%.*]] = extractelement <8 x double> [[A]], i32 5
-; SLM-NEXT: [[A6:%.*]] = extractelement <8 x double> [[A]], i32 6
-; SLM-NEXT: [[A7:%.*]] = extractelement <8 x double> [[A]], i32 7
-; SLM-NEXT: [[B0:%.*]] = extractelement <8 x double> [[B:%.*]], i32 0
-; SLM-NEXT: [[B1:%.*]] = extractelement <8 x double> [[B]], i32 1
-; SLM-NEXT: [[B2:%.*]] = extractelement <8 x double> [[B]], i32 2
-; SLM-NEXT: [[B3:%.*]] = extractelement <8 x double> [[B]], i32 3
-; SLM-NEXT: [[B4:%.*]] = extractelement <8 x double> [[B]], i32 4
-; SLM-NEXT: [[B5:%.*]] = extractelement <8 x double> [[B]], i32 5
-; SLM-NEXT: [[B6:%.*]] = extractelement <8 x double> [[B]], i32 6
-; SLM-NEXT: [[B7:%.*]] = extractelement <8 x double> [[B]], i32 7
-; SLM-NEXT: [[C0:%.*]] = fdiv double [[A0]], [[B0]]
-; SLM-NEXT: [[C1:%.*]] = fdiv double [[A1]], [[B1]]
-; SLM-NEXT: [[C2:%.*]] = fdiv double [[A2]], [[B2]]
-; SLM-NEXT: [[C3:%.*]] = fdiv double [[A3]], [[B3]]
-; SLM-NEXT: [[C4:%.*]] = fdiv double [[A4]], [[B4]]
-; SLM-NEXT: [[C5:%.*]] = fdiv double [[A5]], [[B5]]
-; SLM-NEXT: [[C6:%.*]] = fdiv double [[A6]], [[B6]]
-; SLM-NEXT: [[C7:%.*]] = fdiv double [[A7]], [[B7]]
-; SLM-NEXT: [[R0:%.*]] = insertelement <8 x double> poison, double [[C0]], i32 0
-; SLM-NEXT: [[R1:%.*]] = insertelement <8 x double> [[R0]], double [[C1]], i32 1
-; SLM-NEXT: [[R2:%.*]] = insertelement <8 x double> [[R1]], double [[C2]], i32 2
-; SLM-NEXT: [[R3:%.*]] = insertelement <8 x double> [[R2]], double [[C3]], i32 3
-; SLM-NEXT: [[R4:%.*]] = insertelement <8 x double> [[R3]], double [[C4]], i32 4
-; SLM-NEXT: [[R5:%.*]] = insertelement <8 x double> [[R4]], double [[C5]], i32 5
-; SLM-NEXT: [[R6:%.*]] = insertelement <8 x double> [[R5]], double [[C6]], i32 6
-; SLM-NEXT: [[R73:%.*]] = insertelement <8 x double> [[R6]], double [[C7]], i32 7
+; SLM-NEXT: [[TMP1:%.*]] = shufflevector <8 x double> [[A:%.*]], <8 x double> poison, <2 x i32> <i32 0, i32 1>
+; SLM-NEXT: [[TMP2:%.*]] = shufflevector <8 x double> [[B:%.*]], <8 x double> poison, <2 x i32> <i32 0, i32 1>
+; SLM-NEXT: [[TMP3:%.*]] = fdiv <2 x double> [[TMP1]], [[TMP2]]
+; SLM-NEXT: [[TMP4:%.*]] = shufflevector <8 x double> [[A]], <8 x double> poison, <2 x i32> <i32 2, i32 3>
+; SLM-NEXT: [[TMP5:%.*]] = shufflevector <8 x double> [[B]], <8 x double> poison, <2 x i32> <i32 2, i32 3>
+; SLM-NEXT: [[TMP6:%.*]] = fdiv <2 x double> [[TMP4]], [[TMP5]]
+; SLM-NEXT: [[TMP7:%.*]] = shufflevector <8 x double> [[A]], <8 x double> poison, <2 x i32> <i32 4, i32 5>
+; SLM-NEXT: [[TMP8:%.*]] = shufflevector <8 x double> [[B]], <8 x double> poison, <2 x i32> <i32 4, i32 5>
+; SLM-NEXT: [[TMP9:%.*]] = fdiv <2 x double> [[TMP7]], [[TMP8]]
+; SLM-NEXT: [[TMP10:%.*]] = shufflevector <8 x double> [[A]], <8 x double> poison, <2 x i32> <i32 6, i32 7>
+; SLM-NEXT: [[TMP11:%.*]] = shufflevector <8 x double> [[B]], <8 x double> poison, <2 x i32> <i32 6, i32 7>
+; SLM-NEXT: [[TMP12:%.*]] = fdiv <2 x double> [[TMP10]], [[TMP11]]
+; SLM-NEXT: [[TMP13:%.*]] = shufflevector <2 x double> [[TMP3]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; SLM-NEXT: [[TMP14:%.*]] = shufflevector <2 x double> [[TMP6]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; SLM-NEXT: [[R31:%.*]] = shufflevector <8 x double> [[TMP13]], <8 x double> [[TMP14]], <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 4, i32 5, i32 6, i32 7>
+; SLM-NEXT: [[TMP15:%.*]] = shufflevector <2 x double> [[TMP9]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; SLM-NEXT: [[R52:%.*]] = shufflevector <8 x double> [[R31]], <8 x double> [[TMP15]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 6, i32 7>
+; SLM-NEXT: [[TMP16:%.*]] = shufflevector <2 x double> [[TMP12]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; SLM-NEXT: [[R73:%.*]] = shufflevector <8 x double> [[R52]], <8 x double> [[TMP16]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
; SLM-NEXT: ret <8 x double> [[R73]]
;
; AVX-LABEL: @buildvector_div_8f64(
diff --git a/llvm/test/Transforms/SLPVectorizer/X86/arith-fp.ll b/llvm/test/Transforms/SLPVectorizer/X86/arith-fp.ll
index d1a5c3bb032e0..8b8bc71c2ceda 100644
--- a/llvm/test/Transforms/SLPVectorizer/X86/arith-fp.ll
+++ b/llvm/test/Transforms/SLPVectorizer/X86/arith-fp.ll
@@ -607,38 +607,25 @@ define <8 x double> @buildvector_div_8f64(<8 x double> %a, <8 x double> %b) {
; SSE-NEXT: ret <8 x double> [[TMP1]]
;
; SLM-LABEL: @buildvector_div_8f64(
-; SLM-NEXT: [[A0:%.*]] = extractelement <8 x double> [[A:%.*]], i32 0
-; SLM-NEXT: [[A1:%.*]] = extractelement <8 x double> [[A]], i32 1
-; SLM-NEXT: [[A2:%.*]] = extractelement <8 x double> [[A]], i32 2
-; SLM-NEXT: [[A3:%.*]] = extractelement <8 x double> [[A]], i32 3
-; SLM-NEXT: [[A4:%.*]] = extractelement <8 x double> [[A]], i32 4
-; SLM-NEXT: [[A5:%.*]] = extractelement <8 x double> [[A]], i32 5
-; SLM-NEXT: [[A6:%.*]] = extractelement <8 x double> [[A]], i32 6
-; SLM-NEXT: [[A7:%.*]] = extractelement <8 x double> [[A]], i32 7
-; SLM-NEXT: [[B0:%.*]] = extractelement <8 x double> [[B:%.*]], i32 0
-; SLM-NEXT: [[B1:%.*]] = extractelement <8 x double> [[B]], i32 1
-; SLM-NEXT: [[B2:%.*]] = extractelement <8 x double> [[B]], i32 2
-; SLM-NEXT: [[B3:%.*]] = extractelement <8 x double> [[B]], i32 3
-; SLM-NEXT: [[B4:%.*]] = extractelement <8 x double> [[B]], i32 4
-; SLM-NEXT: [[B5:%.*]] = extractelement <8 x double> [[B]], i32 5
-; SLM-NEXT: [[B6:%.*]] = extractelement <8 x double> [[B]], i32 6
-; SLM-NEXT: [[B7:%.*]] = extractelement <8 x double> [[B]], i32 7
-; SLM-NEXT: [[C0:%.*]] = fdiv double [[A0]], [[B0]]
-; SLM-NEXT: [[C1:%.*]] = fdiv double [[A1]], [[B1]]
-; SLM-NEXT: [[C2:%.*]] = fdiv double [[A2]], [[B2]]
-; SLM-NEXT: [[C3:%.*]] = fdiv double [[A3]], [[B3]]
-; SLM-NEXT: [[C4:%.*]] = fdiv double [[A4]], [[B4]]
-; SLM-NEXT: [[C5:%.*]] = fdiv double [[A5]], [[B5]]
-; SLM-NEXT: [[C6:%.*]] = fdiv double [[A6]], [[B6]]
-; SLM-NEXT: [[C7:%.*]] = fdiv double [[A7]], [[B7]]
-; SLM-NEXT: [[R0:%.*]] = insertelement <8 x double> undef, double [[C0]], i32 0
-; SLM-NEXT: [[R1:%.*]] = insertelement <8 x double> [[R0]], double [[C1]], i32 1
-; SLM-NEXT: [[R2:%.*]] = insertelement <8 x double> [[R1]], double [[C2]], i32 2
-; SLM-NEXT: [[R3:%.*]] = insertelement <8 x double> [[R2]], double [[C3]], i32 3
-; SLM-NEXT: [[R4:%.*]] = insertelement <8 x double> [[R3]], double [[C4]], i32 4
-; SLM-NEXT: [[R5:%.*]] = insertelement <8 x double> [[R4]], double [[C5]], i32 5
-; SLM-NEXT: [[R6:%.*]] = insertelement <8 x double> [[R5]], double [[C6]], i32 6
-; SLM-NEXT: [[R73:%.*]] = insertelement <8 x double> [[R6]], double [[C7]], i32 7
+; SLM-NEXT: [[TMP1:%.*]] = shufflevector <8 x double> [[A:%.*]], <8 x double> poison, <2 x i32> <i32 0, i32 1>
+; SLM-NEXT: [[TMP2:%.*]] = shufflevector <8 x double> [[B:%.*]], <8 x double> poison, <2 x i32> <i32 0, i32 1>
+; SLM-NEXT: [[TMP3:%.*]] = fdiv <2 x double> [[TMP1]], [[TMP2]]
+; SLM-NEXT: [[TMP4:%.*]] = shufflevector <8 x double> [[A]], <8 x double> poison, <2 x i32> <i32 2, i32 3>
+; SLM-NEXT: [[TMP5:%.*]] = shufflevector <8 x double> [[B]], <8 x double> poison, <2 x i32> <i32 2, i32 3>
+; SLM-NEXT: [[TMP6:%.*]] = fdiv <2 x double> [[TMP4]], [[TMP5]]
+; SLM-NEXT: [[TMP7:%.*]] = shufflevector <8 x double> [[A]], <8 x double> poison, <2 x i32> <i32 4, i32 5>
+; SLM-NEXT: [[TMP8:%.*]] = shufflevector <8 x double> [[B]], <8 x double> poison, <2 x i32> <i32 4, i32 5>
+; SLM-NEXT: [[TMP9:%.*]] = fdiv <2 x double> [[TMP7]], [[TMP8]]
+; SLM-NEXT: [[TMP10:%.*]] = shufflevector <8 x double> [[A]], <8 x double> poison, <2 x i32> <i32 6, i32 7>
+; SLM-NEXT: [[TMP11:%.*]] = shufflevector <8 x double> [[B]], <8 x double> poison, <2 x i32> <i32 6, i32 7>
+; SLM-NEXT: [[TMP12:%.*]] = fdiv <2 x double> [[TMP10]], [[TMP11]]
+; SLM-NEXT: [[TMP13:%.*]] = shufflevector <2 x double> [[TMP3]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; SLM-NEXT: [[TMP14:%.*]] = shufflevector <2 x double> [[TMP6]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; SLM-NEXT: [[R31:%.*]] = shufflevector <8 x double> [[TMP13]], <8 x double> [[TMP14]], <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 4, i32 5, i32 6, i32 7>
+; SLM-NEXT: [[TMP15:%.*]] = shufflevector <2 x double> [[TMP9]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; SLM-NEXT: [[R52:%.*]] = shufflevector <8 x double> [[R31]], <8 x double> [[TMP15]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 6, i32 7>
+; SLM-NEXT: [[TMP16:%.*]] = shufflevector <2 x double> [[TMP12]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; SLM-NEXT: [[R73:%.*]] = shufflevector <8 x double> [[R52]], <8 x double> [[TMP16]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
; SLM-NEXT: ret <8 x double> [[R73]]
;
; AVX-LABEL: @buildvector_div_8f64(
diff --git a/llvm/test/Transforms/SLPVectorizer/X86/deleted-instructions-clear.ll b/llvm/test/Transforms/SLPVectorizer/X86/deleted-instructions-clear.ll
index 1bb24c524bb3e..042f5cb3f512b 100644
--- a/llvm/test/Transforms/SLPVectorizer/X86/deleted-instructions-clear.ll
+++ b/llvm/test/Transforms/SLPVectorizer/X86/deleted-instructions-clear.ll
@@ -5,8 +5,7 @@ define void @test(i32 %arg, i32 %arg1, i64 %arg2) {
; CHECK-LABEL: define void @test(
; CHECK-SAME: i32 [[ARG:%.*]], i32 [[ARG1:%.*]], i64 [[ARG2:%.*]]) #[[ATTR0:[0-9]+]] {
; CHECK-NEXT: [[BB:.*]]:
-; CHECK-NEXT: [[TMP0:%.*]] = insertelement <4 x i32> <i32 0, i32 0, i32 poison, i32 poison>, i32 [[ARG]], i32 2
-; CHECK-NEXT: [[TMP11:%.*]] = insertelement <4 x i32> [[TMP0]], i32 [[ARG1]], i32 3
+; CHECK-NEXT: [[TMP11:%.*]] = insertelement <4 x i32> <i32 0, i32 0, i32 0, i32 poison>, i32 [[ARG]], i32 3
; CHECK-NEXT: br label %[[BB3:.*]]
; CHECK: [[BB3]]:
; CHECK-NEXT: [[TMP3:%.*]] = phi i64 [ 0, %[[BB3]] ], [ 0, %[[BB]] ]
@@ -21,7 +20,6 @@ define void @test(i32 %arg, i32 %arg1, i64 %arg2) {
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x i32> [[TMP5]], i32 0
; CHECK-NEXT: [[TMP10:%.*]] = zext i32 [[TMP4]] to i64
; CHECK-NEXT: [[TRUNC10:%.*]] = trunc i64 [[TMP10]] to i32
-; CHECK-NEXT: [[SHL:%.*]] = shl i32 0, 1
; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x i32> [[TMP5]], i32 2
; CHECK-NEXT: [[TMP7:%.*]] = zext i32 [[TMP6]] to i64
; CHECK-NEXT: [[TRUNC27:%.*]] = trunc i64 [[TMP7]] to i32
@@ -30,14 +28,15 @@ define void @test(i32 %arg, i32 %arg1, i64 %arg2) {
; CHECK-NEXT: [[TMP9:%.*]] = mul <4 x i32> [[TMP5]], [[TMP8]]
; CHECK-NEXT: [[XOR38:%.*]] = xor i32 [[ARG]], [[TRUNC28]]
; CHECK-NEXT: [[SHL35:%.*]] = shl i32 [[ARG1]], 0
-; CHECK-NEXT: [[TMP12:%.*]] = insertelement <4 x i32> poison, i32 [[TRUNC10]], i32 0
-; CHECK-NEXT: [[TMP13:%.*]] = insertelement <4 x i32> [[TMP12]], i32 [[SHL]], i32 1
-; CHECK-NEXT: [[TMP24:%.*]] = insertelement <4 x i32> [[TMP13]], i32 [[TRUNC27]], i32 2
-; CHECK-NEXT: [[TMP25:%.*]] = insertelement <4 x i32> [[TMP24]], i32 [[TRUNC19]], i32 3
+; CHECK-NEXT: [[XOR31:%.*]] = xor i32 [[ARG1]], [[TRUNC19]]
+; CHECK-NEXT: [[SHL:%.*]] = shl i32 0, 1
+; CHECK-NEXT: [[TMP23:%.*]] = insertelement <4 x i32> poison, i32 [[SHL]], i32 0
+; CHECK-NEXT: [[TMP12:%.*]] = insertelement <4 x i32> [[TMP23]], i32 [[TRUNC10]], i32 1
+; CHECK-NEXT: [[TMP13:%.*]] = insertelement <4 x i32> [[TMP12]], i32 [[TRUNC27]], i32 3
+; CHECK-NEXT: [[TMP25:%.*]] = shuffle...
[truncated]
|
cpullvm-upstream-sync Bot
pushed a commit
to navaneethshan/cpullvm-toolchain-1
that referenced
this pull request
May 18, 2026
…d reduce to buildvector-only In calculateTreeCostAndTrimNonProfitable, the subtree trim loop returns Invalid when trimming node Idx==1 under an InsertElement root would leave only a buildvector, to avoid infinite vectorization attempts. This is too aggressive when the original untrimmed tree is already profitable (Cost < -SLPCostThreshold). In that case, undo any partial trims and return the original cost instead of rejecting the tree. Original Pull Request: llvm/llvm-project#197763 Recommit after unrelated revert in llvm/llvm-project#198265 Reviewers: Pull Request: llvm/llvm-project#198336
llvm-sync Bot
pushed a commit
to arm/arm-toolchain
that referenced
this pull request
May 18, 2026
…d reduce to buildvector-only In calculateTreeCostAndTrimNonProfitable, the subtree trim loop returns Invalid when trimming node Idx==1 under an InsertElement root would leave only a buildvector, to avoid infinite vectorization attempts. This is too aggressive when the original untrimmed tree is already profitable (Cost < -SLPCostThreshold). In that case, undo any partial trims and return the original cost instead of rejecting the tree. Original Pull Request: llvm/llvm-project#197763 Recommit after unrelated revert in llvm/llvm-project#198265 Reviewers: Pull Request: llvm/llvm-project#198336
llvm-upstreamsync Bot
pushed a commit
to qualcomm/cpullvm-toolchain
that referenced
this pull request
May 18, 2026
…d reduce to buildvector-only In calculateTreeCostAndTrimNonProfitable, the subtree trim loop returns Invalid when trimming node Idx==1 under an InsertElement root would leave only a buildvector, to avoid infinite vectorization attempts. This is too aggressive when the original untrimmed tree is already profitable (Cost < -SLPCostThreshold). In that case, undo any partial trims and return the original cost instead of rejecting the tree. Original Pull Request: llvm/llvm-project#197763 Recommit after unrelated revert in llvm/llvm-project#198265 Reviewers: Pull Request: llvm/llvm-project#198336
pedroMVicente
pushed a commit
to pedroMVicente/llvm-project
that referenced
this pull request
May 19, 2026
… buildvector-only In calculateTreeCostAndTrimNonProfitable, the subtree trim loop returns Invalid when trimming node Idx==1 under an InsertElement root would leave only a buildvector, to avoid infinite vectorization attempts. This is too aggressive when the original untrimmed tree is already profitable (Cost < -SLPCostThreshold). In that case, undo any partial trims and return the original cost instead of rejecting the tree. Original Pull Request: llvm#197763 Recommit after unrelated revert in llvm#198265 Reviewers: Pull Request: llvm#198336
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
In calculateTreeCostAndTrimNonProfitable, the subtree trim loop returns
Invalid when trimming node Idx==1 under an InsertElement root would
leave only a buildvector, to avoid infinite vectorization attempts.
This is too aggressive when the original untrimmed tree is already
profitable (Cost < -SLPCostThreshold). In that case, undo any partial
trims and return the original cost instead of rejecting the tree.
Original Pull Request: #197763
Recommit after unrelated revert in #198265