Skip to content

[SLP] Preserve profitable trees when subtree trimming would reduce to buildvector-only#198336

Merged
alexey-bataev merged 1 commit into
mainfrom
users/alexey-bataev/spr/slp-preserve-profitable-trees-when-subtree-trimming-would-reduce-to-buildvector-only-2
May 18, 2026
Merged

[SLP] Preserve profitable trees when subtree trimming would reduce to buildvector-only#198336
alexey-bataev merged 1 commit into
mainfrom
users/alexey-bataev/spr/slp-preserve-profitable-trees-when-subtree-trimming-would-reduce-to-buildvector-only-2

Conversation

@alexey-bataev

Copy link
Copy Markdown
Member

In calculateTreeCostAndTrimNonProfitable, the subtree trim loop returns
Invalid when trimming node Idx==1 under an InsertElement root would
leave only a buildvector, to avoid infinite vectorization attempts.
This is too aggressive when the original untrimmed tree is already
profitable (Cost < -SLPCostThreshold). In that case, undo any partial
trims and return the original cost instead of rejecting the tree.

Original Pull Request: #197763

Recommit after unrelated revert in #198265

Created using spr 1.3.7
@alexey-bataev alexey-bataev merged commit cf80e0e into main May 18, 2026
7 of 11 checks passed
@alexey-bataev alexey-bataev deleted the users/alexey-bataev/spr/slp-preserve-profitable-trees-when-subtree-trimming-would-reduce-to-buildvector-only-2 branch May 18, 2026 16:21
@llvmorg-github-actions

llvmorg-github-actions Bot commented May 18, 2026

Copy link
Copy Markdown

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-vectorizers

Author: Alexey Bataev (alexey-bataev)

Changes

In calculateTreeCostAndTrimNonProfitable, the subtree trim loop returns
Invalid when trimming node Idx==1 under an InsertElement root would
leave only a buildvector, to avoid infinite vectorization attempts.
This is too aggressive when the original untrimmed tree is already
profitable (Cost < -SLPCostThreshold). In that case, undo any partial
trims and return the original cost instead of rejecting the tree.

Original Pull Request: #197763

Recommit after unrelated revert in #198265


Patch is 35.27 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/198336.diff

10 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp (+12-2)
  • (modified) llvm/test/Transforms/SLPVectorizer/X86/alternate-int-inseltpoison.ll (+7-10)
  • (modified) llvm/test/Transforms/SLPVectorizer/X86/alternate-int.ll (+7-10)
  • (modified) llvm/test/Transforms/SLPVectorizer/X86/arith-fp-inseltpoison.ll (+19-32)
  • (modified) llvm/test/Transforms/SLPVectorizer/X86/arith-fp.ll (+19-32)
  • (modified) llvm/test/Transforms/SLPVectorizer/X86/deleted-instructions-clear.ll (+9-10)
  • (modified) llvm/test/Transforms/SLPVectorizer/X86/scalarize-ctlz.ll (+48-29)
  • (modified) llvm/test/Transforms/SLPVectorizer/X86/select-copyable-cmp-poison.ll (+9-6)
  • (modified) llvm/test/Transforms/SLPVectorizer/X86/trunced-buildvector-scalar-extended.ll (+1-5)
  • (modified) llvm/test/Transforms/SLPVectorizer/X86/vectorize-widest-phis.ll (+7-4)
diff --git a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
index cab8e8f7987ec..3fb8a160625eb 100644
--- a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+++ b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
@@ -18982,11 +18982,21 @@ InstructionCost BoUpSLP::calculateTreeCostAndTrimNonProfitable(
         IsEqualCostAltShuffleToTrim()) {
       PreferTrimmedTree |= TotalSubtreeCost == GatherCost;
       // If the remaining tree is just a buildvector - exit, it will cause
-      // endless attempts to vectorize.
+      // endless attempts to vectorize. When the tree is already profitable,
+      // skip trimming this node and let the post-loop logic (including
+      // gathered loads processing) decide.
       if (VectorizableTree.front()->hasState() &&
           VectorizableTree.front()->getOpcode() == Instruction::InsertElement &&
-          TE->Idx == 1)
+          TE->Idx == 1) {
+        if (Cost < -SLPCostThreshold) {
+          LLVM_DEBUG(dbgs() << "SLP: Skipping trim of node " << TE->Idx
+                            << " - tree already profitable with cost " << Cost
+                            << ".\n");
+          Worklist.pop();
+          continue;
+        }
         return InstructionCost::getInvalid();
+      }
 
       LLVM_DEBUG(dbgs() << "SLP: Trimming unprofitable subtree at node "
                         << TE->Idx << " with cost "
diff --git a/llvm/test/Transforms/SLPVectorizer/X86/alternate-int-inseltpoison.ll b/llvm/test/Transforms/SLPVectorizer/X86/alternate-int-inseltpoison.ll
index 9b45fe6a2804b..174be55bd7b0a 100644
--- a/llvm/test/Transforms/SLPVectorizer/X86/alternate-int-inseltpoison.ll
+++ b/llvm/test/Transforms/SLPVectorizer/X86/alternate-int-inseltpoison.ll
@@ -282,26 +282,23 @@ define <8 x i32> @ashr_shl_v8i32_const(<8 x i32> %a) {
 
 define <8 x i32> @ashr_lshr_shl_v8i32(<8 x i32> %a, <8 x i32> %b) {
 ; SSE-LABEL: @ashr_lshr_shl_v8i32(
-; SSE-NEXT:    [[A4:%.*]] = extractelement <8 x i32> [[A:%.*]], i32 4
-; SSE-NEXT:    [[A5:%.*]] = extractelement <8 x i32> [[A]], i32 5
-; SSE-NEXT:    [[A6:%.*]] = extractelement <8 x i32> [[A]], i32 6
+; SSE-NEXT:    [[A6:%.*]] = extractelement <8 x i32> [[A:%.*]], i32 6
 ; SSE-NEXT:    [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7
-; SSE-NEXT:    [[B4:%.*]] = extractelement <8 x i32> [[B:%.*]], i32 4
-; SSE-NEXT:    [[B5:%.*]] = extractelement <8 x i32> [[B]], i32 5
-; SSE-NEXT:    [[B6:%.*]] = extractelement <8 x i32> [[B]], i32 6
+; SSE-NEXT:    [[B6:%.*]] = extractelement <8 x i32> [[B:%.*]], i32 6
 ; SSE-NEXT:    [[B7:%.*]] = extractelement <8 x i32> [[B]], i32 7
 ; SSE-NEXT:    [[TMP1:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
 ; SSE-NEXT:    [[TMP2:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
 ; SSE-NEXT:    [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]
 ; SSE-NEXT:    [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]
 ; SSE-NEXT:    [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
-; SSE-NEXT:    [[AB4:%.*]] = lshr i32 [[A4]], [[B4]]
-; SSE-NEXT:    [[AB5:%.*]] = lshr i32 [[A5]], [[B5]]
+; SSE-NEXT:    [[TMP6:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> poison, <2 x i32> <i32 4, i32 5>
+; SSE-NEXT:    [[TMP7:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> poison, <2 x i32> <i32 4, i32 5>
+; SSE-NEXT:    [[TMP9:%.*]] = lshr <2 x i32> [[TMP6]], [[TMP7]]
 ; SSE-NEXT:    [[AB6:%.*]] = shl i32 [[A6]], [[B6]]
 ; SSE-NEXT:    [[AB7:%.*]] = shl i32 [[A7]], [[B7]]
 ; SSE-NEXT:    [[TMP8:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 poison, i32 poison, i32 poison, i32 poison>
-; SSE-NEXT:    [[R4:%.*]] = insertelement <8 x i32> [[TMP8]], i32 [[AB4]], i32 4
-; SSE-NEXT:    [[R51:%.*]] = insertelement <8 x i32> [[R4]], i32 [[AB5]], i32 5
+; SSE-NEXT:    [[TMP10:%.*]] = shufflevector <2 x i32> [[TMP9]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; SSE-NEXT:    [[R51:%.*]] = shufflevector <8 x i32> [[TMP8]], <8 x i32> [[TMP10]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 6, i32 7>
 ; SSE-NEXT:    [[R6:%.*]] = insertelement <8 x i32> [[R51]], i32 [[AB6]], i32 6
 ; SSE-NEXT:    [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7
 ; SSE-NEXT:    ret <8 x i32> [[R7]]
diff --git a/llvm/test/Transforms/SLPVectorizer/X86/alternate-int.ll b/llvm/test/Transforms/SLPVectorizer/X86/alternate-int.ll
index d812cc813c20f..8a5e278e2b03a 100644
--- a/llvm/test/Transforms/SLPVectorizer/X86/alternate-int.ll
+++ b/llvm/test/Transforms/SLPVectorizer/X86/alternate-int.ll
@@ -282,26 +282,23 @@ define <8 x i32> @ashr_shl_v8i32_const(<8 x i32> %a) {
 
 define <8 x i32> @ashr_lshr_shl_v8i32(<8 x i32> %a, <8 x i32> %b) {
 ; SSE-LABEL: @ashr_lshr_shl_v8i32(
-; SSE-NEXT:    [[A4:%.*]] = extractelement <8 x i32> [[A:%.*]], i32 4
-; SSE-NEXT:    [[A5:%.*]] = extractelement <8 x i32> [[A]], i32 5
-; SSE-NEXT:    [[A6:%.*]] = extractelement <8 x i32> [[A]], i32 6
+; SSE-NEXT:    [[A6:%.*]] = extractelement <8 x i32> [[A:%.*]], i32 6
 ; SSE-NEXT:    [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7
-; SSE-NEXT:    [[B4:%.*]] = extractelement <8 x i32> [[B:%.*]], i32 4
-; SSE-NEXT:    [[B5:%.*]] = extractelement <8 x i32> [[B]], i32 5
-; SSE-NEXT:    [[B6:%.*]] = extractelement <8 x i32> [[B]], i32 6
+; SSE-NEXT:    [[B6:%.*]] = extractelement <8 x i32> [[B:%.*]], i32 6
 ; SSE-NEXT:    [[B7:%.*]] = extractelement <8 x i32> [[B]], i32 7
 ; SSE-NEXT:    [[TMP1:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
 ; SSE-NEXT:    [[TMP2:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
 ; SSE-NEXT:    [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]
 ; SSE-NEXT:    [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]
 ; SSE-NEXT:    [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
-; SSE-NEXT:    [[AB4:%.*]] = lshr i32 [[A4]], [[B4]]
-; SSE-NEXT:    [[AB5:%.*]] = lshr i32 [[A5]], [[B5]]
+; SSE-NEXT:    [[TMP6:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> poison, <2 x i32> <i32 4, i32 5>
+; SSE-NEXT:    [[TMP7:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> poison, <2 x i32> <i32 4, i32 5>
+; SSE-NEXT:    [[TMP9:%.*]] = lshr <2 x i32> [[TMP6]], [[TMP7]]
 ; SSE-NEXT:    [[AB6:%.*]] = shl i32 [[A6]], [[B6]]
 ; SSE-NEXT:    [[AB7:%.*]] = shl i32 [[A7]], [[B7]]
 ; SSE-NEXT:    [[TMP8:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 poison, i32 poison, i32 poison, i32 poison>
-; SSE-NEXT:    [[R4:%.*]] = insertelement <8 x i32> [[TMP8]], i32 [[AB4]], i32 4
-; SSE-NEXT:    [[R51:%.*]] = insertelement <8 x i32> [[R4]], i32 [[AB5]], i32 5
+; SSE-NEXT:    [[TMP10:%.*]] = shufflevector <2 x i32> [[TMP9]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; SSE-NEXT:    [[R51:%.*]] = shufflevector <8 x i32> [[TMP8]], <8 x i32> [[TMP10]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 6, i32 7>
 ; SSE-NEXT:    [[R6:%.*]] = insertelement <8 x i32> [[R51]], i32 [[AB6]], i32 6
 ; SSE-NEXT:    [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7
 ; SSE-NEXT:    ret <8 x i32> [[R7]]
diff --git a/llvm/test/Transforms/SLPVectorizer/X86/arith-fp-inseltpoison.ll b/llvm/test/Transforms/SLPVectorizer/X86/arith-fp-inseltpoison.ll
index 57deca1d62516..7f7e77eadc987 100644
--- a/llvm/test/Transforms/SLPVectorizer/X86/arith-fp-inseltpoison.ll
+++ b/llvm/test/Transforms/SLPVectorizer/X86/arith-fp-inseltpoison.ll
@@ -607,38 +607,25 @@ define <8 x double> @buildvector_div_8f64(<8 x double> %a, <8 x double> %b) {
 ; SSE-NEXT:    ret <8 x double> [[TMP1]]
 ;
 ; SLM-LABEL: @buildvector_div_8f64(
-; SLM-NEXT:    [[A0:%.*]] = extractelement <8 x double> [[A:%.*]], i32 0
-; SLM-NEXT:    [[A1:%.*]] = extractelement <8 x double> [[A]], i32 1
-; SLM-NEXT:    [[A2:%.*]] = extractelement <8 x double> [[A]], i32 2
-; SLM-NEXT:    [[A3:%.*]] = extractelement <8 x double> [[A]], i32 3
-; SLM-NEXT:    [[A4:%.*]] = extractelement <8 x double> [[A]], i32 4
-; SLM-NEXT:    [[A5:%.*]] = extractelement <8 x double> [[A]], i32 5
-; SLM-NEXT:    [[A6:%.*]] = extractelement <8 x double> [[A]], i32 6
-; SLM-NEXT:    [[A7:%.*]] = extractelement <8 x double> [[A]], i32 7
-; SLM-NEXT:    [[B0:%.*]] = extractelement <8 x double> [[B:%.*]], i32 0
-; SLM-NEXT:    [[B1:%.*]] = extractelement <8 x double> [[B]], i32 1
-; SLM-NEXT:    [[B2:%.*]] = extractelement <8 x double> [[B]], i32 2
-; SLM-NEXT:    [[B3:%.*]] = extractelement <8 x double> [[B]], i32 3
-; SLM-NEXT:    [[B4:%.*]] = extractelement <8 x double> [[B]], i32 4
-; SLM-NEXT:    [[B5:%.*]] = extractelement <8 x double> [[B]], i32 5
-; SLM-NEXT:    [[B6:%.*]] = extractelement <8 x double> [[B]], i32 6
-; SLM-NEXT:    [[B7:%.*]] = extractelement <8 x double> [[B]], i32 7
-; SLM-NEXT:    [[C0:%.*]] = fdiv double [[A0]], [[B0]]
-; SLM-NEXT:    [[C1:%.*]] = fdiv double [[A1]], [[B1]]
-; SLM-NEXT:    [[C2:%.*]] = fdiv double [[A2]], [[B2]]
-; SLM-NEXT:    [[C3:%.*]] = fdiv double [[A3]], [[B3]]
-; SLM-NEXT:    [[C4:%.*]] = fdiv double [[A4]], [[B4]]
-; SLM-NEXT:    [[C5:%.*]] = fdiv double [[A5]], [[B5]]
-; SLM-NEXT:    [[C6:%.*]] = fdiv double [[A6]], [[B6]]
-; SLM-NEXT:    [[C7:%.*]] = fdiv double [[A7]], [[B7]]
-; SLM-NEXT:    [[R0:%.*]] = insertelement <8 x double> poison, double [[C0]], i32 0
-; SLM-NEXT:    [[R1:%.*]] = insertelement <8 x double> [[R0]], double [[C1]], i32 1
-; SLM-NEXT:    [[R2:%.*]] = insertelement <8 x double> [[R1]], double [[C2]], i32 2
-; SLM-NEXT:    [[R3:%.*]] = insertelement <8 x double> [[R2]], double [[C3]], i32 3
-; SLM-NEXT:    [[R4:%.*]] = insertelement <8 x double> [[R3]], double [[C4]], i32 4
-; SLM-NEXT:    [[R5:%.*]] = insertelement <8 x double> [[R4]], double [[C5]], i32 5
-; SLM-NEXT:    [[R6:%.*]] = insertelement <8 x double> [[R5]], double [[C6]], i32 6
-; SLM-NEXT:    [[R73:%.*]] = insertelement <8 x double> [[R6]], double [[C7]], i32 7
+; SLM-NEXT:    [[TMP1:%.*]] = shufflevector <8 x double> [[A:%.*]], <8 x double> poison, <2 x i32> <i32 0, i32 1>
+; SLM-NEXT:    [[TMP2:%.*]] = shufflevector <8 x double> [[B:%.*]], <8 x double> poison, <2 x i32> <i32 0, i32 1>
+; SLM-NEXT:    [[TMP3:%.*]] = fdiv <2 x double> [[TMP1]], [[TMP2]]
+; SLM-NEXT:    [[TMP4:%.*]] = shufflevector <8 x double> [[A]], <8 x double> poison, <2 x i32> <i32 2, i32 3>
+; SLM-NEXT:    [[TMP5:%.*]] = shufflevector <8 x double> [[B]], <8 x double> poison, <2 x i32> <i32 2, i32 3>
+; SLM-NEXT:    [[TMP6:%.*]] = fdiv <2 x double> [[TMP4]], [[TMP5]]
+; SLM-NEXT:    [[TMP7:%.*]] = shufflevector <8 x double> [[A]], <8 x double> poison, <2 x i32> <i32 4, i32 5>
+; SLM-NEXT:    [[TMP8:%.*]] = shufflevector <8 x double> [[B]], <8 x double> poison, <2 x i32> <i32 4, i32 5>
+; SLM-NEXT:    [[TMP9:%.*]] = fdiv <2 x double> [[TMP7]], [[TMP8]]
+; SLM-NEXT:    [[TMP10:%.*]] = shufflevector <8 x double> [[A]], <8 x double> poison, <2 x i32> <i32 6, i32 7>
+; SLM-NEXT:    [[TMP11:%.*]] = shufflevector <8 x double> [[B]], <8 x double> poison, <2 x i32> <i32 6, i32 7>
+; SLM-NEXT:    [[TMP12:%.*]] = fdiv <2 x double> [[TMP10]], [[TMP11]]
+; SLM-NEXT:    [[TMP13:%.*]] = shufflevector <2 x double> [[TMP3]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; SLM-NEXT:    [[TMP14:%.*]] = shufflevector <2 x double> [[TMP6]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; SLM-NEXT:    [[R31:%.*]] = shufflevector <8 x double> [[TMP13]], <8 x double> [[TMP14]], <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 4, i32 5, i32 6, i32 7>
+; SLM-NEXT:    [[TMP15:%.*]] = shufflevector <2 x double> [[TMP9]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; SLM-NEXT:    [[R52:%.*]] = shufflevector <8 x double> [[R31]], <8 x double> [[TMP15]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 6, i32 7>
+; SLM-NEXT:    [[TMP16:%.*]] = shufflevector <2 x double> [[TMP12]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; SLM-NEXT:    [[R73:%.*]] = shufflevector <8 x double> [[R52]], <8 x double> [[TMP16]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
 ; SLM-NEXT:    ret <8 x double> [[R73]]
 ;
 ; AVX-LABEL: @buildvector_div_8f64(
diff --git a/llvm/test/Transforms/SLPVectorizer/X86/arith-fp.ll b/llvm/test/Transforms/SLPVectorizer/X86/arith-fp.ll
index d1a5c3bb032e0..8b8bc71c2ceda 100644
--- a/llvm/test/Transforms/SLPVectorizer/X86/arith-fp.ll
+++ b/llvm/test/Transforms/SLPVectorizer/X86/arith-fp.ll
@@ -607,38 +607,25 @@ define <8 x double> @buildvector_div_8f64(<8 x double> %a, <8 x double> %b) {
 ; SSE-NEXT:    ret <8 x double> [[TMP1]]
 ;
 ; SLM-LABEL: @buildvector_div_8f64(
-; SLM-NEXT:    [[A0:%.*]] = extractelement <8 x double> [[A:%.*]], i32 0
-; SLM-NEXT:    [[A1:%.*]] = extractelement <8 x double> [[A]], i32 1
-; SLM-NEXT:    [[A2:%.*]] = extractelement <8 x double> [[A]], i32 2
-; SLM-NEXT:    [[A3:%.*]] = extractelement <8 x double> [[A]], i32 3
-; SLM-NEXT:    [[A4:%.*]] = extractelement <8 x double> [[A]], i32 4
-; SLM-NEXT:    [[A5:%.*]] = extractelement <8 x double> [[A]], i32 5
-; SLM-NEXT:    [[A6:%.*]] = extractelement <8 x double> [[A]], i32 6
-; SLM-NEXT:    [[A7:%.*]] = extractelement <8 x double> [[A]], i32 7
-; SLM-NEXT:    [[B0:%.*]] = extractelement <8 x double> [[B:%.*]], i32 0
-; SLM-NEXT:    [[B1:%.*]] = extractelement <8 x double> [[B]], i32 1
-; SLM-NEXT:    [[B2:%.*]] = extractelement <8 x double> [[B]], i32 2
-; SLM-NEXT:    [[B3:%.*]] = extractelement <8 x double> [[B]], i32 3
-; SLM-NEXT:    [[B4:%.*]] = extractelement <8 x double> [[B]], i32 4
-; SLM-NEXT:    [[B5:%.*]] = extractelement <8 x double> [[B]], i32 5
-; SLM-NEXT:    [[B6:%.*]] = extractelement <8 x double> [[B]], i32 6
-; SLM-NEXT:    [[B7:%.*]] = extractelement <8 x double> [[B]], i32 7
-; SLM-NEXT:    [[C0:%.*]] = fdiv double [[A0]], [[B0]]
-; SLM-NEXT:    [[C1:%.*]] = fdiv double [[A1]], [[B1]]
-; SLM-NEXT:    [[C2:%.*]] = fdiv double [[A2]], [[B2]]
-; SLM-NEXT:    [[C3:%.*]] = fdiv double [[A3]], [[B3]]
-; SLM-NEXT:    [[C4:%.*]] = fdiv double [[A4]], [[B4]]
-; SLM-NEXT:    [[C5:%.*]] = fdiv double [[A5]], [[B5]]
-; SLM-NEXT:    [[C6:%.*]] = fdiv double [[A6]], [[B6]]
-; SLM-NEXT:    [[C7:%.*]] = fdiv double [[A7]], [[B7]]
-; SLM-NEXT:    [[R0:%.*]] = insertelement <8 x double> undef, double [[C0]], i32 0
-; SLM-NEXT:    [[R1:%.*]] = insertelement <8 x double> [[R0]], double [[C1]], i32 1
-; SLM-NEXT:    [[R2:%.*]] = insertelement <8 x double> [[R1]], double [[C2]], i32 2
-; SLM-NEXT:    [[R3:%.*]] = insertelement <8 x double> [[R2]], double [[C3]], i32 3
-; SLM-NEXT:    [[R4:%.*]] = insertelement <8 x double> [[R3]], double [[C4]], i32 4
-; SLM-NEXT:    [[R5:%.*]] = insertelement <8 x double> [[R4]], double [[C5]], i32 5
-; SLM-NEXT:    [[R6:%.*]] = insertelement <8 x double> [[R5]], double [[C6]], i32 6
-; SLM-NEXT:    [[R73:%.*]] = insertelement <8 x double> [[R6]], double [[C7]], i32 7
+; SLM-NEXT:    [[TMP1:%.*]] = shufflevector <8 x double> [[A:%.*]], <8 x double> poison, <2 x i32> <i32 0, i32 1>
+; SLM-NEXT:    [[TMP2:%.*]] = shufflevector <8 x double> [[B:%.*]], <8 x double> poison, <2 x i32> <i32 0, i32 1>
+; SLM-NEXT:    [[TMP3:%.*]] = fdiv <2 x double> [[TMP1]], [[TMP2]]
+; SLM-NEXT:    [[TMP4:%.*]] = shufflevector <8 x double> [[A]], <8 x double> poison, <2 x i32> <i32 2, i32 3>
+; SLM-NEXT:    [[TMP5:%.*]] = shufflevector <8 x double> [[B]], <8 x double> poison, <2 x i32> <i32 2, i32 3>
+; SLM-NEXT:    [[TMP6:%.*]] = fdiv <2 x double> [[TMP4]], [[TMP5]]
+; SLM-NEXT:    [[TMP7:%.*]] = shufflevector <8 x double> [[A]], <8 x double> poison, <2 x i32> <i32 4, i32 5>
+; SLM-NEXT:    [[TMP8:%.*]] = shufflevector <8 x double> [[B]], <8 x double> poison, <2 x i32> <i32 4, i32 5>
+; SLM-NEXT:    [[TMP9:%.*]] = fdiv <2 x double> [[TMP7]], [[TMP8]]
+; SLM-NEXT:    [[TMP10:%.*]] = shufflevector <8 x double> [[A]], <8 x double> poison, <2 x i32> <i32 6, i32 7>
+; SLM-NEXT:    [[TMP11:%.*]] = shufflevector <8 x double> [[B]], <8 x double> poison, <2 x i32> <i32 6, i32 7>
+; SLM-NEXT:    [[TMP12:%.*]] = fdiv <2 x double> [[TMP10]], [[TMP11]]
+; SLM-NEXT:    [[TMP13:%.*]] = shufflevector <2 x double> [[TMP3]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; SLM-NEXT:    [[TMP14:%.*]] = shufflevector <2 x double> [[TMP6]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; SLM-NEXT:    [[R31:%.*]] = shufflevector <8 x double> [[TMP13]], <8 x double> [[TMP14]], <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 4, i32 5, i32 6, i32 7>
+; SLM-NEXT:    [[TMP15:%.*]] = shufflevector <2 x double> [[TMP9]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; SLM-NEXT:    [[R52:%.*]] = shufflevector <8 x double> [[R31]], <8 x double> [[TMP15]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 6, i32 7>
+; SLM-NEXT:    [[TMP16:%.*]] = shufflevector <2 x double> [[TMP12]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; SLM-NEXT:    [[R73:%.*]] = shufflevector <8 x double> [[R52]], <8 x double> [[TMP16]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
 ; SLM-NEXT:    ret <8 x double> [[R73]]
 ;
 ; AVX-LABEL: @buildvector_div_8f64(
diff --git a/llvm/test/Transforms/SLPVectorizer/X86/deleted-instructions-clear.ll b/llvm/test/Transforms/SLPVectorizer/X86/deleted-instructions-clear.ll
index 1bb24c524bb3e..042f5cb3f512b 100644
--- a/llvm/test/Transforms/SLPVectorizer/X86/deleted-instructions-clear.ll
+++ b/llvm/test/Transforms/SLPVectorizer/X86/deleted-instructions-clear.ll
@@ -5,8 +5,7 @@ define void @test(i32 %arg, i32 %arg1, i64 %arg2) {
 ; CHECK-LABEL: define void @test(
 ; CHECK-SAME: i32 [[ARG:%.*]], i32 [[ARG1:%.*]], i64 [[ARG2:%.*]]) #[[ATTR0:[0-9]+]] {
 ; CHECK-NEXT:  [[BB:.*]]:
-; CHECK-NEXT:    [[TMP0:%.*]] = insertelement <4 x i32> <i32 0, i32 0, i32 poison, i32 poison>, i32 [[ARG]], i32 2
-; CHECK-NEXT:    [[TMP11:%.*]] = insertelement <4 x i32> [[TMP0]], i32 [[ARG1]], i32 3
+; CHECK-NEXT:    [[TMP11:%.*]] = insertelement <4 x i32> <i32 0, i32 0, i32 0, i32 poison>, i32 [[ARG]], i32 3
 ; CHECK-NEXT:    br label %[[BB3:.*]]
 ; CHECK:       [[BB3]]:
 ; CHECK-NEXT:    [[TMP3:%.*]] = phi i64 [ 0, %[[BB3]] ], [ 0, %[[BB]] ]
@@ -21,7 +20,6 @@ define void @test(i32 %arg, i32 %arg1, i64 %arg2) {
 ; CHECK-NEXT:    [[TMP4:%.*]] = extractelement <4 x i32> [[TMP5]], i32 0
 ; CHECK-NEXT:    [[TMP10:%.*]] = zext i32 [[TMP4]] to i64
 ; CHECK-NEXT:    [[TRUNC10:%.*]] = trunc i64 [[TMP10]] to i32
-; CHECK-NEXT:    [[SHL:%.*]] = shl i32 0, 1
 ; CHECK-NEXT:    [[TMP6:%.*]] = extractelement <4 x i32> [[TMP5]], i32 2
 ; CHECK-NEXT:    [[TMP7:%.*]] = zext i32 [[TMP6]] to i64
 ; CHECK-NEXT:    [[TRUNC27:%.*]] = trunc i64 [[TMP7]] to i32
@@ -30,14 +28,15 @@ define void @test(i32 %arg, i32 %arg1, i64 %arg2) {
 ; CHECK-NEXT:    [[TMP9:%.*]] = mul <4 x i32> [[TMP5]], [[TMP8]]
 ; CHECK-NEXT:    [[XOR38:%.*]] = xor i32 [[ARG]], [[TRUNC28]]
 ; CHECK-NEXT:    [[SHL35:%.*]] = shl i32 [[ARG1]], 0
-; CHECK-NEXT:    [[TMP12:%.*]] = insertelement <4 x i32> poison, i32 [[TRUNC10]], i32 0
-; CHECK-NEXT:    [[TMP13:%.*]] = insertelement <4 x i32> [[TMP12]], i32 [[SHL]], i32 1
-; CHECK-NEXT:    [[TMP24:%.*]] = insertelement <4 x i32> [[TMP13]], i32 [[TRUNC27]], i32 2
-; CHECK-NEXT:    [[TMP25:%.*]] = insertelement <4 x i32> [[TMP24]], i32 [[TRUNC19]], i32 3
+; CHECK-NEXT:    [[XOR31:%.*]] = xor i32 [[ARG1]], [[TRUNC19]]
+; CHECK-NEXT:    [[SHL:%.*]] = shl i32 0, 1
+; CHECK-NEXT:    [[TMP23:%.*]] = insertelement <4 x i32> poison, i32 [[SHL]], i32 0
+; CHECK-NEXT:    [[TMP12:%.*]] = insertelement <4 x i32> [[TMP23]], i32 [[TRUNC10]], i32 1
+; CHECK-NEXT:    [[TMP13:%.*]] = insertelement <4 x i32> [[TMP12]], i32 [[TRUNC27]], i32 3
+; CHECK-NEXT:    [[TMP25:%.*]] = shuffle...
[truncated]

cpullvm-upstream-sync Bot pushed a commit to navaneethshan/cpullvm-toolchain-1 that referenced this pull request May 18, 2026
…d reduce to buildvector-only

In calculateTreeCostAndTrimNonProfitable, the subtree trim loop returns
Invalid when trimming node Idx==1 under an InsertElement root would
leave only a buildvector, to avoid infinite vectorization attempts.
This is too aggressive when the original untrimmed tree is already
profitable (Cost < -SLPCostThreshold). In that case, undo any partial
trims and return the original cost instead of rejecting the tree.

Original Pull Request: llvm/llvm-project#197763

Recommit after unrelated revert in llvm/llvm-project#198265

Reviewers:

Pull Request: llvm/llvm-project#198336
llvm-sync Bot pushed a commit to arm/arm-toolchain that referenced this pull request May 18, 2026
…d reduce to buildvector-only

In calculateTreeCostAndTrimNonProfitable, the subtree trim loop returns
Invalid when trimming node Idx==1 under an InsertElement root would
leave only a buildvector, to avoid infinite vectorization attempts.
This is too aggressive when the original untrimmed tree is already
profitable (Cost < -SLPCostThreshold). In that case, undo any partial
trims and return the original cost instead of rejecting the tree.

Original Pull Request: llvm/llvm-project#197763

Recommit after unrelated revert in llvm/llvm-project#198265

Reviewers:

Pull Request: llvm/llvm-project#198336
llvm-upstreamsync Bot pushed a commit to qualcomm/cpullvm-toolchain that referenced this pull request May 18, 2026
…d reduce to buildvector-only

In calculateTreeCostAndTrimNonProfitable, the subtree trim loop returns
Invalid when trimming node Idx==1 under an InsertElement root would
leave only a buildvector, to avoid infinite vectorization attempts.
This is too aggressive when the original untrimmed tree is already
profitable (Cost < -SLPCostThreshold). In that case, undo any partial
trims and return the original cost instead of rejecting the tree.

Original Pull Request: llvm/llvm-project#197763

Recommit after unrelated revert in llvm/llvm-project#198265

Reviewers:

Pull Request: llvm/llvm-project#198336
pedroMVicente pushed a commit to pedroMVicente/llvm-project that referenced this pull request May 19, 2026
… buildvector-only

In calculateTreeCostAndTrimNonProfitable, the subtree trim loop returns
Invalid when trimming node Idx==1 under an InsertElement root would
leave only a buildvector, to avoid infinite vectorization attempts.
This is too aggressive when the original untrimmed tree is already
profitable (Cost < -SLPCostThreshold). In that case, undo any partial
trims and return the original cost instead of rejecting the tree.

Original Pull Request: llvm#197763

Recommit after unrelated revert in llvm#198265

Reviewers: 

Pull Request: llvm#198336
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant