Skip to content

[SLP] Preserve profitable trees when subtree trimming would reduce to buildvector-only#197763

Merged
alexey-bataev merged 2 commits into
mainfrom
users/alexey-bataev/spr/slp-preserve-profitable-trees-when-subtree-trimming-would-reduce-to-buildvector-only-1
May 15, 2026
Merged

[SLP] Preserve profitable trees when subtree trimming would reduce to buildvector-only#197763
alexey-bataev merged 2 commits into
mainfrom
users/alexey-bataev/spr/slp-preserve-profitable-trees-when-subtree-trimming-would-reduce-to-buildvector-only-1

Conversation

@alexey-bataev

Copy link
Copy Markdown
Member

In calculateTreeCostAndTrimNonProfitable, the subtree trim loop returns
Invalid when trimming node Idx==1 under an InsertElement root would
leave only a buildvector, to avoid infinite vectorization attempts.
This is too aggressive when the original untrimmed tree is already
profitable (Cost < -SLPCostThreshold). In that case, undo any partial
trims and return the original cost instead of rejecting the tree.

Created using spr 1.3.7
@llvmorg-github-actions

llvmorg-github-actions Bot commented May 14, 2026

Copy link
Copy Markdown

@llvm/pr-subscribers-vectorizers

@llvm/pr-subscribers-llvm-transforms

Author: Alexey Bataev (alexey-bataev)

Changes

In calculateTreeCostAndTrimNonProfitable, the subtree trim loop returns
Invalid when trimming node Idx==1 under an InsertElement root would
leave only a buildvector, to avoid infinite vectorization attempts.
This is too aggressive when the original untrimmed tree is already
profitable (Cost < -SLPCostThreshold). In that case, undo any partial
trims and return the original cost instead of rejecting the tree.


Patch is 35.27 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/197763.diff

10 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp (+12-2)
  • (modified) llvm/test/Transforms/SLPVectorizer/X86/alternate-int-inseltpoison.ll (+7-10)
  • (modified) llvm/test/Transforms/SLPVectorizer/X86/alternate-int.ll (+7-10)
  • (modified) llvm/test/Transforms/SLPVectorizer/X86/arith-fp-inseltpoison.ll (+19-32)
  • (modified) llvm/test/Transforms/SLPVectorizer/X86/arith-fp.ll (+19-32)
  • (modified) llvm/test/Transforms/SLPVectorizer/X86/deleted-instructions-clear.ll (+9-10)
  • (modified) llvm/test/Transforms/SLPVectorizer/X86/scalarize-ctlz.ll (+48-29)
  • (modified) llvm/test/Transforms/SLPVectorizer/X86/select-copyable-cmp-poison.ll (+9-6)
  • (modified) llvm/test/Transforms/SLPVectorizer/X86/trunced-buildvector-scalar-extended.ll (+1-5)
  • (modified) llvm/test/Transforms/SLPVectorizer/X86/vectorize-widest-phis.ll (+7-4)
diff --git a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
index 9592771917995..0f88605045abb 100644
--- a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+++ b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
@@ -18951,11 +18951,21 @@ InstructionCost BoUpSLP::calculateTreeCostAndTrimNonProfitable(
         IsEqualCostAltShuffleToTrim()) {
       PreferTrimmedTree |= TotalSubtreeCost == GatherCost;
       // If the remaining tree is just a buildvector - exit, it will cause
-      // endless attempts to vectorize.
+      // endless attempts to vectorize. When the tree is already profitable,
+      // skip trimming this node and let the post-loop logic (including
+      // gathered loads processing) decide.
       if (VectorizableTree.front()->hasState() &&
           VectorizableTree.front()->getOpcode() == Instruction::InsertElement &&
-          TE->Idx == 1)
+          TE->Idx == 1) {
+        if (Cost < -SLPCostThreshold) {
+          LLVM_DEBUG(dbgs() << "SLP: Skipping trim of node " << TE->Idx
+                            << " - tree already profitable with cost " << Cost
+                            << ".\n");
+          Worklist.pop();
+          continue;
+        }
         return InstructionCost::getInvalid();
+      }
 
       LLVM_DEBUG(dbgs() << "SLP: Trimming unprofitable subtree at node "
                         << TE->Idx << " with cost "
diff --git a/llvm/test/Transforms/SLPVectorizer/X86/alternate-int-inseltpoison.ll b/llvm/test/Transforms/SLPVectorizer/X86/alternate-int-inseltpoison.ll
index 9b45fe6a2804b..174be55bd7b0a 100644
--- a/llvm/test/Transforms/SLPVectorizer/X86/alternate-int-inseltpoison.ll
+++ b/llvm/test/Transforms/SLPVectorizer/X86/alternate-int-inseltpoison.ll
@@ -282,26 +282,23 @@ define <8 x i32> @ashr_shl_v8i32_const(<8 x i32> %a) {
 
 define <8 x i32> @ashr_lshr_shl_v8i32(<8 x i32> %a, <8 x i32> %b) {
 ; SSE-LABEL: @ashr_lshr_shl_v8i32(
-; SSE-NEXT:    [[A4:%.*]] = extractelement <8 x i32> [[A:%.*]], i32 4
-; SSE-NEXT:    [[A5:%.*]] = extractelement <8 x i32> [[A]], i32 5
-; SSE-NEXT:    [[A6:%.*]] = extractelement <8 x i32> [[A]], i32 6
+; SSE-NEXT:    [[A6:%.*]] = extractelement <8 x i32> [[A:%.*]], i32 6
 ; SSE-NEXT:    [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7
-; SSE-NEXT:    [[B4:%.*]] = extractelement <8 x i32> [[B:%.*]], i32 4
-; SSE-NEXT:    [[B5:%.*]] = extractelement <8 x i32> [[B]], i32 5
-; SSE-NEXT:    [[B6:%.*]] = extractelement <8 x i32> [[B]], i32 6
+; SSE-NEXT:    [[B6:%.*]] = extractelement <8 x i32> [[B:%.*]], i32 6
 ; SSE-NEXT:    [[B7:%.*]] = extractelement <8 x i32> [[B]], i32 7
 ; SSE-NEXT:    [[TMP1:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
 ; SSE-NEXT:    [[TMP2:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
 ; SSE-NEXT:    [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]
 ; SSE-NEXT:    [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]
 ; SSE-NEXT:    [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
-; SSE-NEXT:    [[AB4:%.*]] = lshr i32 [[A4]], [[B4]]
-; SSE-NEXT:    [[AB5:%.*]] = lshr i32 [[A5]], [[B5]]
+; SSE-NEXT:    [[TMP6:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> poison, <2 x i32> <i32 4, i32 5>
+; SSE-NEXT:    [[TMP7:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> poison, <2 x i32> <i32 4, i32 5>
+; SSE-NEXT:    [[TMP9:%.*]] = lshr <2 x i32> [[TMP6]], [[TMP7]]
 ; SSE-NEXT:    [[AB6:%.*]] = shl i32 [[A6]], [[B6]]
 ; SSE-NEXT:    [[AB7:%.*]] = shl i32 [[A7]], [[B7]]
 ; SSE-NEXT:    [[TMP8:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 poison, i32 poison, i32 poison, i32 poison>
-; SSE-NEXT:    [[R4:%.*]] = insertelement <8 x i32> [[TMP8]], i32 [[AB4]], i32 4
-; SSE-NEXT:    [[R51:%.*]] = insertelement <8 x i32> [[R4]], i32 [[AB5]], i32 5
+; SSE-NEXT:    [[TMP10:%.*]] = shufflevector <2 x i32> [[TMP9]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; SSE-NEXT:    [[R51:%.*]] = shufflevector <8 x i32> [[TMP8]], <8 x i32> [[TMP10]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 6, i32 7>
 ; SSE-NEXT:    [[R6:%.*]] = insertelement <8 x i32> [[R51]], i32 [[AB6]], i32 6
 ; SSE-NEXT:    [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7
 ; SSE-NEXT:    ret <8 x i32> [[R7]]
diff --git a/llvm/test/Transforms/SLPVectorizer/X86/alternate-int.ll b/llvm/test/Transforms/SLPVectorizer/X86/alternate-int.ll
index d812cc813c20f..8a5e278e2b03a 100644
--- a/llvm/test/Transforms/SLPVectorizer/X86/alternate-int.ll
+++ b/llvm/test/Transforms/SLPVectorizer/X86/alternate-int.ll
@@ -282,26 +282,23 @@ define <8 x i32> @ashr_shl_v8i32_const(<8 x i32> %a) {
 
 define <8 x i32> @ashr_lshr_shl_v8i32(<8 x i32> %a, <8 x i32> %b) {
 ; SSE-LABEL: @ashr_lshr_shl_v8i32(
-; SSE-NEXT:    [[A4:%.*]] = extractelement <8 x i32> [[A:%.*]], i32 4
-; SSE-NEXT:    [[A5:%.*]] = extractelement <8 x i32> [[A]], i32 5
-; SSE-NEXT:    [[A6:%.*]] = extractelement <8 x i32> [[A]], i32 6
+; SSE-NEXT:    [[A6:%.*]] = extractelement <8 x i32> [[A:%.*]], i32 6
 ; SSE-NEXT:    [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7
-; SSE-NEXT:    [[B4:%.*]] = extractelement <8 x i32> [[B:%.*]], i32 4
-; SSE-NEXT:    [[B5:%.*]] = extractelement <8 x i32> [[B]], i32 5
-; SSE-NEXT:    [[B6:%.*]] = extractelement <8 x i32> [[B]], i32 6
+; SSE-NEXT:    [[B6:%.*]] = extractelement <8 x i32> [[B:%.*]], i32 6
 ; SSE-NEXT:    [[B7:%.*]] = extractelement <8 x i32> [[B]], i32 7
 ; SSE-NEXT:    [[TMP1:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
 ; SSE-NEXT:    [[TMP2:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
 ; SSE-NEXT:    [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]
 ; SSE-NEXT:    [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]
 ; SSE-NEXT:    [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
-; SSE-NEXT:    [[AB4:%.*]] = lshr i32 [[A4]], [[B4]]
-; SSE-NEXT:    [[AB5:%.*]] = lshr i32 [[A5]], [[B5]]
+; SSE-NEXT:    [[TMP6:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> poison, <2 x i32> <i32 4, i32 5>
+; SSE-NEXT:    [[TMP7:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> poison, <2 x i32> <i32 4, i32 5>
+; SSE-NEXT:    [[TMP9:%.*]] = lshr <2 x i32> [[TMP6]], [[TMP7]]
 ; SSE-NEXT:    [[AB6:%.*]] = shl i32 [[A6]], [[B6]]
 ; SSE-NEXT:    [[AB7:%.*]] = shl i32 [[A7]], [[B7]]
 ; SSE-NEXT:    [[TMP8:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 poison, i32 poison, i32 poison, i32 poison>
-; SSE-NEXT:    [[R4:%.*]] = insertelement <8 x i32> [[TMP8]], i32 [[AB4]], i32 4
-; SSE-NEXT:    [[R51:%.*]] = insertelement <8 x i32> [[R4]], i32 [[AB5]], i32 5
+; SSE-NEXT:    [[TMP10:%.*]] = shufflevector <2 x i32> [[TMP9]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; SSE-NEXT:    [[R51:%.*]] = shufflevector <8 x i32> [[TMP8]], <8 x i32> [[TMP10]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 6, i32 7>
 ; SSE-NEXT:    [[R6:%.*]] = insertelement <8 x i32> [[R51]], i32 [[AB6]], i32 6
 ; SSE-NEXT:    [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7
 ; SSE-NEXT:    ret <8 x i32> [[R7]]
diff --git a/llvm/test/Transforms/SLPVectorizer/X86/arith-fp-inseltpoison.ll b/llvm/test/Transforms/SLPVectorizer/X86/arith-fp-inseltpoison.ll
index 57deca1d62516..7f7e77eadc987 100644
--- a/llvm/test/Transforms/SLPVectorizer/X86/arith-fp-inseltpoison.ll
+++ b/llvm/test/Transforms/SLPVectorizer/X86/arith-fp-inseltpoison.ll
@@ -607,38 +607,25 @@ define <8 x double> @buildvector_div_8f64(<8 x double> %a, <8 x double> %b) {
 ; SSE-NEXT:    ret <8 x double> [[TMP1]]
 ;
 ; SLM-LABEL: @buildvector_div_8f64(
-; SLM-NEXT:    [[A0:%.*]] = extractelement <8 x double> [[A:%.*]], i32 0
-; SLM-NEXT:    [[A1:%.*]] = extractelement <8 x double> [[A]], i32 1
-; SLM-NEXT:    [[A2:%.*]] = extractelement <8 x double> [[A]], i32 2
-; SLM-NEXT:    [[A3:%.*]] = extractelement <8 x double> [[A]], i32 3
-; SLM-NEXT:    [[A4:%.*]] = extractelement <8 x double> [[A]], i32 4
-; SLM-NEXT:    [[A5:%.*]] = extractelement <8 x double> [[A]], i32 5
-; SLM-NEXT:    [[A6:%.*]] = extractelement <8 x double> [[A]], i32 6
-; SLM-NEXT:    [[A7:%.*]] = extractelement <8 x double> [[A]], i32 7
-; SLM-NEXT:    [[B0:%.*]] = extractelement <8 x double> [[B:%.*]], i32 0
-; SLM-NEXT:    [[B1:%.*]] = extractelement <8 x double> [[B]], i32 1
-; SLM-NEXT:    [[B2:%.*]] = extractelement <8 x double> [[B]], i32 2
-; SLM-NEXT:    [[B3:%.*]] = extractelement <8 x double> [[B]], i32 3
-; SLM-NEXT:    [[B4:%.*]] = extractelement <8 x double> [[B]], i32 4
-; SLM-NEXT:    [[B5:%.*]] = extractelement <8 x double> [[B]], i32 5
-; SLM-NEXT:    [[B6:%.*]] = extractelement <8 x double> [[B]], i32 6
-; SLM-NEXT:    [[B7:%.*]] = extractelement <8 x double> [[B]], i32 7
-; SLM-NEXT:    [[C0:%.*]] = fdiv double [[A0]], [[B0]]
-; SLM-NEXT:    [[C1:%.*]] = fdiv double [[A1]], [[B1]]
-; SLM-NEXT:    [[C2:%.*]] = fdiv double [[A2]], [[B2]]
-; SLM-NEXT:    [[C3:%.*]] = fdiv double [[A3]], [[B3]]
-; SLM-NEXT:    [[C4:%.*]] = fdiv double [[A4]], [[B4]]
-; SLM-NEXT:    [[C5:%.*]] = fdiv double [[A5]], [[B5]]
-; SLM-NEXT:    [[C6:%.*]] = fdiv double [[A6]], [[B6]]
-; SLM-NEXT:    [[C7:%.*]] = fdiv double [[A7]], [[B7]]
-; SLM-NEXT:    [[R0:%.*]] = insertelement <8 x double> poison, double [[C0]], i32 0
-; SLM-NEXT:    [[R1:%.*]] = insertelement <8 x double> [[R0]], double [[C1]], i32 1
-; SLM-NEXT:    [[R2:%.*]] = insertelement <8 x double> [[R1]], double [[C2]], i32 2
-; SLM-NEXT:    [[R3:%.*]] = insertelement <8 x double> [[R2]], double [[C3]], i32 3
-; SLM-NEXT:    [[R4:%.*]] = insertelement <8 x double> [[R3]], double [[C4]], i32 4
-; SLM-NEXT:    [[R5:%.*]] = insertelement <8 x double> [[R4]], double [[C5]], i32 5
-; SLM-NEXT:    [[R6:%.*]] = insertelement <8 x double> [[R5]], double [[C6]], i32 6
-; SLM-NEXT:    [[R73:%.*]] = insertelement <8 x double> [[R6]], double [[C7]], i32 7
+; SLM-NEXT:    [[TMP1:%.*]] = shufflevector <8 x double> [[A:%.*]], <8 x double> poison, <2 x i32> <i32 0, i32 1>
+; SLM-NEXT:    [[TMP2:%.*]] = shufflevector <8 x double> [[B:%.*]], <8 x double> poison, <2 x i32> <i32 0, i32 1>
+; SLM-NEXT:    [[TMP3:%.*]] = fdiv <2 x double> [[TMP1]], [[TMP2]]
+; SLM-NEXT:    [[TMP4:%.*]] = shufflevector <8 x double> [[A]], <8 x double> poison, <2 x i32> <i32 2, i32 3>
+; SLM-NEXT:    [[TMP5:%.*]] = shufflevector <8 x double> [[B]], <8 x double> poison, <2 x i32> <i32 2, i32 3>
+; SLM-NEXT:    [[TMP6:%.*]] = fdiv <2 x double> [[TMP4]], [[TMP5]]
+; SLM-NEXT:    [[TMP7:%.*]] = shufflevector <8 x double> [[A]], <8 x double> poison, <2 x i32> <i32 4, i32 5>
+; SLM-NEXT:    [[TMP8:%.*]] = shufflevector <8 x double> [[B]], <8 x double> poison, <2 x i32> <i32 4, i32 5>
+; SLM-NEXT:    [[TMP9:%.*]] = fdiv <2 x double> [[TMP7]], [[TMP8]]
+; SLM-NEXT:    [[TMP10:%.*]] = shufflevector <8 x double> [[A]], <8 x double> poison, <2 x i32> <i32 6, i32 7>
+; SLM-NEXT:    [[TMP11:%.*]] = shufflevector <8 x double> [[B]], <8 x double> poison, <2 x i32> <i32 6, i32 7>
+; SLM-NEXT:    [[TMP12:%.*]] = fdiv <2 x double> [[TMP10]], [[TMP11]]
+; SLM-NEXT:    [[TMP13:%.*]] = shufflevector <2 x double> [[TMP3]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; SLM-NEXT:    [[TMP14:%.*]] = shufflevector <2 x double> [[TMP6]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; SLM-NEXT:    [[R31:%.*]] = shufflevector <8 x double> [[TMP13]], <8 x double> [[TMP14]], <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 4, i32 5, i32 6, i32 7>
+; SLM-NEXT:    [[TMP15:%.*]] = shufflevector <2 x double> [[TMP9]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; SLM-NEXT:    [[R52:%.*]] = shufflevector <8 x double> [[R31]], <8 x double> [[TMP15]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 6, i32 7>
+; SLM-NEXT:    [[TMP16:%.*]] = shufflevector <2 x double> [[TMP12]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; SLM-NEXT:    [[R73:%.*]] = shufflevector <8 x double> [[R52]], <8 x double> [[TMP16]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
 ; SLM-NEXT:    ret <8 x double> [[R73]]
 ;
 ; AVX-LABEL: @buildvector_div_8f64(
diff --git a/llvm/test/Transforms/SLPVectorizer/X86/arith-fp.ll b/llvm/test/Transforms/SLPVectorizer/X86/arith-fp.ll
index d1a5c3bb032e0..8b8bc71c2ceda 100644
--- a/llvm/test/Transforms/SLPVectorizer/X86/arith-fp.ll
+++ b/llvm/test/Transforms/SLPVectorizer/X86/arith-fp.ll
@@ -607,38 +607,25 @@ define <8 x double> @buildvector_div_8f64(<8 x double> %a, <8 x double> %b) {
 ; SSE-NEXT:    ret <8 x double> [[TMP1]]
 ;
 ; SLM-LABEL: @buildvector_div_8f64(
-; SLM-NEXT:    [[A0:%.*]] = extractelement <8 x double> [[A:%.*]], i32 0
-; SLM-NEXT:    [[A1:%.*]] = extractelement <8 x double> [[A]], i32 1
-; SLM-NEXT:    [[A2:%.*]] = extractelement <8 x double> [[A]], i32 2
-; SLM-NEXT:    [[A3:%.*]] = extractelement <8 x double> [[A]], i32 3
-; SLM-NEXT:    [[A4:%.*]] = extractelement <8 x double> [[A]], i32 4
-; SLM-NEXT:    [[A5:%.*]] = extractelement <8 x double> [[A]], i32 5
-; SLM-NEXT:    [[A6:%.*]] = extractelement <8 x double> [[A]], i32 6
-; SLM-NEXT:    [[A7:%.*]] = extractelement <8 x double> [[A]], i32 7
-; SLM-NEXT:    [[B0:%.*]] = extractelement <8 x double> [[B:%.*]], i32 0
-; SLM-NEXT:    [[B1:%.*]] = extractelement <8 x double> [[B]], i32 1
-; SLM-NEXT:    [[B2:%.*]] = extractelement <8 x double> [[B]], i32 2
-; SLM-NEXT:    [[B3:%.*]] = extractelement <8 x double> [[B]], i32 3
-; SLM-NEXT:    [[B4:%.*]] = extractelement <8 x double> [[B]], i32 4
-; SLM-NEXT:    [[B5:%.*]] = extractelement <8 x double> [[B]], i32 5
-; SLM-NEXT:    [[B6:%.*]] = extractelement <8 x double> [[B]], i32 6
-; SLM-NEXT:    [[B7:%.*]] = extractelement <8 x double> [[B]], i32 7
-; SLM-NEXT:    [[C0:%.*]] = fdiv double [[A0]], [[B0]]
-; SLM-NEXT:    [[C1:%.*]] = fdiv double [[A1]], [[B1]]
-; SLM-NEXT:    [[C2:%.*]] = fdiv double [[A2]], [[B2]]
-; SLM-NEXT:    [[C3:%.*]] = fdiv double [[A3]], [[B3]]
-; SLM-NEXT:    [[C4:%.*]] = fdiv double [[A4]], [[B4]]
-; SLM-NEXT:    [[C5:%.*]] = fdiv double [[A5]], [[B5]]
-; SLM-NEXT:    [[C6:%.*]] = fdiv double [[A6]], [[B6]]
-; SLM-NEXT:    [[C7:%.*]] = fdiv double [[A7]], [[B7]]
-; SLM-NEXT:    [[R0:%.*]] = insertelement <8 x double> undef, double [[C0]], i32 0
-; SLM-NEXT:    [[R1:%.*]] = insertelement <8 x double> [[R0]], double [[C1]], i32 1
-; SLM-NEXT:    [[R2:%.*]] = insertelement <8 x double> [[R1]], double [[C2]], i32 2
-; SLM-NEXT:    [[R3:%.*]] = insertelement <8 x double> [[R2]], double [[C3]], i32 3
-; SLM-NEXT:    [[R4:%.*]] = insertelement <8 x double> [[R3]], double [[C4]], i32 4
-; SLM-NEXT:    [[R5:%.*]] = insertelement <8 x double> [[R4]], double [[C5]], i32 5
-; SLM-NEXT:    [[R6:%.*]] = insertelement <8 x double> [[R5]], double [[C6]], i32 6
-; SLM-NEXT:    [[R73:%.*]] = insertelement <8 x double> [[R6]], double [[C7]], i32 7
+; SLM-NEXT:    [[TMP1:%.*]] = shufflevector <8 x double> [[A:%.*]], <8 x double> poison, <2 x i32> <i32 0, i32 1>
+; SLM-NEXT:    [[TMP2:%.*]] = shufflevector <8 x double> [[B:%.*]], <8 x double> poison, <2 x i32> <i32 0, i32 1>
+; SLM-NEXT:    [[TMP3:%.*]] = fdiv <2 x double> [[TMP1]], [[TMP2]]
+; SLM-NEXT:    [[TMP4:%.*]] = shufflevector <8 x double> [[A]], <8 x double> poison, <2 x i32> <i32 2, i32 3>
+; SLM-NEXT:    [[TMP5:%.*]] = shufflevector <8 x double> [[B]], <8 x double> poison, <2 x i32> <i32 2, i32 3>
+; SLM-NEXT:    [[TMP6:%.*]] = fdiv <2 x double> [[TMP4]], [[TMP5]]
+; SLM-NEXT:    [[TMP7:%.*]] = shufflevector <8 x double> [[A]], <8 x double> poison, <2 x i32> <i32 4, i32 5>
+; SLM-NEXT:    [[TMP8:%.*]] = shufflevector <8 x double> [[B]], <8 x double> poison, <2 x i32> <i32 4, i32 5>
+; SLM-NEXT:    [[TMP9:%.*]] = fdiv <2 x double> [[TMP7]], [[TMP8]]
+; SLM-NEXT:    [[TMP10:%.*]] = shufflevector <8 x double> [[A]], <8 x double> poison, <2 x i32> <i32 6, i32 7>
+; SLM-NEXT:    [[TMP11:%.*]] = shufflevector <8 x double> [[B]], <8 x double> poison, <2 x i32> <i32 6, i32 7>
+; SLM-NEXT:    [[TMP12:%.*]] = fdiv <2 x double> [[TMP10]], [[TMP11]]
+; SLM-NEXT:    [[TMP13:%.*]] = shufflevector <2 x double> [[TMP3]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; SLM-NEXT:    [[TMP14:%.*]] = shufflevector <2 x double> [[TMP6]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; SLM-NEXT:    [[R31:%.*]] = shufflevector <8 x double> [[TMP13]], <8 x double> [[TMP14]], <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 4, i32 5, i32 6, i32 7>
+; SLM-NEXT:    [[TMP15:%.*]] = shufflevector <2 x double> [[TMP9]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; SLM-NEXT:    [[R52:%.*]] = shufflevector <8 x double> [[R31]], <8 x double> [[TMP15]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 6, i32 7>
+; SLM-NEXT:    [[TMP16:%.*]] = shufflevector <2 x double> [[TMP12]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; SLM-NEXT:    [[R73:%.*]] = shufflevector <8 x double> [[R52]], <8 x double> [[TMP16]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
 ; SLM-NEXT:    ret <8 x double> [[R73]]
 ;
 ; AVX-LABEL: @buildvector_div_8f64(
diff --git a/llvm/test/Transforms/SLPVectorizer/X86/deleted-instructions-clear.ll b/llvm/test/Transforms/SLPVectorizer/X86/deleted-instructions-clear.ll
index 1bb24c524bb3e..042f5cb3f512b 100644
--- a/llvm/test/Transforms/SLPVectorizer/X86/deleted-instructions-clear.ll
+++ b/llvm/test/Transforms/SLPVectorizer/X86/deleted-instructions-clear.ll
@@ -5,8 +5,7 @@ define void @test(i32 %arg, i32 %arg1, i64 %arg2) {
 ; CHECK-LABEL: define void @test(
 ; CHECK-SAME: i32 [[ARG:%.*]], i32 [[ARG1:%.*]], i64 [[ARG2:%.*]]) #[[ATTR0:[0-9]+]] {
 ; CHECK-NEXT:  [[BB:.*]]:
-; CHECK-NEXT:    [[TMP0:%.*]] = insertelement <4 x i32> <i32 0, i32 0, i32 poison, i32 poison>, i32 [[ARG]], i32 2
-; CHECK-NEXT:    [[TMP11:%.*]] = insertelement <4 x i32> [[TMP0]], i32 [[ARG1]], i32 3
+; CHECK-NEXT:    [[TMP11:%.*]] = insertelement <4 x i32> <i32 0, i32 0, i32 0, i32 poison>, i32 [[ARG]], i32 3
 ; CHECK-NEXT:    br label %[[BB3:.*]]
 ; CHECK:       [[BB3]]:
 ; CHECK-NEXT:    [[TMP3:%.*]] = phi i64 [ 0, %[[BB3]] ], [ 0, %[[BB]] ]
@@ -21,7 +20,6 @@ define void @test(i32 %arg, i32 %arg1, i64 %arg2) {
 ; CHECK-NEXT:    [[TMP4:%.*]] = extractelement <4 x i32> [[TMP5]], i32 0
 ; CHECK-NEXT:    [[TMP10:%.*]] = zext i32 [[TMP4]] to i64
 ; CHECK-NEXT:    [[TRUNC10:%.*]] = trunc i64 [[TMP10]] to i32
-; CHECK-NEXT:    [[SHL:%.*]] = shl i32 0, 1
 ; CHECK-NEXT:    [[TMP6:%.*]] = extractelement <4 x i32> [[TMP5]], i32 2
 ; CHECK-NEXT:    [[TMP7:%.*]] = zext i32 [[TMP6]] to i64
 ; CHECK-NEXT:    [[TRUNC27:%.*]] = trunc i64 [[TMP7]] to i32
@@ -30,14 +28,15 @@ define void @test(i32 %arg, i32 %arg1, i64 %arg2) {
 ; CHECK-NEXT:    [[TMP9:%.*]] = mul <4 x i32> [[TMP5]], [[TMP8]]
 ; CHECK-NEXT:    [[XOR38:%.*]] = xor i32 [[ARG]], [[TRUNC28]]
 ; CHECK-NEXT:    [[SHL35:%.*]] = shl i32 [[ARG1]], 0
-; CHECK-NEXT:    [[TMP12:%.*]] = insertelement <4 x i32> poison, i32 [[TRUNC10]], i32 0
-; CHECK-NEXT:    [[TMP13:%.*]] = insertelement <4 x i32> [[TMP12]], i32 [[SHL]], i32 1
-; CHECK-NEXT:    [[TMP24:%.*]] = insertelement <4 x i32> [[TMP13]], i32 [[TRUNC27]], i32 2
-; CHECK-NEXT:    [[TMP25:%.*]] = insertelement <4 x i32> [[TMP24]], i32 [[TRUNC19]], i32 3
+; CHECK-NEXT:    [[XOR31:%.*]] = xor i32 [[ARG1]], [[TRUNC19]]
+; CHECK-NEXT:    [[SHL:%.*]] = shl i32 0, 1
+; CHECK-NEXT:    [[TMP23:%.*]] = insertelement <4 x i32> poison, i32 [[SHL]], i32 0
+; CHECK-NEXT:    [[TMP12:%.*]] = insertelement <4 x i32> [[TMP23]], i32 [[TRUNC10]], i32 1
+; CHECK-NEXT:    [[TMP13:%.*]] = insertelement <4 x i32> [[TMP12]], i32 [[TRUNC27]], i32 3
+; CHECK-NEXT:    [[TMP25:%.*]] = shuffle...
[truncated]

@github-actions

Copy link
Copy Markdown

⚠️ undef deprecator found issues in your code. ⚠️

You can test this locally with the following command:
git diff -U0 --pickaxe-regex -S '([^a-zA-Z0-9#_-]undef([^a-zA-Z0-9_-]|$)|UndefValue::get)' 'HEAD~1' HEAD llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp llvm/test/Transforms/SLPVectorizer/X86/alternate-int-inseltpoison.ll llvm/test/Transforms/SLPVectorizer/X86/alternate-int.ll llvm/test/Transforms/SLPVectorizer/X86/arith-fp-inseltpoison.ll llvm/test/Transforms/SLPVectorizer/X86/arith-fp.ll llvm/test/Transforms/SLPVectorizer/X86/deleted-instructions-clear.ll llvm/test/Transforms/SLPVectorizer/X86/scalarize-ctlz.ll llvm/test/Transforms/SLPVectorizer/X86/select-copyable-cmp-poison.ll llvm/test/Transforms/SLPVectorizer/X86/trunced-buildvector-scalar-extended.ll llvm/test/Transforms/SLPVectorizer/X86/vectorize-widest-phis.ll

The following files introduce new uses of undef:

  • llvm/test/Transforms/SLPVectorizer/X86/trunced-buildvector-scalar-extended.ll

Undef is now deprecated and should only be used in the rare cases where no replacement is possible. For example, a load of uninitialized memory yields undef. You should use poison values for placeholders instead.

In tests, avoid using undef and having tests that trigger undefined behavior. If you need an operand with some unimportant value, you can add a new argument to the function and use that instead.

For example, this is considered a bad practice:

define void @fn() {
  ...
  br i1 undef, ...
}

Please use the following instead:

define void @fn(i1 %cond) {
  ...
  br i1 %cond, ...
}

Please refer to the Undefined Behavior Manual for more information.

@bababuck bababuck left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Created using spr 1.3.7
@alexey-bataev alexey-bataev merged commit f0adfab into main May 15, 2026
6 of 10 checks passed
@alexey-bataev alexey-bataev deleted the users/alexey-bataev/spr/slp-preserve-profitable-trees-when-subtree-trimming-would-reduce-to-buildvector-only-1 branch May 15, 2026 20:01
llvm-sync Bot pushed a commit to arm/arm-toolchain that referenced this pull request May 15, 2026
…d reduce to buildvector-only

In calculateTreeCostAndTrimNonProfitable, the subtree trim loop returns
Invalid when trimming node Idx==1 under an InsertElement root would
leave only a buildvector, to avoid infinite vectorization attempts.
This is too aggressive when the original untrimmed tree is already
profitable (Cost < -SLPCostThreshold). In that case, undo any partial
trims and return the original cost instead of rejecting the tree.

Reviewers: RKSimon, hiraditya, bababuck

Pull Request: llvm/llvm-project#197763
llvm-upstreamsync Bot pushed a commit to qualcomm/cpullvm-toolchain that referenced this pull request May 15, 2026
…d reduce to buildvector-only

In calculateTreeCostAndTrimNonProfitable, the subtree trim loop returns
Invalid when trimming node Idx==1 under an InsertElement root would
leave only a buildvector, to avoid infinite vectorization attempts.
This is too aggressive when the original untrimmed tree is already
profitable (Cost < -SLPCostThreshold). In that case, undo any partial
trims and return the original cost instead of rejecting the tree.

Reviewers: RKSimon, hiraditya, bababuck

Pull Request: llvm/llvm-project#197763
cpullvm-upstream-sync Bot pushed a commit to navaneethshan/cpullvm-toolchain-1 that referenced this pull request May 15, 2026
…d reduce to buildvector-only

In calculateTreeCostAndTrimNonProfitable, the subtree trim loop returns
Invalid when trimming node Idx==1 under an InsertElement root would
leave only a buildvector, to avoid infinite vectorization attempts.
This is too aggressive when the original untrimmed tree is already
profitable (Cost < -SLPCostThreshold). In that case, undo any partial
trims and return the original cost instead of rejecting the tree.

Reviewers: RKSimon, hiraditya, bababuck

Pull Request: llvm/llvm-project#197763
alexey-bataev added a commit that referenced this pull request May 18, 2026
… buildvector-only

In calculateTreeCostAndTrimNonProfitable, the subtree trim loop returns
Invalid when trimming node Idx==1 under an InsertElement root would
leave only a buildvector, to avoid infinite vectorization attempts.
This is too aggressive when the original untrimmed tree is already
profitable (Cost < -SLPCostThreshold). In that case, undo any partial
trims and return the original cost instead of rejecting the tree.

Original Pull Request: #197763

Recommit after unrelated revert in #198265

Reviewers: 

Pull Request: #198336
cpullvm-upstream-sync Bot pushed a commit to navaneethshan/cpullvm-toolchain-1 that referenced this pull request May 18, 2026
…d reduce to buildvector-only

In calculateTreeCostAndTrimNonProfitable, the subtree trim loop returns
Invalid when trimming node Idx==1 under an InsertElement root would
leave only a buildvector, to avoid infinite vectorization attempts.
This is too aggressive when the original untrimmed tree is already
profitable (Cost < -SLPCostThreshold). In that case, undo any partial
trims and return the original cost instead of rejecting the tree.

Original Pull Request: llvm/llvm-project#197763

Recommit after unrelated revert in llvm/llvm-project#198265

Reviewers:

Pull Request: llvm/llvm-project#198336
llvm-sync Bot pushed a commit to arm/arm-toolchain that referenced this pull request May 18, 2026
…d reduce to buildvector-only

In calculateTreeCostAndTrimNonProfitable, the subtree trim loop returns
Invalid when trimming node Idx==1 under an InsertElement root would
leave only a buildvector, to avoid infinite vectorization attempts.
This is too aggressive when the original untrimmed tree is already
profitable (Cost < -SLPCostThreshold). In that case, undo any partial
trims and return the original cost instead of rejecting the tree.

Original Pull Request: llvm/llvm-project#197763

Recommit after unrelated revert in llvm/llvm-project#198265

Reviewers:

Pull Request: llvm/llvm-project#198336
llvm-upstreamsync Bot pushed a commit to qualcomm/cpullvm-toolchain that referenced this pull request May 18, 2026
…d reduce to buildvector-only

In calculateTreeCostAndTrimNonProfitable, the subtree trim loop returns
Invalid when trimming node Idx==1 under an InsertElement root would
leave only a buildvector, to avoid infinite vectorization attempts.
This is too aggressive when the original untrimmed tree is already
profitable (Cost < -SLPCostThreshold). In that case, undo any partial
trims and return the original cost instead of rejecting the tree.

Original Pull Request: llvm/llvm-project#197763

Recommit after unrelated revert in llvm/llvm-project#198265

Reviewers:

Pull Request: llvm/llvm-project#198336
YuriPlyakhin pushed a commit to YuriPlyakhin/llvm-project that referenced this pull request May 18, 2026
… buildvector-only

In calculateTreeCostAndTrimNonProfitable, the subtree trim loop returns
Invalid when trimming node Idx==1 under an InsertElement root would
leave only a buildvector, to avoid infinite vectorization attempts.
This is too aggressive when the original untrimmed tree is already
profitable (Cost < -SLPCostThreshold). In that case, undo any partial
trims and return the original cost instead of rejecting the tree.

Reviewers: RKSimon, hiraditya, bababuck

Pull Request: llvm#197763
pedroMVicente pushed a commit to pedroMVicente/llvm-project that referenced this pull request May 19, 2026
… buildvector-only

In calculateTreeCostAndTrimNonProfitable, the subtree trim loop returns
Invalid when trimming node Idx==1 under an InsertElement root would
leave only a buildvector, to avoid infinite vectorization attempts.
This is too aggressive when the original untrimmed tree is already
profitable (Cost < -SLPCostThreshold). In that case, undo any partial
trims and return the original cost instead of rejecting the tree.

Reviewers: RKSimon, hiraditya, bababuck

Pull Request: llvm#197763
pedroMVicente pushed a commit to pedroMVicente/llvm-project that referenced this pull request May 19, 2026
… buildvector-only

In calculateTreeCostAndTrimNonProfitable, the subtree trim loop returns
Invalid when trimming node Idx==1 under an InsertElement root would
leave only a buildvector, to avoid infinite vectorization attempts.
This is too aggressive when the original untrimmed tree is already
profitable (Cost < -SLPCostThreshold). In that case, undo any partial
trims and return the original cost instead of rejecting the tree.

Original Pull Request: llvm#197763

Recommit after unrelated revert in llvm#198265

Reviewers: 

Pull Request: llvm#198336
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants