Skip to content

Move ExpandMemCmp and MergeIcmp to the middle end #77370

Merged
nikic merged 2 commits into
llvm:mainfrom
gbaraldi:gb/middle-end-memcmp
Apr 2, 2026
Merged

Move ExpandMemCmp and MergeIcmp to the middle end #77370
nikic merged 2 commits into
llvm:mainfrom
gbaraldi:gb/middle-end-memcmp

Conversation

@gbaraldi

@gbaraldi gbaraldi commented Jan 8, 2024

Copy link
Copy Markdown
Contributor

Moving these into the middle-end pipeline will allow for additional optimization of the expansion result, such as CSE of redundant loads (c.f. https://godbolt.org/z/bEna4Md9r). For now, we conservatively place the passes at the end of the middle-end pipeline, so we mostly don't benefit from additional optimizations yet. The pipeline position will be moved in a future change.

This builds on work done by legrosbuffle in https://reviews.llvm.org/D60318.

@llvmbot

llvmbot commented Jan 8, 2024

Copy link
Copy Markdown
Member

@llvm/pr-subscribers-backend-spir-v
@llvm/pr-subscribers-backend-webassembly
@llvm/pr-subscribers-backend-powerpc
@llvm/pr-subscribers-backend-risc-v
@llvm/pr-subscribers-backend-aarch64
@llvm/pr-subscribers-backend-arm

@llvm/pr-subscribers-llvm-transforms

Author: Gabriel Baraldi (gbaraldi)

Changes

This should allow for optimizations like saving repeated loads between memcmp calls. c.f https://godbolt.org/z/bEna4Md9r, the window on the right shows the generated output of this branch.

One question is where to put this in the pipeline. The inline expansions probably benefit the most from the passes we run early, but we might want to wait a bit so that we can try and prove a constant argument to the memcmp (though I imagine if they are going to be constant, it's directly from the source).

One other question is that some of the x86 tests are massive, I just ported over from the llc test, but they grew quite a bit more.


Patch is 4.02 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/77370.diff

83 Files Affected:

  • (modified) llvm/include/llvm/CodeGen/CodeGenPassBuilder.h (-10)
  • (modified) llvm/include/llvm/CodeGen/MachinePassRegistry.def (-2)
  • (modified) llvm/include/llvm/CodeGen/Passes.h (-2)
  • (modified) llvm/include/llvm/InitializePasses.h (-1)
  • (modified) llvm/include/llvm/LinkAllPasses.h (-1)
  • (renamed) llvm/include/llvm/Transforms/Scalar/ExpandMemCmp.h (+3-3)
  • (modified) llvm/lib/CodeGen/CMakeLists.txt (-1)
  • (modified) llvm/lib/CodeGen/CodeGen.cpp (-1)
  • (modified) llvm/lib/CodeGen/TargetPassConfig.cpp (-11)
  • (modified) llvm/lib/Passes/PassBuilder.cpp (+1-1)
  • (modified) llvm/lib/Passes/PassBuilderPipelines.cpp (+6)
  • (modified) llvm/lib/Passes/PassRegistry.def (+2-1)
  • (modified) llvm/lib/Transforms/Scalar/CMakeLists.txt (+1)
  • (renamed) llvm/lib/Transforms/Scalar/ExpandMemCmp.cpp (+37-96)
  • (modified) llvm/test/CodeGen/AArch64/O3-pipeline.ll (-7)
  • (removed) llvm/test/CodeGen/AArch64/bcmp-inline-small.ll (-98)
  • (removed) llvm/test/CodeGen/AArch64/bcmp.ll (-537)
  • (modified) llvm/test/CodeGen/AArch64/dag-combine-setcc.ll (+25-6)
  • (modified) llvm/test/CodeGen/AArch64/machine-licm-hoist-load.ll (+84-44)
  • (removed) llvm/test/CodeGen/AArch64/memcmp.ll (-3029)
  • (modified) llvm/test/CodeGen/AMDGPU/llc-pipeline.ll (-28)
  • (modified) llvm/test/CodeGen/ARM/O3-pipeline.ll (-7)
  • (removed) llvm/test/CodeGen/BPF/memcmp.ll (-77)
  • (modified) llvm/test/CodeGen/Generic/llc-start-stop.ll (+3-3)
  • (modified) llvm/test/CodeGen/LoongArch/opt-pipeline.ll (+1-8)
  • (modified) llvm/test/CodeGen/M68k/pipeline.ll (-7)
  • (modified) llvm/test/CodeGen/PowerPC/O3-pipeline.ll (+1-8)
  • (removed) llvm/test/CodeGen/PowerPC/memCmpUsedInZeroEqualityComparison.ll (-168)
  • (removed) llvm/test/CodeGen/PowerPC/memcmp-mergeexpand.ll (-39)
  • (removed) llvm/test/CodeGen/PowerPC/memcmp.ll (-62)
  • (removed) llvm/test/CodeGen/PowerPC/memcmpIR.ll (-178)
  • (modified) llvm/test/CodeGen/RISCV/O3-pipeline.ll (+1-8)
  • (removed) llvm/test/CodeGen/X86/memcmp-mergeexpand.ll (-49)
  • (removed) llvm/test/CodeGen/X86/memcmp-minsize-x32.ll (-445)
  • (removed) llvm/test/CodeGen/X86/memcmp-minsize.ll (-433)
  • (removed) llvm/test/CodeGen/X86/memcmp-more-load-pairs-x32.ll (-2911)
  • (removed) llvm/test/CodeGen/X86/memcmp-more-load-pairs.ll (-4006)
  • (removed) llvm/test/CodeGen/X86/memcmp-optsize-x32.ll (-583)
  • (removed) llvm/test/CodeGen/X86/memcmp-optsize.ll (-596)
  • (removed) llvm/test/CodeGen/X86/memcmp-pgso-x32.ll (-600)
  • (removed) llvm/test/CodeGen/X86/memcmp-pgso.ll (-613)
  • (removed) llvm/test/CodeGen/X86/memcmp-x32.ll (-2429)
  • (removed) llvm/test/CodeGen/X86/memcmp.ll (-3065)
  • (modified) llvm/test/CodeGen/X86/opt-pipeline.ll (+1-8)
  • (modified) llvm/test/Other/new-pm-defaults.ll (+3-1)
  • (modified) llvm/test/Other/new-pm-thinlto-postlink-defaults.ll (+3-1)
  • (modified) llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll (+3-1)
  • (modified) llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll (+3-1)
  • (modified) llvm/test/Other/new-pm-thinlto-prelink-defaults.ll (+3-1)
  • (modified) llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll (+14-12)
  • (modified) llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll (+3-1)
  • (added) llvm/test/Transforms/ExpandMemCmp/AArch64/bcmp.ll (+751)
  • (added) llvm/test/Transforms/ExpandMemCmp/AArch64/memcmp-extra.ll (+3434)
  • (modified) llvm/test/Transforms/ExpandMemCmp/AArch64/memcmp.ll (-1)
  • (added) llvm/test/Transforms/ExpandMemCmp/BPF/lit.local.cfg (+4)
  • (added) llvm/test/Transforms/ExpandMemCmp/BPF/memcmp.ll (+119)
  • (added) llvm/test/Transforms/ExpandMemCmp/PowerPC/lit.local.cfg (+2)
  • (added) llvm/test/Transforms/ExpandMemCmp/PowerPC/memCmpUsedInZeroEqualityComparison.ll (+218)
  • (added) llvm/test/Transforms/ExpandMemCmp/PowerPC/memcmp-mergeexpand.ll (+48)
  • (added) llvm/test/Transforms/ExpandMemCmp/PowerPC/memcmp.ll (+70)
  • (added) llvm/test/Transforms/ExpandMemCmp/PowerPC/memcmpIR.ll (+216)
  • (modified) llvm/test/Transforms/ExpandMemCmp/X86/bcmp.ll (+8-8)
  • (added) llvm/test/Transforms/ExpandMemCmp/X86/memcmp-2.ll (+20249)
  • (renamed) llvm/test/Transforms/ExpandMemCmp/X86/memcmp-constant.ll (+47-42)
  • (added) llvm/test/Transforms/ExpandMemCmp/X86/memcmp-minsize-x32.ll (+493)
  • (added) llvm/test/Transforms/ExpandMemCmp/X86/memcmp-minsize.ll (+707)
  • (added) llvm/test/Transforms/ExpandMemCmp/X86/memcmp-more-load-pairs-x32.ll (+6203)
  • (added) llvm/test/Transforms/ExpandMemCmp/X86/memcmp-more-load-pairs.ll (+18833)
  • (added) llvm/test/Transforms/ExpandMemCmp/X86/memcmp-nobuiltin.ll (+248)
  • (added) llvm/test/Transforms/ExpandMemCmp/X86/memcmp-optsize-x32.ll (+870)
  • (added) llvm/test/Transforms/ExpandMemCmp/X86/memcmp-optsize.ll (+1414)
  • (added) llvm/test/Transforms/ExpandMemCmp/X86/memcmp-pgso-x32.ll (+887)
  • (added) llvm/test/Transforms/ExpandMemCmp/X86/memcmp-pgso.ll (+1347)
  • (added) llvm/test/Transforms/ExpandMemCmp/X86/memcmp-x32-2.ll (+4813)
  • (modified) llvm/test/Transforms/ExpandMemCmp/X86/memcmp-x32.ll (+277-246)
  • (modified) llvm/test/Transforms/ExpandMemCmp/X86/memcmp.ll (+618-576)
  • (added) llvm/test/Transforms/PhaseOrdering/PowerPC/lit.local.cfg (+2)
  • (added) llvm/test/Transforms/PhaseOrdering/X86/memcmp-early.ll (+86)
  • (added) llvm/test/Transforms/PhaseOrdering/X86/memcmp-mergeexpand.ll (+62)
  • (added) llvm/test/Transforms/PhaseOrdering/X86/memcmp.ll (+856)
  • (modified) llvm/tools/opt/opt.cpp (-1)
  • (modified) llvm/utils/gn/secondary/llvm/lib/CodeGen/BUILD.gn (-1)
  • (modified) llvm/utils/gn/secondary/llvm/lib/Transforms/Scalar/BUILD.gn (+1)
diff --git a/llvm/include/llvm/CodeGen/CodeGenPassBuilder.h b/llvm/include/llvm/CodeGen/CodeGenPassBuilder.h
index a7cbb0910baabf..556304231b397b 100644
--- a/llvm/include/llvm/CodeGen/CodeGenPassBuilder.h
+++ b/llvm/include/llvm/CodeGen/CodeGenPassBuilder.h
@@ -25,7 +25,6 @@
 #include "llvm/Analysis/TypeBasedAliasAnalysis.h"
 #include "llvm/CodeGen/CallBrPrepare.h"
 #include "llvm/CodeGen/DwarfEHPrepare.h"
-#include "llvm/CodeGen/ExpandMemCmp.h"
 #include "llvm/CodeGen/ExpandReductions.h"
 #include "llvm/CodeGen/GCMetadata.h"
 #include "llvm/CodeGen/IndirectBrExpand.h"
@@ -629,15 +628,6 @@ void CodeGenPassBuilder<Derived>::addIRPasses(AddIRPass &addPass) const {
       addPass(PrintFunctionPass(dbgs(), "\n\n*** Code after LSR ***\n"));
   }
 
-  if (getOptLevel() != CodeGenOptLevel::None) {
-    // The MergeICmpsPass tries to create memcmp calls by grouping sequences of
-    // loads and compares. ExpandMemCmpPass then tries to expand those calls
-    // into optimally-sized loads and compares. The transforms are enabled by a
-    // target lowering hook.
-    if (!Opt.DisableMergeICmps)
-      addPass(MergeICmpsPass());
-    addPass(ExpandMemCmpPass(&TM));
-  }
 
   // Run GC lowering passes for builtin collectors
   // TODO: add a pass insertion point here
diff --git a/llvm/include/llvm/CodeGen/MachinePassRegistry.def b/llvm/include/llvm/CodeGen/MachinePassRegistry.def
index f950dfae7e338b..3c00668aae3897 100644
--- a/llvm/include/llvm/CodeGen/MachinePassRegistry.def
+++ b/llvm/include/llvm/CodeGen/MachinePassRegistry.def
@@ -47,7 +47,6 @@ FUNCTION_PASS("dwarf-eh-prepare", DwarfEHPreparePass, (TM))
 FUNCTION_PASS("ee-instrument", EntryExitInstrumenterPass, (false))
 FUNCTION_PASS("expand-large-div-rem", ExpandLargeDivRemPass, (TM))
 FUNCTION_PASS("expand-large-fp-convert", ExpandLargeFpConvertPass, (TM))
-FUNCTION_PASS("expand-memcmp", ExpandMemCmpPass, (TM))
 FUNCTION_PASS("expand-reductions", ExpandReductionsPass, ())
 FUNCTION_PASS("expandvp", ExpandVectorPredicationPass, ())
 FUNCTION_PASS("indirectbr-expand", IndirectBrExpandPass, (TM))
@@ -55,7 +54,6 @@ FUNCTION_PASS("interleaved-access", InterleavedAccessPass, (TM))
 FUNCTION_PASS("interleaved-load-combine", InterleavedLoadCombinePass, (TM))
 FUNCTION_PASS("lower-constant-intrinsics", LowerConstantIntrinsicsPass, ())
 FUNCTION_PASS("lowerinvoke", LowerInvokePass, ())
-FUNCTION_PASS("mergeicmps", MergeICmpsPass, ())
 FUNCTION_PASS("partially-inline-libcalls", PartiallyInlineLibCallsPass, ())
 FUNCTION_PASS("post-inline-ee-instrument", EntryExitInstrumenterPass, (true))
 FUNCTION_PASS("replace-with-veclib", ReplaceWithVeclib, ())
diff --git a/llvm/include/llvm/CodeGen/Passes.h b/llvm/include/llvm/CodeGen/Passes.h
index ca9fbb1def7624..e5ed5f15f62ed7 100644
--- a/llvm/include/llvm/CodeGen/Passes.h
+++ b/llvm/include/llvm/CodeGen/Passes.h
@@ -519,8 +519,6 @@ namespace llvm {
   // Expands large div/rem instructions.
   FunctionPass *createExpandLargeFpConvertPass();
 
-  // This pass expands memcmp() to load/stores.
-  FunctionPass *createExpandMemCmpLegacyPass();
 
   /// Creates Break False Dependencies pass. \see BreakFalseDeps.cpp
   FunctionPass *createBreakFalseDeps();
diff --git a/llvm/include/llvm/InitializePasses.h b/llvm/include/llvm/InitializePasses.h
index 46b1e95c3c15f3..b0ca9fa942cda3 100644
--- a/llvm/include/llvm/InitializePasses.h
+++ b/llvm/include/llvm/InitializePasses.h
@@ -103,7 +103,6 @@ void initializeEdgeBundlesPass(PassRegistry&);
 void initializeEHContGuardCatchretPass(PassRegistry &);
 void initializeExpandLargeFpConvertLegacyPassPass(PassRegistry&);
 void initializeExpandLargeDivRemLegacyPassPass(PassRegistry&);
-void initializeExpandMemCmpLegacyPassPass(PassRegistry &);
 void initializeExpandPostRAPass(PassRegistry&);
 void initializeExpandReductionsPass(PassRegistry&);
 void initializeExpandVectorPredicationPass(PassRegistry &);
diff --git a/llvm/include/llvm/LinkAllPasses.h b/llvm/include/llvm/LinkAllPasses.h
index 7a21876e565a7c..9aff428fbe938b 100644
--- a/llvm/include/llvm/LinkAllPasses.h
+++ b/llvm/include/llvm/LinkAllPasses.h
@@ -119,7 +119,6 @@ namespace {
       (void) llvm::createPostDomTree();
       (void) llvm::createMergeICmpsLegacyPass();
       (void) llvm::createExpandLargeDivRemPass();
-      (void)llvm::createExpandMemCmpLegacyPass();
       (void) llvm::createExpandVectorPredicationPass();
       std::string buf;
       llvm::raw_string_ostream os(buf);
diff --git a/llvm/include/llvm/CodeGen/ExpandMemCmp.h b/llvm/include/llvm/Transforms/Scalar/ExpandMemCmp.h
similarity index 83%
rename from llvm/include/llvm/CodeGen/ExpandMemCmp.h
rename to llvm/include/llvm/Transforms/Scalar/ExpandMemCmp.h
index 94a877854f327a..94ba0cf9305040 100644
--- a/llvm/include/llvm/CodeGen/ExpandMemCmp.h
+++ b/llvm/include/llvm/Transforms/Scalar/ExpandMemCmp.h
@@ -6,8 +6,8 @@
 //
 //===----------------------------------------------------------------------===//
 
-#ifndef LLVM_CODEGEN_EXPANDMEMCMP_H
-#define LLVM_CODEGEN_EXPANDMEMCMP_H
+#ifndef LLVM_TRANSFORMS_SCALAR_EXPANDMEMCMP_H
+#define LLVM_TRANSFORMS_SCALAR_EXPANDMEMCMP_H
 
 #include "llvm/IR/PassManager.h"
 
@@ -26,4 +26,4 @@ class ExpandMemCmpPass : public PassInfoMixin<ExpandMemCmpPass> {
 
 } // namespace llvm
 
-#endif // LLVM_CODEGEN_EXPANDMEMCMP_H
+#endif // LLVM_TRANSFORMS_SCALAR_EXPANDMEMCMP_H
diff --git a/llvm/lib/CodeGen/CMakeLists.txt b/llvm/lib/CodeGen/CMakeLists.txt
index df2d1831ee5fdb..518432e9a7b32f 100644
--- a/llvm/lib/CodeGen/CMakeLists.txt
+++ b/llvm/lib/CodeGen/CMakeLists.txt
@@ -71,7 +71,6 @@ add_llvm_component_library(LLVMCodeGen
   ExecutionDomainFix.cpp
   ExpandLargeDivRem.cpp
   ExpandLargeFpConvert.cpp
-  ExpandMemCmp.cpp
   ExpandPostRAPseudos.cpp
   ExpandReductions.cpp
   ExpandVectorPredication.cpp
diff --git a/llvm/lib/CodeGen/CodeGen.cpp b/llvm/lib/CodeGen/CodeGen.cpp
index 7b73a7b11ddf1c..043fa4e6eabe8f 100644
--- a/llvm/lib/CodeGen/CodeGen.cpp
+++ b/llvm/lib/CodeGen/CodeGen.cpp
@@ -41,7 +41,6 @@ void llvm::initializeCodeGen(PassRegistry &Registry) {
   initializeEarlyTailDuplicatePass(Registry);
   initializeExpandLargeDivRemLegacyPassPass(Registry);
   initializeExpandLargeFpConvertLegacyPassPass(Registry);
-  initializeExpandMemCmpLegacyPassPass(Registry);
   initializeExpandPostRAPass(Registry);
   initializeFEntryInserterPass(Registry);
   initializeFinalizeISelPass(Registry);
diff --git a/llvm/lib/CodeGen/TargetPassConfig.cpp b/llvm/lib/CodeGen/TargetPassConfig.cpp
index 4003a08a5422dd..33562e90e94426 100644
--- a/llvm/lib/CodeGen/TargetPassConfig.cpp
+++ b/llvm/lib/CodeGen/TargetPassConfig.cpp
@@ -108,9 +108,6 @@ static cl::opt<bool> EnableImplicitNullChecks(
     "enable-implicit-null-checks",
     cl::desc("Fold null checks into faulting memory operations"),
     cl::init(false), cl::Hidden);
-static cl::opt<bool> DisableMergeICmps("disable-mergeicmps",
-    cl::desc("Disable MergeICmps Pass"),
-    cl::init(false), cl::Hidden);
 static cl::opt<bool> PrintLSR("print-lsr-output", cl::Hidden,
     cl::desc("Print LLVM IR produced by the loop-reduce pass"));
 static cl::opt<bool>
@@ -487,7 +484,6 @@ CGPassBuilderOption llvm::getCGPassBuilderOption() {
   SET_BOOLEAN_OPTION(EnableImplicitNullChecks)
   SET_BOOLEAN_OPTION(EnableMachineOutliner)
   SET_BOOLEAN_OPTION(MISchedPostRA)
-  SET_BOOLEAN_OPTION(DisableMergeICmps)
   SET_BOOLEAN_OPTION(DisableLSR)
   SET_BOOLEAN_OPTION(DisableConstantHoisting)
   SET_BOOLEAN_OPTION(DisableCGP)
@@ -872,13 +868,6 @@ void TargetPassConfig::addIRPasses() {
                                         "\n\n*** Code after LSR ***\n"));
     }
 
-    // The MergeICmpsPass tries to create memcmp calls by grouping sequences of
-    // loads and compares. ExpandMemCmpPass then tries to expand those calls
-    // into optimally-sized loads and compares. The transforms are enabled by a
-    // target lowering hook.
-    if (!DisableMergeICmps)
-      addPass(createMergeICmpsLegacyPass());
-    addPass(createExpandMemCmpLegacyPass());
   }
 
   // Run GC lowering passes for builtin collectors
diff --git a/llvm/lib/Passes/PassBuilder.cpp b/llvm/lib/Passes/PassBuilder.cpp
index 439f749bda8bb7..20448554756aca 100644
--- a/llvm/lib/Passes/PassBuilder.cpp
+++ b/llvm/lib/Passes/PassBuilder.cpp
@@ -76,7 +76,6 @@
 #include "llvm/CodeGen/DwarfEHPrepare.h"
 #include "llvm/CodeGen/ExpandLargeDivRem.h"
 #include "llvm/CodeGen/ExpandLargeFpConvert.h"
-#include "llvm/CodeGen/ExpandMemCmp.h"
 #include "llvm/CodeGen/GCMetadata.h"
 #include "llvm/CodeGen/HardwareLoops.h"
 #include "llvm/CodeGen/IndirectBrExpand.h"
@@ -181,6 +180,7 @@
 #include "llvm/Transforms/Scalar/DeadStoreElimination.h"
 #include "llvm/Transforms/Scalar/DivRemPairs.h"
 #include "llvm/Transforms/Scalar/EarlyCSE.h"
+#include "llvm/Transforms/Scalar/ExpandMemCmp.h"
 #include "llvm/Transforms/Scalar/FlattenCFG.h"
 #include "llvm/Transforms/Scalar/Float2Int.h"
 #include "llvm/Transforms/Scalar/GVN.h"
diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp b/llvm/lib/Passes/PassBuilderPipelines.cpp
index 5c6c391049a7b2..e2dd413f12d696 100644
--- a/llvm/lib/Passes/PassBuilderPipelines.cpp
+++ b/llvm/lib/Passes/PassBuilderPipelines.cpp
@@ -86,6 +86,7 @@
 #include "llvm/Transforms/Scalar/DeadStoreElimination.h"
 #include "llvm/Transforms/Scalar/DivRemPairs.h"
 #include "llvm/Transforms/Scalar/EarlyCSE.h"
+#include "llvm/Transforms/Scalar/ExpandMemCmp.h"
 #include "llvm/Transforms/Scalar/Float2Int.h"
 #include "llvm/Transforms/Scalar/GVN.h"
 #include "llvm/Transforms/Scalar/IndVarSimplify.h"
@@ -111,6 +112,7 @@
 #include "llvm/Transforms/Scalar/LowerExpectIntrinsic.h"
 #include "llvm/Transforms/Scalar/LowerMatrixIntrinsics.h"
 #include "llvm/Transforms/Scalar/MemCpyOptimizer.h"
+#include "llvm/Transforms/Scalar/MergeICmps.h"
 #include "llvm/Transforms/Scalar/MergedLoadStoreMotion.h"
 #include "llvm/Transforms/Scalar/NewGVN.h"
 #include "llvm/Transforms/Scalar/Reassociate.h"
@@ -386,6 +388,8 @@ PassBuilder::buildO1FunctionSimplificationPipeline(OptimizationLevel Level,
   if (AreStatisticsEnabled())
     FPM.addPass(CountVisitsPass());
 
+  FPM.addPass(MergeICmpsPass());
+  FPM.addPass(ExpandMemCmpPass(TM));
   // Form SSA out of local memory accesses after breaking apart aggregates into
   // scalars.
   FPM.addPass(SROAPass(SROAOptions::ModifyCFG));
@@ -532,6 +536,8 @@ PassBuilder::buildFunctionSimplificationPipeline(OptimizationLevel Level,
   if (AreStatisticsEnabled())
     FPM.addPass(CountVisitsPass());
 
+  FPM.addPass(MergeICmpsPass());
+  FPM.addPass(ExpandMemCmpPass(TM));
   // Form SSA out of local memory accesses after breaking apart aggregates into
   // scalars.
   FPM.addPass(SROAPass(SROAOptions::ModifyCFG));
diff --git a/llvm/lib/Passes/PassRegistry.def b/llvm/lib/Passes/PassRegistry.def
index 82ce040c649626..31adbf1942b410 100644
--- a/llvm/lib/Passes/PassRegistry.def
+++ b/llvm/lib/Passes/PassRegistry.def
@@ -353,6 +353,7 @@ FUNCTION_PASS("mem2reg", PromotePass())
 FUNCTION_PASS("memcpyopt", MemCpyOptPass())
 FUNCTION_PASS("memprof", MemProfilerPass())
 FUNCTION_PASS("mergeicmps", MergeICmpsPass())
+FUNCTION_PASS("expand-memcmp", ExpandMemCmpPass(TM))
 FUNCTION_PASS("mergereturn", UnifyFunctionExitNodesPass())
 FUNCTION_PASS("move-auto-init", MoveAutoInitPass())
 FUNCTION_PASS("nary-reassociate", NaryReassociatePass())
@@ -415,7 +416,7 @@ FUNCTION_PASS("structurizecfg", StructurizeCFGPass())
 FUNCTION_PASS("tailcallelim", TailCallElimPass())
 FUNCTION_PASS("tlshoist", TLSVariableHoistPass())
 FUNCTION_PASS("transform-warning", WarnMissedTransformationsPass())
-FUNCTION_PASS("trigger-verifier-error", TriggerVerifierErrorPass())  
+FUNCTION_PASS("trigger-verifier-error", TriggerVerifierErrorPass())
 FUNCTION_PASS("tsan", ThreadSanitizerPass())
 FUNCTION_PASS("typepromotion", TypePromotionPass(TM))
 FUNCTION_PASS("unify-loop-exits", UnifyLoopExitsPass())
diff --git a/llvm/lib/Transforms/Scalar/CMakeLists.txt b/llvm/lib/Transforms/Scalar/CMakeLists.txt
index 2dd27037a17de7..f6e666dd071256 100644
--- a/llvm/lib/Transforms/Scalar/CMakeLists.txt
+++ b/llvm/lib/Transforms/Scalar/CMakeLists.txt
@@ -11,6 +11,7 @@ add_llvm_component_library(LLVMScalarOpts
   DeadStoreElimination.cpp
   DFAJumpThreading.cpp
   DivRemPairs.cpp
+  ExpandMemCmp.cpp
   EarlyCSE.cpp
   FlattenCFGPass.cpp
   Float2Int.cpp
diff --git a/llvm/lib/CodeGen/ExpandMemCmp.cpp b/llvm/lib/Transforms/Scalar/ExpandMemCmp.cpp
similarity index 90%
rename from llvm/lib/CodeGen/ExpandMemCmp.cpp
rename to llvm/lib/Transforms/Scalar/ExpandMemCmp.cpp
index bb84813569f4d5..973875ee142978 100644
--- a/llvm/lib/CodeGen/ExpandMemCmp.cpp
+++ b/llvm/lib/Transforms/Scalar/ExpandMemCmp.cpp
@@ -11,21 +11,22 @@
 //
 //===----------------------------------------------------------------------===//
 
-#include "llvm/CodeGen/ExpandMemCmp.h"
+#include "llvm/Transforms/Scalar/ExpandMemCmp.h"
 #include "llvm/ADT/Statistic.h"
 #include "llvm/Analysis/ConstantFolding.h"
 #include "llvm/Analysis/DomTreeUpdater.h"
+#include "llvm/Analysis/GlobalsModRef.h"
 #include "llvm/Analysis/LazyBlockFrequencyInfo.h"
 #include "llvm/Analysis/ProfileSummaryInfo.h"
 #include "llvm/Analysis/TargetLibraryInfo.h"
 #include "llvm/Analysis/TargetTransformInfo.h"
 #include "llvm/Analysis/ValueTracking.h"
-#include "llvm/CodeGen/TargetPassConfig.h"
-#include "llvm/CodeGen/TargetSubtargetInfo.h"
+#include "llvm/IR/Constants.h"
 #include "llvm/IR/Dominators.h"
 #include "llvm/IR/IRBuilder.h"
 #include "llvm/IR/PatternMatch.h"
 #include "llvm/InitializePasses.h"
+#include "llvm/Support/Debug.h"
 #include "llvm/Target/TargetMachine.h"
 #include "llvm/Transforms/Utils/BasicBlockUtils.h"
 #include "llvm/Transforms/Utils/Local.h"
@@ -35,9 +36,6 @@
 using namespace llvm;
 using namespace llvm::PatternMatch;
 
-namespace llvm {
-class TargetLowering;
-}
 
 #define DEBUG_TYPE "expand-memcmp"
 
@@ -305,6 +303,7 @@ unsigned MemCmpExpansion::getNumBlocks() {
 }
 
 void MemCmpExpansion::createLoadCmpBlocks() {
+  assert(ResBlock.BB && "ResBlock must be created before LoadCmpBlocks");
   for (unsigned i = 0; i < getNumBlocks(); i++) {
     BasicBlock *BB = BasicBlock::Create(CI->getContext(), "loadbb",
                                         EndBlock->getParent(), EndBlock);
@@ -313,6 +312,7 @@ void MemCmpExpansion::createLoadCmpBlocks() {
 }
 
 void MemCmpExpansion::createResultBlock() {
+  assert(EndBlock && "EndBlock must be created before ResultBlock");
   ResBlock.BB = BasicBlock::Create(CI->getContext(), "res_block",
                                    EndBlock->getParent(), EndBlock);
 }
@@ -828,9 +828,9 @@ Value *MemCmpExpansion::getMemCmpExpansion() {
 ///  %phi.res = phi i32 [ %48, %loadbb3 ], [ %11, %res_block ]
 ///  ret i32 %phi.res
 static bool expandMemCmp(CallInst *CI, const TargetTransformInfo *TTI,
-                         const TargetLowering *TLI, const DataLayout *DL,
-                         ProfileSummaryInfo *PSI, BlockFrequencyInfo *BFI,
-                         DomTreeUpdater *DTU, const bool IsBCmp) {
+                         const DataLayout *DL, ProfileSummaryInfo *PSI,
+                         BlockFrequencyInfo *BFI, DomTreeUpdater *DTU,
+                         const bool IsBCmp) {
   NumMemCmpCalls++;
 
   // Early exit from expansion if -Oz.
@@ -845,9 +845,7 @@ static bool expandMemCmp(CallInst *CI, const TargetTransformInfo *TTI,
   }
   const uint64_t SizeVal = SizeCast->getZExtValue();
 
-  if (SizeVal == 0) {
-    return false;
-  }
+
   // TTI call to check if target would like to expand memcmp. Also, get the
   // available load sizes.
   const bool IsUsedForZeroCmp =
@@ -857,28 +855,33 @@ static bool expandMemCmp(CallInst *CI, const TargetTransformInfo *TTI,
   auto Options = TTI->enableMemCmpExpansion(OptForSize,
                                             IsUsedForZeroCmp);
   if (!Options) return false;
+  Value *Res = nullptr;
 
-  if (MemCmpEqZeroNumLoadsPerBlock.getNumOccurrences())
-    Options.NumLoadsPerBlock = MemCmpEqZeroNumLoadsPerBlock;
-
-  if (OptForSize &&
-      MaxLoadsPerMemcmpOptSize.getNumOccurrences())
-    Options.MaxNumLoads = MaxLoadsPerMemcmpOptSize;
+  if (SizeVal == 0) {
+    Res = ConstantInt::get(CI->getFunctionType()->getReturnType(), 0);
+  } else {
+    if (MemCmpEqZeroNumLoadsPerBlock.getNumOccurrences())
+      Options.NumLoadsPerBlock = MemCmpEqZeroNumLoadsPerBlock;
 
-  if (!OptForSize && MaxLoadsPerMemcmp.getNumOccurrences())
-    Options.MaxNumLoads = MaxLoadsPerMemcmp;
+    if (OptForSize &&
+        MaxLoadsPerMemcmpOptSize.getNumOccurrences())
+      Options.MaxNumLoads = MaxLoadsPerMemcmpOptSize;
 
-  MemCmpExpansion Expansion(CI, SizeVal, Options, IsUsedForZeroCmp, *DL, DTU);
+    if (!OptForSize && MaxLoadsPerMemcmp.getNumOccurrences())
+      Options.MaxNumLoads = MaxLoadsPerMemcmp;
 
-  // Don't expand if this will require more loads than desired by the target.
-  if (Expansion.getNumLoads() == 0) {
-    NumMemCmpGreaterThanMax++;
-    return false;
-  }
+    MemCmpExpansion Expansion(CI, SizeVal, Options, IsUsedForZeroCmp, *DL, DTU);
 
-  NumMemCmpInlined++;
+    // Don't expand if this will require more loads than desired by the target.
+    if (Expansion.getNumLoads() == 0) {
+      NumMemCmpGreaterThanMax++;
+      return false;
+    }
 
-  if (Value *Res = Expansion.getMemCmpExpansion()) {
+    NumMemCmpInlined++;
+    Res = Expansion.getMemCmpExpansion();
+  }
+  if (Res) {
     // Replace call with result of expansion and erase call.
     CI->replaceAllUsesWith(Res);
     CI->eraseFromParent();
@@ -889,62 +892,18 @@ static bool expandMemCmp(CallInst *CI, const TargetTransformInfo *TTI,
 
 // Returns true if a change was made.
 static bool runOnBlock(BasicBlock &BB, const TargetLibraryInfo *TLI,
-                       const TargetTransformInfo *TTI, const TargetLowering *TL,
+                       const TargetTransformInfo *TTI,
                        const DataLayout &DL, ProfileSummaryInfo *PSI,
                        BlockFrequencyInfo *BFI, DomTreeUpdater *DTU);
 
 static PreservedAnalyses runImpl(Function &F, const TargetLibraryInfo *TLI,
                                  const TargetTransformInfo *TTI,
-                                 const TargetLowering *TL,
                                  ProfileSummaryInfo *PSI,
                                  BlockFrequencyInfo *BFI, DominatorTree *DT);
 
-class ExpandMemCmpLegacyPass : public FunctionPass {
-public:
-  static char ID;
-
-  ExpandMemCmpLegacyPass() : FunctionPass(ID) {
-    initializeExpandMemCmpLegacyPassPass(*PassRegistry::getPassRegistry());
-  }
-
-  bool runOnFunction(Function &F) override {
-    if (skipFunction(F)) return false;
-
-    auto *TPC = getAnalysisIfAvailable<TargetPassConfig>();
-    if (!TPC) {
-      return false;
-    }
-    const TargetLowering* TL =
-        TPC->getTM<TargetMachine>().getSubtargetImpl(F)->getTargetLowering();
-
-    const TargetLibraryInfo *TLI =
-        &getAnalysis<TargetLibraryInfoWrapperPass>().getTLI(F);
-    const TargetTransformInfo *TTI =
-        &getAnalysis<TargetTransformInfoWrapperPass>().getTTI(F);
-    auto *PSI = &getAnalysis<ProfileSummaryInfoWrapperPass>().getPSI();
-    auto *BFI = (PSI && PSI->hasProfileSummary()) ?
-           &getAnalysis<LazyBlockFrequencyInfoPass>().getBFI() :
-           nullptr;
-    DominatorTree *DT = nullptr;
-    if (auto *DTWP = getAnalysisIfAvailable<DominatorTreeWrapperPass>())
-      DT = &DTWP->getDomTree();
-    auto PA = runImpl(F, TLI, TTI, TL, PSI, BFI, DT);
-    return !PA.areAllPreserved();
-  }
-
-private:
-  void getAnalysisUsage(AnalysisUsage &AU) const override {
-    AU.addRequired<TargetLibraryInfoWrapperPass>();
-    AU.addRequired<TargetTransformInfoWrapperPass>();
-    AU.addRequired<ProfileSummaryInfoWrapperPass>();
-    AU.addPreserved<DominatorTreeWrapperPass>();
-    LazyBlockFrequencyInfoPass::getLazyBFIAnalysisUsage(AU);
-    FunctionPass::getAnalysisUsage(AU);
-  }
-};
 
 bool runOnBlock(BasicBlock &BB, const TargetLibraryInfo *TLI,
-                const TargetTransformInfo *TTI, const TargetLowering *TL,
+                const TargetTransformInfo *TTI,
                 const DataLayout &DL, ProfileSummaryInfo *PSI,
                 BlockFrequencyInfo *BFI, DomTreeUpdater *DTU) {
   for (Instruction &I : BB) {
@@ -955,7 +914,7 @@ bool runOnBlock(BasicBlock &BB, const TargetLibraryInfo *TLI,
     LibFunc Func;
     if (TLI->getLibFunc(*CI, Func) &&
         (Func == LibFunc_memcmp || Func == LibFunc_bcmp) &&
-        expandMemCmp(CI, TTI, TL, &DL, PS...
[truncated]

@llvmbot

llvmbot commented Jan 8, 2024

Copy link
Copy Markdown
Member

@llvm/pr-subscribers-backend-x86

Author: Gabriel Baraldi (gbaraldi)

Changes

This should allow for optimizations like saving repeated loads between memcmp calls. c.f https://godbolt.org/z/bEna4Md9r, the window on the right shows the generated output of this branch.

One question is where to put this in the pipeline. The inline expansions probably benefit the most from the passes we run early, but we might want to wait a bit so that we can try and prove a constant argument to the memcmp (though I imagine if they are going to be constant, it's directly from the source).

One other question is that some of the x86 tests are massive, I just ported over from the llc test, but they grew quite a bit more.


Patch is 4.02 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/77370.diff

83 Files Affected:

  • (modified) llvm/include/llvm/CodeGen/CodeGenPassBuilder.h (-10)
  • (modified) llvm/include/llvm/CodeGen/MachinePassRegistry.def (-2)
  • (modified) llvm/include/llvm/CodeGen/Passes.h (-2)
  • (modified) llvm/include/llvm/InitializePasses.h (-1)
  • (modified) llvm/include/llvm/LinkAllPasses.h (-1)
  • (renamed) llvm/include/llvm/Transforms/Scalar/ExpandMemCmp.h (+3-3)
  • (modified) llvm/lib/CodeGen/CMakeLists.txt (-1)
  • (modified) llvm/lib/CodeGen/CodeGen.cpp (-1)
  • (modified) llvm/lib/CodeGen/TargetPassConfig.cpp (-11)
  • (modified) llvm/lib/Passes/PassBuilder.cpp (+1-1)
  • (modified) llvm/lib/Passes/PassBuilderPipelines.cpp (+6)
  • (modified) llvm/lib/Passes/PassRegistry.def (+2-1)
  • (modified) llvm/lib/Transforms/Scalar/CMakeLists.txt (+1)
  • (renamed) llvm/lib/Transforms/Scalar/ExpandMemCmp.cpp (+37-96)
  • (modified) llvm/test/CodeGen/AArch64/O3-pipeline.ll (-7)
  • (removed) llvm/test/CodeGen/AArch64/bcmp-inline-small.ll (-98)
  • (removed) llvm/test/CodeGen/AArch64/bcmp.ll (-537)
  • (modified) llvm/test/CodeGen/AArch64/dag-combine-setcc.ll (+25-6)
  • (modified) llvm/test/CodeGen/AArch64/machine-licm-hoist-load.ll (+84-44)
  • (removed) llvm/test/CodeGen/AArch64/memcmp.ll (-3029)
  • (modified) llvm/test/CodeGen/AMDGPU/llc-pipeline.ll (-28)
  • (modified) llvm/test/CodeGen/ARM/O3-pipeline.ll (-7)
  • (removed) llvm/test/CodeGen/BPF/memcmp.ll (-77)
  • (modified) llvm/test/CodeGen/Generic/llc-start-stop.ll (+3-3)
  • (modified) llvm/test/CodeGen/LoongArch/opt-pipeline.ll (+1-8)
  • (modified) llvm/test/CodeGen/M68k/pipeline.ll (-7)
  • (modified) llvm/test/CodeGen/PowerPC/O3-pipeline.ll (+1-8)
  • (removed) llvm/test/CodeGen/PowerPC/memCmpUsedInZeroEqualityComparison.ll (-168)
  • (removed) llvm/test/CodeGen/PowerPC/memcmp-mergeexpand.ll (-39)
  • (removed) llvm/test/CodeGen/PowerPC/memcmp.ll (-62)
  • (removed) llvm/test/CodeGen/PowerPC/memcmpIR.ll (-178)
  • (modified) llvm/test/CodeGen/RISCV/O3-pipeline.ll (+1-8)
  • (removed) llvm/test/CodeGen/X86/memcmp-mergeexpand.ll (-49)
  • (removed) llvm/test/CodeGen/X86/memcmp-minsize-x32.ll (-445)
  • (removed) llvm/test/CodeGen/X86/memcmp-minsize.ll (-433)
  • (removed) llvm/test/CodeGen/X86/memcmp-more-load-pairs-x32.ll (-2911)
  • (removed) llvm/test/CodeGen/X86/memcmp-more-load-pairs.ll (-4006)
  • (removed) llvm/test/CodeGen/X86/memcmp-optsize-x32.ll (-583)
  • (removed) llvm/test/CodeGen/X86/memcmp-optsize.ll (-596)
  • (removed) llvm/test/CodeGen/X86/memcmp-pgso-x32.ll (-600)
  • (removed) llvm/test/CodeGen/X86/memcmp-pgso.ll (-613)
  • (removed) llvm/test/CodeGen/X86/memcmp-x32.ll (-2429)
  • (removed) llvm/test/CodeGen/X86/memcmp.ll (-3065)
  • (modified) llvm/test/CodeGen/X86/opt-pipeline.ll (+1-8)
  • (modified) llvm/test/Other/new-pm-defaults.ll (+3-1)
  • (modified) llvm/test/Other/new-pm-thinlto-postlink-defaults.ll (+3-1)
  • (modified) llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll (+3-1)
  • (modified) llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll (+3-1)
  • (modified) llvm/test/Other/new-pm-thinlto-prelink-defaults.ll (+3-1)
  • (modified) llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll (+14-12)
  • (modified) llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll (+3-1)
  • (added) llvm/test/Transforms/ExpandMemCmp/AArch64/bcmp.ll (+751)
  • (added) llvm/test/Transforms/ExpandMemCmp/AArch64/memcmp-extra.ll (+3434)
  • (modified) llvm/test/Transforms/ExpandMemCmp/AArch64/memcmp.ll (-1)
  • (added) llvm/test/Transforms/ExpandMemCmp/BPF/lit.local.cfg (+4)
  • (added) llvm/test/Transforms/ExpandMemCmp/BPF/memcmp.ll (+119)
  • (added) llvm/test/Transforms/ExpandMemCmp/PowerPC/lit.local.cfg (+2)
  • (added) llvm/test/Transforms/ExpandMemCmp/PowerPC/memCmpUsedInZeroEqualityComparison.ll (+218)
  • (added) llvm/test/Transforms/ExpandMemCmp/PowerPC/memcmp-mergeexpand.ll (+48)
  • (added) llvm/test/Transforms/ExpandMemCmp/PowerPC/memcmp.ll (+70)
  • (added) llvm/test/Transforms/ExpandMemCmp/PowerPC/memcmpIR.ll (+216)
  • (modified) llvm/test/Transforms/ExpandMemCmp/X86/bcmp.ll (+8-8)
  • (added) llvm/test/Transforms/ExpandMemCmp/X86/memcmp-2.ll (+20249)
  • (renamed) llvm/test/Transforms/ExpandMemCmp/X86/memcmp-constant.ll (+47-42)
  • (added) llvm/test/Transforms/ExpandMemCmp/X86/memcmp-minsize-x32.ll (+493)
  • (added) llvm/test/Transforms/ExpandMemCmp/X86/memcmp-minsize.ll (+707)
  • (added) llvm/test/Transforms/ExpandMemCmp/X86/memcmp-more-load-pairs-x32.ll (+6203)
  • (added) llvm/test/Transforms/ExpandMemCmp/X86/memcmp-more-load-pairs.ll (+18833)
  • (added) llvm/test/Transforms/ExpandMemCmp/X86/memcmp-nobuiltin.ll (+248)
  • (added) llvm/test/Transforms/ExpandMemCmp/X86/memcmp-optsize-x32.ll (+870)
  • (added) llvm/test/Transforms/ExpandMemCmp/X86/memcmp-optsize.ll (+1414)
  • (added) llvm/test/Transforms/ExpandMemCmp/X86/memcmp-pgso-x32.ll (+887)
  • (added) llvm/test/Transforms/ExpandMemCmp/X86/memcmp-pgso.ll (+1347)
  • (added) llvm/test/Transforms/ExpandMemCmp/X86/memcmp-x32-2.ll (+4813)
  • (modified) llvm/test/Transforms/ExpandMemCmp/X86/memcmp-x32.ll (+277-246)
  • (modified) llvm/test/Transforms/ExpandMemCmp/X86/memcmp.ll (+618-576)
  • (added) llvm/test/Transforms/PhaseOrdering/PowerPC/lit.local.cfg (+2)
  • (added) llvm/test/Transforms/PhaseOrdering/X86/memcmp-early.ll (+86)
  • (added) llvm/test/Transforms/PhaseOrdering/X86/memcmp-mergeexpand.ll (+62)
  • (added) llvm/test/Transforms/PhaseOrdering/X86/memcmp.ll (+856)
  • (modified) llvm/tools/opt/opt.cpp (-1)
  • (modified) llvm/utils/gn/secondary/llvm/lib/CodeGen/BUILD.gn (-1)
  • (modified) llvm/utils/gn/secondary/llvm/lib/Transforms/Scalar/BUILD.gn (+1)
diff --git a/llvm/include/llvm/CodeGen/CodeGenPassBuilder.h b/llvm/include/llvm/CodeGen/CodeGenPassBuilder.h
index a7cbb0910baabf..556304231b397b 100644
--- a/llvm/include/llvm/CodeGen/CodeGenPassBuilder.h
+++ b/llvm/include/llvm/CodeGen/CodeGenPassBuilder.h
@@ -25,7 +25,6 @@
 #include "llvm/Analysis/TypeBasedAliasAnalysis.h"
 #include "llvm/CodeGen/CallBrPrepare.h"
 #include "llvm/CodeGen/DwarfEHPrepare.h"
-#include "llvm/CodeGen/ExpandMemCmp.h"
 #include "llvm/CodeGen/ExpandReductions.h"
 #include "llvm/CodeGen/GCMetadata.h"
 #include "llvm/CodeGen/IndirectBrExpand.h"
@@ -629,15 +628,6 @@ void CodeGenPassBuilder<Derived>::addIRPasses(AddIRPass &addPass) const {
       addPass(PrintFunctionPass(dbgs(), "\n\n*** Code after LSR ***\n"));
   }
 
-  if (getOptLevel() != CodeGenOptLevel::None) {
-    // The MergeICmpsPass tries to create memcmp calls by grouping sequences of
-    // loads and compares. ExpandMemCmpPass then tries to expand those calls
-    // into optimally-sized loads and compares. The transforms are enabled by a
-    // target lowering hook.
-    if (!Opt.DisableMergeICmps)
-      addPass(MergeICmpsPass());
-    addPass(ExpandMemCmpPass(&TM));
-  }
 
   // Run GC lowering passes for builtin collectors
   // TODO: add a pass insertion point here
diff --git a/llvm/include/llvm/CodeGen/MachinePassRegistry.def b/llvm/include/llvm/CodeGen/MachinePassRegistry.def
index f950dfae7e338b..3c00668aae3897 100644
--- a/llvm/include/llvm/CodeGen/MachinePassRegistry.def
+++ b/llvm/include/llvm/CodeGen/MachinePassRegistry.def
@@ -47,7 +47,6 @@ FUNCTION_PASS("dwarf-eh-prepare", DwarfEHPreparePass, (TM))
 FUNCTION_PASS("ee-instrument", EntryExitInstrumenterPass, (false))
 FUNCTION_PASS("expand-large-div-rem", ExpandLargeDivRemPass, (TM))
 FUNCTION_PASS("expand-large-fp-convert", ExpandLargeFpConvertPass, (TM))
-FUNCTION_PASS("expand-memcmp", ExpandMemCmpPass, (TM))
 FUNCTION_PASS("expand-reductions", ExpandReductionsPass, ())
 FUNCTION_PASS("expandvp", ExpandVectorPredicationPass, ())
 FUNCTION_PASS("indirectbr-expand", IndirectBrExpandPass, (TM))
@@ -55,7 +54,6 @@ FUNCTION_PASS("interleaved-access", InterleavedAccessPass, (TM))
 FUNCTION_PASS("interleaved-load-combine", InterleavedLoadCombinePass, (TM))
 FUNCTION_PASS("lower-constant-intrinsics", LowerConstantIntrinsicsPass, ())
 FUNCTION_PASS("lowerinvoke", LowerInvokePass, ())
-FUNCTION_PASS("mergeicmps", MergeICmpsPass, ())
 FUNCTION_PASS("partially-inline-libcalls", PartiallyInlineLibCallsPass, ())
 FUNCTION_PASS("post-inline-ee-instrument", EntryExitInstrumenterPass, (true))
 FUNCTION_PASS("replace-with-veclib", ReplaceWithVeclib, ())
diff --git a/llvm/include/llvm/CodeGen/Passes.h b/llvm/include/llvm/CodeGen/Passes.h
index ca9fbb1def7624..e5ed5f15f62ed7 100644
--- a/llvm/include/llvm/CodeGen/Passes.h
+++ b/llvm/include/llvm/CodeGen/Passes.h
@@ -519,8 +519,6 @@ namespace llvm {
   // Expands large div/rem instructions.
   FunctionPass *createExpandLargeFpConvertPass();
 
-  // This pass expands memcmp() to load/stores.
-  FunctionPass *createExpandMemCmpLegacyPass();
 
   /// Creates Break False Dependencies pass. \see BreakFalseDeps.cpp
   FunctionPass *createBreakFalseDeps();
diff --git a/llvm/include/llvm/InitializePasses.h b/llvm/include/llvm/InitializePasses.h
index 46b1e95c3c15f3..b0ca9fa942cda3 100644
--- a/llvm/include/llvm/InitializePasses.h
+++ b/llvm/include/llvm/InitializePasses.h
@@ -103,7 +103,6 @@ void initializeEdgeBundlesPass(PassRegistry&);
 void initializeEHContGuardCatchretPass(PassRegistry &);
 void initializeExpandLargeFpConvertLegacyPassPass(PassRegistry&);
 void initializeExpandLargeDivRemLegacyPassPass(PassRegistry&);
-void initializeExpandMemCmpLegacyPassPass(PassRegistry &);
 void initializeExpandPostRAPass(PassRegistry&);
 void initializeExpandReductionsPass(PassRegistry&);
 void initializeExpandVectorPredicationPass(PassRegistry &);
diff --git a/llvm/include/llvm/LinkAllPasses.h b/llvm/include/llvm/LinkAllPasses.h
index 7a21876e565a7c..9aff428fbe938b 100644
--- a/llvm/include/llvm/LinkAllPasses.h
+++ b/llvm/include/llvm/LinkAllPasses.h
@@ -119,7 +119,6 @@ namespace {
       (void) llvm::createPostDomTree();
       (void) llvm::createMergeICmpsLegacyPass();
       (void) llvm::createExpandLargeDivRemPass();
-      (void)llvm::createExpandMemCmpLegacyPass();
       (void) llvm::createExpandVectorPredicationPass();
       std::string buf;
       llvm::raw_string_ostream os(buf);
diff --git a/llvm/include/llvm/CodeGen/ExpandMemCmp.h b/llvm/include/llvm/Transforms/Scalar/ExpandMemCmp.h
similarity index 83%
rename from llvm/include/llvm/CodeGen/ExpandMemCmp.h
rename to llvm/include/llvm/Transforms/Scalar/ExpandMemCmp.h
index 94a877854f327a..94ba0cf9305040 100644
--- a/llvm/include/llvm/CodeGen/ExpandMemCmp.h
+++ b/llvm/include/llvm/Transforms/Scalar/ExpandMemCmp.h
@@ -6,8 +6,8 @@
 //
 //===----------------------------------------------------------------------===//
 
-#ifndef LLVM_CODEGEN_EXPANDMEMCMP_H
-#define LLVM_CODEGEN_EXPANDMEMCMP_H
+#ifndef LLVM_TRANSFORMS_SCALAR_EXPANDMEMCMP_H
+#define LLVM_TRANSFORMS_SCALAR_EXPANDMEMCMP_H
 
 #include "llvm/IR/PassManager.h"
 
@@ -26,4 +26,4 @@ class ExpandMemCmpPass : public PassInfoMixin<ExpandMemCmpPass> {
 
 } // namespace llvm
 
-#endif // LLVM_CODEGEN_EXPANDMEMCMP_H
+#endif // LLVM_TRANSFORMS_SCALAR_EXPANDMEMCMP_H
diff --git a/llvm/lib/CodeGen/CMakeLists.txt b/llvm/lib/CodeGen/CMakeLists.txt
index df2d1831ee5fdb..518432e9a7b32f 100644
--- a/llvm/lib/CodeGen/CMakeLists.txt
+++ b/llvm/lib/CodeGen/CMakeLists.txt
@@ -71,7 +71,6 @@ add_llvm_component_library(LLVMCodeGen
   ExecutionDomainFix.cpp
   ExpandLargeDivRem.cpp
   ExpandLargeFpConvert.cpp
-  ExpandMemCmp.cpp
   ExpandPostRAPseudos.cpp
   ExpandReductions.cpp
   ExpandVectorPredication.cpp
diff --git a/llvm/lib/CodeGen/CodeGen.cpp b/llvm/lib/CodeGen/CodeGen.cpp
index 7b73a7b11ddf1c..043fa4e6eabe8f 100644
--- a/llvm/lib/CodeGen/CodeGen.cpp
+++ b/llvm/lib/CodeGen/CodeGen.cpp
@@ -41,7 +41,6 @@ void llvm::initializeCodeGen(PassRegistry &Registry) {
   initializeEarlyTailDuplicatePass(Registry);
   initializeExpandLargeDivRemLegacyPassPass(Registry);
   initializeExpandLargeFpConvertLegacyPassPass(Registry);
-  initializeExpandMemCmpLegacyPassPass(Registry);
   initializeExpandPostRAPass(Registry);
   initializeFEntryInserterPass(Registry);
   initializeFinalizeISelPass(Registry);
diff --git a/llvm/lib/CodeGen/TargetPassConfig.cpp b/llvm/lib/CodeGen/TargetPassConfig.cpp
index 4003a08a5422dd..33562e90e94426 100644
--- a/llvm/lib/CodeGen/TargetPassConfig.cpp
+++ b/llvm/lib/CodeGen/TargetPassConfig.cpp
@@ -108,9 +108,6 @@ static cl::opt<bool> EnableImplicitNullChecks(
     "enable-implicit-null-checks",
     cl::desc("Fold null checks into faulting memory operations"),
     cl::init(false), cl::Hidden);
-static cl::opt<bool> DisableMergeICmps("disable-mergeicmps",
-    cl::desc("Disable MergeICmps Pass"),
-    cl::init(false), cl::Hidden);
 static cl::opt<bool> PrintLSR("print-lsr-output", cl::Hidden,
     cl::desc("Print LLVM IR produced by the loop-reduce pass"));
 static cl::opt<bool>
@@ -487,7 +484,6 @@ CGPassBuilderOption llvm::getCGPassBuilderOption() {
   SET_BOOLEAN_OPTION(EnableImplicitNullChecks)
   SET_BOOLEAN_OPTION(EnableMachineOutliner)
   SET_BOOLEAN_OPTION(MISchedPostRA)
-  SET_BOOLEAN_OPTION(DisableMergeICmps)
   SET_BOOLEAN_OPTION(DisableLSR)
   SET_BOOLEAN_OPTION(DisableConstantHoisting)
   SET_BOOLEAN_OPTION(DisableCGP)
@@ -872,13 +868,6 @@ void TargetPassConfig::addIRPasses() {
                                         "\n\n*** Code after LSR ***\n"));
     }
 
-    // The MergeICmpsPass tries to create memcmp calls by grouping sequences of
-    // loads and compares. ExpandMemCmpPass then tries to expand those calls
-    // into optimally-sized loads and compares. The transforms are enabled by a
-    // target lowering hook.
-    if (!DisableMergeICmps)
-      addPass(createMergeICmpsLegacyPass());
-    addPass(createExpandMemCmpLegacyPass());
   }
 
   // Run GC lowering passes for builtin collectors
diff --git a/llvm/lib/Passes/PassBuilder.cpp b/llvm/lib/Passes/PassBuilder.cpp
index 439f749bda8bb7..20448554756aca 100644
--- a/llvm/lib/Passes/PassBuilder.cpp
+++ b/llvm/lib/Passes/PassBuilder.cpp
@@ -76,7 +76,6 @@
 #include "llvm/CodeGen/DwarfEHPrepare.h"
 #include "llvm/CodeGen/ExpandLargeDivRem.h"
 #include "llvm/CodeGen/ExpandLargeFpConvert.h"
-#include "llvm/CodeGen/ExpandMemCmp.h"
 #include "llvm/CodeGen/GCMetadata.h"
 #include "llvm/CodeGen/HardwareLoops.h"
 #include "llvm/CodeGen/IndirectBrExpand.h"
@@ -181,6 +180,7 @@
 #include "llvm/Transforms/Scalar/DeadStoreElimination.h"
 #include "llvm/Transforms/Scalar/DivRemPairs.h"
 #include "llvm/Transforms/Scalar/EarlyCSE.h"
+#include "llvm/Transforms/Scalar/ExpandMemCmp.h"
 #include "llvm/Transforms/Scalar/FlattenCFG.h"
 #include "llvm/Transforms/Scalar/Float2Int.h"
 #include "llvm/Transforms/Scalar/GVN.h"
diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp b/llvm/lib/Passes/PassBuilderPipelines.cpp
index 5c6c391049a7b2..e2dd413f12d696 100644
--- a/llvm/lib/Passes/PassBuilderPipelines.cpp
+++ b/llvm/lib/Passes/PassBuilderPipelines.cpp
@@ -86,6 +86,7 @@
 #include "llvm/Transforms/Scalar/DeadStoreElimination.h"
 #include "llvm/Transforms/Scalar/DivRemPairs.h"
 #include "llvm/Transforms/Scalar/EarlyCSE.h"
+#include "llvm/Transforms/Scalar/ExpandMemCmp.h"
 #include "llvm/Transforms/Scalar/Float2Int.h"
 #include "llvm/Transforms/Scalar/GVN.h"
 #include "llvm/Transforms/Scalar/IndVarSimplify.h"
@@ -111,6 +112,7 @@
 #include "llvm/Transforms/Scalar/LowerExpectIntrinsic.h"
 #include "llvm/Transforms/Scalar/LowerMatrixIntrinsics.h"
 #include "llvm/Transforms/Scalar/MemCpyOptimizer.h"
+#include "llvm/Transforms/Scalar/MergeICmps.h"
 #include "llvm/Transforms/Scalar/MergedLoadStoreMotion.h"
 #include "llvm/Transforms/Scalar/NewGVN.h"
 #include "llvm/Transforms/Scalar/Reassociate.h"
@@ -386,6 +388,8 @@ PassBuilder::buildO1FunctionSimplificationPipeline(OptimizationLevel Level,
   if (AreStatisticsEnabled())
     FPM.addPass(CountVisitsPass());
 
+  FPM.addPass(MergeICmpsPass());
+  FPM.addPass(ExpandMemCmpPass(TM));
   // Form SSA out of local memory accesses after breaking apart aggregates into
   // scalars.
   FPM.addPass(SROAPass(SROAOptions::ModifyCFG));
@@ -532,6 +536,8 @@ PassBuilder::buildFunctionSimplificationPipeline(OptimizationLevel Level,
   if (AreStatisticsEnabled())
     FPM.addPass(CountVisitsPass());
 
+  FPM.addPass(MergeICmpsPass());
+  FPM.addPass(ExpandMemCmpPass(TM));
   // Form SSA out of local memory accesses after breaking apart aggregates into
   // scalars.
   FPM.addPass(SROAPass(SROAOptions::ModifyCFG));
diff --git a/llvm/lib/Passes/PassRegistry.def b/llvm/lib/Passes/PassRegistry.def
index 82ce040c649626..31adbf1942b410 100644
--- a/llvm/lib/Passes/PassRegistry.def
+++ b/llvm/lib/Passes/PassRegistry.def
@@ -353,6 +353,7 @@ FUNCTION_PASS("mem2reg", PromotePass())
 FUNCTION_PASS("memcpyopt", MemCpyOptPass())
 FUNCTION_PASS("memprof", MemProfilerPass())
 FUNCTION_PASS("mergeicmps", MergeICmpsPass())
+FUNCTION_PASS("expand-memcmp", ExpandMemCmpPass(TM))
 FUNCTION_PASS("mergereturn", UnifyFunctionExitNodesPass())
 FUNCTION_PASS("move-auto-init", MoveAutoInitPass())
 FUNCTION_PASS("nary-reassociate", NaryReassociatePass())
@@ -415,7 +416,7 @@ FUNCTION_PASS("structurizecfg", StructurizeCFGPass())
 FUNCTION_PASS("tailcallelim", TailCallElimPass())
 FUNCTION_PASS("tlshoist", TLSVariableHoistPass())
 FUNCTION_PASS("transform-warning", WarnMissedTransformationsPass())
-FUNCTION_PASS("trigger-verifier-error", TriggerVerifierErrorPass())  
+FUNCTION_PASS("trigger-verifier-error", TriggerVerifierErrorPass())
 FUNCTION_PASS("tsan", ThreadSanitizerPass())
 FUNCTION_PASS("typepromotion", TypePromotionPass(TM))
 FUNCTION_PASS("unify-loop-exits", UnifyLoopExitsPass())
diff --git a/llvm/lib/Transforms/Scalar/CMakeLists.txt b/llvm/lib/Transforms/Scalar/CMakeLists.txt
index 2dd27037a17de7..f6e666dd071256 100644
--- a/llvm/lib/Transforms/Scalar/CMakeLists.txt
+++ b/llvm/lib/Transforms/Scalar/CMakeLists.txt
@@ -11,6 +11,7 @@ add_llvm_component_library(LLVMScalarOpts
   DeadStoreElimination.cpp
   DFAJumpThreading.cpp
   DivRemPairs.cpp
+  ExpandMemCmp.cpp
   EarlyCSE.cpp
   FlattenCFGPass.cpp
   Float2Int.cpp
diff --git a/llvm/lib/CodeGen/ExpandMemCmp.cpp b/llvm/lib/Transforms/Scalar/ExpandMemCmp.cpp
similarity index 90%
rename from llvm/lib/CodeGen/ExpandMemCmp.cpp
rename to llvm/lib/Transforms/Scalar/ExpandMemCmp.cpp
index bb84813569f4d5..973875ee142978 100644
--- a/llvm/lib/CodeGen/ExpandMemCmp.cpp
+++ b/llvm/lib/Transforms/Scalar/ExpandMemCmp.cpp
@@ -11,21 +11,22 @@
 //
 //===----------------------------------------------------------------------===//
 
-#include "llvm/CodeGen/ExpandMemCmp.h"
+#include "llvm/Transforms/Scalar/ExpandMemCmp.h"
 #include "llvm/ADT/Statistic.h"
 #include "llvm/Analysis/ConstantFolding.h"
 #include "llvm/Analysis/DomTreeUpdater.h"
+#include "llvm/Analysis/GlobalsModRef.h"
 #include "llvm/Analysis/LazyBlockFrequencyInfo.h"
 #include "llvm/Analysis/ProfileSummaryInfo.h"
 #include "llvm/Analysis/TargetLibraryInfo.h"
 #include "llvm/Analysis/TargetTransformInfo.h"
 #include "llvm/Analysis/ValueTracking.h"
-#include "llvm/CodeGen/TargetPassConfig.h"
-#include "llvm/CodeGen/TargetSubtargetInfo.h"
+#include "llvm/IR/Constants.h"
 #include "llvm/IR/Dominators.h"
 #include "llvm/IR/IRBuilder.h"
 #include "llvm/IR/PatternMatch.h"
 #include "llvm/InitializePasses.h"
+#include "llvm/Support/Debug.h"
 #include "llvm/Target/TargetMachine.h"
 #include "llvm/Transforms/Utils/BasicBlockUtils.h"
 #include "llvm/Transforms/Utils/Local.h"
@@ -35,9 +36,6 @@
 using namespace llvm;
 using namespace llvm::PatternMatch;
 
-namespace llvm {
-class TargetLowering;
-}
 
 #define DEBUG_TYPE "expand-memcmp"
 
@@ -305,6 +303,7 @@ unsigned MemCmpExpansion::getNumBlocks() {
 }
 
 void MemCmpExpansion::createLoadCmpBlocks() {
+  assert(ResBlock.BB && "ResBlock must be created before LoadCmpBlocks");
   for (unsigned i = 0; i < getNumBlocks(); i++) {
     BasicBlock *BB = BasicBlock::Create(CI->getContext(), "loadbb",
                                         EndBlock->getParent(), EndBlock);
@@ -313,6 +312,7 @@ void MemCmpExpansion::createLoadCmpBlocks() {
 }
 
 void MemCmpExpansion::createResultBlock() {
+  assert(EndBlock && "EndBlock must be created before ResultBlock");
   ResBlock.BB = BasicBlock::Create(CI->getContext(), "res_block",
                                    EndBlock->getParent(), EndBlock);
 }
@@ -828,9 +828,9 @@ Value *MemCmpExpansion::getMemCmpExpansion() {
 ///  %phi.res = phi i32 [ %48, %loadbb3 ], [ %11, %res_block ]
 ///  ret i32 %phi.res
 static bool expandMemCmp(CallInst *CI, const TargetTransformInfo *TTI,
-                         const TargetLowering *TLI, const DataLayout *DL,
-                         ProfileSummaryInfo *PSI, BlockFrequencyInfo *BFI,
-                         DomTreeUpdater *DTU, const bool IsBCmp) {
+                         const DataLayout *DL, ProfileSummaryInfo *PSI,
+                         BlockFrequencyInfo *BFI, DomTreeUpdater *DTU,
+                         const bool IsBCmp) {
   NumMemCmpCalls++;
 
   // Early exit from expansion if -Oz.
@@ -845,9 +845,7 @@ static bool expandMemCmp(CallInst *CI, const TargetTransformInfo *TTI,
   }
   const uint64_t SizeVal = SizeCast->getZExtValue();
 
-  if (SizeVal == 0) {
-    return false;
-  }
+
   // TTI call to check if target would like to expand memcmp. Also, get the
   // available load sizes.
   const bool IsUsedForZeroCmp =
@@ -857,28 +855,33 @@ static bool expandMemCmp(CallInst *CI, const TargetTransformInfo *TTI,
   auto Options = TTI->enableMemCmpExpansion(OptForSize,
                                             IsUsedForZeroCmp);
   if (!Options) return false;
+  Value *Res = nullptr;
 
-  if (MemCmpEqZeroNumLoadsPerBlock.getNumOccurrences())
-    Options.NumLoadsPerBlock = MemCmpEqZeroNumLoadsPerBlock;
-
-  if (OptForSize &&
-      MaxLoadsPerMemcmpOptSize.getNumOccurrences())
-    Options.MaxNumLoads = MaxLoadsPerMemcmpOptSize;
+  if (SizeVal == 0) {
+    Res = ConstantInt::get(CI->getFunctionType()->getReturnType(), 0);
+  } else {
+    if (MemCmpEqZeroNumLoadsPerBlock.getNumOccurrences())
+      Options.NumLoadsPerBlock = MemCmpEqZeroNumLoadsPerBlock;
 
-  if (!OptForSize && MaxLoadsPerMemcmp.getNumOccurrences())
-    Options.MaxNumLoads = MaxLoadsPerMemcmp;
+    if (OptForSize &&
+        MaxLoadsPerMemcmpOptSize.getNumOccurrences())
+      Options.MaxNumLoads = MaxLoadsPerMemcmpOptSize;
 
-  MemCmpExpansion Expansion(CI, SizeVal, Options, IsUsedForZeroCmp, *DL, DTU);
+    if (!OptForSize && MaxLoadsPerMemcmp.getNumOccurrences())
+      Options.MaxNumLoads = MaxLoadsPerMemcmp;
 
-  // Don't expand if this will require more loads than desired by the target.
-  if (Expansion.getNumLoads() == 0) {
-    NumMemCmpGreaterThanMax++;
-    return false;
-  }
+    MemCmpExpansion Expansion(CI, SizeVal, Options, IsUsedForZeroCmp, *DL, DTU);
 
-  NumMemCmpInlined++;
+    // Don't expand if this will require more loads than desired by the target.
+    if (Expansion.getNumLoads() == 0) {
+      NumMemCmpGreaterThanMax++;
+      return false;
+    }
 
-  if (Value *Res = Expansion.getMemCmpExpansion()) {
+    NumMemCmpInlined++;
+    Res = Expansion.getMemCmpExpansion();
+  }
+  if (Res) {
     // Replace call with result of expansion and erase call.
     CI->replaceAllUsesWith(Res);
     CI->eraseFromParent();
@@ -889,62 +892,18 @@ static bool expandMemCmp(CallInst *CI, const TargetTransformInfo *TTI,
 
 // Returns true if a change was made.
 static bool runOnBlock(BasicBlock &BB, const TargetLibraryInfo *TLI,
-                       const TargetTransformInfo *TTI, const TargetLowering *TL,
+                       const TargetTransformInfo *TTI,
                        const DataLayout &DL, ProfileSummaryInfo *PSI,
                        BlockFrequencyInfo *BFI, DomTreeUpdater *DTU);
 
 static PreservedAnalyses runImpl(Function &F, const TargetLibraryInfo *TLI,
                                  const TargetTransformInfo *TTI,
-                                 const TargetLowering *TL,
                                  ProfileSummaryInfo *PSI,
                                  BlockFrequencyInfo *BFI, DominatorTree *DT);
 
-class ExpandMemCmpLegacyPass : public FunctionPass {
-public:
-  static char ID;
-
-  ExpandMemCmpLegacyPass() : FunctionPass(ID) {
-    initializeExpandMemCmpLegacyPassPass(*PassRegistry::getPassRegistry());
-  }
-
-  bool runOnFunction(Function &F) override {
-    if (skipFunction(F)) return false;
-
-    auto *TPC = getAnalysisIfAvailable<TargetPassConfig>();
-    if (!TPC) {
-      return false;
-    }
-    const TargetLowering* TL =
-        TPC->getTM<TargetMachine>().getSubtargetImpl(F)->getTargetLowering();
-
-    const TargetLibraryInfo *TLI =
-        &getAnalysis<TargetLibraryInfoWrapperPass>().getTLI(F);
-    const TargetTransformInfo *TTI =
-        &getAnalysis<TargetTransformInfoWrapperPass>().getTTI(F);
-    auto *PSI = &getAnalysis<ProfileSummaryInfoWrapperPass>().getPSI();
-    auto *BFI = (PSI && PSI->hasProfileSummary()) ?
-           &getAnalysis<LazyBlockFrequencyInfoPass>().getBFI() :
-           nullptr;
-    DominatorTree *DT = nullptr;
-    if (auto *DTWP = getAnalysisIfAvailable<DominatorTreeWrapperPass>())
-      DT = &DTWP->getDomTree();
-    auto PA = runImpl(F, TLI, TTI, TL, PSI, BFI, DT);
-    return !PA.areAllPreserved();
-  }
-
-private:
-  void getAnalysisUsage(AnalysisUsage &AU) const override {
-    AU.addRequired<TargetLibraryInfoWrapperPass>();
-    AU.addRequired<TargetTransformInfoWrapperPass>();
-    AU.addRequired<ProfileSummaryInfoWrapperPass>();
-    AU.addPreserved<DominatorTreeWrapperPass>();
-    LazyBlockFrequencyInfoPass::getLazyBFIAnalysisUsage(AU);
-    FunctionPass::getAnalysisUsage(AU);
-  }
-};
 
 bool runOnBlock(BasicBlock &BB, const TargetLibraryInfo *TLI,
-                const TargetTransformInfo *TTI, const TargetLowering *TL,
+                const TargetTransformInfo *TTI,
                 const DataLayout &DL, ProfileSummaryInfo *PSI,
                 BlockFrequencyInfo *BFI, DomTreeUpdater *DTU) {
   for (Instruction &I : BB) {
@@ -955,7 +914,7 @@ bool runOnBlock(BasicBlock &BB, const TargetLibraryInfo *TLI,
     LibFunc Func;
     if (TLI->getLibFunc(*CI, Func) &&
         (Func == LibFunc_memcmp || Func == LibFunc_bcmp) &&
-        expandMemCmp(CI, TTI, TL, &DL, PS...
[truncated]

@llvmbot

llvmbot commented Jan 8, 2024

Copy link
Copy Markdown
Member

@llvm/pr-subscribers-backend-m68k

Author: Gabriel Baraldi (gbaraldi)

Changes

This should allow for optimizations like saving repeated loads between memcmp calls. c.f https://godbolt.org/z/bEna4Md9r, the window on the right shows the generated output of this branch.

One question is where to put this in the pipeline. The inline expansions probably benefit the most from the passes we run early, but we might want to wait a bit so that we can try and prove a constant argument to the memcmp (though I imagine if they are going to be constant, it's directly from the source).

One other question is that some of the x86 tests are massive, I just ported over from the llc test, but they grew quite a bit more.


Patch is 4.02 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/77370.diff

83 Files Affected:

  • (modified) llvm/include/llvm/CodeGen/CodeGenPassBuilder.h (-10)
  • (modified) llvm/include/llvm/CodeGen/MachinePassRegistry.def (-2)
  • (modified) llvm/include/llvm/CodeGen/Passes.h (-2)
  • (modified) llvm/include/llvm/InitializePasses.h (-1)
  • (modified) llvm/include/llvm/LinkAllPasses.h (-1)
  • (renamed) llvm/include/llvm/Transforms/Scalar/ExpandMemCmp.h (+3-3)
  • (modified) llvm/lib/CodeGen/CMakeLists.txt (-1)
  • (modified) llvm/lib/CodeGen/CodeGen.cpp (-1)
  • (modified) llvm/lib/CodeGen/TargetPassConfig.cpp (-11)
  • (modified) llvm/lib/Passes/PassBuilder.cpp (+1-1)
  • (modified) llvm/lib/Passes/PassBuilderPipelines.cpp (+6)
  • (modified) llvm/lib/Passes/PassRegistry.def (+2-1)
  • (modified) llvm/lib/Transforms/Scalar/CMakeLists.txt (+1)
  • (renamed) llvm/lib/Transforms/Scalar/ExpandMemCmp.cpp (+37-96)
  • (modified) llvm/test/CodeGen/AArch64/O3-pipeline.ll (-7)
  • (removed) llvm/test/CodeGen/AArch64/bcmp-inline-small.ll (-98)
  • (removed) llvm/test/CodeGen/AArch64/bcmp.ll (-537)
  • (modified) llvm/test/CodeGen/AArch64/dag-combine-setcc.ll (+25-6)
  • (modified) llvm/test/CodeGen/AArch64/machine-licm-hoist-load.ll (+84-44)
  • (removed) llvm/test/CodeGen/AArch64/memcmp.ll (-3029)
  • (modified) llvm/test/CodeGen/AMDGPU/llc-pipeline.ll (-28)
  • (modified) llvm/test/CodeGen/ARM/O3-pipeline.ll (-7)
  • (removed) llvm/test/CodeGen/BPF/memcmp.ll (-77)
  • (modified) llvm/test/CodeGen/Generic/llc-start-stop.ll (+3-3)
  • (modified) llvm/test/CodeGen/LoongArch/opt-pipeline.ll (+1-8)
  • (modified) llvm/test/CodeGen/M68k/pipeline.ll (-7)
  • (modified) llvm/test/CodeGen/PowerPC/O3-pipeline.ll (+1-8)
  • (removed) llvm/test/CodeGen/PowerPC/memCmpUsedInZeroEqualityComparison.ll (-168)
  • (removed) llvm/test/CodeGen/PowerPC/memcmp-mergeexpand.ll (-39)
  • (removed) llvm/test/CodeGen/PowerPC/memcmp.ll (-62)
  • (removed) llvm/test/CodeGen/PowerPC/memcmpIR.ll (-178)
  • (modified) llvm/test/CodeGen/RISCV/O3-pipeline.ll (+1-8)
  • (removed) llvm/test/CodeGen/X86/memcmp-mergeexpand.ll (-49)
  • (removed) llvm/test/CodeGen/X86/memcmp-minsize-x32.ll (-445)
  • (removed) llvm/test/CodeGen/X86/memcmp-minsize.ll (-433)
  • (removed) llvm/test/CodeGen/X86/memcmp-more-load-pairs-x32.ll (-2911)
  • (removed) llvm/test/CodeGen/X86/memcmp-more-load-pairs.ll (-4006)
  • (removed) llvm/test/CodeGen/X86/memcmp-optsize-x32.ll (-583)
  • (removed) llvm/test/CodeGen/X86/memcmp-optsize.ll (-596)
  • (removed) llvm/test/CodeGen/X86/memcmp-pgso-x32.ll (-600)
  • (removed) llvm/test/CodeGen/X86/memcmp-pgso.ll (-613)
  • (removed) llvm/test/CodeGen/X86/memcmp-x32.ll (-2429)
  • (removed) llvm/test/CodeGen/X86/memcmp.ll (-3065)
  • (modified) llvm/test/CodeGen/X86/opt-pipeline.ll (+1-8)
  • (modified) llvm/test/Other/new-pm-defaults.ll (+3-1)
  • (modified) llvm/test/Other/new-pm-thinlto-postlink-defaults.ll (+3-1)
  • (modified) llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll (+3-1)
  • (modified) llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll (+3-1)
  • (modified) llvm/test/Other/new-pm-thinlto-prelink-defaults.ll (+3-1)
  • (modified) llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll (+14-12)
  • (modified) llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll (+3-1)
  • (added) llvm/test/Transforms/ExpandMemCmp/AArch64/bcmp.ll (+751)
  • (added) llvm/test/Transforms/ExpandMemCmp/AArch64/memcmp-extra.ll (+3434)
  • (modified) llvm/test/Transforms/ExpandMemCmp/AArch64/memcmp.ll (-1)
  • (added) llvm/test/Transforms/ExpandMemCmp/BPF/lit.local.cfg (+4)
  • (added) llvm/test/Transforms/ExpandMemCmp/BPF/memcmp.ll (+119)
  • (added) llvm/test/Transforms/ExpandMemCmp/PowerPC/lit.local.cfg (+2)
  • (added) llvm/test/Transforms/ExpandMemCmp/PowerPC/memCmpUsedInZeroEqualityComparison.ll (+218)
  • (added) llvm/test/Transforms/ExpandMemCmp/PowerPC/memcmp-mergeexpand.ll (+48)
  • (added) llvm/test/Transforms/ExpandMemCmp/PowerPC/memcmp.ll (+70)
  • (added) llvm/test/Transforms/ExpandMemCmp/PowerPC/memcmpIR.ll (+216)
  • (modified) llvm/test/Transforms/ExpandMemCmp/X86/bcmp.ll (+8-8)
  • (added) llvm/test/Transforms/ExpandMemCmp/X86/memcmp-2.ll (+20249)
  • (renamed) llvm/test/Transforms/ExpandMemCmp/X86/memcmp-constant.ll (+47-42)
  • (added) llvm/test/Transforms/ExpandMemCmp/X86/memcmp-minsize-x32.ll (+493)
  • (added) llvm/test/Transforms/ExpandMemCmp/X86/memcmp-minsize.ll (+707)
  • (added) llvm/test/Transforms/ExpandMemCmp/X86/memcmp-more-load-pairs-x32.ll (+6203)
  • (added) llvm/test/Transforms/ExpandMemCmp/X86/memcmp-more-load-pairs.ll (+18833)
  • (added) llvm/test/Transforms/ExpandMemCmp/X86/memcmp-nobuiltin.ll (+248)
  • (added) llvm/test/Transforms/ExpandMemCmp/X86/memcmp-optsize-x32.ll (+870)
  • (added) llvm/test/Transforms/ExpandMemCmp/X86/memcmp-optsize.ll (+1414)
  • (added) llvm/test/Transforms/ExpandMemCmp/X86/memcmp-pgso-x32.ll (+887)
  • (added) llvm/test/Transforms/ExpandMemCmp/X86/memcmp-pgso.ll (+1347)
  • (added) llvm/test/Transforms/ExpandMemCmp/X86/memcmp-x32-2.ll (+4813)
  • (modified) llvm/test/Transforms/ExpandMemCmp/X86/memcmp-x32.ll (+277-246)
  • (modified) llvm/test/Transforms/ExpandMemCmp/X86/memcmp.ll (+618-576)
  • (added) llvm/test/Transforms/PhaseOrdering/PowerPC/lit.local.cfg (+2)
  • (added) llvm/test/Transforms/PhaseOrdering/X86/memcmp-early.ll (+86)
  • (added) llvm/test/Transforms/PhaseOrdering/X86/memcmp-mergeexpand.ll (+62)
  • (added) llvm/test/Transforms/PhaseOrdering/X86/memcmp.ll (+856)
  • (modified) llvm/tools/opt/opt.cpp (-1)
  • (modified) llvm/utils/gn/secondary/llvm/lib/CodeGen/BUILD.gn (-1)
  • (modified) llvm/utils/gn/secondary/llvm/lib/Transforms/Scalar/BUILD.gn (+1)
diff --git a/llvm/include/llvm/CodeGen/CodeGenPassBuilder.h b/llvm/include/llvm/CodeGen/CodeGenPassBuilder.h
index a7cbb0910baabf..556304231b397b 100644
--- a/llvm/include/llvm/CodeGen/CodeGenPassBuilder.h
+++ b/llvm/include/llvm/CodeGen/CodeGenPassBuilder.h
@@ -25,7 +25,6 @@
 #include "llvm/Analysis/TypeBasedAliasAnalysis.h"
 #include "llvm/CodeGen/CallBrPrepare.h"
 #include "llvm/CodeGen/DwarfEHPrepare.h"
-#include "llvm/CodeGen/ExpandMemCmp.h"
 #include "llvm/CodeGen/ExpandReductions.h"
 #include "llvm/CodeGen/GCMetadata.h"
 #include "llvm/CodeGen/IndirectBrExpand.h"
@@ -629,15 +628,6 @@ void CodeGenPassBuilder<Derived>::addIRPasses(AddIRPass &addPass) const {
       addPass(PrintFunctionPass(dbgs(), "\n\n*** Code after LSR ***\n"));
   }
 
-  if (getOptLevel() != CodeGenOptLevel::None) {
-    // The MergeICmpsPass tries to create memcmp calls by grouping sequences of
-    // loads and compares. ExpandMemCmpPass then tries to expand those calls
-    // into optimally-sized loads and compares. The transforms are enabled by a
-    // target lowering hook.
-    if (!Opt.DisableMergeICmps)
-      addPass(MergeICmpsPass());
-    addPass(ExpandMemCmpPass(&TM));
-  }
 
   // Run GC lowering passes for builtin collectors
   // TODO: add a pass insertion point here
diff --git a/llvm/include/llvm/CodeGen/MachinePassRegistry.def b/llvm/include/llvm/CodeGen/MachinePassRegistry.def
index f950dfae7e338b..3c00668aae3897 100644
--- a/llvm/include/llvm/CodeGen/MachinePassRegistry.def
+++ b/llvm/include/llvm/CodeGen/MachinePassRegistry.def
@@ -47,7 +47,6 @@ FUNCTION_PASS("dwarf-eh-prepare", DwarfEHPreparePass, (TM))
 FUNCTION_PASS("ee-instrument", EntryExitInstrumenterPass, (false))
 FUNCTION_PASS("expand-large-div-rem", ExpandLargeDivRemPass, (TM))
 FUNCTION_PASS("expand-large-fp-convert", ExpandLargeFpConvertPass, (TM))
-FUNCTION_PASS("expand-memcmp", ExpandMemCmpPass, (TM))
 FUNCTION_PASS("expand-reductions", ExpandReductionsPass, ())
 FUNCTION_PASS("expandvp", ExpandVectorPredicationPass, ())
 FUNCTION_PASS("indirectbr-expand", IndirectBrExpandPass, (TM))
@@ -55,7 +54,6 @@ FUNCTION_PASS("interleaved-access", InterleavedAccessPass, (TM))
 FUNCTION_PASS("interleaved-load-combine", InterleavedLoadCombinePass, (TM))
 FUNCTION_PASS("lower-constant-intrinsics", LowerConstantIntrinsicsPass, ())
 FUNCTION_PASS("lowerinvoke", LowerInvokePass, ())
-FUNCTION_PASS("mergeicmps", MergeICmpsPass, ())
 FUNCTION_PASS("partially-inline-libcalls", PartiallyInlineLibCallsPass, ())
 FUNCTION_PASS("post-inline-ee-instrument", EntryExitInstrumenterPass, (true))
 FUNCTION_PASS("replace-with-veclib", ReplaceWithVeclib, ())
diff --git a/llvm/include/llvm/CodeGen/Passes.h b/llvm/include/llvm/CodeGen/Passes.h
index ca9fbb1def7624..e5ed5f15f62ed7 100644
--- a/llvm/include/llvm/CodeGen/Passes.h
+++ b/llvm/include/llvm/CodeGen/Passes.h
@@ -519,8 +519,6 @@ namespace llvm {
   // Expands large div/rem instructions.
   FunctionPass *createExpandLargeFpConvertPass();
 
-  // This pass expands memcmp() to load/stores.
-  FunctionPass *createExpandMemCmpLegacyPass();
 
   /// Creates Break False Dependencies pass. \see BreakFalseDeps.cpp
   FunctionPass *createBreakFalseDeps();
diff --git a/llvm/include/llvm/InitializePasses.h b/llvm/include/llvm/InitializePasses.h
index 46b1e95c3c15f3..b0ca9fa942cda3 100644
--- a/llvm/include/llvm/InitializePasses.h
+++ b/llvm/include/llvm/InitializePasses.h
@@ -103,7 +103,6 @@ void initializeEdgeBundlesPass(PassRegistry&);
 void initializeEHContGuardCatchretPass(PassRegistry &);
 void initializeExpandLargeFpConvertLegacyPassPass(PassRegistry&);
 void initializeExpandLargeDivRemLegacyPassPass(PassRegistry&);
-void initializeExpandMemCmpLegacyPassPass(PassRegistry &);
 void initializeExpandPostRAPass(PassRegistry&);
 void initializeExpandReductionsPass(PassRegistry&);
 void initializeExpandVectorPredicationPass(PassRegistry &);
diff --git a/llvm/include/llvm/LinkAllPasses.h b/llvm/include/llvm/LinkAllPasses.h
index 7a21876e565a7c..9aff428fbe938b 100644
--- a/llvm/include/llvm/LinkAllPasses.h
+++ b/llvm/include/llvm/LinkAllPasses.h
@@ -119,7 +119,6 @@ namespace {
       (void) llvm::createPostDomTree();
       (void) llvm::createMergeICmpsLegacyPass();
       (void) llvm::createExpandLargeDivRemPass();
-      (void)llvm::createExpandMemCmpLegacyPass();
       (void) llvm::createExpandVectorPredicationPass();
       std::string buf;
       llvm::raw_string_ostream os(buf);
diff --git a/llvm/include/llvm/CodeGen/ExpandMemCmp.h b/llvm/include/llvm/Transforms/Scalar/ExpandMemCmp.h
similarity index 83%
rename from llvm/include/llvm/CodeGen/ExpandMemCmp.h
rename to llvm/include/llvm/Transforms/Scalar/ExpandMemCmp.h
index 94a877854f327a..94ba0cf9305040 100644
--- a/llvm/include/llvm/CodeGen/ExpandMemCmp.h
+++ b/llvm/include/llvm/Transforms/Scalar/ExpandMemCmp.h
@@ -6,8 +6,8 @@
 //
 //===----------------------------------------------------------------------===//
 
-#ifndef LLVM_CODEGEN_EXPANDMEMCMP_H
-#define LLVM_CODEGEN_EXPANDMEMCMP_H
+#ifndef LLVM_TRANSFORMS_SCALAR_EXPANDMEMCMP_H
+#define LLVM_TRANSFORMS_SCALAR_EXPANDMEMCMP_H
 
 #include "llvm/IR/PassManager.h"
 
@@ -26,4 +26,4 @@ class ExpandMemCmpPass : public PassInfoMixin<ExpandMemCmpPass> {
 
 } // namespace llvm
 
-#endif // LLVM_CODEGEN_EXPANDMEMCMP_H
+#endif // LLVM_TRANSFORMS_SCALAR_EXPANDMEMCMP_H
diff --git a/llvm/lib/CodeGen/CMakeLists.txt b/llvm/lib/CodeGen/CMakeLists.txt
index df2d1831ee5fdb..518432e9a7b32f 100644
--- a/llvm/lib/CodeGen/CMakeLists.txt
+++ b/llvm/lib/CodeGen/CMakeLists.txt
@@ -71,7 +71,6 @@ add_llvm_component_library(LLVMCodeGen
   ExecutionDomainFix.cpp
   ExpandLargeDivRem.cpp
   ExpandLargeFpConvert.cpp
-  ExpandMemCmp.cpp
   ExpandPostRAPseudos.cpp
   ExpandReductions.cpp
   ExpandVectorPredication.cpp
diff --git a/llvm/lib/CodeGen/CodeGen.cpp b/llvm/lib/CodeGen/CodeGen.cpp
index 7b73a7b11ddf1c..043fa4e6eabe8f 100644
--- a/llvm/lib/CodeGen/CodeGen.cpp
+++ b/llvm/lib/CodeGen/CodeGen.cpp
@@ -41,7 +41,6 @@ void llvm::initializeCodeGen(PassRegistry &Registry) {
   initializeEarlyTailDuplicatePass(Registry);
   initializeExpandLargeDivRemLegacyPassPass(Registry);
   initializeExpandLargeFpConvertLegacyPassPass(Registry);
-  initializeExpandMemCmpLegacyPassPass(Registry);
   initializeExpandPostRAPass(Registry);
   initializeFEntryInserterPass(Registry);
   initializeFinalizeISelPass(Registry);
diff --git a/llvm/lib/CodeGen/TargetPassConfig.cpp b/llvm/lib/CodeGen/TargetPassConfig.cpp
index 4003a08a5422dd..33562e90e94426 100644
--- a/llvm/lib/CodeGen/TargetPassConfig.cpp
+++ b/llvm/lib/CodeGen/TargetPassConfig.cpp
@@ -108,9 +108,6 @@ static cl::opt<bool> EnableImplicitNullChecks(
     "enable-implicit-null-checks",
     cl::desc("Fold null checks into faulting memory operations"),
     cl::init(false), cl::Hidden);
-static cl::opt<bool> DisableMergeICmps("disable-mergeicmps",
-    cl::desc("Disable MergeICmps Pass"),
-    cl::init(false), cl::Hidden);
 static cl::opt<bool> PrintLSR("print-lsr-output", cl::Hidden,
     cl::desc("Print LLVM IR produced by the loop-reduce pass"));
 static cl::opt<bool>
@@ -487,7 +484,6 @@ CGPassBuilderOption llvm::getCGPassBuilderOption() {
   SET_BOOLEAN_OPTION(EnableImplicitNullChecks)
   SET_BOOLEAN_OPTION(EnableMachineOutliner)
   SET_BOOLEAN_OPTION(MISchedPostRA)
-  SET_BOOLEAN_OPTION(DisableMergeICmps)
   SET_BOOLEAN_OPTION(DisableLSR)
   SET_BOOLEAN_OPTION(DisableConstantHoisting)
   SET_BOOLEAN_OPTION(DisableCGP)
@@ -872,13 +868,6 @@ void TargetPassConfig::addIRPasses() {
                                         "\n\n*** Code after LSR ***\n"));
     }
 
-    // The MergeICmpsPass tries to create memcmp calls by grouping sequences of
-    // loads and compares. ExpandMemCmpPass then tries to expand those calls
-    // into optimally-sized loads and compares. The transforms are enabled by a
-    // target lowering hook.
-    if (!DisableMergeICmps)
-      addPass(createMergeICmpsLegacyPass());
-    addPass(createExpandMemCmpLegacyPass());
   }
 
   // Run GC lowering passes for builtin collectors
diff --git a/llvm/lib/Passes/PassBuilder.cpp b/llvm/lib/Passes/PassBuilder.cpp
index 439f749bda8bb7..20448554756aca 100644
--- a/llvm/lib/Passes/PassBuilder.cpp
+++ b/llvm/lib/Passes/PassBuilder.cpp
@@ -76,7 +76,6 @@
 #include "llvm/CodeGen/DwarfEHPrepare.h"
 #include "llvm/CodeGen/ExpandLargeDivRem.h"
 #include "llvm/CodeGen/ExpandLargeFpConvert.h"
-#include "llvm/CodeGen/ExpandMemCmp.h"
 #include "llvm/CodeGen/GCMetadata.h"
 #include "llvm/CodeGen/HardwareLoops.h"
 #include "llvm/CodeGen/IndirectBrExpand.h"
@@ -181,6 +180,7 @@
 #include "llvm/Transforms/Scalar/DeadStoreElimination.h"
 #include "llvm/Transforms/Scalar/DivRemPairs.h"
 #include "llvm/Transforms/Scalar/EarlyCSE.h"
+#include "llvm/Transforms/Scalar/ExpandMemCmp.h"
 #include "llvm/Transforms/Scalar/FlattenCFG.h"
 #include "llvm/Transforms/Scalar/Float2Int.h"
 #include "llvm/Transforms/Scalar/GVN.h"
diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp b/llvm/lib/Passes/PassBuilderPipelines.cpp
index 5c6c391049a7b2..e2dd413f12d696 100644
--- a/llvm/lib/Passes/PassBuilderPipelines.cpp
+++ b/llvm/lib/Passes/PassBuilderPipelines.cpp
@@ -86,6 +86,7 @@
 #include "llvm/Transforms/Scalar/DeadStoreElimination.h"
 #include "llvm/Transforms/Scalar/DivRemPairs.h"
 #include "llvm/Transforms/Scalar/EarlyCSE.h"
+#include "llvm/Transforms/Scalar/ExpandMemCmp.h"
 #include "llvm/Transforms/Scalar/Float2Int.h"
 #include "llvm/Transforms/Scalar/GVN.h"
 #include "llvm/Transforms/Scalar/IndVarSimplify.h"
@@ -111,6 +112,7 @@
 #include "llvm/Transforms/Scalar/LowerExpectIntrinsic.h"
 #include "llvm/Transforms/Scalar/LowerMatrixIntrinsics.h"
 #include "llvm/Transforms/Scalar/MemCpyOptimizer.h"
+#include "llvm/Transforms/Scalar/MergeICmps.h"
 #include "llvm/Transforms/Scalar/MergedLoadStoreMotion.h"
 #include "llvm/Transforms/Scalar/NewGVN.h"
 #include "llvm/Transforms/Scalar/Reassociate.h"
@@ -386,6 +388,8 @@ PassBuilder::buildO1FunctionSimplificationPipeline(OptimizationLevel Level,
   if (AreStatisticsEnabled())
     FPM.addPass(CountVisitsPass());
 
+  FPM.addPass(MergeICmpsPass());
+  FPM.addPass(ExpandMemCmpPass(TM));
   // Form SSA out of local memory accesses after breaking apart aggregates into
   // scalars.
   FPM.addPass(SROAPass(SROAOptions::ModifyCFG));
@@ -532,6 +536,8 @@ PassBuilder::buildFunctionSimplificationPipeline(OptimizationLevel Level,
   if (AreStatisticsEnabled())
     FPM.addPass(CountVisitsPass());
 
+  FPM.addPass(MergeICmpsPass());
+  FPM.addPass(ExpandMemCmpPass(TM));
   // Form SSA out of local memory accesses after breaking apart aggregates into
   // scalars.
   FPM.addPass(SROAPass(SROAOptions::ModifyCFG));
diff --git a/llvm/lib/Passes/PassRegistry.def b/llvm/lib/Passes/PassRegistry.def
index 82ce040c649626..31adbf1942b410 100644
--- a/llvm/lib/Passes/PassRegistry.def
+++ b/llvm/lib/Passes/PassRegistry.def
@@ -353,6 +353,7 @@ FUNCTION_PASS("mem2reg", PromotePass())
 FUNCTION_PASS("memcpyopt", MemCpyOptPass())
 FUNCTION_PASS("memprof", MemProfilerPass())
 FUNCTION_PASS("mergeicmps", MergeICmpsPass())
+FUNCTION_PASS("expand-memcmp", ExpandMemCmpPass(TM))
 FUNCTION_PASS("mergereturn", UnifyFunctionExitNodesPass())
 FUNCTION_PASS("move-auto-init", MoveAutoInitPass())
 FUNCTION_PASS("nary-reassociate", NaryReassociatePass())
@@ -415,7 +416,7 @@ FUNCTION_PASS("structurizecfg", StructurizeCFGPass())
 FUNCTION_PASS("tailcallelim", TailCallElimPass())
 FUNCTION_PASS("tlshoist", TLSVariableHoistPass())
 FUNCTION_PASS("transform-warning", WarnMissedTransformationsPass())
-FUNCTION_PASS("trigger-verifier-error", TriggerVerifierErrorPass())  
+FUNCTION_PASS("trigger-verifier-error", TriggerVerifierErrorPass())
 FUNCTION_PASS("tsan", ThreadSanitizerPass())
 FUNCTION_PASS("typepromotion", TypePromotionPass(TM))
 FUNCTION_PASS("unify-loop-exits", UnifyLoopExitsPass())
diff --git a/llvm/lib/Transforms/Scalar/CMakeLists.txt b/llvm/lib/Transforms/Scalar/CMakeLists.txt
index 2dd27037a17de7..f6e666dd071256 100644
--- a/llvm/lib/Transforms/Scalar/CMakeLists.txt
+++ b/llvm/lib/Transforms/Scalar/CMakeLists.txt
@@ -11,6 +11,7 @@ add_llvm_component_library(LLVMScalarOpts
   DeadStoreElimination.cpp
   DFAJumpThreading.cpp
   DivRemPairs.cpp
+  ExpandMemCmp.cpp
   EarlyCSE.cpp
   FlattenCFGPass.cpp
   Float2Int.cpp
diff --git a/llvm/lib/CodeGen/ExpandMemCmp.cpp b/llvm/lib/Transforms/Scalar/ExpandMemCmp.cpp
similarity index 90%
rename from llvm/lib/CodeGen/ExpandMemCmp.cpp
rename to llvm/lib/Transforms/Scalar/ExpandMemCmp.cpp
index bb84813569f4d5..973875ee142978 100644
--- a/llvm/lib/CodeGen/ExpandMemCmp.cpp
+++ b/llvm/lib/Transforms/Scalar/ExpandMemCmp.cpp
@@ -11,21 +11,22 @@
 //
 //===----------------------------------------------------------------------===//
 
-#include "llvm/CodeGen/ExpandMemCmp.h"
+#include "llvm/Transforms/Scalar/ExpandMemCmp.h"
 #include "llvm/ADT/Statistic.h"
 #include "llvm/Analysis/ConstantFolding.h"
 #include "llvm/Analysis/DomTreeUpdater.h"
+#include "llvm/Analysis/GlobalsModRef.h"
 #include "llvm/Analysis/LazyBlockFrequencyInfo.h"
 #include "llvm/Analysis/ProfileSummaryInfo.h"
 #include "llvm/Analysis/TargetLibraryInfo.h"
 #include "llvm/Analysis/TargetTransformInfo.h"
 #include "llvm/Analysis/ValueTracking.h"
-#include "llvm/CodeGen/TargetPassConfig.h"
-#include "llvm/CodeGen/TargetSubtargetInfo.h"
+#include "llvm/IR/Constants.h"
 #include "llvm/IR/Dominators.h"
 #include "llvm/IR/IRBuilder.h"
 #include "llvm/IR/PatternMatch.h"
 #include "llvm/InitializePasses.h"
+#include "llvm/Support/Debug.h"
 #include "llvm/Target/TargetMachine.h"
 #include "llvm/Transforms/Utils/BasicBlockUtils.h"
 #include "llvm/Transforms/Utils/Local.h"
@@ -35,9 +36,6 @@
 using namespace llvm;
 using namespace llvm::PatternMatch;
 
-namespace llvm {
-class TargetLowering;
-}
 
 #define DEBUG_TYPE "expand-memcmp"
 
@@ -305,6 +303,7 @@ unsigned MemCmpExpansion::getNumBlocks() {
 }
 
 void MemCmpExpansion::createLoadCmpBlocks() {
+  assert(ResBlock.BB && "ResBlock must be created before LoadCmpBlocks");
   for (unsigned i = 0; i < getNumBlocks(); i++) {
     BasicBlock *BB = BasicBlock::Create(CI->getContext(), "loadbb",
                                         EndBlock->getParent(), EndBlock);
@@ -313,6 +312,7 @@ void MemCmpExpansion::createLoadCmpBlocks() {
 }
 
 void MemCmpExpansion::createResultBlock() {
+  assert(EndBlock && "EndBlock must be created before ResultBlock");
   ResBlock.BB = BasicBlock::Create(CI->getContext(), "res_block",
                                    EndBlock->getParent(), EndBlock);
 }
@@ -828,9 +828,9 @@ Value *MemCmpExpansion::getMemCmpExpansion() {
 ///  %phi.res = phi i32 [ %48, %loadbb3 ], [ %11, %res_block ]
 ///  ret i32 %phi.res
 static bool expandMemCmp(CallInst *CI, const TargetTransformInfo *TTI,
-                         const TargetLowering *TLI, const DataLayout *DL,
-                         ProfileSummaryInfo *PSI, BlockFrequencyInfo *BFI,
-                         DomTreeUpdater *DTU, const bool IsBCmp) {
+                         const DataLayout *DL, ProfileSummaryInfo *PSI,
+                         BlockFrequencyInfo *BFI, DomTreeUpdater *DTU,
+                         const bool IsBCmp) {
   NumMemCmpCalls++;
 
   // Early exit from expansion if -Oz.
@@ -845,9 +845,7 @@ static bool expandMemCmp(CallInst *CI, const TargetTransformInfo *TTI,
   }
   const uint64_t SizeVal = SizeCast->getZExtValue();
 
-  if (SizeVal == 0) {
-    return false;
-  }
+
   // TTI call to check if target would like to expand memcmp. Also, get the
   // available load sizes.
   const bool IsUsedForZeroCmp =
@@ -857,28 +855,33 @@ static bool expandMemCmp(CallInst *CI, const TargetTransformInfo *TTI,
   auto Options = TTI->enableMemCmpExpansion(OptForSize,
                                             IsUsedForZeroCmp);
   if (!Options) return false;
+  Value *Res = nullptr;
 
-  if (MemCmpEqZeroNumLoadsPerBlock.getNumOccurrences())
-    Options.NumLoadsPerBlock = MemCmpEqZeroNumLoadsPerBlock;
-
-  if (OptForSize &&
-      MaxLoadsPerMemcmpOptSize.getNumOccurrences())
-    Options.MaxNumLoads = MaxLoadsPerMemcmpOptSize;
+  if (SizeVal == 0) {
+    Res = ConstantInt::get(CI->getFunctionType()->getReturnType(), 0);
+  } else {
+    if (MemCmpEqZeroNumLoadsPerBlock.getNumOccurrences())
+      Options.NumLoadsPerBlock = MemCmpEqZeroNumLoadsPerBlock;
 
-  if (!OptForSize && MaxLoadsPerMemcmp.getNumOccurrences())
-    Options.MaxNumLoads = MaxLoadsPerMemcmp;
+    if (OptForSize &&
+        MaxLoadsPerMemcmpOptSize.getNumOccurrences())
+      Options.MaxNumLoads = MaxLoadsPerMemcmpOptSize;
 
-  MemCmpExpansion Expansion(CI, SizeVal, Options, IsUsedForZeroCmp, *DL, DTU);
+    if (!OptForSize && MaxLoadsPerMemcmp.getNumOccurrences())
+      Options.MaxNumLoads = MaxLoadsPerMemcmp;
 
-  // Don't expand if this will require more loads than desired by the target.
-  if (Expansion.getNumLoads() == 0) {
-    NumMemCmpGreaterThanMax++;
-    return false;
-  }
+    MemCmpExpansion Expansion(CI, SizeVal, Options, IsUsedForZeroCmp, *DL, DTU);
 
-  NumMemCmpInlined++;
+    // Don't expand if this will require more loads than desired by the target.
+    if (Expansion.getNumLoads() == 0) {
+      NumMemCmpGreaterThanMax++;
+      return false;
+    }
 
-  if (Value *Res = Expansion.getMemCmpExpansion()) {
+    NumMemCmpInlined++;
+    Res = Expansion.getMemCmpExpansion();
+  }
+  if (Res) {
     // Replace call with result of expansion and erase call.
     CI->replaceAllUsesWith(Res);
     CI->eraseFromParent();
@@ -889,62 +892,18 @@ static bool expandMemCmp(CallInst *CI, const TargetTransformInfo *TTI,
 
 // Returns true if a change was made.
 static bool runOnBlock(BasicBlock &BB, const TargetLibraryInfo *TLI,
-                       const TargetTransformInfo *TTI, const TargetLowering *TL,
+                       const TargetTransformInfo *TTI,
                        const DataLayout &DL, ProfileSummaryInfo *PSI,
                        BlockFrequencyInfo *BFI, DomTreeUpdater *DTU);
 
 static PreservedAnalyses runImpl(Function &F, const TargetLibraryInfo *TLI,
                                  const TargetTransformInfo *TTI,
-                                 const TargetLowering *TL,
                                  ProfileSummaryInfo *PSI,
                                  BlockFrequencyInfo *BFI, DominatorTree *DT);
 
-class ExpandMemCmpLegacyPass : public FunctionPass {
-public:
-  static char ID;
-
-  ExpandMemCmpLegacyPass() : FunctionPass(ID) {
-    initializeExpandMemCmpLegacyPassPass(*PassRegistry::getPassRegistry());
-  }
-
-  bool runOnFunction(Function &F) override {
-    if (skipFunction(F)) return false;
-
-    auto *TPC = getAnalysisIfAvailable<TargetPassConfig>();
-    if (!TPC) {
-      return false;
-    }
-    const TargetLowering* TL =
-        TPC->getTM<TargetMachine>().getSubtargetImpl(F)->getTargetLowering();
-
-    const TargetLibraryInfo *TLI =
-        &getAnalysis<TargetLibraryInfoWrapperPass>().getTLI(F);
-    const TargetTransformInfo *TTI =
-        &getAnalysis<TargetTransformInfoWrapperPass>().getTTI(F);
-    auto *PSI = &getAnalysis<ProfileSummaryInfoWrapperPass>().getPSI();
-    auto *BFI = (PSI && PSI->hasProfileSummary()) ?
-           &getAnalysis<LazyBlockFrequencyInfoPass>().getBFI() :
-           nullptr;
-    DominatorTree *DT = nullptr;
-    if (auto *DTWP = getAnalysisIfAvailable<DominatorTreeWrapperPass>())
-      DT = &DTWP->getDomTree();
-    auto PA = runImpl(F, TLI, TTI, TL, PSI, BFI, DT);
-    return !PA.areAllPreserved();
-  }
-
-private:
-  void getAnalysisUsage(AnalysisUsage &AU) const override {
-    AU.addRequired<TargetLibraryInfoWrapperPass>();
-    AU.addRequired<TargetTransformInfoWrapperPass>();
-    AU.addRequired<ProfileSummaryInfoWrapperPass>();
-    AU.addPreserved<DominatorTreeWrapperPass>();
-    LazyBlockFrequencyInfoPass::getLazyBFIAnalysisUsage(AU);
-    FunctionPass::getAnalysisUsage(AU);
-  }
-};
 
 bool runOnBlock(BasicBlock &BB, const TargetLibraryInfo *TLI,
-                const TargetTransformInfo *TTI, const TargetLowering *TL,
+                const TargetTransformInfo *TTI,
                 const DataLayout &DL, ProfileSummaryInfo *PSI,
                 BlockFrequencyInfo *BFI, DomTreeUpdater *DTU) {
   for (Instruction &I : BB) {
@@ -955,7 +914,7 @@ bool runOnBlock(BasicBlock &BB, const TargetLibraryInfo *TLI,
     LibFunc Func;
     if (TLI->getLibFunc(*CI, Func) &&
         (Func == LibFunc_memcmp || Func == LibFunc_bcmp) &&
-        expandMemCmp(CI, TTI, TL, &DL, PS...
[truncated]

@github-actions

github-actions Bot commented Jan 8, 2024

Copy link
Copy Markdown

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff origin/main HEAD --extensions h,cpp -- llvm/include/llvm/CodeGen/Passes.h llvm/include/llvm/InitializePasses.h llvm/include/llvm/LinkAllPasses.h llvm/include/llvm/Passes/CodeGenPassBuilder.h llvm/include/llvm/Target/CGPassBuilderOption.h llvm/include/llvm/Transforms/Scalar.h llvm/lib/CodeGen/CodeGen.cpp llvm/lib/CodeGen/TargetPassConfig.cpp llvm/lib/Passes/PassBuilder.cpp llvm/lib/Passes/PassBuilderPipelines.cpp llvm/lib/Transforms/Scalar/MergeICmps.cpp llvm/lib/Transforms/Scalar/Scalar.cpp llvm/tools/opt/optdriver.cpp llvm/include/llvm/Transforms/Scalar/ExpandMemCmp.h llvm/lib/Transforms/Scalar/ExpandMemCmp.cpp --diff_from_common_commit

⚠️
The reproduction instructions above might return results for more than one PR
in a stack if you are using a stacked PR workflow. You can limit the results by
changing origin/main to the base branch/commit you want to compare against.
⚠️

View the diff from clang-format here.
diff --git a/llvm/lib/Transforms/Scalar/ExpandMemCmp.cpp b/llvm/lib/Transforms/Scalar/ExpandMemCmp.cpp
index 1e1976c7e..94dc1225f 100644
--- a/llvm/lib/Transforms/Scalar/ExpandMemCmp.cpp
+++ b/llvm/lib/Transforms/Scalar/ExpandMemCmp.cpp
@@ -55,7 +55,6 @@ static cl::opt<unsigned> MaxLoadsPerMemcmpOptSize(
 
 namespace {
 
-
 // This class provides helper functions to expand a memcmp library call into an
 // inline expansion.
 class MemCmpExpansion {
@@ -85,8 +84,7 @@ class MemCmpExpansion {
   // 1x1-byte load, which would be represented as [{16, 0}, {16, 16}, {1, 32}.
   struct LoadEntry {
     LoadEntry(unsigned LoadSize, uint64_t Offset)
-        : LoadSize(LoadSize), Offset(Offset) {
-    }
+        : LoadSize(LoadSize), Offset(Offset) {}
 
     // The size of the load for this block, in bytes.
     unsigned LoadSize;
@@ -716,7 +714,8 @@ Value *MemCmpExpansion::getMemCmpExpansion() {
     // calculate which source was larger. The calculation requires the
     // two loaded source values of each load compare block.
     // These will be saved in the phi nodes created by setupResultBlockPHINodes.
-    if (!IsUsedForZeroCmp) setupResultBlockPHINodes();
+    if (!IsUsedForZeroCmp)
+      setupResultBlockPHINodes();
 
     // Create the number of required load compare basic blocks.
     createLoadCmpBlocks();
@@ -845,15 +844,14 @@ static bool expandMemCmp(CallInst *CI, const TargetTransformInfo *TTI,
   const bool IsUsedForZeroCmp =
       IsBCmp || isOnlyUsedInZeroEqualityComparison(CI);
   bool OptForSize = llvm::shouldOptimizeForSize(CI->getParent(), PSI, BFI);
-  auto Options = TTI->enableMemCmpExpansion(OptForSize,
-                                            IsUsedForZeroCmp);
-  if (!Options) return false;
+  auto Options = TTI->enableMemCmpExpansion(OptForSize, IsUsedForZeroCmp);
+  if (!Options)
+    return false;
 
   if (MemCmpEqZeroNumLoadsPerBlock.getNumOccurrences())
     Options.NumLoadsPerBlock = MemCmpEqZeroNumLoadsPerBlock;
 
-  if (OptForSize &&
-      MaxLoadsPerMemcmpOptSize.getNumOccurrences())
+  if (OptForSize && MaxLoadsPerMemcmpOptSize.getNumOccurrences())
     Options.MaxNumLoads = MaxLoadsPerMemcmpOptSize;
 
   if (!OptForSize && MaxLoadsPerMemcmp.getNumOccurrences())
@@ -916,7 +914,7 @@ PreservedAnalyses runImpl(Function &F, const TargetLibraryInfo *TLI,
   if (DT)
     DTU.emplace(DT, DomTreeUpdater::UpdateStrategy::Lazy);
 
-  const DataLayout& DL = F.getDataLayout();
+  const DataLayout &DL = F.getDataLayout();
   bool MadeChanges = false;
   for (auto BBIt = F.begin(); BBIt != F.end();) {
     if (runOnBlock(*BBIt, TLI, TTI, DL, PSI, BFI, DTU ? &*DTU : nullptr)) {

@efriedma-quic

Copy link
Copy Markdown
Contributor

From the previous review:

I'm going to abandon it for now. This interferes with the sanitizers in hard to fix ways, and I don't have the bandwith to come back to it. sorry.

Have you addressed this somehow?

@gbaraldi

gbaraldi commented Jan 9, 2024

Copy link
Copy Markdown
Contributor Author

So what confused me about that concern was that we don't care about this for memcpy for example, where if instcombine decides to do the inline expansion, it just happens, asan/msan notwidthstanding.
I might be missing something about how the sanitizers work, and maybe we want to keep the call not expanded to make codegen easier, but I also don't see how the sanitizers couldn't reason over some loads/stores/icmps

But if not legal we can probably use the sanitize_* attribute as a sentinel to skip the pass/es

@efriedma-quic

Copy link
Copy Markdown
Contributor

I don't really know what the issue was, exactly. The review doesn't say, and I don't have any further information. We may want to disable memcmp expansion when msan is enabled, to allow the specialized diagnostics to trigger.

I am a little concerned about the semantics of uninitialized memory... currently, we assume that memcmp stops comparing after the first non-equal byte. And the current expansion doesn't try to freeze the loaded values. (See LibCallSimplifier::optimizeStrCmp .)

@arsenm arsenm left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised by the volume of test changes

Comment thread llvm/test/CodeGen/AArch64/machine-licm-hoist-load.ll Outdated
@legrosbuffle

Copy link
Copy Markdown
Contributor

I don't really know what the issue was, exactly. The review doesn't say, and I don't have any further information.

I think I remember that one of these issues was that sanitizers would insert instrumentation around calls to memcmp and around loads and stores. When building in sanitizer mode, the last pass of the pipeline would insert nobuiltin on memcmp calls, so that they are not expanded during codegen and the sanitizer runtime can intercept the calls.

When ExpandMemCmp is moved higher up in the pipeline, it turns memcmp into loads and stores, which prevents the sanitizers from understanding that the semantic is memcmp. So the sanitizers sometimes insert a ton of instrumentation for expanded memcmp calls, which is both inefficient and prevents the runtime from doing as good a job as when a call is present (arguably this is an issue with the sanitizer itself, but unfortunately this is how things are).

There was also the issue mentionned by +emerson:

We also detected some compile time regressions in the CTMark subset of the test suite, with lencod regressing by around 4-5%. I haven't fully bisected but the commit list was short and this seemed to be the only suspicious change.

@gbaraldi

gbaraldi commented Jan 9, 2024

Copy link
Copy Markdown
Contributor Author

I'm surprised by the volume of test changes

I just copied/ported all the tests we had that tested memcmp expansion, which apparently was quite a bit. Though I guess there’s now quite a bit of repetition.

@gbaraldi

gbaraldi commented Jan 9, 2024

Copy link
Copy Markdown
Contributor Author

What is the "recommended" way of running the benchmarks, I ran check-all but this is my first time doing a "large" contribution

@nikic

nikic commented Jan 9, 2024

Copy link
Copy Markdown
Contributor

Compile-time: https://llvm-compile-time-tracker.com/compare.php?from=a776740d6296520b8bde156aa3f8d9ecb32cddd9&to=d36e561ef86107c9bd0413b1d3d5fb0285ed33f7&stat=instructions:u

Notably there are regressions in stage2-O3 (but not stage1-O3), which usually indicates an optimization quality regression that affects clang itself.

@nikic

nikic commented Jan 9, 2024

Copy link
Copy Markdown
Contributor

One question is where to put this in the pipeline. The inline expansions probably benefit the most from the passes we run early, but we might want to wait a bit so that we can try and prove a constant argument to the memcmp (though I imagine if they are going to be constant, it's directly from the source).

I'd probably recommend to start conservatively and add the passes at the end of the module optimization pipeline, before the last SimplifyCFG invocation (in the vicinity of ExpandDivRem). This won't address your motivating case, but it will allow us to iron out issues from moving these passes to the middle-end, while still mostly keeping their current pipeline position.

Once they are in the middle-end pipeline, it will be easier to evaluate different positions.

@gbaraldi

Copy link
Copy Markdown
Contributor Author

@nikic could you rerun the compile time benchmarks?

@nikic

nikic commented Jan 18, 2024

Copy link
Copy Markdown
Contributor

@gbaraldi New compile-time numbers: https://llvm-compile-time-tracker.com/compare.php?from=ecd47811b755d13357085bcd7519a66d6c4d8e5c&to=f582b874c35e2b90046da8e8c30a7fa30ba08167&stat=instructions:u (They are mildly positive now, and there are some text size reductions.)

@gbaraldi

Copy link
Copy Markdown
Contributor Author

That's encouraging at least.

@gbaraldi

Copy link
Copy Markdown
Contributor Author

What is needed for merging this? (Do the compiler-rt/sanitizer tests run in CI)? Apparently they used to be an issue with this change.

@gbaraldi gbaraldi force-pushed the gb/middle-end-memcmp branch from 150e6b0 to 58d3c12 Compare January 26, 2024 13:48
@gbaraldi

Copy link
Copy Markdown
Contributor Author

Ok, it seems CI is finally a bit happier, not sure what was going on. @nikic what do you think is needed to get this in?

@gbaraldi

Copy link
Copy Markdown
Contributor Author

Gentle bump

@gbaraldi

Copy link
Copy Markdown
Contributor Author

@nikic I asked the robot to rebase this and to not move the tests for now, not sure how we want to keep this. Is having tests checking assembly in the Transforms directory fine?

@github-actions

github-actions Bot commented Mar 19, 2026

Copy link
Copy Markdown

🪟 Windows x64 Test Results

  • 133002 tests passed
  • 3048 tests skipped

✅ The build succeeded and all tests passed.

@github-actions

github-actions Bot commented Mar 19, 2026

Copy link
Copy Markdown

🐧 Linux x64 Test Results

  • 193025 tests passed
  • 4980 tests skipped

✅ The build succeeded and all tests passed.

@nikic nikic left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like there are some test failures.

@@ -1,5 +1,5 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
; RUN: llc < %s -disable-wasm-fallthrough-return-opt -wasm-disable-explicit-locals -wasm-keep-registers -mattr=+simd128 | FileCheck %s
; RUN: opt -passes=expand-memcmp -mtriple=wasm32-unknown-unknown -mattr=+simd128 -S < %s | llc -disable-wasm-fallthrough-return-opt -wasm-disable-explicit-locals -wasm-keep-registers -mattr=+simd128 | FileCheck %s

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
; RUN: opt -passes=expand-memcmp -mtriple=wasm32-unknown-unknown -mattr=+simd128 -S < %s | llc -disable-wasm-fallthrough-return-opt -wasm-disable-explicit-locals -wasm-keep-registers -mattr=+simd128 | FileCheck %s
; RUN: opt -passes=expand-memcmp -mtriple=wasm32-unknown-unknown -mattr=+simd128 -S < %s | llc -disable-wasm-fallthrough-return-opt -wasm-disable-explicit-locals -wasm-keep-registers -mattr=+simd128 | FileCheck %s

And similar for other tests.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still open.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This runs -O2 so should be PhaseOrdering test.

Though this also looks redundant with llvm/test/Transforms/PhaseOrdering/X86/expand-memcmp-middle-end.ll.

Comment thread llvm/test/Transforms/PhaseOrdering/X86/expand-memcmp-middle-end.ll
@nikic

nikic commented Mar 19, 2026

Copy link
Copy Markdown
Contributor

@nikic I asked the robot to rebase this and to not move the tests for now, not sure how we want to keep this. Is having tests checking assembly in the Transforms directory fine?

It's unusual but possible (e.g. we do this for LoopStrengthReduce sometimes).

Though I think keeping the current place of those tests is fine.

@@ -95,6 +95,7 @@
#include "llvm/Transforms/Scalar/DFAJumpThreading.h"
#include "llvm/Transforms/Scalar/DeadStoreElimination.h"
#include "llvm/Transforms/Scalar/DivRemPairs.h"
#include "llvm/CodeGen/ExpandMemCmp.h"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I don't think we should be including a CodeGen pass here. We should probably move ExpandMemCmp to Transforms first.

Comment thread llvm/lib/Passes/PassBuilderPipelines.cpp Outdated

@nikic nikic left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks basically fine to me, but there are test failures.

@@ -5,21 +5,14 @@
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//
//
// This header is deprecated. Use llvm/Transforms/Scalar/ExpandMemCmp.h instead.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for this, you can delete the header.

addFunctionPass(MergeICmpsPass(), PMW);
addFunctionPass(ExpandMemCmpPass(), PMW);
// MergeICmps and ExpandMemCmp now run in the middle-end optimization
// pipeline (see PassBuilderPipelines.cpp). They are no longer needed here.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need this comment.

Comment thread llvm/lib/CodeGen/TargetPassConfig.cpp Outdated
addPass(createMergeICmpsLegacyPass());
addPass(createExpandMemCmpLegacyPass());
// MergeICmps and ExpandMemCmp now run in the middle-end optimization
// pipeline (see PassBuilderPipelines.cpp). They are no longer needed here.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, I don't think we need this comment.

Comment thread llvm/lib/Passes/PassBuilder.cpp Outdated
@@ -91,7 +91,7 @@
#include "llvm/CodeGen/EarlyIfConversion.h"
#include "llvm/CodeGen/EdgeBundles.h"
#include "llvm/CodeGen/ExpandIRInsts.h"
#include "llvm/CodeGen/ExpandMemCmp.h"
#include "llvm/Transforms/Scalar/ExpandMemCmp.h"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should move this to the right place.

@@ -1,5 +1,5 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
; RUN: llc < %s -disable-wasm-fallthrough-return-opt -wasm-disable-explicit-locals -wasm-keep-registers -mattr=+simd128 | FileCheck %s
; RUN: opt -passes=expand-memcmp -mtriple=wasm32-unknown-unknown -mattr=+simd128 -S < %s | llc -disable-wasm-fallthrough-return-opt -wasm-disable-explicit-locals -wasm-keep-registers -mattr=+simd128 | FileCheck %s

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still open.

Comment thread llvm/test/Transforms/ExpandMemCmp/X86/sanitizer-skip.ll
declare i32 @memcmp(ptr nocapture, ptr nocapture, i64)

; Two memcmp calls with a shared first argument. After expansion in the
; middle-end, further IR optimizations can optimize the expanded code.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a FIXME here, because this doesn't actually happen right now. (This will require moving the pipeline position on a followup.)

…imization pipeline

This should allow for optimizations like saving repeated loads between
memcmp calls. The passes are added at the end of the module optimization
pipeline, before the last SimplifyCFG invocation.

Also removes the MergeICmpsLegacyPass (now dead code since only the new
PM path is used), the deprecated CodeGen/ExpandMemCmp.h forwarding
header, and the DisableMergeICmps codegen option.

When sanitizer attributes (sanitize_address, sanitize_memory,
sanitize_thread, sanitize_hwaddress) are present, ExpandMemCmp skips
expansion so that sanitizer interceptors can catch memory errors.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@gbaraldi gbaraldi force-pushed the gb/middle-end-memcmp branch from f2772a6 to 20f814a Compare April 1, 2026 18:49

@nikic nikic left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@nikic nikic merged commit 5e0a06b into llvm:main Apr 2, 2026
10 of 11 checks passed
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Apr 4, 2026
llvm: Fix array ABI test to not check equality implementation

LLVM has moved memcmp expansion in the pipeline, resulting in the bcmp call being expanded into loads and register comparisons, which breaks the test.

See llvm/llvm-project#77370

Based on history, I believe the test actually intended validate that these arrays were being passed as pointer arguments, which can be done more directly.

r? @durin42
cc @scottmcm (author of the test being modified)

@rustbot label llvm-main
rust-timer added a commit to rust-lang/rust that referenced this pull request Apr 4, 2026
Rollup merge of #154731 - maurer:array-abi-test, r=scottmcm

llvm: Fix array ABI test to not check equality implementation

LLVM has moved memcmp expansion in the pipeline, resulting in the bcmp call being expanded into loads and register comparisons, which breaks the test.

See llvm/llvm-project#77370

Based on history, I believe the test actually intended validate that these arrays were being passed as pointer arguments, which can be done more directly.

r? @durin42
cc @scottmcm (author of the test being modified)

@rustbot label llvm-main
zwu-2025 pushed a commit to zwu-2025/llvm-project that referenced this pull request May 17, 2026
Moving these into the middle-end pipeline will allow for additional
optimization of the expansion result, such as CSE of redundant loads
(c.f. https://godbolt.org/z/bEna4Md9r). For now, we conservatively place
the passes at the end of the middle-end pipeline, so we mostly don't
benefit from additional optimizations yet. The pipeline position will be
moved in a future change.

This builds on work done by legrosbuffle in
https://reviews.llvm.org/D60318.

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
trdthg added a commit to buddy-compiler/buddy-mlir that referenced this pull request May 29, 2026
## Version and Build Flow

- fetch branch / apply local patch -> fetch exact submodule commit with
  `git fetch origin "${LLVM_COMMIT}"`
- `LLVM_COMMIT + riscv-jitlink.patch hash` -> `LLVM_COMMIT` cache key
- apply `riscv-jitlink.patch` -> no patch; current LLVM already has the fix
- [#172581](llvm/llvm-project#172581) / 3d7018c70b97: `nanobind` /
  `nanobind==2.4.*` -> `nanobind>=2.9,<3.0` / `nanobind==2.9.*`

## Python Bindings

- [#172581](llvm/llvm-project#172581) / 3d7018c70b97:
  `PybindAdaptors.h` -> `NanobindAdaptors.h`
- [#172581](llvm/llvm-project#172581) / 3d7018c70b97:
  `PYBIND11_MODULE` -> `NB_MODULE`
- [#172581](llvm/llvm-project#172581) / 3d7018c70b97: manual
  pybind/nanobind CMake setup -> `mlir_configure_python_dev_packages()`

## MLIR API

- [#162167](llvm/llvm-project#162167) / ea291d0e8c93:
  `vector::SplatOp` -> `vector::BroadcastOp`
- [#149603](llvm/llvm-project#149603) / 33465bb2bb75:
  `vector::InsertElementOp` -> `vector::InsertOp`
- [#149603](llvm/llvm-project#149603) / 33465bb2bb75:
  `vector::ExtractElementOp` -> `vector::ExtractOp`
- [#145445](llvm/llvm-project#145445) / 63f30d7d820c:
  `ValueRange{std::nullopt}` -> `ValueRange{}`
- [#145445](llvm/llvm-project#145445) / 63f30d7d820c:
  `TypeRange(std::nullopt)` -> `TypeRange{}`
- [#145445](llvm/llvm-project#145445) / 63f30d7d820c: empty
  `scf.yield` with null operands -> empty `scf.yield`
- [#196082](llvm/llvm-project#196082) / dd57b0c9728a:
  `computeConstantBound(..., /*closedUB=*/true)` ->
  `computeConstantBound(..., ValueBoundsOptions{/*closedUB=*/true})`
- [#182715](llvm/llvm-project#182715) / 72f5050ae765:
  `applyPatternsAndFoldGreedily(...)` -> `applyPatternsGreedily(...)`

## GPU and AMX

- [#188905](llvm/llvm-project#188905) / c1cff89bdcea:
  `launchOp.getWorkgroupAttributions()` ->
  `launchOp.getWorkgroupAttributionBBArgs()`
- [#188905](llvm/llvm-project#188905) / c1cff89bdcea: manual
  `gpu.kernel` attr -> `outlinedFunc.setKernel(true)`
- [#188905](llvm/llvm-project#188905) / c1cff89bdcea:
  `setAttr(getKnown*AttrName(), ...)` ->
  `setKnownBlockSizeAttr(...)` / `setKnownGridSizeAttr(...)`
- [#111197](llvm/llvm-project#111197) / 2f743ac52e94:
  `mlir/Dialect/AMX/AMXDialect.h` -> `mlir/Dialect/X86/X86Dialect.h`
- [#111197](llvm/llvm-project#111197) / 2f743ac52e94:
  `mlir::amx::AMXDialect` -> `mlir::x86::X86Dialect`

## LLVM Source Lists

- [#167271](llvm/llvm-project#167271) / 2345b7d98f75:
  `InlineSizeEstimatorAnalysis.cpp` -> removed upstream; no Buddy source-list
  entry needed
- [#172680](llvm/llvm-project#172680) / 71760f324ff9:
  `ExpandLargeDivRem.cpp` -> merged into `ExpandFp.cpp`
- [#172681](llvm/llvm-project#172681) / 5c05824d2bd3:
  `ExpandFp.cpp` / merged `ExpandLargeDivRem.cpp` -> `ExpandIRInsts.cpp`
- [#77370](llvm/llvm-project#77370) / 5e0a06b34d09:
  `ExpandMemCmp.cpp` in CodeGen -> moved to `Transforms/Scalar`
- [#181547](llvm/llvm-project#181547) / 9a0d65cdfd0d:
  `CallBrPrepare.cpp` -> `InlineAsmPrepare.cpp`
- [#160454](llvm/llvm-project#160454) / aa6a33ae6556:
  `EVLIndVarSimplify.cpp` -> removed upstream; no Buddy source-list entry needed
- [#192635](llvm/llvm-project#192635) / 680a9908194e:
  `VPlanSLP.cpp` -> removed upstream; no Buddy source-list entry needed

## RISC-V Backend

- local Buddy `M0`-`M7` definitions -> upstream RISC-V `M0`-`M7` definitions
- Buddy `MatrixReg` using local mask regs -> `MatrixReg` using upstream mask regs
- IME `*N` instructions in disassembler tables -> mark affected instructions
  `isCodeGenOnly = 1`

## Tool API and Link Libraries

- [#156715](llvm/llvm-project#156715) / dfbd76bda01e:
  `Expected<std::unique_ptr<ToolOutputFile>>` from
  `setupLLVMOptimizationRemarks` -> `Expected<LLVMRemarkFileHandle>`
- [#186998](llvm/llvm-project#186998) / 69cd746bd2f1:
  `codegen::setFunctionAttributes(CPUStr, FeaturesStr, F)` ->
  `codegen::setFunctionAttributes(F, CPUStr, FeaturesStr[, TuneCPUStr])`
- [#186998](llvm/llvm-project#186998) / 69cd746bd2f1:
  `codegen::setFunctionAttributes(CPUStr, FeaturesStr, *M)` ->
  `codegen::setFunctionAttributes(*M, CPUStr, FeaturesStr[, TuneCPUStr])`
- [#87226](llvm/llvm-project#87226) / d0e72ccc54df:
  `TargetMachine::buildCodeGenPipeline(MPM, OS, DwoOut, FileType, Opt, PIC)` ->
  pass `MAM` and `MMI.getContext()`
- [#170846](llvm/llvm-project#170846) / cd806d7e7689: `buddy-llc`
  without `Plugins` -> link LLVM `Plugins`
- [#150805](llvm/llvm-project#150805) / ace42cf063a5,
  [#151150](llvm/llvm-project#151150) / e68a20e0b762: implicit MLIR
  register-all symbols -> link `MLIRRegisterAllDialects`,
  `MLIRRegisterAllExtensions`, `MLIRRegisterAllPasses`
trdthg added a commit to buddy-compiler/buddy-mlir that referenced this pull request May 29, 2026
- fetch branch / apply local patch -> fetch exact submodule commit with
  `git fetch origin "${LLVM_COMMIT}"`
- `LLVM_COMMIT + riscv-jitlink.patch hash` -> `LLVM_COMMIT` cache key
- apply `riscv-jitlink.patch` -> no patch; current LLVM already has the fix
- [#172581](llvm/llvm-project#172581) / 3d7018c70b97: `nanobind` /
  `nanobind==2.4.*` -> `nanobind>=2.9,<3.0` / `nanobind==2.9.*`

- [#172581](llvm/llvm-project#172581) / 3d7018c70b97:
  `PybindAdaptors.h` -> `NanobindAdaptors.h`
- [#172581](llvm/llvm-project#172581) / 3d7018c70b97:
  `PYBIND11_MODULE` -> `NB_MODULE`
- [#172581](llvm/llvm-project#172581) / 3d7018c70b97: manual
  pybind/nanobind CMake setup -> `mlir_configure_python_dev_packages()`

- [#162167](llvm/llvm-project#162167) / ea291d0e8c93:
  `vector::SplatOp` -> `vector::BroadcastOp`
- [#149603](llvm/llvm-project#149603) / 33465bb2bb75:
  `vector::InsertElementOp` -> `vector::InsertOp`
- [#149603](llvm/llvm-project#149603) / 33465bb2bb75:
  `vector::ExtractElementOp` -> `vector::ExtractOp`
- [#145445](llvm/llvm-project#145445) / 63f30d7d820c:
  `ValueRange{std::nullopt}` -> `ValueRange{}`
- [#145445](llvm/llvm-project#145445) / 63f30d7d820c:
  `TypeRange(std::nullopt)` -> `TypeRange{}`
- [#145445](llvm/llvm-project#145445) / 63f30d7d820c: empty
  `scf.yield` with null operands -> empty `scf.yield`
- [#196082](llvm/llvm-project#196082) / dd57b0c9728a:
  `computeConstantBound(..., /*closedUB=*/true)` ->
  `computeConstantBound(..., ValueBoundsOptions{/*closedUB=*/true})`
- [#182715](llvm/llvm-project#182715) / 72f5050ae765:
  `applyPatternsAndFoldGreedily(...)` -> `applyPatternsGreedily(...)`

- [#188905](llvm/llvm-project#188905) / c1cff89bdcea:
  `launchOp.getWorkgroupAttributions()` ->
  `launchOp.getWorkgroupAttributionBBArgs()`
- [#188905](llvm/llvm-project#188905) / c1cff89bdcea: manual
  `gpu.kernel` attr -> `outlinedFunc.setKernel(true)`
- [#188905](llvm/llvm-project#188905) / c1cff89bdcea:
  `setAttr(getKnown*AttrName(), ...)` ->
  `setKnownBlockSizeAttr(...)` / `setKnownGridSizeAttr(...)`
- [#111197](llvm/llvm-project#111197) / 2f743ac52e94:
  `mlir/Dialect/AMX/AMXDialect.h` -> `mlir/Dialect/X86/X86Dialect.h`
- [#111197](llvm/llvm-project#111197) / 2f743ac52e94:
  `mlir::amx::AMXDialect` -> `mlir::x86::X86Dialect`

- [#167271](llvm/llvm-project#167271) / 2345b7d98f75:
  `InlineSizeEstimatorAnalysis.cpp` -> removed upstream; no Buddy source-list
  entry needed
- [#172680](llvm/llvm-project#172680) / 71760f324ff9:
  `ExpandLargeDivRem.cpp` -> merged into `ExpandFp.cpp`
- [#172681](llvm/llvm-project#172681) / 5c05824d2bd3:
  `ExpandFp.cpp` / merged `ExpandLargeDivRem.cpp` -> `ExpandIRInsts.cpp`
- [#77370](llvm/llvm-project#77370) / 5e0a06b34d09:
  `ExpandMemCmp.cpp` in CodeGen -> moved to `Transforms/Scalar`
- [#181547](llvm/llvm-project#181547) / 9a0d65cdfd0d:
  `CallBrPrepare.cpp` -> `InlineAsmPrepare.cpp`
- [#160454](llvm/llvm-project#160454) / aa6a33ae6556:
  `EVLIndVarSimplify.cpp` -> removed upstream; no Buddy source-list entry needed
- [#192635](llvm/llvm-project#192635) / 680a9908194e:
  `VPlanSLP.cpp` -> removed upstream; no Buddy source-list entry needed

- local Buddy `M0`-`M7` definitions -> upstream RISC-V `M0`-`M7` definitions
- Buddy `MatrixReg` using local mask regs -> `MatrixReg` using upstream mask regs
- IME `*N` instructions in disassembler tables -> mark affected instructions
  `isCodeGenOnly = 1`

- [#156715](llvm/llvm-project#156715) / dfbd76bda01e:
  `Expected<std::unique_ptr<ToolOutputFile>>` from
  `setupLLVMOptimizationRemarks` -> `Expected<LLVMRemarkFileHandle>`
- [#186998](llvm/llvm-project#186998) / 69cd746bd2f1:
  `codegen::setFunctionAttributes(CPUStr, FeaturesStr, F)` ->
  `codegen::setFunctionAttributes(F, CPUStr, FeaturesStr[, TuneCPUStr])`
- [#186998](llvm/llvm-project#186998) / 69cd746bd2f1:
  `codegen::setFunctionAttributes(CPUStr, FeaturesStr, *M)` ->
  `codegen::setFunctionAttributes(*M, CPUStr, FeaturesStr[, TuneCPUStr])`
- [#87226](llvm/llvm-project#87226) / d0e72ccc54df:
  `TargetMachine::buildCodeGenPipeline(MPM, OS, DwoOut, FileType, Opt, PIC)` ->
  pass `MAM` and `MMI.getContext()`
- [#170846](llvm/llvm-project#170846) / cd806d7e7689: `buddy-llc`
  without `Plugins` -> link LLVM `Plugins`
- [#150805](llvm/llvm-project#150805) / ace42cf063a5,
  [#151150](llvm/llvm-project#151150) / e68a20e0b762: implicit MLIR
  register-all symbols -> link `MLIRRegisterAllDialects`,
  `MLIRRegisterAllExtensions`, `MLIRRegisterAllPasses`
trdthg added a commit to buddy-compiler/buddy-mlir that referenced this pull request May 29, 2026
- fetch branch / apply local patch -> fetch exact submodule commit with
  `git fetch origin "${LLVM_COMMIT}"`
- `LLVM_COMMIT + riscv-jitlink.patch hash` -> `LLVM_COMMIT` cache key
- apply `riscv-jitlink.patch` -> no patch; current LLVM already has the fix
- [#172581](llvm/llvm-project#172581) / 3d7018c70b97: `nanobind` /
  `nanobind==2.4.*` -> `nanobind>=2.9,<3.0` / `nanobind==2.9.*`

- [#172581](llvm/llvm-project#172581) / 3d7018c70b97:
  `PybindAdaptors.h` -> `NanobindAdaptors.h`
- [#172581](llvm/llvm-project#172581) / 3d7018c70b97:
  `PYBIND11_MODULE` -> `NB_MODULE`
- [#172581](llvm/llvm-project#172581) / 3d7018c70b97: manual
  pybind/nanobind CMake setup -> `mlir_configure_python_dev_packages()`

- [#162167](llvm/llvm-project#162167) / ea291d0e8c93:
  `vector::SplatOp` -> `vector::BroadcastOp`
- [#149603](llvm/llvm-project#149603) / 33465bb2bb75:
  `vector::InsertElementOp` -> `vector::InsertOp`
- [#149603](llvm/llvm-project#149603) / 33465bb2bb75:
  `vector::ExtractElementOp` -> `vector::ExtractOp`
- [#145445](llvm/llvm-project#145445) / 63f30d7d820c:
  `ValueRange{std::nullopt}` -> `ValueRange{}`
- [#145445](llvm/llvm-project#145445) / 63f30d7d820c:
  `TypeRange(std::nullopt)` -> `TypeRange{}`
- [#145445](llvm/llvm-project#145445) / 63f30d7d820c: empty
  `scf.yield` with null operands -> empty `scf.yield`
- [#196082](llvm/llvm-project#196082) / dd57b0c9728a:
  `computeConstantBound(..., /*closedUB=*/true)` ->
  `computeConstantBound(..., ValueBoundsOptions{/*closedUB=*/true})`
- [#182715](llvm/llvm-project#182715) / 72f5050ae765:
  `applyPatternsAndFoldGreedily(...)` -> `applyPatternsGreedily(...)`

- [#188905](llvm/llvm-project#188905) / c1cff89bdcea:
  `launchOp.getWorkgroupAttributions()` ->
  `launchOp.getWorkgroupAttributionBBArgs()`
- [#188905](llvm/llvm-project#188905) / c1cff89bdcea: manual
  `gpu.kernel` attr -> `outlinedFunc.setKernel(true)`
- [#188905](llvm/llvm-project#188905) / c1cff89bdcea:
  `setAttr(getKnown*AttrName(), ...)` ->
  `setKnownBlockSizeAttr(...)` / `setKnownGridSizeAttr(...)`
- [#111197](llvm/llvm-project#111197) / 2f743ac52e94:
  `mlir/Dialect/AMX/AMXDialect.h` -> `mlir/Dialect/X86/X86Dialect.h`
- [#111197](llvm/llvm-project#111197) / 2f743ac52e94:
  `mlir::amx::AMXDialect` -> `mlir::x86::X86Dialect`

- [#167271](llvm/llvm-project#167271) / 2345b7d98f75:
  `InlineSizeEstimatorAnalysis.cpp` -> removed upstream; no Buddy source-list
  entry needed
- [#172680](llvm/llvm-project#172680) / 71760f324ff9:
  `ExpandLargeDivRem.cpp` -> merged into `ExpandFp.cpp`
- [#172681](llvm/llvm-project#172681) / 5c05824d2bd3:
  `ExpandFp.cpp` / merged `ExpandLargeDivRem.cpp` -> `ExpandIRInsts.cpp`
- [#77370](llvm/llvm-project#77370) / 5e0a06b34d09:
  `ExpandMemCmp.cpp` in CodeGen -> moved to `Transforms/Scalar`
- [#181547](llvm/llvm-project#181547) / 9a0d65cdfd0d:
  `CallBrPrepare.cpp` -> `InlineAsmPrepare.cpp`
- [#160454](llvm/llvm-project#160454) / aa6a33ae6556:
  `EVLIndVarSimplify.cpp` -> removed upstream; no Buddy source-list entry needed
- [#192635](llvm/llvm-project#192635) / 680a9908194e:
  `VPlanSLP.cpp` -> removed upstream; no Buddy source-list entry needed

- local Buddy `M0`-`M7` definitions -> upstream RISC-V `M0`-`M7` definitions
- Buddy `MatrixReg` using local mask regs -> `MatrixReg` using upstream mask regs
- IME `*N` instructions in disassembler tables -> mark affected instructions
  `isCodeGenOnly = 1`

- [#156715](llvm/llvm-project#156715) / dfbd76bda01e:
  `Expected<std::unique_ptr<ToolOutputFile>>` from
  `setupLLVMOptimizationRemarks` -> `Expected<LLVMRemarkFileHandle>`
- [#186998](llvm/llvm-project#186998) / 69cd746bd2f1:
  `codegen::setFunctionAttributes(CPUStr, FeaturesStr, F)` ->
  `codegen::setFunctionAttributes(F, CPUStr, FeaturesStr[, TuneCPUStr])`
- [#186998](llvm/llvm-project#186998) / 69cd746bd2f1:
  `codegen::setFunctionAttributes(CPUStr, FeaturesStr, *M)` ->
  `codegen::setFunctionAttributes(*M, CPUStr, FeaturesStr[, TuneCPUStr])`
- [#87226](llvm/llvm-project#87226) / d0e72ccc54df:
  `TargetMachine::buildCodeGenPipeline(MPM, OS, DwoOut, FileType, Opt, PIC)` ->
  pass `MAM` and `MMI.getContext()`
- [#170846](llvm/llvm-project#170846) / cd806d7e7689: `buddy-llc`
  without `Plugins` -> link LLVM `Plugins`
- [#150805](llvm/llvm-project#150805) / ace42cf063a5,
  [#151150](llvm/llvm-project#151150) / e68a20e0b762: implicit MLIR
  register-all symbols -> link `MLIRRegisterAllDialects`,
  `MLIRRegisterAllExtensions`, `MLIRRegisterAllPasses`
trdthg added a commit to buddy-compiler/buddy-mlir that referenced this pull request Jun 1, 2026
- fetch branch / apply local patch -> fetch exact submodule commit with
  `git fetch origin "${LLVM_COMMIT}"`
- `LLVM_COMMIT + riscv-jitlink.patch hash` -> `LLVM_COMMIT` cache key
- apply `riscv-jitlink.patch` -> no patch; current LLVM already has the fix
- [#172581](llvm/llvm-project#172581) / 3d7018c70b97: `nanobind` /
  `nanobind==2.4.*` -> `nanobind>=2.9,<3.0` / `nanobind==2.9.*`

- [#172581](llvm/llvm-project#172581) / 3d7018c70b97:
  `PybindAdaptors.h` -> `NanobindAdaptors.h`
- [#172581](llvm/llvm-project#172581) / 3d7018c70b97:
  `PYBIND11_MODULE` -> `NB_MODULE`
- [#172581](llvm/llvm-project#172581) / 3d7018c70b97: manual
  pybind/nanobind CMake setup -> `mlir_configure_python_dev_packages()`

- [#162167](llvm/llvm-project#162167) / ea291d0e8c93:
  `vector::SplatOp` -> `vector::BroadcastOp`
- [#149603](llvm/llvm-project#149603) / 33465bb2bb75:
  `vector::InsertElementOp` -> `vector::InsertOp`
- [#149603](llvm/llvm-project#149603) / 33465bb2bb75:
  `vector::ExtractElementOp` -> `vector::ExtractOp`
- [#145445](llvm/llvm-project#145445) / 63f30d7d820c:
  `ValueRange{std::nullopt}` -> `ValueRange{}`
- [#145445](llvm/llvm-project#145445) / 63f30d7d820c:
  `TypeRange(std::nullopt)` -> `TypeRange{}`
- [#145445](llvm/llvm-project#145445) / 63f30d7d820c: empty
  `scf.yield` with null operands -> empty `scf.yield`
- [#196082](llvm/llvm-project#196082) / dd57b0c9728a:
  `computeConstantBound(..., /*closedUB=*/true)` ->
  `computeConstantBound(..., ValueBoundsOptions{/*closedUB=*/true})`
- [#182715](llvm/llvm-project#182715) / 72f5050ae765:
  `applyPatternsAndFoldGreedily(...)` -> `applyPatternsGreedily(...)`

- [#188905](llvm/llvm-project#188905) / c1cff89bdcea:
  `launchOp.getWorkgroupAttributions()` ->
  `launchOp.getWorkgroupAttributionBBArgs()`
- [#188905](llvm/llvm-project#188905) / c1cff89bdcea: manual
  `gpu.kernel` attr -> `outlinedFunc.setKernel(true)`
- [#188905](llvm/llvm-project#188905) / c1cff89bdcea:
  `setAttr(getKnown*AttrName(), ...)` ->
  `setKnownBlockSizeAttr(...)` / `setKnownGridSizeAttr(...)`
- [#111197](llvm/llvm-project#111197) / 2f743ac52e94:
  `mlir/Dialect/AMX/AMXDialect.h` -> `mlir/Dialect/X86/X86Dialect.h`
- [#111197](llvm/llvm-project#111197) / 2f743ac52e94:
  `mlir::amx::AMXDialect` -> `mlir::x86::X86Dialect`

- [#167271](llvm/llvm-project#167271) / 2345b7d98f75:
  `InlineSizeEstimatorAnalysis.cpp` -> removed upstream; no Buddy source-list
  entry needed
- [#172680](llvm/llvm-project#172680) / 71760f324ff9:
  `ExpandLargeDivRem.cpp` -> merged into `ExpandFp.cpp`
- [#172681](llvm/llvm-project#172681) / 5c05824d2bd3:
  `ExpandFp.cpp` / merged `ExpandLargeDivRem.cpp` -> `ExpandIRInsts.cpp`
- [#77370](llvm/llvm-project#77370) / 5e0a06b34d09:
  `ExpandMemCmp.cpp` in CodeGen -> moved to `Transforms/Scalar`
- [#181547](llvm/llvm-project#181547) / 9a0d65cdfd0d:
  `CallBrPrepare.cpp` -> `InlineAsmPrepare.cpp`
- [#160454](llvm/llvm-project#160454) / aa6a33ae6556:
  `EVLIndVarSimplify.cpp` -> removed upstream; no Buddy source-list entry needed
- [#192635](llvm/llvm-project#192635) / 680a9908194e:
  `VPlanSLP.cpp` -> removed upstream; no Buddy source-list entry needed

- local Buddy `M0`-`M7` definitions -> upstream RISC-V `M0`-`M7` definitions
- Buddy `MatrixReg` using local mask regs -> `MatrixReg` using upstream mask regs
- IME `*N` instructions in disassembler tables -> mark affected instructions
  `isCodeGenOnly = 1`

- [#156715](llvm/llvm-project#156715) / dfbd76bda01e:
  `Expected<std::unique_ptr<ToolOutputFile>>` from
  `setupLLVMOptimizationRemarks` -> `Expected<LLVMRemarkFileHandle>`
- [#186998](llvm/llvm-project#186998) / 69cd746bd2f1:
  `codegen::setFunctionAttributes(CPUStr, FeaturesStr, F)` ->
  `codegen::setFunctionAttributes(F, CPUStr, FeaturesStr[, TuneCPUStr])`
- [#186998](llvm/llvm-project#186998) / 69cd746bd2f1:
  `codegen::setFunctionAttributes(CPUStr, FeaturesStr, *M)` ->
  `codegen::setFunctionAttributes(*M, CPUStr, FeaturesStr[, TuneCPUStr])`
- [#87226](llvm/llvm-project#87226) / d0e72ccc54df:
  `TargetMachine::buildCodeGenPipeline(MPM, OS, DwoOut, FileType, Opt, PIC)` ->
  pass `MAM` and `MMI.getContext()`
- [#170846](llvm/llvm-project#170846) / cd806d7e7689: `buddy-llc`
  without `Plugins` -> link LLVM `Plugins`
- [#150805](llvm/llvm-project#150805) / ace42cf063a5,
  [#151150](llvm/llvm-project#151150) / e68a20e0b762: implicit MLIR
  register-all symbols -> link `MLIRRegisterAllDialects`,
  `MLIRRegisterAllExtensions`, `MLIRRegisterAllPasses`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants