[PowerPC] Fix i128 vcmpequb optimization for loads with range metadata and small constants by amy-kwan · Pull Request #196801 · llvm/llvm-project

amy-kwan · 2026-05-10T13:27:37Z

The combine introduced in 55aff64 lowers scalar i128 compares into vector compares by reissuing the original loads as v16i8 loads. However, the combine was reusing the original MachineMemOperand without modification.

If the original i128 load carries !range metadata, the MMO encodes that range using i128 values. Reusing this MMO for a v16i8 load is incorrect as range metadata is only valid for integer scalar types and its bitwidth must match the memory VT.

This patch fixes this by creating a new MachineMemOperand for the vector vector load. Additionally, we restrict the combine for constant operands to avoid cases that are better handled by scalar lowering. Small constants (fit within 16 bits) are excluded to prevent generating suboptimal vector compares.

…a and small constants The combine introduced in 55aff64 lowers scalar i128 compares into vector compares by reissuing the original loads as v16i8 loads. However, the combine was reusing the original MachineMemOperand without modification. If the original i128 load carries !range metadata, the MMO encodes that range using i128 values. Reusing this MMO for a v16i8 load is incorrect as range metadata is only valid for integer scalar types and its bitwidth must match the memory VT. This patch fixes this by creating a new MachineMemOperand for the vector vector load. Additionally, we restrict the combine for constant operands to avoid cases that are better handled by scalar lowering. Small constants (fit within 16 bits) are excluded to prevent generating suboptimal vector compares.

llvmorg-github-actions · 2026-05-10T13:28:12Z

@llvm/pr-subscribers-backend-powerpc

Author: Amy Kwan (amy-kwan)

Changes

The combine introduced in 55aff64 lowers scalar i128 compares into vector compares by reissuing the original loads as v16i8 loads. However, the combine was reusing the original MachineMemOperand without modification.

If the original i128 load carries !range metadata, the MMO encodes that range using i128 values. Reusing this MMO for a v16i8 load is incorrect as range metadata is only valid for integer scalar types and its bitwidth must match the memory VT.

This patch fixes this by creating a new MachineMemOperand for the vector vector load. Additionally, we restrict the combine for constant operands to avoid cases that are better handled by scalar lowering. Small constants (fit within 16 bits) are excluded to prevent generating suboptimal vector compares.

Full diff: https://github.com/llvm/llvm-project/pull/196801.diff

2 Files Affected:

(modified) llvm/lib/Target/PowerPC/PPCISelLowering.cpp (+21-6)
(added) llvm/test/CodeGen/PowerPC/ppc-i128-cmp.ll (+210)

diff --git a/llvm/lib/Target/PowerPC/PPCISelLowering.cpp b/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
index e959100d713dd..a5fc479292717 100644
--- a/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
+++ b/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
@@ -15797,8 +15797,14 @@ static bool canConvertToVcmpequb(SDValue &LHS, SDValue &RHS) {
     if (Operand.getValueType() != MVT::i128)
       return false;
 
-    if (Operand.getOpcode() == ISD::Constant)
-      return true;
+    if (Operand.getOpcode() == ISD::Constant) {
+      auto *C = cast<ConstantSDNode>(Operand);
+      const APInt &Val = C->getAPIntValue();
+      if (Val.ult(1ULL << 16))
+        return false;
+      else
+        return true;
+    }
 
     auto *LoadNode = dyn_cast<LoadSDNode>(Operand);
     if (!LoadNode)
@@ -15849,10 +15855,19 @@ SDValue convertTwoLoadsAndCmpToVCMPEQUB(SelectionDAG &DAG, SDNode *N,
     assert(Operand.getOpcode() == ISD::LOAD && "Must be LoadSDNode here.");
 
     auto *LoadNode = cast<LoadSDNode>(Operand);
-    SDValue NewLoad =
-        DAG.getLoad(MVT::v16i8, DL, LoadNode->getChain(),
-                    LoadNode->getBasePtr(), LoadNode->getMemOperand());
-    DAG.ReplaceAllUsesOfValueWith(Operand.getValue(1), NewLoad.getValue(1));
+    // Create a new MachineMemOperand without range metadata.
+    // Range metadata is only valid for integer scalar types, not vectors.
+    // The original i128 load may have range metadata, but when we convert
+    // to v16i8, that metadata is no longer semantically valid.
+    MachineMemOperand *MMO = LoadNode->getMemOperand();
+    MachineFunction &MF = DAG.getMachineFunction();
+    MachineMemOperand *NewMMO = MF.getMachineMemOperand(
+        MMO->getPointerInfo(), MMO->getFlags(), MMO->getSize(),
+        MMO->getAlign(), MMO->getAAInfo(), nullptr, MMO->getSyncScopeID(),
+        MMO->getSuccessOrdering(), MMO->getFailureOrdering());
+    SDValue NewLoad = DAG.getLoad(MVT::v16i8, DL, LoadNode->getChain(),
+                                   LoadNode->getBasePtr(), NewMMO);
+    DAG.ReplaceAllUsesOfValueWith(SDValue(LoadNode, 1), NewLoad.getValue(1));
     return NewLoad;
   };
 
diff --git a/llvm/test/CodeGen/PowerPC/ppc-i128-cmp.ll b/llvm/test/CodeGen/PowerPC/ppc-i128-cmp.ll
new file mode 100644
index 0000000000000..29b4c076bcadf
--- /dev/null
+++ b/llvm/test/CodeGen/PowerPC/ppc-i128-cmp.ll
@@ -0,0 +1,210 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 6
+; RUN: llc -mcpu=pwr8 -ppc-asm-full-reg-names -mtriple=powerpc64-ibm-aix < %s | \
+; RUN:   FileCheck %s --check-prefix=CHECK-AIX
+; RUN: llc -mcpu=pwr8 -ppc-asm-full-reg-names -mtriple=powerpc64le-unknown-linux-gnu < %s | \
+; RUN:   FileCheck %s --check-prefix=CHECK-LINUX
+
+define i1 @test1() {
+; CHECK-AIX-LABEL: test1:
+; CHECK-AIX:       # %bb.0: # %bb
+; CHECK-AIX-NEXT:    ld r3, 0(0)
+; CHECK-AIX-NEXT:    ld r4, 8(0)
+; CHECK-AIX-NEXT:    or r3, r4, r3
+; CHECK-AIX-NEXT:    cntlzd r3, r3
+; CHECK-AIX-NEXT:    rldicl r3, r3, 58, 63
+; CHECK-AIX-NEXT:    blr
+;
+; CHECK-LINUX-LABEL: test1:
+; CHECK-LINUX:       # %bb.0: # %bb
+; CHECK-LINUX-NEXT:    ld r3, 8(0)
+; CHECK-LINUX-NEXT:    ld r4, 0(0)
+; CHECK-LINUX-NEXT:    or r3, r4, r3
+; CHECK-LINUX-NEXT:    cntlzd r3, r3
+; CHECK-LINUX-NEXT:    rldicl r3, r3, 58, 63
+; CHECK-LINUX-NEXT:    blr
+bb:
+  %load = load i128, ptr null, align 16
+  %icmp = icmp eq i128 %load, 0
+  ret i1 %icmp
+}
+
+define i1 @test2() {
+; CHECK-AIX-LABEL: test2:
+; CHECK-AIX:       # %bb.0: # %bb
+; CHECK-AIX-NEXT:    ld r4, 8(0)
+; CHECK-AIX-NEXT:    ld r3, 0(0)
+; CHECK-AIX-NEXT:    xori r4, r4, 10
+; CHECK-AIX-NEXT:    or r3, r4, r3
+; CHECK-AIX-NEXT:    cntlzd r3, r3
+; CHECK-AIX-NEXT:    rldicl r3, r3, 58, 63
+; CHECK-AIX-NEXT:    blr
+;
+; CHECK-LINUX-LABEL: test2:
+; CHECK-LINUX:       # %bb.0: # %bb
+; CHECK-LINUX-NEXT:    ld r4, 0(0)
+; CHECK-LINUX-NEXT:    ld r3, 8(0)
+; CHECK-LINUX-NEXT:    xori r4, r4, 10
+; CHECK-LINUX-NEXT:    or r3, r4, r3
+; CHECK-LINUX-NEXT:    cntlzd r3, r3
+; CHECK-LINUX-NEXT:    rldicl r3, r3, 58, 63
+; CHECK-LINUX-NEXT:    blr
+bb:
+  %load = load i128, ptr null, align 16
+  %icmp = icmp eq i128 %load, 10
+  ret i1 %icmp
+}
+
+define i1 @test3() {
+; CHECK-AIX-LABEL: test3:
+; CHECK-AIX:       # %bb.0: # %bb
+; CHECK-AIX-NEXT:    ld r4, 8(0)
+; CHECK-AIX-NEXT:    ld r3, 0(0)
+; CHECK-AIX-NEXT:    xori r4, r4, 65535
+; CHECK-AIX-NEXT:    or r3, r4, r3
+; CHECK-AIX-NEXT:    cntlzd r3, r3
+; CHECK-AIX-NEXT:    rldicl r3, r3, 58, 63
+; CHECK-AIX-NEXT:    blr
+;
+; CHECK-LINUX-LABEL: test3:
+; CHECK-LINUX:       # %bb.0: # %bb
+; CHECK-LINUX-NEXT:    ld r4, 0(0)
+; CHECK-LINUX-NEXT:    ld r3, 8(0)
+; CHECK-LINUX-NEXT:    xori r4, r4, 65535
+; CHECK-LINUX-NEXT:    or r3, r4, r3
+; CHECK-LINUX-NEXT:    cntlzd r3, r3
+; CHECK-LINUX-NEXT:    rldicl r3, r3, 58, 63
+; CHECK-LINUX-NEXT:    blr
+bb:
+  %load = load i128, ptr null, align 16
+  %icmp = icmp eq i128 %load, 65535
+  ret i1 %icmp
+}
+
+define i1 @test4() {
+; CHECK-AIX-LABEL: test4:
+; CHECK-AIX:       # %bb.0: # %bb
+; CHECK-AIX-NEXT:    li r3, 0
+; CHECK-AIX-NEXT:    lxvw4x vs34, 0, r3
+; CHECK-AIX-NEXT:    ld r3, L..C0(r2) # %const.0
+; CHECK-AIX-NEXT:    lxvd2x vs35, 0, r3
+; CHECK-AIX-NEXT:    vcmpequb. v2, v2, v3
+; CHECK-AIX-NEXT:    mfocrf r3, 2
+; CHECK-AIX-NEXT:    rlwinm r3, r3, 25, 31, 31
+; CHECK-AIX-NEXT:    blr
+;
+; CHECK-LINUX-LABEL: test4:
+; CHECK-LINUX:       # %bb.0: # %bb
+; CHECK-LINUX-NEXT:    li r3, 0
+; CHECK-LINUX-NEXT:    lxvd2x vs34, 0, r3
+; CHECK-LINUX-NEXT:    addis r3, r2, .LCPI3_0@toc@ha
+; CHECK-LINUX-NEXT:    addi r3, r3, .LCPI3_0@toc@l
+; CHECK-LINUX-NEXT:    lxvd2x vs35, 0, r3
+; CHECK-LINUX-NEXT:    vcmpequb. v2, v2, v3
+; CHECK-LINUX-NEXT:    mfocrf r3, 2
+; CHECK-LINUX-NEXT:    rlwinm r3, r3, 25, 31, 31
+; CHECK-LINUX-NEXT:    blr
+bb:
+  %load = load i128, ptr null, align 16
+  %icmp = icmp eq i128 %load, 65536
+  ret i1 %icmp
+}
+
+; Test using the !range metadata
+define i1 @test5() {
+; CHECK-AIX-LABEL: test5:
+; CHECK-AIX:       # %bb.0: # %bb
+; CHECK-AIX-NEXT:    ld r3, 0(0)
+; CHECK-AIX-NEXT:    ld r4, 8(0)
+; CHECK-AIX-NEXT:    or r3, r4, r3
+; CHECK-AIX-NEXT:    cntlzd r3, r3
+; CHECK-AIX-NEXT:    rldicl r3, r3, 58, 63
+; CHECK-AIX-NEXT:    blr
+;
+; CHECK-LINUX-LABEL: test5:
+; CHECK-LINUX:       # %bb.0: # %bb
+; CHECK-LINUX-NEXT:    ld r3, 8(0)
+; CHECK-LINUX-NEXT:    ld r4, 0(0)
+; CHECK-LINUX-NEXT:    or r3, r4, r3
+; CHECK-LINUX-NEXT:    cntlzd r3, r3
+; CHECK-LINUX-NEXT:    rldicl r3, r3, 58, 63
+; CHECK-LINUX-NEXT:    blr
+bb:
+  %load = load i128, ptr null, align 16, !range !0
+  %icmp = icmp eq i128 %load, 0
+  ret i1 %icmp
+}
+
+define i1 @test6() {
+; CHECK-AIX-LABEL: test6:
+; CHECK-AIX:       # %bb.0: # %bb
+; CHECK-AIX-NEXT:    ld r4, 8(0)
+; CHECK-AIX-NEXT:    ld r3, 0(0)
+; CHECK-AIX-NEXT:    xori r4, r4, 65535
+; CHECK-AIX-NEXT:    or r3, r4, r3
+; CHECK-AIX-NEXT:    cntlzd r3, r3
+; CHECK-AIX-NEXT:    rldicl r3, r3, 58, 63
+; CHECK-AIX-NEXT:    blr
+;
+; CHECK-LINUX-LABEL: test6:
+; CHECK-LINUX:       # %bb.0: # %bb
+; CHECK-LINUX-NEXT:    ld r4, 0(0)
+; CHECK-LINUX-NEXT:    ld r3, 8(0)
+; CHECK-LINUX-NEXT:    xori r4, r4, 65535
+; CHECK-LINUX-NEXT:    or r3, r4, r3
+; CHECK-LINUX-NEXT:    cntlzd r3, r3
+; CHECK-LINUX-NEXT:    rldicl r3, r3, 58, 63
+; CHECK-LINUX-NEXT:    blr
+bb:
+  %load = load i128, ptr null, align 16, !range !1
+  %icmp = icmp eq i128 %load, 65535
+  ret i1 %icmp
+}
+
+define i1 @test7() {
+; CHECK-AIX-LABEL: test7:
+; CHECK-AIX:       # %bb.0: # %bb
+; CHECK-AIX-NEXT:    li r3, 0
+; CHECK-AIX-NEXT:    blr
+;
+; CHECK-LINUX-LABEL: test7:
+; CHECK-LINUX:       # %bb.0: # %bb
+; CHECK-LINUX-NEXT:    li r3, 0
+; CHECK-LINUX-NEXT:    blr
+bb:
+  %load = load i128, ptr null, align 16, !range !1
+  %icmp = icmp eq i128 %load, 65536
+  ret i1 %icmp
+}
+
+define i1 @test8() {
+; CHECK-AIX-LABEL: test8:
+; CHECK-AIX:       # %bb.0: # %bb
+; CHECK-AIX-NEXT:    li r3, 0
+; CHECK-AIX-NEXT:    lxvw4x vs34, 0, r3
+; CHECK-AIX-NEXT:    ld r3, L..C1(r2) # %const.0
+; CHECK-AIX-NEXT:    lxvd2x vs35, 0, r3
+; CHECK-AIX-NEXT:    vcmpequb. v2, v2, v3
+; CHECK-AIX-NEXT:    mfocrf r3, 2
+; CHECK-AIX-NEXT:    rlwinm r3, r3, 25, 31, 31
+; CHECK-AIX-NEXT:    blr
+;
+; CHECK-LINUX-LABEL: test8:
+; CHECK-LINUX:       # %bb.0: # %bb
+; CHECK-LINUX-NEXT:    li r3, 0
+; CHECK-LINUX-NEXT:    lxvd2x vs34, 0, r3
+; CHECK-LINUX-NEXT:    addis r3, r2, .LCPI7_0@toc@ha
+; CHECK-LINUX-NEXT:    addi r3, r3, .LCPI7_0@toc@l
+; CHECK-LINUX-NEXT:    lxvd2x vs35, 0, r3
+; CHECK-LINUX-NEXT:    vcmpequb. v2, v2, v3
+; CHECK-LINUX-NEXT:    mfocrf r3, 2
+; CHECK-LINUX-NEXT:    rlwinm r3, r3, 25, 31, 31
+; CHECK-LINUX-NEXT:    blr
+bb:
+  %load = load i128, ptr null, align 16, !range !2
+  %icmp = icmp eq i128 %load, 65536
+  ret i1 %icmp
+}
+
+!0 = !{i128 0, i128 2}
+!1 = !{i128 0, i128 65536}
+!2 = !{i128 0, i128 65537}

github-actions · 2026-05-10T13:29:02Z

✅ With the latest revision this PR passed the C/C++ code formatter.

diggerlin · 2026-05-11T17:57:48Z

+    if (Operand.getOpcode() == ISD::Constant) {
+      auto *C = cast<ConstantSDNode>(Operand);
+      const APInt &Val = C->getAPIntValue();
+      if (Val.ult(1ULL << 16))


I suggest that add a comment here " When comparing an i128 value loaded from memory against a constant, the comparison can be lowered to xori or or if the constant is less than 2¹⁶, since xori's immediate field is 16 bits wide."

and I guess comparing with constant zero when target is 32bit. convertTwoLoadsAndCmpToVCMPEQUB will has less instructions.

if (Val.ult(1ULL << 16))

-->

if (Val.ult(1ULL << 16) && DAG.getTargetLoweringInfo().getPointerTy(DAG.getDataLayout()).getSizeInBits() == 64 )

diggerlin

LGTM, but please address my comment first.

diggerlin · 2026-05-14T17:30:43Z

@@ -3,6 +3,8 @@
 ; RUN:   FileCheck %s --check-prefix=CHECK-AIX


change CHECK-AIX --> CHECK-AIX64

amy-kwan · 2026-05-17T13:09:27Z

Cherry-pick this into the release branch to resolve an assertion involving range metadata.

/cherry-pick 1907b58

llvmbot · 2026-05-17T13:16:48Z

/pull-request #198177

…a and small constants (llvm#196801) The combine introduced in 55aff64 lowers scalar i128 compares into vector compares by reissuing the original loads as v16i8 loads. However, the combine was reusing the original MachineMemOperand without modification. If the original i128 load carries !range metadata, the MMO encodes that range using i128 values. Reusing this MMO for a v16i8 load is incorrect as range metadata is only valid for integer scalar types and its bitwidth must match the memory VT. This patch fixes this by creating a new MachineMemOperand for the vector vector load. Additionally, we restrict the combine for constant operands to avoid cases that are better handled by scalar lowering. Small constants (fit within 16 bits) are excluded to prevent generating suboptimal vector compares.

…a and small constants (llvm#196801) The combine introduced in 55aff64 lowers scalar i128 compares into vector compares by reissuing the original loads as v16i8 loads. However, the combine was reusing the original MachineMemOperand without modification. If the original i128 load carries !range metadata, the MMO encodes that range using i128 values. Reusing this MMO for a v16i8 load is incorrect as range metadata is only valid for integer scalar types and its bitwidth must match the memory VT. This patch fixes this by creating a new MachineMemOperand for the vector vector load. Additionally, we restrict the combine for constant operands to avoid cases that are better handled by scalar lowering. Small constants (fit within 16 bits) are excluded to prevent generating suboptimal vector compares. (cherry picked from commit 1907b58)

amy-kwan requested review from RolandF77, diggerlin, lei137, mandlebug and maryammo May 10, 2026 13:27

amy-kwan self-assigned this May 10, 2026

llvmorg-github-actions Bot added the backend:PowerPC label May 10, 2026

Apply clang-format.

5071ab3

diggerlin reviewed May 11, 2026

View reviewed changes

Comment thread llvm/test/CodeGen/PowerPC/ppc-i128-cmp.ll

Address review comments for 32-bit

dd1fd94

diggerlin approved these changes May 14, 2026

View reviewed changes

Update check prefix

095f753

amy-kwan merged commit 1907b58 into llvm:main May 17, 2026
10 checks passed

amy-kwan added this to the LLVM 22.x Release milestone May 17, 2026

github-project-automation Bot added this to LLVM Release Status May 17, 2026

github-project-automation Bot moved this to Needs Triage in LLVM Release Status May 17, 2026

github-project-automation Bot moved this from Needs Triage to Done in LLVM Release Status May 17, 2026

daltenty mentioned this pull request Jun 12, 2026

Update LLVM to 22.1.7 rust-lang/llvm-project#196

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PowerPC] Fix i128 vcmpequb optimization for loads with range metadata and small constants#196801

[PowerPC] Fix i128 vcmpequb optimization for loads with range metadata and small constants#196801
amy-kwan merged 4 commits into
llvm:mainfrom
amy-kwan:amyk/i128-load-cmp

amy-kwan commented May 10, 2026

Uh oh!

llvmorg-github-actions Bot commented May 10, 2026

Uh oh!

github-actions Bot commented May 10, 2026 •

edited

Loading

Uh oh!

diggerlin May 11, 2026

Uh oh!

Uh oh!

diggerlin left a comment

Uh oh!

diggerlin May 14, 2026

Uh oh!

Uh oh!

amy-kwan commented May 17, 2026

Uh oh!

llvmbot commented May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

amy-kwan commented May 10, 2026

Uh oh!

llvmorg-github-actions Bot commented May 10, 2026

Uh oh!

github-actions Bot commented May 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

diggerlin May 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

diggerlin left a comment

Choose a reason for hiding this comment

Uh oh!

diggerlin May 14, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

amy-kwan commented May 17, 2026

Uh oh!

llvmbot commented May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions Bot commented May 10, 2026 •

edited

Loading