Skip to content

[PowerPC] Fix i128 vcmpequb optimization for loads with range metadata and small constants#196801

Merged
amy-kwan merged 4 commits into
llvm:mainfrom
amy-kwan:amyk/i128-load-cmp
May 17, 2026
Merged

[PowerPC] Fix i128 vcmpequb optimization for loads with range metadata and small constants#196801
amy-kwan merged 4 commits into
llvm:mainfrom
amy-kwan:amyk/i128-load-cmp

Conversation

@amy-kwan

Copy link
Copy Markdown
Member

The combine introduced in 55aff64 lowers scalar i128 compares into vector compares by reissuing the original loads as v16i8 loads. However, the combine was reusing the original MachineMemOperand without modification.

If the original i128 load carries !range metadata, the MMO encodes that range using i128 values. Reusing this MMO for a v16i8 load is incorrect as range metadata is only valid for integer scalar types and its bitwidth must match the memory VT.

This patch fixes this by creating a new MachineMemOperand for the vector vector load. Additionally, we restrict the combine for constant operands to avoid cases that are better handled by scalar lowering. Small constants (fit within 16 bits) are excluded to prevent generating suboptimal vector compares.

…a and small constants

The combine introduced in 55aff64 lowers scalar
i128 compares into vector compares by reissuing the original
loads as v16i8 loads. However, the combine was reusing the original
MachineMemOperand without modification.

If the original i128 load carries !range metadata, the MMO encodes that
range using i128 values. Reusing this MMO for a v16i8 load is incorrect as
range metadata is only valid for integer scalar types and its bitwidth
must match the memory VT.

This patch fixes this by creating a new MachineMemOperand for the vector
vector load. Additionally, we restrict the combine for constant operands
to avoid cases that are better handled by scalar lowering. Small constants
(fit within 16 bits) are excluded to prevent generating suboptimal vector compares.
@llvmorg-github-actions

Copy link
Copy Markdown

@llvm/pr-subscribers-backend-powerpc

Author: Amy Kwan (amy-kwan)

Changes

The combine introduced in 55aff64 lowers scalar i128 compares into vector compares by reissuing the original loads as v16i8 loads. However, the combine was reusing the original MachineMemOperand without modification.

If the original i128 load carries !range metadata, the MMO encodes that range using i128 values. Reusing this MMO for a v16i8 load is incorrect as range metadata is only valid for integer scalar types and its bitwidth must match the memory VT.

This patch fixes this by creating a new MachineMemOperand for the vector vector load. Additionally, we restrict the combine for constant operands to avoid cases that are better handled by scalar lowering. Small constants (fit within 16 bits) are excluded to prevent generating suboptimal vector compares.


Full diff: https://github.com/llvm/llvm-project/pull/196801.diff

2 Files Affected:

  • (modified) llvm/lib/Target/PowerPC/PPCISelLowering.cpp (+21-6)
  • (added) llvm/test/CodeGen/PowerPC/ppc-i128-cmp.ll (+210)
diff --git a/llvm/lib/Target/PowerPC/PPCISelLowering.cpp b/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
index e959100d713dd..a5fc479292717 100644
--- a/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
+++ b/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
@@ -15797,8 +15797,14 @@ static bool canConvertToVcmpequb(SDValue &LHS, SDValue &RHS) {
     if (Operand.getValueType() != MVT::i128)
       return false;
 
-    if (Operand.getOpcode() == ISD::Constant)
-      return true;
+    if (Operand.getOpcode() == ISD::Constant) {
+      auto *C = cast<ConstantSDNode>(Operand);
+      const APInt &Val = C->getAPIntValue();
+      if (Val.ult(1ULL << 16))
+        return false;
+      else
+        return true;
+    }
 
     auto *LoadNode = dyn_cast<LoadSDNode>(Operand);
     if (!LoadNode)
@@ -15849,10 +15855,19 @@ SDValue convertTwoLoadsAndCmpToVCMPEQUB(SelectionDAG &DAG, SDNode *N,
     assert(Operand.getOpcode() == ISD::LOAD && "Must be LoadSDNode here.");
 
     auto *LoadNode = cast<LoadSDNode>(Operand);
-    SDValue NewLoad =
-        DAG.getLoad(MVT::v16i8, DL, LoadNode->getChain(),
-                    LoadNode->getBasePtr(), LoadNode->getMemOperand());
-    DAG.ReplaceAllUsesOfValueWith(Operand.getValue(1), NewLoad.getValue(1));
+    // Create a new MachineMemOperand without range metadata.
+    // Range metadata is only valid for integer scalar types, not vectors.
+    // The original i128 load may have range metadata, but when we convert
+    // to v16i8, that metadata is no longer semantically valid.
+    MachineMemOperand *MMO = LoadNode->getMemOperand();
+    MachineFunction &MF = DAG.getMachineFunction();
+    MachineMemOperand *NewMMO = MF.getMachineMemOperand(
+        MMO->getPointerInfo(), MMO->getFlags(), MMO->getSize(),
+        MMO->getAlign(), MMO->getAAInfo(), nullptr, MMO->getSyncScopeID(),
+        MMO->getSuccessOrdering(), MMO->getFailureOrdering());
+    SDValue NewLoad = DAG.getLoad(MVT::v16i8, DL, LoadNode->getChain(),
+                                   LoadNode->getBasePtr(), NewMMO);
+    DAG.ReplaceAllUsesOfValueWith(SDValue(LoadNode, 1), NewLoad.getValue(1));
     return NewLoad;
   };
 
diff --git a/llvm/test/CodeGen/PowerPC/ppc-i128-cmp.ll b/llvm/test/CodeGen/PowerPC/ppc-i128-cmp.ll
new file mode 100644
index 0000000000000..29b4c076bcadf
--- /dev/null
+++ b/llvm/test/CodeGen/PowerPC/ppc-i128-cmp.ll
@@ -0,0 +1,210 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 6
+; RUN: llc -mcpu=pwr8 -ppc-asm-full-reg-names -mtriple=powerpc64-ibm-aix < %s | \
+; RUN:   FileCheck %s --check-prefix=CHECK-AIX
+; RUN: llc -mcpu=pwr8 -ppc-asm-full-reg-names -mtriple=powerpc64le-unknown-linux-gnu < %s | \
+; RUN:   FileCheck %s --check-prefix=CHECK-LINUX
+
+define i1 @test1() {
+; CHECK-AIX-LABEL: test1:
+; CHECK-AIX:       # %bb.0: # %bb
+; CHECK-AIX-NEXT:    ld r3, 0(0)
+; CHECK-AIX-NEXT:    ld r4, 8(0)
+; CHECK-AIX-NEXT:    or r3, r4, r3
+; CHECK-AIX-NEXT:    cntlzd r3, r3
+; CHECK-AIX-NEXT:    rldicl r3, r3, 58, 63
+; CHECK-AIX-NEXT:    blr
+;
+; CHECK-LINUX-LABEL: test1:
+; CHECK-LINUX:       # %bb.0: # %bb
+; CHECK-LINUX-NEXT:    ld r3, 8(0)
+; CHECK-LINUX-NEXT:    ld r4, 0(0)
+; CHECK-LINUX-NEXT:    or r3, r4, r3
+; CHECK-LINUX-NEXT:    cntlzd r3, r3
+; CHECK-LINUX-NEXT:    rldicl r3, r3, 58, 63
+; CHECK-LINUX-NEXT:    blr
+bb:
+  %load = load i128, ptr null, align 16
+  %icmp = icmp eq i128 %load, 0
+  ret i1 %icmp
+}
+
+define i1 @test2() {
+; CHECK-AIX-LABEL: test2:
+; CHECK-AIX:       # %bb.0: # %bb
+; CHECK-AIX-NEXT:    ld r4, 8(0)
+; CHECK-AIX-NEXT:    ld r3, 0(0)
+; CHECK-AIX-NEXT:    xori r4, r4, 10
+; CHECK-AIX-NEXT:    or r3, r4, r3
+; CHECK-AIX-NEXT:    cntlzd r3, r3
+; CHECK-AIX-NEXT:    rldicl r3, r3, 58, 63
+; CHECK-AIX-NEXT:    blr
+;
+; CHECK-LINUX-LABEL: test2:
+; CHECK-LINUX:       # %bb.0: # %bb
+; CHECK-LINUX-NEXT:    ld r4, 0(0)
+; CHECK-LINUX-NEXT:    ld r3, 8(0)
+; CHECK-LINUX-NEXT:    xori r4, r4, 10
+; CHECK-LINUX-NEXT:    or r3, r4, r3
+; CHECK-LINUX-NEXT:    cntlzd r3, r3
+; CHECK-LINUX-NEXT:    rldicl r3, r3, 58, 63
+; CHECK-LINUX-NEXT:    blr
+bb:
+  %load = load i128, ptr null, align 16
+  %icmp = icmp eq i128 %load, 10
+  ret i1 %icmp
+}
+
+define i1 @test3() {
+; CHECK-AIX-LABEL: test3:
+; CHECK-AIX:       # %bb.0: # %bb
+; CHECK-AIX-NEXT:    ld r4, 8(0)
+; CHECK-AIX-NEXT:    ld r3, 0(0)
+; CHECK-AIX-NEXT:    xori r4, r4, 65535
+; CHECK-AIX-NEXT:    or r3, r4, r3
+; CHECK-AIX-NEXT:    cntlzd r3, r3
+; CHECK-AIX-NEXT:    rldicl r3, r3, 58, 63
+; CHECK-AIX-NEXT:    blr
+;
+; CHECK-LINUX-LABEL: test3:
+; CHECK-LINUX:       # %bb.0: # %bb
+; CHECK-LINUX-NEXT:    ld r4, 0(0)
+; CHECK-LINUX-NEXT:    ld r3, 8(0)
+; CHECK-LINUX-NEXT:    xori r4, r4, 65535
+; CHECK-LINUX-NEXT:    or r3, r4, r3
+; CHECK-LINUX-NEXT:    cntlzd r3, r3
+; CHECK-LINUX-NEXT:    rldicl r3, r3, 58, 63
+; CHECK-LINUX-NEXT:    blr
+bb:
+  %load = load i128, ptr null, align 16
+  %icmp = icmp eq i128 %load, 65535
+  ret i1 %icmp
+}
+
+define i1 @test4() {
+; CHECK-AIX-LABEL: test4:
+; CHECK-AIX:       # %bb.0: # %bb
+; CHECK-AIX-NEXT:    li r3, 0
+; CHECK-AIX-NEXT:    lxvw4x vs34, 0, r3
+; CHECK-AIX-NEXT:    ld r3, L..C0(r2) # %const.0
+; CHECK-AIX-NEXT:    lxvd2x vs35, 0, r3
+; CHECK-AIX-NEXT:    vcmpequb. v2, v2, v3
+; CHECK-AIX-NEXT:    mfocrf r3, 2
+; CHECK-AIX-NEXT:    rlwinm r3, r3, 25, 31, 31
+; CHECK-AIX-NEXT:    blr
+;
+; CHECK-LINUX-LABEL: test4:
+; CHECK-LINUX:       # %bb.0: # %bb
+; CHECK-LINUX-NEXT:    li r3, 0
+; CHECK-LINUX-NEXT:    lxvd2x vs34, 0, r3
+; CHECK-LINUX-NEXT:    addis r3, r2, .LCPI3_0@toc@ha
+; CHECK-LINUX-NEXT:    addi r3, r3, .LCPI3_0@toc@l
+; CHECK-LINUX-NEXT:    lxvd2x vs35, 0, r3
+; CHECK-LINUX-NEXT:    vcmpequb. v2, v2, v3
+; CHECK-LINUX-NEXT:    mfocrf r3, 2
+; CHECK-LINUX-NEXT:    rlwinm r3, r3, 25, 31, 31
+; CHECK-LINUX-NEXT:    blr
+bb:
+  %load = load i128, ptr null, align 16
+  %icmp = icmp eq i128 %load, 65536
+  ret i1 %icmp
+}
+
+; Test using the !range metadata
+define i1 @test5() {
+; CHECK-AIX-LABEL: test5:
+; CHECK-AIX:       # %bb.0: # %bb
+; CHECK-AIX-NEXT:    ld r3, 0(0)
+; CHECK-AIX-NEXT:    ld r4, 8(0)
+; CHECK-AIX-NEXT:    or r3, r4, r3
+; CHECK-AIX-NEXT:    cntlzd r3, r3
+; CHECK-AIX-NEXT:    rldicl r3, r3, 58, 63
+; CHECK-AIX-NEXT:    blr
+;
+; CHECK-LINUX-LABEL: test5:
+; CHECK-LINUX:       # %bb.0: # %bb
+; CHECK-LINUX-NEXT:    ld r3, 8(0)
+; CHECK-LINUX-NEXT:    ld r4, 0(0)
+; CHECK-LINUX-NEXT:    or r3, r4, r3
+; CHECK-LINUX-NEXT:    cntlzd r3, r3
+; CHECK-LINUX-NEXT:    rldicl r3, r3, 58, 63
+; CHECK-LINUX-NEXT:    blr
+bb:
+  %load = load i128, ptr null, align 16, !range !0
+  %icmp = icmp eq i128 %load, 0
+  ret i1 %icmp
+}
+
+define i1 @test6() {
+; CHECK-AIX-LABEL: test6:
+; CHECK-AIX:       # %bb.0: # %bb
+; CHECK-AIX-NEXT:    ld r4, 8(0)
+; CHECK-AIX-NEXT:    ld r3, 0(0)
+; CHECK-AIX-NEXT:    xori r4, r4, 65535
+; CHECK-AIX-NEXT:    or r3, r4, r3
+; CHECK-AIX-NEXT:    cntlzd r3, r3
+; CHECK-AIX-NEXT:    rldicl r3, r3, 58, 63
+; CHECK-AIX-NEXT:    blr
+;
+; CHECK-LINUX-LABEL: test6:
+; CHECK-LINUX:       # %bb.0: # %bb
+; CHECK-LINUX-NEXT:    ld r4, 0(0)
+; CHECK-LINUX-NEXT:    ld r3, 8(0)
+; CHECK-LINUX-NEXT:    xori r4, r4, 65535
+; CHECK-LINUX-NEXT:    or r3, r4, r3
+; CHECK-LINUX-NEXT:    cntlzd r3, r3
+; CHECK-LINUX-NEXT:    rldicl r3, r3, 58, 63
+; CHECK-LINUX-NEXT:    blr
+bb:
+  %load = load i128, ptr null, align 16, !range !1
+  %icmp = icmp eq i128 %load, 65535
+  ret i1 %icmp
+}
+
+define i1 @test7() {
+; CHECK-AIX-LABEL: test7:
+; CHECK-AIX:       # %bb.0: # %bb
+; CHECK-AIX-NEXT:    li r3, 0
+; CHECK-AIX-NEXT:    blr
+;
+; CHECK-LINUX-LABEL: test7:
+; CHECK-LINUX:       # %bb.0: # %bb
+; CHECK-LINUX-NEXT:    li r3, 0
+; CHECK-LINUX-NEXT:    blr
+bb:
+  %load = load i128, ptr null, align 16, !range !1
+  %icmp = icmp eq i128 %load, 65536
+  ret i1 %icmp
+}
+
+define i1 @test8() {
+; CHECK-AIX-LABEL: test8:
+; CHECK-AIX:       # %bb.0: # %bb
+; CHECK-AIX-NEXT:    li r3, 0
+; CHECK-AIX-NEXT:    lxvw4x vs34, 0, r3
+; CHECK-AIX-NEXT:    ld r3, L..C1(r2) # %const.0
+; CHECK-AIX-NEXT:    lxvd2x vs35, 0, r3
+; CHECK-AIX-NEXT:    vcmpequb. v2, v2, v3
+; CHECK-AIX-NEXT:    mfocrf r3, 2
+; CHECK-AIX-NEXT:    rlwinm r3, r3, 25, 31, 31
+; CHECK-AIX-NEXT:    blr
+;
+; CHECK-LINUX-LABEL: test8:
+; CHECK-LINUX:       # %bb.0: # %bb
+; CHECK-LINUX-NEXT:    li r3, 0
+; CHECK-LINUX-NEXT:    lxvd2x vs34, 0, r3
+; CHECK-LINUX-NEXT:    addis r3, r2, .LCPI7_0@toc@ha
+; CHECK-LINUX-NEXT:    addi r3, r3, .LCPI7_0@toc@l
+; CHECK-LINUX-NEXT:    lxvd2x vs35, 0, r3
+; CHECK-LINUX-NEXT:    vcmpequb. v2, v2, v3
+; CHECK-LINUX-NEXT:    mfocrf r3, 2
+; CHECK-LINUX-NEXT:    rlwinm r3, r3, 25, 31, 31
+; CHECK-LINUX-NEXT:    blr
+bb:
+  %load = load i128, ptr null, align 16, !range !2
+  %icmp = icmp eq i128 %load, 65536
+  ret i1 %icmp
+}
+
+!0 = !{i128 0, i128 2}
+!1 = !{i128 0, i128 65536}
+!2 = !{i128 0, i128 65537}

@github-actions

github-actions Bot commented May 10, 2026

Copy link
Copy Markdown

✅ With the latest revision this PR passed the C/C++ code formatter.

if (Operand.getOpcode() == ISD::Constant) {
auto *C = cast<ConstantSDNode>(Operand);
const APInt &Val = C->getAPIntValue();
if (Val.ult(1ULL << 16))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest that add a comment here " When comparing an i128 value loaded from memory against a constant, the comparison can be lowered to xori or or if the constant is less than 2¹⁶, since xori's immediate field is 16 bits wide."

and I guess comparing with constant zero when target is 32bit. convertTwoLoadsAndCmpToVCMPEQUB will has less instructions.

if (Val.ult(1ULL << 16))

-->

if (Val.ult(1ULL << 16) && DAG.getTargetLoweringInfo().getPointerTy(DAG.getDataLayout()).getSizeInBits() == 64 )

Comment thread llvm/test/CodeGen/PowerPC/ppc-i128-cmp.ll

@diggerlin diggerlin left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but please address my comment first.

@@ -3,6 +3,8 @@
; RUN: FileCheck %s --check-prefix=CHECK-AIX

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change CHECK-AIX --> CHECK-AIX64

@amy-kwan amy-kwan merged commit 1907b58 into llvm:main May 17, 2026
10 checks passed
@amy-kwan amy-kwan added this to the LLVM 22.x Release milestone May 17, 2026
@github-project-automation github-project-automation Bot moved this to Needs Triage in LLVM Release Status May 17, 2026
@github-project-automation github-project-automation Bot moved this from Needs Triage to Done in LLVM Release Status May 17, 2026
@amy-kwan

Copy link
Copy Markdown
Member Author

Cherry-pick this into the release branch to resolve an assertion involving range metadata.

/cherry-pick 1907b58

@llvmbot

llvmbot commented May 17, 2026

Copy link
Copy Markdown
Member

/pull-request #198177

pedroMVicente pushed a commit to pedroMVicente/llvm-project that referenced this pull request May 19, 2026
…a and small constants (llvm#196801)

The combine introduced in 55aff64
lowers scalar i128 compares into vector compares by reissuing the
original loads as v16i8 loads. However, the combine was reusing the
original MachineMemOperand without modification.

If the original i128 load carries !range metadata, the MMO encodes that
range using i128 values. Reusing this MMO for a v16i8 load is incorrect
as range metadata is only valid for integer scalar types and its
bitwidth must match the memory VT.

This patch fixes this by creating a new MachineMemOperand for the vector
vector load. Additionally, we restrict the combine for constant operands
to avoid cases that are better handled by scalar lowering. Small
constants (fit within 16 bits) are excluded to prevent generating
suboptimal vector compares.
c-rhodes pushed a commit to llvmbot/llvm-project that referenced this pull request Jun 1, 2026
…a and small constants (llvm#196801)

The combine introduced in 55aff64
lowers scalar i128 compares into vector compares by reissuing the
original loads as v16i8 loads. However, the combine was reusing the
original MachineMemOperand without modification.

If the original i128 load carries !range metadata, the MMO encodes that
range using i128 values. Reusing this MMO for a v16i8 load is incorrect
as range metadata is only valid for integer scalar types and its
bitwidth must match the memory VT.

This patch fixes this by creating a new MachineMemOperand for the vector
vector load. Additionally, we restrict the combine for constant operands
to avoid cases that are better handled by scalar lowering. Small
constants (fit within 16 bits) are excluded to prevent generating
suboptimal vector compares.

(cherry picked from commit 1907b58)
daunabomba pushed a commit to daunabomba/llvm-project that referenced this pull request Jun 2, 2026
…a and small constants (llvm#196801)

The combine introduced in 55aff64
lowers scalar i128 compares into vector compares by reissuing the
original loads as v16i8 loads. However, the combine was reusing the
original MachineMemOperand without modification.

If the original i128 load carries !range metadata, the MMO encodes that
range using i128 values. Reusing this MMO for a v16i8 load is incorrect
as range metadata is only valid for integer scalar types and its
bitwidth must match the memory VT.

This patch fixes this by creating a new MachineMemOperand for the vector
vector load. Additionally, we restrict the combine for constant operands
to avoid cases that are better handled by scalar lowering. Small
constants (fit within 16 bits) are excluded to prevent generating
suboptimal vector compares.

(cherry picked from commit 1907b58)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Development

Successfully merging this pull request may close these issues.

3 participants