Skip to content

ARM64 SVE: mask conversion not always optimised away #108241

Description

@a74nh

With DOTNET_TieredCompilation=0

      [MethodImpl(MethodImplOptions.NoInlining)]
      public static unsafe int foo(ref int* src, int length)
      {
          Vector<int> total = new Vector<int>(0);
          Vector<int> pred = (Vector<int>)Sve.CreateWhileLessThanMask32Bit(0, length);
          Vector<int> vec = Sve.LoadVector(pred, src);
          total = Sve.ConditionalSelect(pred, Sve.Add(total, vec), total);
          return (int)Sve.AddAcross(total).ToScalar();
      }
G_M25815_IG01:  ;; offset=0x0000
            stp     fp, lr, [sp, #-0x20]!
            mov     fp, sp
            stp     xzr, xzr, [fp, #0x10]	// [V02 loc0], [V02 loc0+0x08]
						;; size=12 bbWeight=1 PerfScore 2.50
G_M25815_IG02:  ;; offset=0x000C
            str     xzr, [fp, #0x10]
            str     xzr, [fp, #0x18]
            mov     w2, wzr
            whilelt p0.s, w2, w1
            mov     z16.s, p0/z, #1
            ptrue   p0.s
            cmpne   p0.s, p0/z, z16.s, #0
            ldr     q16, [fp, #0x10]	// [V02 loc0]
            ldr     x0, [x0]
            ld1w    { z17.s }, p0/z, [x0]
            ldr     q18, [fp, #0x10]	// [V02 loc0]
            add     z16.s, z16.s, z17.s
            sel     z16.s, p0, z16.s, z18.s
            str     q16, [fp, #0x10]	// [V02 loc0]
            ldr     q16, [fp, #0x10]	// [V02 loc0]
            ptrue   p0.s
            saddv   d16, p0, z16.s
            umov    x0, v16.d[0]
						;; size=72 bbWeight=1 PerfScore 39.50
G_M25815_IG03:  ;; offset=0x0054
            ldp     fp, lr, [sp], #0x20
            ret     lr
						;; size=8 bbWeight=1 PerfScore 2.00

These three lines are not required. They are converting mask -> vector -> mask

            mov     z16.s, p0/z, #1
            ptrue   p0.s
            cmpne   p0.s, p0/z, z16.s, #0

I suspect this is because there are two uses of pred - in conditional select and load vector.

Metadata

Metadata

Assignees

Labels

Priority:1Work that is critical for the release, but we could probably ship withoutarea-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMIarm-sveWork related to arm64 SVE/SVE2 supportin-prThere is an active PR which will close this issue when it is merged

Type

No type

Fields

No fields configured for issues without a type.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions