Add more optimizations for (https://github.com/dotnet/runtime/issues/61412)#74806
Add more optimizations for (https://github.com/dotnet/runtime/issues/61412)#74806EgorBo merged 7 commits intodotnet:mainfrom
Conversation
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue Detailsnull
|
public class Issue61412
{
[MethodImpl(MethodImplOptions.NoInlining)]
public static bool Equal0(int x) => (x & 1) == 0;
[MethodImpl(MethodImplOptions.NoInlining)]
public static bool Equal1(int x) => (x & 1) == 1;
[MethodImpl(MethodImplOptions.NoInlining)]
public static bool NotEqual1(int x) => (x & 1) != 1;
[MethodImpl(MethodImplOptions.NoInlining)]
public static bool NotEqual0(int x) => (x & 1) != 0;
}; Assembly listing for method Issue61412:Equal0(int):bool
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; Final local variable assignments
;
; V00 arg0 [V00,T00] ( 3, 3 ) int -> rcx single-def
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [rsp+00H] "OutgoingArgSpace"
;
; Lcl frame size = 0
G_M14579_IG01: ;; offset=0000H
;; size=0 bbWeight=1 PerfScore 0.00
G_M14579_IG02: ;; offset=0000H
8BC1 mov eax, ecx
F7D0 not eax
83E001 and eax, 1
;; size=7 bbWeight=1 PerfScore 0.75
G_M14579_IG03: ;; offset=0007H
C3 ret
;; size=1 bbWeight=1 PerfScore 1.00
; Total bytes of code 8, prolog size 0, PerfScore 2.55, instruction count 4, allocated bytes for code 8 (MethodHash=70dec70c) for method Issue61412:Equal0(int):bool
; ============================================================
; Assembly listing for method Issue61412:Equal1(int):bool
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; Final local variable assignments
;
; V00 arg0 [V00,T00] ( 3, 3 ) int -> rcx single-def
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [rsp+00H] "OutgoingArgSpace"
;
; Lcl frame size = 0
G_M54258_IG01: ;; offset=0000H
;; size=0 bbWeight=1 PerfScore 0.00
G_M54258_IG02: ;; offset=0000H
8BC1 mov eax, ecx
83E001 and eax, 1
;; size=5 bbWeight=1 PerfScore 0.50
G_M54258_IG03: ;; offset=0005H
C3 ret
;; size=1 bbWeight=1 PerfScore 1.00
; Total bytes of code 6, prolog size 0, PerfScore 2.10, instruction count 3, allocated bytes for code 6 (MethodHash=b52d2c0d) for method Issue61412:Equal1(int):bool
; ============================================================
; Assembly listing for method Issue61412:NotEqual1(int):bool
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; Final local variable assignments
;
; V00 arg0 [V00,T00] ( 3, 3 ) int -> rcx single-def
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [rsp+00H] "OutgoingArgSpace"
;
; Lcl frame size = 0
G_M3143_IG01: ;; offset=0000H
;; size=0 bbWeight=1 PerfScore 0.00
G_M3143_IG02: ;; offset=0000H
8BC1 mov eax, ecx
F7D0 not eax
83E001 and eax, 1
;; size=7 bbWeight=1 PerfScore 0.75
G_M3143_IG03: ;; offset=0007H
C3 ret
;; size=1 bbWeight=1 PerfScore 1.00
; Total bytes of code 8, prolog size 0, PerfScore 2.55, instruction count 4, allocated bytes for code 8 (MethodHash=b798f3b8) for method Issue61412:NotEqual1(int):bool
; ============================================================
; Assembly listing for method Issue61412:NotEqual0(int):bool
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; Final local variable assignments
;
; V00 arg0 [V00,T00] ( 3, 3 ) int -> rcx single-def
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [rsp+00H] "OutgoingArgSpace"
;
; Lcl frame size = 0
G_M35142_IG01: ;; offset=0000H
;; size=0 bbWeight=1 PerfScore 0.00
G_M35142_IG02: ;; offset=0000H
8BC1 mov eax, ecx
83E001 and eax, 1
;; size=5 bbWeight=1 PerfScore 0.50
G_M35142_IG03: ;; offset=0005H
C3 ret
;; size=1 bbWeight=1 PerfScore 1.00
; Total bytes of code 6, prolog size 0, PerfScore 2.10, instruction count 3, allocated bytes for code 6 (MethodHash=94c276b9) for method Issue61412:NotEqual0(int):bool
; ============================================================ |
|
Can someone review this please? |
EgorBo
left a comment
There was a problem hiding this comment.
LGTM, the optimization seems to be quite conservatives around surroundings but it was like that before your changes.
|
Any idea why we don't see arm64 diffs? Is this handled already there? |
|
Thanks! I guess because of this check? Can't say for sure GenTree* Lowering::OptimizeConstCompare(GenTree* cmp)
{
assert(cmp->gtGetOp2()->IsIntegralConst());
#if defined(TARGET_XARCH) || defined(TARGET_ARM64)
GenTree* op1 = cmp->gtGetOp1();
GenTreeIntCon* op2 = cmp->gtGetOp2()->AsIntCon();
ssize_t op2Value = op2->IconValue();
#ifdef TARGET_ARM64 // <---
// Do not optimise further if op1 has a contained chain.
if (op1->OperIs(GT_AND) &&
(op1->gtGetOp1()->isContainedAndNotIntOrIImmed() || op1->gtGetOp2()->isContainedAndNotIntOrIImmed()))
{
return cmp;
}
#endif
///...
} |
|
@En3Tho oh, interesting, if you want you can remove that ifdef so we can see SPMI diffs on Ci as part of this PR |
|
@EgorBo Sure. Let's see what will break :D |
|
One of failures is #76041 . I'm not sure what those Push work item to Helix failures mean. Is that a pure ci problem? Also, should spmi for arm triggered manually? Am I just missing arm results or there are none? UPD: arm has regressed so reverting that check back |
|
@En3Tho thanks! |
Closes #61412
Enhances #73120 with (X & 1) == 0 to ((NOT X) & 1) in addition to (X & 1) != 0 to (X & 1)
Cases of == 1 and != 1 are supported too, #73120 transforms them to 0 comparisons
Please correct me as I'm a newbie.