XARCH: Eliminate redundant mov to/from mem.#59780
Conversation
|
Tagging subscribers to this area: @JulieLeeMSFT Issue DetailsOptimize redundant mov instructions like: IN002f: 000141 mov qword ptr [V08 rsp+78H], rax This commit will essentially mirror what we do for Register->Register movs but with slight modifications to customize it for mem<->reg mov.
An example code snippet to reproduce this
|
4ec86cc to
9914c52
Compare
|
@tannergooding Can you please add the right person to review this change? |
|
CC. @JulieLeeMSFT, @dotnet/jit-contrib |
| // vmovapd xmmword ptr [V01 rbp-20H], xmm0 # <-- last instruction | ||
| // vmovapd xmmword ptr [V01 rbp-20H], xmm0 # <-- current instruction can be omitted. | ||
| // | ||
| // 2. Opposite Move as that of last instruction emitted. |
There was a problem hiding this comment.
Isn't this case *_R_S rather than *_S_R?
There was a problem hiding this comment.
It covers both R_S and S_R. I'm open to naming it something else for clarity
There was a problem hiding this comment.
Maybe we could call this IsRedundantStackMov or similar?
Someone on the JIT team may have a better suggestion or convention we could use here.
There was a problem hiding this comment.
This handles the following cases
- mov reg mem
mov reg mem // identical instructions mem ->reg - mov mem reg
mov mem reg // identical instructions reg ->mem - mov mem reg
mov reg mem // reg->mem followed by same mem ->reg - mov reg mem
mov mem reg // mem->reg followed by same reg-> mem
There was a problem hiding this comment.
I wanted to note that the back to back moves can have the side effect of changing the GC-ness of a location/register.
For example, in this case:
// vmovupd ymmword ptr[V01 rbp-50H], ymm0 # <-- last instruction
// vmovupd ymm0, ymmword ptr[V01 rbp-50H] # <-- current instruction can be omitted.
(Let's pretend vectors can store GC refs, this obviously only applies to GPRs)
If V01 was a non-GC variable, and ymm0 in the first instruction held a non-GC value, but then the load was to a GC variable (ymm0 would become GC-live), we cannot eliminate it.
Does the logic here handle this case already?
There was a problem hiding this comment.
Superpmi
What do the diffs look like?
Think this is good to go?
The change looks good to me, but someone from the team will have the final say.
There was a problem hiding this comment.
Diff looks like this-
Found 53 files with textual diffs.
Summary of Code Size diffs:
(Lower is better)
Total bytes of base: 111144
Total bytes of diff: 110600
Total bytes of delta: -544 (-0.489% of base)
Total relative delta: -0.51
diff is an improvement.
relative diff is an improvement.
Top file improvements (bytes):
-105 : 22240.dasm (-1.113% of base)
-55 : 20577.dasm (-1.168% of base)
-28 : 20558.dasm (-0.897% of base)
-24 : 17628.dasm (-0.610% of base)
-16 : 4252.dasm (-0.789% of base)
-16 : 19812.dasm (-0.381% of base)
-14 : 20322.dasm (-0.271% of base)
-14 : 3120.dasm (-0.307% of base)
-14 : 6361.dasm (-0.627% of base)
-12 : 15710.dasm (-1.309% of base)
-12 : 4886.dasm (-0.722% of base)
-12 : 21805.dasm (-0.721% of base)
-8 : 1205.dasm (-0.666% of base)
-8 : 5003.dasm (-1.117% of base)
-8 : 24037.dasm (-0.324% of base)
-8 : 6376.dasm (-0.748% of base)
-8 : 23196.dasm (-0.331% of base)
-8 : 1289.dasm (-1.538% of base)
-8 : 8810.dasm (-1.000% of base)
-8 : 18225.dasm (-0.460% of base)
53 total files with Code Size differences (53 improved, 0 regressed), 0 unchanged.
There was a problem hiding this comment.
For the motivating example given in PR description
With Change-
G_M36146_IG01: ;; offset=0000H
55 push rbp
4881EC90000000 sub rsp, 144
C5F877 vzeroupper
488DAC2490000000 lea rbp, [rsp+90H]
C4413857C0 vxorps xmm8, xmm8
C5797F45E0 vmovdqa xmmword ptr [rbp-20H], xmm8
C5797F45F0 vmovdqa xmmword ptr [rbp-10H], xmm8
;; bbWeight=1 PerfScore 7.08
G_M36146_IG02: ;; offset=0022H
C5FD100556000000 vmovupd ymm0, ymmword ptr[reloc @rwd00]
C5FD1145B0 vmovupd ymmword ptr[rbp-50H], ymm0
C5FD100D69000000 vmovupd ymm1, ymmword ptr[reloc @RWD32]
C5FD114D90 vmovupd ymmword ptr[rbp-70H], ymm1
C5FD118570FFFFFF vmovupd ymmword ptr[rbp-90H], ymm0
C5FD1045B0 vmovupd ymm0, ymmword ptr[rbp-50H]
C5FD104DE0 vmovupd ymm1, ymmword ptr[rbp-20H]
C4E2F5A84590 vfmadd213pd ymm0, ymm1, ymmword ptr[rbp-70H]
C5FD114590 vmovupd ymmword ptr[rbp-70H], ymm0
C5FD588570FFFFFF vaddpd ymm0, ymm0, ymmword ptr[rbp-90H]
Without change
G_M36146_IG01: ;; offset=0000H
55 push rbp
4881EC90000000 sub rsp, 144
C5F877 vzeroupper
488DAC2490000000 lea rbp, [rsp+90H]
C4413857C0 vxorps xmm8, xmm8
C5797F45E0 vmovdqa xmmword ptr [rbp-20H], xmm8
C5797F45F0 vmovdqa xmmword ptr [rbp-10H], xmm8
;; bbWeight=1 PerfScore 7.08
G_M36146_IG02: ;; offset=0022H
C5FD100556000000 vmovupd ymm0, ymmword ptr[reloc @rwd00]
C5FD1145B0 vmovupd ymmword ptr[rbp-50H], ymm0
C5FD100D69000000 vmovupd ymm1, ymmword ptr[reloc @RWD32]
C5FD114D90 vmovupd ymmword ptr[rbp-70H], ymm1
C5FD118570FFFFFF vmovupd ymmword ptr[rbp-90H], ymm0
C5FD1045B0 vmovupd ymm0, ymmword ptr[rbp-50H]
C5FD104DE0 vmovupd ymm1, ymmword ptr[rbp-20H]
C4E2F5A84590 vfmadd213pd ymm0, ymm1, ymmword ptr[rbp-70H]
C5FD114590 vmovupd ymmword ptr[rbp-70H], ymm0
C5FD104590 vmovupd ymm0, ymmword ptr[rbp-70H]
C5FD588570FFFFFF vaddpd ymm0, ymm0, ymmword ptr[rbp-90H]
There was a problem hiding this comment.
I ran the tests again with latest changes and I seem to have an additional failure-
artifacts/tests/coreclr/Linux.x64.Release/tracing/eventpipe/gcdump/gcdump/gcdump.sh
Looking into this
Edit - false alarm. Not a failure
There was a problem hiding this comment.
Note - I'm currently just waiting for further comments/ or get this resolved as I do not have anything new to add.
1. Changing name to IsRedundantStackMov. 2. Add assert for format. 3. Move common code to helper.
45ec160 to
a44703d
Compare
|
@kunalspathak this is a new peephole -- you should take a look. Also I'd be interested in seeing what this does to our un-optimized codegen. Seems like it could do a lot of good there. We could for instance enable this in Tier0. I'd even entertain the idea of enabling it for debug codegen, if we can be confident it won't adversely impact the debugging experience. |
|
I will take a look. |
|
I ran this PR through superpmi and will double check the assembly differences for some of them. BenchmarksDetail diffsLibrariesDetail diffsCoreclr.testsDetail diffs |
|
@kunalspathak Let me know if you see anything that needs reworking or is failing. |
kunalspathak
left a comment
There was a problem hiding this comment.
I verified some of the asmdiffs and they looks good. Thank you for your contribution!
Optimize redundant mov instructions like:
IN002f: 000141 mov qword ptr [V08 rsp+78H], rax
IN0030: 000146 mov rax, qword ptr [V08 rsp+78H]
see comment (link)
This commit will essentially mirror what we do for Register->Register movs but with slight modifications to customize it for mem<->reg mov.
There are a couple of further optimizations to be revisited for both this and the existing IsRedundantMov() implementation(These are already mentioned as comments in existing impl)
An example code snippet to reproduce this
`
[MethodImpl(MethodImplOptions.AggressiveOptimization)]