You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We will continue to improve the code quality for Arm64 targets in .NET 10 to benefit our customers who run or wants to run their workload on Arm64 hardware.
General optimizations
Compact encoding
Improve code quality by making use of instructions that do more than one operation and hence improve the encoding of Arm64. Also, as part of this work, we will revisit the addressing modes that are ignored or used less frequently (e.g. post-index addressing mode) but can give much better code quality. Review the multi-op instruction usage for Arm64 #68028
Improvements in GC
Modernize write barriers for Arm64: In various benchmarks, we have seen write barrier on arm64 is more time consuming that x86 counterpart. This is despite the fact that arm64 have conservative write-barrier (which does less work) instead of precise write barrier present in x86 (which does more work). The first step is to analyze the results from our experiments done in Significant Performance Disparity Between Arm64 and x64 Write Barriers #106051. Next step would be to see and enable precise write barrier for arm64. On x64, it showed significant wins in GC pause time and hence overall throughput. Another thing we want to explore is what happens when we have multiple versions of write-barrier similar to x86 and if we will give us any benefits.
The primary requirement before starting the design of streaming-mode SVE and SME would be to add support in JIT/.NET runtime for VL agnostic. This includes the following:
(WIP) Introduce TYP_SIMD and educate various JIT code paths about the new type. See if some portion of this can be achievable on how we handle stackalloc.
(WIP) Make sure getVectorTByteLength() returns VL that is available on the hardware and fix all the JIT code paths affected.
Sort locals such that TYP_SIMD / TYP_MASK are at the very last. They will be places at the bottom of the stack frame layout.
(WIP) Access the stack offsets of TYP_SIMD / TYP_MASK using sve instructions
Enable non-streaming SVE for NativeAOT / crossgen with VL agnostic.
Design streaming mode SVE and SME
Come up with API design of streaming-mode SVE and SME and its interaction with non-streaming APIs as well as NEON APIs.
Implication of the streaming modes switch on overall .NET runtime executing process
Handling of diagnostics and debugging during streaming mode
NativeAOT and crossgen support in presence of streaming mode flag toggles
How faults and exceptions will be handled, and how the state restore will happen.
We will continue to improve the code quality for Arm64 targets in .NET 10 to benefit our customers who run or wants to run their workload on Arm64 hardware.
General optimizations
Compact encoding
Improvements in GC
PR: Arm64: Implement region write barriers #111636
Scalable Vector Extension
Wrap the non-streaming SVE work
Reference: #101477
Sve2 APIs
Pushed out to Future
PAC/RET feature enablement
Debugger support
Scalable Vector Extension
Add support for vector length agnostic
The primary requirement before starting the design of streaming-mode SVE and SME would be to add support in JIT/.NET runtime for VL agnostic. This includes the following:
TYP_SIMDand educate various JIT code paths about the new type. See if some portion of this can be achievable on how we handlestackalloc.getVectorTByteLength()returns VL that is available on the hardware and fix all the JIT code paths affected.TYP_SIMD/TYP_MASKare at the very last. They will be places at the bottom of the stack frame layout.TYP_SIMD/TYP_MASKusing sve instructionsDesign streaming mode SVE and SME
References: