Skip to content

Arm64/Sve: Revisit scalable register save/restore in prolog/epilogue #103320

Description

@kunalspathak

This is requirement for SVE to work in true fashion, but I just want to make sure that we have an issue for this. Currently, during prolog/epilog generation, we do not have information if the vector registers are scalable or not, so end up generating NEON str/ldr that takes care of loading/storing 128-bit registers. Today, .NET supports only 128 bits, so it doesn't matter. But to support the true scalable nature of these registers in future, the prolog and epilog need to make sure that we capture the scalable nature.

Need to double check if the prolog/epilog should save/restore scalable registers if they are running on SVE hardware or should we just do it if we are generating code that deals with scalable or predicate register or some other condition.

Basically we want to make sure that this doesn't happen:

; prolog
str v4 [stack]  ; store just 128 bits
str v5, [stack]

...
...
add z4, ...     ; anything above 128 bits is trashed
add z5, ...

...

;epilogue
ldr v5, [stack]
ldr v4, [stack] ; restore just 128 bits

Reference:

From https://github.com/ARM-software/abi-aa/blob/2982a9f3b512a5bfdc9e3fea5d3b298f9165c36b/aapcs64/aapcs64.rst#613scalable-vector-registers :

6.1.3 Scalable vector registers
The Arm 64-bit architecture also defines an optional set of thirty-two scalable vector registers, z0-z31. Each register extends the corresponding SIMD and Floating-Point register so that it can hold the contents of a single Scalable Vector Type (see Scalable vectors). That is, scalable vector register z0 is an extension of SIMD and Floating-Point register v0.

z0-z7 are used to pass scalable vector arguments to a subroutine, and to return scalable vector results from a function. If a subroutine takes at least one argument in scalable vector registers or scalable predicate registers, or if it is a function that returns results in such registers, it must ensure that the entire contents of z8-z23 are preserved across the call. In other cases it need only preserve the low 64 bits of z8-z15, as described in SIMD and Floating-Point registers.

6.1.4 Scalable Predicate Registers
The Arm 64-bit architecture defines an optional set of sixteen scalable predicate registers p0-p15. These registers are available if and only if the scalable vector registers are available (see Scalable vector registers). Each register can store the contents of a Scalable Predicate Type (see Scalable Predicates).

p0-p3 are used to pass scalable predicate arguments to a subroutine and to return scalable predicate results from a function. If a subroutine takes at least one argument in scalable vector registers or scalable predicate registers, or if it is a function that returns results in such registers, it must ensure that p4-p15 are preserved across the call. In other cases it need not preserve any scalable predicate register contents.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions