relaxed i32x4.trunc_sat_f32x4_{s,u} i32x4.trunc_sat_f64x2_{s,u}_zero

1. What are the instructions being proposed?

Relaxed versions of:

- `i32x4.trunc_sat_f32x4_s`
- `i32x4.trunc_sat_f32x4_u`
- `i32x4.trunc_sat_f64x2_s_zero`
- `i32x4.trunc_sat_f64x2_u_zero`

from Simd128. (Names undecided)

2. What are the semantics of these instructions?

Convert f32x4/f64x2 to i32x4 with truncation (signed/unsigned). If the inputs are out of range or NaNs, the result is implementation-defined.

3. How will these instructions be implemented? Give examples for at least
   x86-64 and ARM64. Also provide reference implementation in terms of 128-bit
   Wasm SIMD.

## x86/64

relaxed `i32x4.trunc_sat_f32x4_s` = CVTTPS2DQ
relaxed `i32x4.trunc_sat_f32x4_u` = VCVTTPS2UDQ (AVX512), Simd128 `i32x4.trunc_sat_f32x4_u` otherwise (can be slightly optimized to ignore NaNs)
relaxed `i32x4.trunc_sat_f64x2_s_zero` = CVTTPD2DQ
relaxed `i32x4.trunc_sat_f64x2_u_zero` = VCVTTPD2UDQ (AVX512), Simd128 `i32x4.trunc_sat_f64x2_u_zero`


## ARM64

relaxed `i32x4.trunc_sat_f32x4_s` = FCVTZS
relaxed `i32x4.trunc_sat_f32x4_u` = FCVTZU
relaxed `i32x4.trunc_sat_f64x2_s_zero` = FCVTZS + SQXTN
relaxed `i32x4.trunc_sat_f64x2_u_zero` = FCVTZU + UQXTN

## ARM NEON

relaxed `i32x4.trunc_sat_f32x4_s` = vcvt.S32.F32
relaxed `i32x4.trunc_sat_f32x4_u` = vcvt.U32.F32
relaxed `i32x4.trunc_sat_f64x2_s_zero` =  vcvt.S32.F64 +  vcvt.S32.F64 + vmov
relaxed `i32x4.trunc_sat_f64x2_u_zero` = vcvt.U32.F64 +  vcvt.U32.F64 + vmov

Note: On ARM MVE, double precision conversions require Armv8-M Floating-point Extension (FPv5), MVE can be implemented with or without such an extension.


## simd128

respective non-relaxed versions `i32x4.trunc_sat_f32x4_s`, `i32x4.trunc_sat_f32x4_u`, `i32x4.trunc_sat_f64x2_s_zero`, `i32x4.trunc_sat_f64x2_u_zero`.

4. How does behavior differ across processors? What new fingerprinting surfaces will be exposed?

For `i32x4.trunc_sat_f32x4_s`:

- x86/64 will return `0x8000000` in lanes for out of range or NaNs
- ARM/ARM64 will return 0 for NaNs and saturated results of out of range

For `i32x4.trunc_sat_f32x4_u`:

- x86/64 will return `0xFFFFFFFF` in lanes for out of range or NaNs, if if AVX512 is available, `0` otherwise (but require more instruction counts)
- ARM/ARM64 will return 0 for NaNs and saturated results of out of range

For `i32x4.trunc_sat_f64x2_s_zero`:

- x86/64, `0x80000000` for out of range or NaNs
- ARM/ARM64 will return 0 for NaNs and saturated results of out of range

For `i32x4.trunc_sat_f64x2_u_zero`:

- x86/64, `0xFFFFFFFF` for out of range or NaNs if AVX512 is available, `0` otherwise
- ARM/ARM64 will return 0 for NaNs and saturated results of out of range

5. What use cases are there?

Conversion instructions are common, if the application can guarantee the input range we can get good performance on all architectures.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

relaxed i32x4.trunc_sat_f32x4_{s,u} i32x4.trunc_sat_f64x2_{s,u}_zero #21

x86/64

ARM64

ARM NEON

simd128

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

relaxed i32x4.trunc_sat_f32x4_{s,u} i32x4.trunc_sat_f64x2_{s,u}_zero #21

Description

x86/64

ARM64

ARM NEON

simd128

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions