- What are the instructions being proposed?
Relaxed versions of:
i32x4.trunc_sat_f32x4_s
i32x4.trunc_sat_f32x4_u
i32x4.trunc_sat_f64x2_s_zero
i32x4.trunc_sat_f64x2_u_zero
from Simd128. (Names undecided)
- What are the semantics of these instructions?
Convert f32x4/f64x2 to i32x4 with truncation (signed/unsigned). If the inputs are out of range or NaNs, the result is implementation-defined.
- How will these instructions be implemented? Give examples for at least
x86-64 and ARM64. Also provide reference implementation in terms of 128-bit
Wasm SIMD.
x86/64
relaxed i32x4.trunc_sat_f32x4_s = CVTTPS2DQ
relaxed i32x4.trunc_sat_f32x4_u = VCVTTPS2UDQ (AVX512), Simd128 i32x4.trunc_sat_f32x4_u otherwise (can be slightly optimized to ignore NaNs)
relaxed i32x4.trunc_sat_f64x2_s_zero = CVTTPD2DQ
relaxed i32x4.trunc_sat_f64x2_u_zero = VCVTTPD2UDQ (AVX512), Simd128 i32x4.trunc_sat_f64x2_u_zero
ARM64
relaxed i32x4.trunc_sat_f32x4_s = FCVTZS
relaxed i32x4.trunc_sat_f32x4_u = FCVTZU
relaxed i32x4.trunc_sat_f64x2_s_zero = FCVTZS + SQXTN
relaxed i32x4.trunc_sat_f64x2_u_zero = FCVTZU + UQXTN
ARM NEON
relaxed i32x4.trunc_sat_f32x4_s = vcvt.S32.F32
relaxed i32x4.trunc_sat_f32x4_u = vcvt.U32.F32
relaxed i32x4.trunc_sat_f64x2_s_zero = vcvt.S32.F64 + vcvt.S32.F64 + vmov
relaxed i32x4.trunc_sat_f64x2_u_zero = vcvt.U32.F64 + vcvt.U32.F64 + vmov
Note: On ARM MVE, double precision conversions require Armv8-M Floating-point Extension (FPv5), MVE can be implemented with or without such an extension.
simd128
respective non-relaxed versions i32x4.trunc_sat_f32x4_s, i32x4.trunc_sat_f32x4_u, i32x4.trunc_sat_f64x2_s_zero, i32x4.trunc_sat_f64x2_u_zero.
- How does behavior differ across processors? What new fingerprinting surfaces will be exposed?
For i32x4.trunc_sat_f32x4_s:
- x86/64 will return
0x8000000 in lanes for out of range or NaNs
- ARM/ARM64 will return 0 for NaNs and saturated results of out of range
For i32x4.trunc_sat_f32x4_u:
- x86/64 will return
0xFFFFFFFF in lanes for out of range or NaNs, if if AVX512 is available, 0 otherwise (but require more instruction counts)
- ARM/ARM64 will return 0 for NaNs and saturated results of out of range
For i32x4.trunc_sat_f64x2_s_zero:
- x86/64,
0x80000000 for out of range or NaNs
- ARM/ARM64 will return 0 for NaNs and saturated results of out of range
For i32x4.trunc_sat_f64x2_u_zero:
- x86/64,
0xFFFFFFFF for out of range or NaNs if AVX512 is available, 0 otherwise
- ARM/ARM64 will return 0 for NaNs and saturated results of out of range
- What use cases are there?
Conversion instructions are common, if the application can guarantee the input range we can get good performance on all architectures.
Relaxed versions of:
i32x4.trunc_sat_f32x4_si32x4.trunc_sat_f32x4_ui32x4.trunc_sat_f64x2_s_zeroi32x4.trunc_sat_f64x2_u_zerofrom Simd128. (Names undecided)
Convert f32x4/f64x2 to i32x4 with truncation (signed/unsigned). If the inputs are out of range or NaNs, the result is implementation-defined.
x86-64 and ARM64. Also provide reference implementation in terms of 128-bit
Wasm SIMD.
x86/64
relaxed
i32x4.trunc_sat_f32x4_s= CVTTPS2DQrelaxed
i32x4.trunc_sat_f32x4_u= VCVTTPS2UDQ (AVX512), Simd128i32x4.trunc_sat_f32x4_uotherwise (can be slightly optimized to ignore NaNs)relaxed
i32x4.trunc_sat_f64x2_s_zero= CVTTPD2DQrelaxed
i32x4.trunc_sat_f64x2_u_zero= VCVTTPD2UDQ (AVX512), Simd128i32x4.trunc_sat_f64x2_u_zeroARM64
relaxed
i32x4.trunc_sat_f32x4_s= FCVTZSrelaxed
i32x4.trunc_sat_f32x4_u= FCVTZUrelaxed
i32x4.trunc_sat_f64x2_s_zero= FCVTZS + SQXTNrelaxed
i32x4.trunc_sat_f64x2_u_zero= FCVTZU + UQXTNARM NEON
relaxed
i32x4.trunc_sat_f32x4_s= vcvt.S32.F32relaxed
i32x4.trunc_sat_f32x4_u= vcvt.U32.F32relaxed
i32x4.trunc_sat_f64x2_s_zero= vcvt.S32.F64 + vcvt.S32.F64 + vmovrelaxed
i32x4.trunc_sat_f64x2_u_zero= vcvt.U32.F64 + vcvt.U32.F64 + vmovNote: On ARM MVE, double precision conversions require Armv8-M Floating-point Extension (FPv5), MVE can be implemented with or without such an extension.
simd128
respective non-relaxed versions
i32x4.trunc_sat_f32x4_s,i32x4.trunc_sat_f32x4_u,i32x4.trunc_sat_f64x2_s_zero,i32x4.trunc_sat_f64x2_u_zero.For
i32x4.trunc_sat_f32x4_s:0x8000000in lanes for out of range or NaNsFor
i32x4.trunc_sat_f32x4_u:0xFFFFFFFFin lanes for out of range or NaNs, if if AVX512 is available,0otherwise (but require more instruction counts)For
i32x4.trunc_sat_f64x2_s_zero:0x80000000for out of range or NaNsFor
i32x4.trunc_sat_f64x2_u_zero:0xFFFFFFFFfor out of range or NaNs if AVX512 is available,0otherwiseConversion instructions are common, if the application can guarantee the input range we can get good performance on all architectures.