[Codegen] Use CUDA's half2 and nv_bfloat162 intrinsics for vector fp16/bf16 data types#15190
Closed
yzh119 wants to merge 38 commits into
Closed
[Codegen] Use CUDA's half2 and nv_bfloat162 intrinsics for vector fp16/bf16 data types#15190yzh119 wants to merge 38 commits into
yzh119 wants to merge 38 commits into