[AMDGPU][True16][CodeGen] select vgpr16 for asm inline 16bit vreg#140946
Conversation
|
@llvm/pr-subscribers-backend-amdgpu Author: Brox Chen (broxigarchen) Changesselect vgpr16 for asm inline 16bit reg Full diff: https://github.com/llvm/llvm-project/pull/140946.diff 3 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index ba7e11a853347..a4b62454e782d 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -16062,7 +16062,10 @@ SITargetLowering::getRegForInlineAsmConstraint(const TargetRegisterInfo *TRI_,
case 'v':
switch (BitWidth) {
case 16:
- RC = &AMDGPU::VGPR_32RegClass;
+ if (Subtarget->useRealTrue16Insts())
+ RC = &AMDGPU::VGPR_16RegClass;
+ else
+ RC = &AMDGPU::VGPR_32RegClass;
break;
default:
RC = TRI->getVGPRClassForBitWidth(BitWidth);
diff --git a/llvm/test/CodeGen/AMDGPU/inlineasm-16-fake16.ll b/llvm/test/CodeGen/AMDGPU/inlineasm-16-fake16.ll
new file mode 100644
index 0000000000000..0f268c796c695
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/inlineasm-16-fake16.ll
@@ -0,0 +1,48 @@
+; RUN: llc -mtriple=amdgcn -mcpu=gfx1100 -mattr=-real-true16 -verify-machineinstrs < %s 2>&1 | FileCheck -enable-var-scope -check-prefixes=GFX11 %s
+
+; GFX11-LABEL: {{^}}s_input_output_i16:
+; GFX11: s_mov_b32 s[[REG:[0-9]+]], -1
+; GFX11: ; use s[[REG]]
+define amdgpu_kernel void @s_input_output_i16() #0 {
+ %v = tail call i16 asm sideeffect "s_mov_b32 $0, -1", "=s"()
+ tail call void asm sideeffect "; use $0", "s"(i16 %v) #0
+ ret void
+}
+
+; GFX11-LABEL: {{^}}s_input_output_f16:
+; GFX11: s_mov_b32 s[[REG:[0-9]+]], -1
+; GFX11: ; use s[[REG]]
+define amdgpu_kernel void @s_input_output_f16() #0 {
+ %v = tail call half asm sideeffect "s_mov_b32 $0, -1", "=s"() #0
+ tail call void asm sideeffect "; use $0", "s"(half %v)
+ ret void
+}
+
+; GFX11-LABEL: {{^}}v_input_output_f16:
+; GFX11: v_mov_b32 v[[REG:[0-9]+]], -1
+; GFX11: ; use v[[REG]]
+define amdgpu_kernel void @v_input_output_f16() #0 {
+ %v = tail call half asm sideeffect "v_mov_b32 $0, -1", "=v"() #0
+ tail call void asm sideeffect "; use $0", "v"(half %v)
+ ret void
+}
+
+; GFX11-LABEL: {{^}}v_input_output_i16:
+; GFX11: v_mov_b32 v[[REG:[0-9]+]], -1
+; GFX11: ; use v[[REG]]
+define amdgpu_kernel void @v_input_output_i16() #0 {
+ %v = tail call i16 asm sideeffect "v_mov_b32 $0, -1", "=v"() #0
+ tail call void asm sideeffect "; use $0", "v"(i16 %v)
+ ret void
+}
+
+; GFX11-LABEL: {{^}}i16_imm_input_phys_vgpr:
+; GFX11: v_mov_b32_e32 v0, 0xffff
+; GFX11: ; use v0
+define amdgpu_kernel void @i16_imm_input_phys_vgpr() {
+entry:
+ call void asm sideeffect "; use $0 ", "{v0}"(i16 65535)
+ ret void
+}
+
+attributes #0 = { nounwind }
diff --git a/llvm/test/CodeGen/AMDGPU/inlineasm-16-true16.ll b/llvm/test/CodeGen/AMDGPU/inlineasm-16-true16.ll
new file mode 100644
index 0000000000000..908fb840e8d2c
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/inlineasm-16-true16.ll
@@ -0,0 +1,48 @@
+; RUN: llc -mtriple=amdgcn -mcpu=gfx1100 -mattr=+real-true16 -verify-machineinstrs < %s 2>&1 | FileCheck -enable-var-scope -check-prefixes=GFX11 %s
+
+; GFX11-LABEL: {{^}}s_input_output_i16:
+; GFX11: s_mov_b32 s[[REG:[0-9]+]], -1
+; GFX11: ; use s[[REG]]
+define amdgpu_kernel void @s_input_output_i16() #0 {
+ %v = tail call i16 asm sideeffect "s_mov_b32 $0, -1", "=s"()
+ tail call void asm sideeffect "; use $0", "s"(i16 %v) #0
+ ret void
+}
+
+; GFX11-LABEL: {{^}}s_input_output_f16:
+; GFX11: s_mov_b32 s[[REG:[0-9]+]], -1
+; GFX11: ; use s[[REG]]
+define amdgpu_kernel void @s_input_output_f16() #0 {
+ %v = tail call half asm sideeffect "s_mov_b32 $0, -1", "=s"() #0
+ tail call void asm sideeffect "; use $0", "s"(half %v)
+ ret void
+}
+
+; GFX11-LABEL: {{^}}v_input_output_f16:
+; GFX11: v_mov_b16 v[[REG:[0-9]+.(l|h)]], -1
+; GFX11: ; use v[[REG]]
+define amdgpu_kernel void @v_input_output_f16() #0 {
+ %v = tail call half asm sideeffect "v_mov_b16 $0, -1", "=v"() #0
+ tail call void asm sideeffect "; use $0", "v"(half %v)
+ ret void
+}
+
+; GFX11-LABEL: {{^}}v_input_output_i16:
+; GFX11: v_mov_b16 v[[REG:[0-9]+.(l|h)]], -1
+; GFX11: ; use v[[REG]]
+define amdgpu_kernel void @v_input_output_i16() #0 {
+ %v = tail call i16 asm sideeffect "v_mov_b16 $0, -1", "=v"() #0
+ tail call void asm sideeffect "; use $0", "v"(i16 %v)
+ ret void
+}
+
+; GFX11-LABEL: {{^}}i16_imm_input_phys_vgpr:
+; GFX11: v_mov_b16_e32 v0.l, -1
+; GFX11: ; use v0
+define amdgpu_kernel void @i16_imm_input_phys_vgpr() {
+entry:
+ call void asm sideeffect "; use $0 ", "{v0.l}"(i16 65535)
+ ret void
+}
+
+attributes #0 = { nounwind }
|
47427c7 to
d77dc1f
Compare
| @@ -0,0 +1,48 @@ | |||
| ; RUN: llc -mtriple=amdgcn -mcpu=gfx1100 -mattr=+real-true16 -verify-machineinstrs < %s 2>&1 | FileCheck -enable-var-scope -check-prefixes=GFX11 %s | |||
There was a problem hiding this comment.
Why do these need to be separate files? Also don't need -verify-machineinstrs, or to redirect stderr
There was a problem hiding this comment.
The v_mov_b16 vs v_mov_b32 in asm
| ; GFX11: ; use v0.l | ||
| define amdgpu_kernel void @i16_imm_input_phys_vgpr() { | ||
| entry: | ||
| call void asm sideeffect "; use $0 ", "{v0.l}"(i16 65535) |
There was a problem hiding this comment.
Can you please add one more test with v0.h?
There was a problem hiding this comment.
And do we need a constraint to specify specifically an l or h for a virtual register?
There was a problem hiding this comment.
Added a .h case
There was a problem hiding this comment.
And do we need a constraint to specify specifically an l or h for a virtual register?
I do not think it is practically needed. At least it is not needed for correctness at this point.
d77dc1f to
f3a655a
Compare
f3a655a to
c1b6d35
Compare
| ; GFX11: ; use v0.l | ||
| define amdgpu_kernel void @i16_imm_input_phys_vgpr() { | ||
| entry: | ||
| call void asm sideeffect "; use $0 ", "{v0.l}"(i16 65535) |
There was a problem hiding this comment.
And do we need a constraint to specify specifically an l or h for a virtual register?
I do not think it is practically needed. At least it is not needed for correctness at this point.
| @@ -0,0 +1,48 @@ | |||
| ; RUN: llc -mtriple=amdgcn -mcpu=gfx1100 -mattr=-real-true16 < %s | FileCheck -enable-var-scope -check-prefixes=GFX11 %s | |||
There was a problem hiding this comment.
can you auto generate check lines?
| @@ -16062,7 +16062,8 @@ SITargetLowering::getRegForInlineAsmConstraint(const TargetRegisterInfo *TRI_, | |||
| case 'v': | |||
| switch (BitWidth) { | |||
There was a problem hiding this comment.
switch seems overkill here. Could just handle it with:
if (BitWidth == 16 && !Subtarget->useRealTrue16Insts())
BitWidth = 32;
There was a problem hiding this comment.
Or even move the handling for the BitWidth == 16 case inside getVGPRClassForBitWidth?
There was a problem hiding this comment.
This reminds me we need constraints for the aligned and unaligned versions of register classes
select vgpr16 for asm inline 16bit vreg in true16 mode