Skip to content

[S-TIR][Tests] Fix transform test failures after TIRx bringup#19735

Merged
tlopex merged 3 commits into
apache:mainfrom
tlopex:fix-s-tir-transform-tests
Jun 11, 2026
Merged

[S-TIR][Tests] Fix transform test failures after TIRx bringup#19735
tlopex merged 3 commits into
apache:mainfrom
tlopex:fix-s-tir-transform-tests

Conversation

@tlopex

@tlopex tlopex commented Jun 11, 2026

Copy link
Copy Markdown
Member

This PR fixes 11 test failures in tests/python/s_tir/transform/ introduced as side effects of the TIRx bringup (#19581 / 859498d), in three independent commits.

1. LowerOpaqueBlock: update expected IR for buffer metadata annotations

LowerOpaqueBlock now emits buffer_allocated_addr and buffer_data_alignment annotations on lowered allocations (intentional in #19581: the annotations are consumed downstream by codegen_cuda.cc / codegen_trn.cc; the alignment value 64 comes from kAllocAlignment). The tests' expected IR predates this, so assert_structural_equal failed on the missing annotations.

Fix: update the expected IR in test_s_tir_transform_lower_opaque_block.py to carry the annotations (T.decl_buffer(...)T.alloc_buffer(..., annotations={...})). Fixes 6 tests.

2. DefaultGPUSchedule: parse scalar-block test in s_tir mode

#19581 added a well-formedness rule rejecting SBlockRealize in tirx=True mode, which is correct — sblocks are s_tir-mode constructs. The hand-written Before/Expected modules in test_scalar_block_no_loops were the only ones in the file still using plain T.prim_func, so they failed at parse time before the pass under test even ran.

Fix: parse both modules with T.prim_func(s_tir=True), consistent with every other test in the file. Fixes 1 test.

3. InjectPermutedLayout: match legacy PTX intrinsics by canonical name

#19581 registers device intrinsics under two Op identities: a flat builtin name (returned by builtin::xxx() in C++) and a canonical dotted name (e.g. tirx.ptx.ldmatrix_legacy, produced when TVMScript / tensor intrinsics are parsed). InjectPermutedLayout only compared with same_as(builtin::...), so it silently skipped rewriting the swizzled shared-memory offsets of parsed legacy-form calls, leaving the expected swizzle index expressions unmatched.

Fix: match ptx_ldmatrix_legacy / mma_store_legacy by both the builtin Op and the canonical name via an IsOp helper, following the existing pattern in lower_warp_memory.cc and codegen_cuda.cc. Only the legacy intrinsic forms fold shared-memory access into tvm_access_ptr + offset; non-legacy forms address shared memory through BufferLoad and are already handled by the BufferLoad visitor, so the unreachable InternalError throw is replaced by a pass-through. (mma_store_legacy has no dotted alias, hence the asymmetric name strings.) Fixes 4 tests.

tlopex added 3 commits June 11, 2026 02:04
…a annotations

Since the TIRx infrastructure bringup (apache#19581), LowerOpaqueBlock preserves
buffer metadata as allocation annotations ("buffer_data_alignment",
"buffer_allocated_addr"), which codegen consumes (codegen_cuda.cc,
codegen_trn.cc). Update the tests' expected IR to carry these annotations.
Since apache#19581 the TIRx well-formedness verifier rejects SBlock/SBlockRealize
in tirx=True mode, so the bare T.prim_func input of
test_scalar_block_no_loops failed at parse time before the pass ever ran.
SBlocks are s_tir-mode constructs; parse the Before/Expected modules with
s_tir=True like every other test in this file, keeping the no-itervar
block shape the regression test exists to cover.
…edLayout

Device intrinsics are registered under both a flat name (the builtin Op)
and a canonical dotted name (emitted by TVMScript and the tensor
intrinsics), so same_as() against the builtin accessor never matched the
ldmatrix/mma_store calls in parsed IR and the pass silently skipped the
swizzle rewrite. Compare against both names, and match only the legacy
intrinsic forms: they are the ones that fold the shared-memory access
into tvm_access_ptr + offset, while the non-legacy forms address shared
memory through BufferLoad, which the BufferLoad visitor already handles.

Identical to the inject_permuted_layout.cc change on
fix-mma-tensorize-cuda-capture so the two branches merge cleanly.
@tlopex tlopex changed the title [S-TIR] Fix transform test failures after TIRx bringup (#19581) [S-TIR] Fix transform test failures after TIRx bringup Jun 11, 2026

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a helper method IsOp in inject_permuted_layout.cc to match device intrinsics by both their flat builtin name and canonical dotted name, updating the visitor to handle legacy forms of ptx_ldmatrix and mma_store. Additionally, test files are updated to use T.prim_func(s_tir=True) and replace T.decl_buffer with T.alloc_buffer containing explicit annotations. The reviewer suggests optimizing the IsOp helper by caching the canonical Ops as static local variables using Op::Get to avoid costly dynamic casts and string comparisons on every visited node.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread src/s_tir/transform/inject_permuted_layout.cc
Comment thread src/s_tir/transform/inject_permuted_layout.cc
Comment thread src/s_tir/transform/inject_permuted_layout.cc
@tlopex tlopex changed the title [S-TIR] Fix transform test failures after TIRx bringup [S-TIR][Tests] Fix transform test failures after TIRx bringup Jun 11, 2026
@tlopex

tlopex commented Jun 11, 2026

Copy link
Copy Markdown
Member Author

@tvm-bot rerun

@tlopex tlopex merged commit c9a77d6 into apache:main Jun 11, 2026
12 of 14 checks passed
MasterJH5574 pushed a commit to MasterJH5574/tvm that referenced this pull request Jun 15, 2026
…#19735)

This PR fixes 11 test failures in `tests/python/s_tir/transform/`
introduced as side effects of the TIRx bringup (apache#19581 / 859498d), in
three independent commits.

### 1. LowerOpaqueBlock: update expected IR for buffer metadata
annotations

`LowerOpaqueBlock` now emits `buffer_allocated_addr` and
`buffer_data_alignment` annotations on lowered allocations (intentional
in apache#19581: the annotations are consumed downstream by `codegen_cuda.cc`
/ `codegen_trn.cc`; the alignment value 64 comes from
`kAllocAlignment`). The tests' expected IR predates this, so
`assert_structural_equal` failed on the missing annotations.

Fix: update the expected IR in
`test_s_tir_transform_lower_opaque_block.py` to carry the annotations
(`T.decl_buffer(...)` → `T.alloc_buffer(..., annotations={...})`). Fixes
6 tests.

### 2. DefaultGPUSchedule: parse scalar-block test in s_tir mode

apache#19581 added a well-formedness rule rejecting `SBlockRealize` in
`tirx=True` mode, which is correct — sblocks are s_tir-mode constructs.
The hand-written `Before`/`Expected` modules in
`test_scalar_block_no_loops` were the only ones in the file still using
plain `T.prim_func`, so they failed at parse time before the pass under
test even ran.

Fix: parse both modules with `T.prim_func(s_tir=True)`, consistent with
every other test in the file. Fixes 1 test.

### 3. InjectPermutedLayout: match legacy PTX intrinsics by canonical
name

apache#19581 registers device intrinsics under two Op identities: a flat
builtin name (returned by `builtin::xxx()` in C++) and a canonical
dotted name (e.g. `tirx.ptx.ldmatrix_legacy`, produced when TVMScript /
tensor intrinsics are parsed). `InjectPermutedLayout` only compared with
`same_as(builtin::...)`, so it silently skipped rewriting the swizzled
shared-memory offsets of parsed legacy-form calls, leaving the expected
swizzle index expressions unmatched.

Fix: match `ptx_ldmatrix_legacy` / `mma_store_legacy` by both the
builtin Op and the canonical name via an `IsOp` helper, following the
existing pattern in `lower_warp_memory.cc` and `codegen_cuda.cc`. Only
the legacy intrinsic forms fold shared-memory access into
`tvm_access_ptr` + offset; non-legacy forms address shared memory
through `BufferLoad` and are already handled by the BufferLoad visitor,
so the unreachable `InternalError` throw is replaced by a pass-through.
(`mma_store_legacy` has no dotted alias, hence the asymmetric name
strings.) Fixes 4 tests.

(cherry picked from commit c9a77d6)
MasterJH5574 pushed a commit to MasterJH5574/tvm that referenced this pull request Jun 15, 2026
…#19735)

This PR fixes 11 test failures in `tests/python/s_tir/transform/`
introduced as side effects of the TIRx bringup (apache#19581 / 859498d), in
three independent commits.

### 1. LowerOpaqueBlock: update expected IR for buffer metadata
annotations

`LowerOpaqueBlock` now emits `buffer_allocated_addr` and
`buffer_data_alignment` annotations on lowered allocations (intentional
in apache#19581: the annotations are consumed downstream by `codegen_cuda.cc`
/ `codegen_trn.cc`; the alignment value 64 comes from
`kAllocAlignment`). The tests' expected IR predates this, so
`assert_structural_equal` failed on the missing annotations.

Fix: update the expected IR in
`test_s_tir_transform_lower_opaque_block.py` to carry the annotations
(`T.decl_buffer(...)` → `T.alloc_buffer(..., annotations={...})`). Fixes
6 tests.

### 2. DefaultGPUSchedule: parse scalar-block test in s_tir mode

apache#19581 added a well-formedness rule rejecting `SBlockRealize` in
`tirx=True` mode, which is correct — sblocks are s_tir-mode constructs.
The hand-written `Before`/`Expected` modules in
`test_scalar_block_no_loops` were the only ones in the file still using
plain `T.prim_func`, so they failed at parse time before the pass under
test even ran.

Fix: parse both modules with `T.prim_func(s_tir=True)`, consistent with
every other test in the file. Fixes 1 test.

### 3. InjectPermutedLayout: match legacy PTX intrinsics by canonical
name

apache#19581 registers device intrinsics under two Op identities: a flat
builtin name (returned by `builtin::xxx()` in C++) and a canonical
dotted name (e.g. `tirx.ptx.ldmatrix_legacy`, produced when TVMScript /
tensor intrinsics are parsed). `InjectPermutedLayout` only compared with
`same_as(builtin::...)`, so it silently skipped rewriting the swizzled
shared-memory offsets of parsed legacy-form calls, leaving the expected
swizzle index expressions unmatched.

Fix: match `ptx_ldmatrix_legacy` / `mma_store_legacy` by both the
builtin Op and the canonical name via an `IsOp` helper, following the
existing pattern in `lower_warp_memory.cc` and `codegen_cuda.cc`. Only
the legacy intrinsic forms fold shared-memory access into
`tvm_access_ptr` + offset; non-legacy forms address shared memory
through `BufferLoad` and are already handled by the BufferLoad visitor,
so the unreachable `InternalError` throw is replaced by a pass-through.
(`mma_store_legacy` has no dotted alias, hence the asymmetric name
strings.) Fixes 4 tests.

(cherry picked from commit c9a77d6)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants