[S-TIR][Tests] Fix transform test failures after TIRx bringup#19735
Conversation
…a annotations Since the TIRx infrastructure bringup (apache#19581), LowerOpaqueBlock preserves buffer metadata as allocation annotations ("buffer_data_alignment", "buffer_allocated_addr"), which codegen consumes (codegen_cuda.cc, codegen_trn.cc). Update the tests' expected IR to carry these annotations.
Since apache#19581 the TIRx well-formedness verifier rejects SBlock/SBlockRealize in tirx=True mode, so the bare T.prim_func input of test_scalar_block_no_loops failed at parse time before the pass ever ran. SBlocks are s_tir-mode constructs; parse the Before/Expected modules with s_tir=True like every other test in this file, keeping the no-itervar block shape the regression test exists to cover.
…edLayout Device intrinsics are registered under both a flat name (the builtin Op) and a canonical dotted name (emitted by TVMScript and the tensor intrinsics), so same_as() against the builtin accessor never matched the ldmatrix/mma_store calls in parsed IR and the pass silently skipped the swizzle rewrite. Compare against both names, and match only the legacy intrinsic forms: they are the ones that fold the shared-memory access into tvm_access_ptr + offset, while the non-legacy forms address shared memory through BufferLoad, which the BufferLoad visitor already handles. Identical to the inject_permuted_layout.cc change on fix-mma-tensorize-cuda-capture so the two branches merge cleanly.
There was a problem hiding this comment.
Code Review
This pull request introduces a helper method IsOp in inject_permuted_layout.cc to match device intrinsics by both their flat builtin name and canonical dotted name, updating the visitor to handle legacy forms of ptx_ldmatrix and mma_store. Additionally, test files are updated to use T.prim_func(s_tir=True) and replace T.decl_buffer with T.alloc_buffer containing explicit annotations. The reviewer suggests optimizing the IsOp helper by caching the canonical Ops as static local variables using Op::Get to avoid costly dynamic casts and string comparisons on every visited node.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
|
@tvm-bot rerun |
…#19735) This PR fixes 11 test failures in `tests/python/s_tir/transform/` introduced as side effects of the TIRx bringup (apache#19581 / 859498d), in three independent commits. ### 1. LowerOpaqueBlock: update expected IR for buffer metadata annotations `LowerOpaqueBlock` now emits `buffer_allocated_addr` and `buffer_data_alignment` annotations on lowered allocations (intentional in apache#19581: the annotations are consumed downstream by `codegen_cuda.cc` / `codegen_trn.cc`; the alignment value 64 comes from `kAllocAlignment`). The tests' expected IR predates this, so `assert_structural_equal` failed on the missing annotations. Fix: update the expected IR in `test_s_tir_transform_lower_opaque_block.py` to carry the annotations (`T.decl_buffer(...)` → `T.alloc_buffer(..., annotations={...})`). Fixes 6 tests. ### 2. DefaultGPUSchedule: parse scalar-block test in s_tir mode apache#19581 added a well-formedness rule rejecting `SBlockRealize` in `tirx=True` mode, which is correct — sblocks are s_tir-mode constructs. The hand-written `Before`/`Expected` modules in `test_scalar_block_no_loops` were the only ones in the file still using plain `T.prim_func`, so they failed at parse time before the pass under test even ran. Fix: parse both modules with `T.prim_func(s_tir=True)`, consistent with every other test in the file. Fixes 1 test. ### 3. InjectPermutedLayout: match legacy PTX intrinsics by canonical name apache#19581 registers device intrinsics under two Op identities: a flat builtin name (returned by `builtin::xxx()` in C++) and a canonical dotted name (e.g. `tirx.ptx.ldmatrix_legacy`, produced when TVMScript / tensor intrinsics are parsed). `InjectPermutedLayout` only compared with `same_as(builtin::...)`, so it silently skipped rewriting the swizzled shared-memory offsets of parsed legacy-form calls, leaving the expected swizzle index expressions unmatched. Fix: match `ptx_ldmatrix_legacy` / `mma_store_legacy` by both the builtin Op and the canonical name via an `IsOp` helper, following the existing pattern in `lower_warp_memory.cc` and `codegen_cuda.cc`. Only the legacy intrinsic forms fold shared-memory access into `tvm_access_ptr` + offset; non-legacy forms address shared memory through `BufferLoad` and are already handled by the BufferLoad visitor, so the unreachable `InternalError` throw is replaced by a pass-through. (`mma_store_legacy` has no dotted alias, hence the asymmetric name strings.) Fixes 4 tests. (cherry picked from commit c9a77d6)
…#19735) This PR fixes 11 test failures in `tests/python/s_tir/transform/` introduced as side effects of the TIRx bringup (apache#19581 / 859498d), in three independent commits. ### 1. LowerOpaqueBlock: update expected IR for buffer metadata annotations `LowerOpaqueBlock` now emits `buffer_allocated_addr` and `buffer_data_alignment` annotations on lowered allocations (intentional in apache#19581: the annotations are consumed downstream by `codegen_cuda.cc` / `codegen_trn.cc`; the alignment value 64 comes from `kAllocAlignment`). The tests' expected IR predates this, so `assert_structural_equal` failed on the missing annotations. Fix: update the expected IR in `test_s_tir_transform_lower_opaque_block.py` to carry the annotations (`T.decl_buffer(...)` → `T.alloc_buffer(..., annotations={...})`). Fixes 6 tests. ### 2. DefaultGPUSchedule: parse scalar-block test in s_tir mode apache#19581 added a well-formedness rule rejecting `SBlockRealize` in `tirx=True` mode, which is correct — sblocks are s_tir-mode constructs. The hand-written `Before`/`Expected` modules in `test_scalar_block_no_loops` were the only ones in the file still using plain `T.prim_func`, so they failed at parse time before the pass under test even ran. Fix: parse both modules with `T.prim_func(s_tir=True)`, consistent with every other test in the file. Fixes 1 test. ### 3. InjectPermutedLayout: match legacy PTX intrinsics by canonical name apache#19581 registers device intrinsics under two Op identities: a flat builtin name (returned by `builtin::xxx()` in C++) and a canonical dotted name (e.g. `tirx.ptx.ldmatrix_legacy`, produced when TVMScript / tensor intrinsics are parsed). `InjectPermutedLayout` only compared with `same_as(builtin::...)`, so it silently skipped rewriting the swizzled shared-memory offsets of parsed legacy-form calls, leaving the expected swizzle index expressions unmatched. Fix: match `ptx_ldmatrix_legacy` / `mma_store_legacy` by both the builtin Op and the canonical name via an `IsOp` helper, following the existing pattern in `lower_warp_memory.cc` and `codegen_cuda.cc`. Only the legacy intrinsic forms fold shared-memory access into `tvm_access_ptr` + offset; non-legacy forms address shared memory through `BufferLoad` and are already handled by the BufferLoad visitor, so the unreachable `InternalError` throw is replaced by a pass-through. (`mma_store_legacy` has no dotted alias, hence the asymmetric name strings.) Fixes 4 tests. (cherry picked from commit c9a77d6)
This PR fixes 11 test failures in
tests/python/s_tir/transform/introduced as side effects of the TIRx bringup (#19581 / 859498d), in three independent commits.1. LowerOpaqueBlock: update expected IR for buffer metadata annotations
LowerOpaqueBlocknow emitsbuffer_allocated_addrandbuffer_data_alignmentannotations on lowered allocations (intentional in #19581: the annotations are consumed downstream bycodegen_cuda.cc/codegen_trn.cc; the alignment value 64 comes fromkAllocAlignment). The tests' expected IR predates this, soassert_structural_equalfailed on the missing annotations.Fix: update the expected IR in
test_s_tir_transform_lower_opaque_block.pyto carry the annotations (T.decl_buffer(...)→T.alloc_buffer(..., annotations={...})). Fixes 6 tests.2. DefaultGPUSchedule: parse scalar-block test in s_tir mode
#19581 added a well-formedness rule rejecting
SBlockRealizeintirx=Truemode, which is correct — sblocks are s_tir-mode constructs. The hand-writtenBefore/Expectedmodules intest_scalar_block_no_loopswere the only ones in the file still using plainT.prim_func, so they failed at parse time before the pass under test even ran.Fix: parse both modules with
T.prim_func(s_tir=True), consistent with every other test in the file. Fixes 1 test.3. InjectPermutedLayout: match legacy PTX intrinsics by canonical name
#19581 registers device intrinsics under two Op identities: a flat builtin name (returned by
builtin::xxx()in C++) and a canonical dotted name (e.g.tirx.ptx.ldmatrix_legacy, produced when TVMScript / tensor intrinsics are parsed).InjectPermutedLayoutonly compared withsame_as(builtin::...), so it silently skipped rewriting the swizzled shared-memory offsets of parsed legacy-form calls, leaving the expected swizzle index expressions unmatched.Fix: match
ptx_ldmatrix_legacy/mma_store_legacyby both the builtin Op and the canonical name via anIsOphelper, following the existing pattern inlower_warp_memory.ccandcodegen_cuda.cc. Only the legacy intrinsic forms fold shared-memory access intotvm_access_ptr+ offset; non-legacy forms address shared memory throughBufferLoadand are already handled by the BufferLoad visitor, so the unreachableInternalErrorthrow is replaced by a pass-through. (mma_store_legacyhas no dotted alias, hence the asymmetric name strings.) Fixes 4 tests.