[Tests][S-TIR] Fix stale MetaSchedule sketch expectations and migrate let binds to T.let#19729
Conversation
Fix the s_tir MetaSchedule sketch tests that no longer matched the design spaces generated by current TVM: * test_meta_schedule_schedule_rule_add_rfactor.py::test_cpu_argmax The argmax workload and its expected sketches used the legacy `v: T.int32 = ...` annotated-assignment syntax. The TIRx parser now lowers that form to a mutable local-scalar buffer plus a store, which the rfactor/cross-thread-reduction reducer matching correctly rejects (reduction combiner temporaries must be immutable binds). Switch the temporaries to `v: T.let[T.int32] = ...`, producing Bind nodes - the same canonical form te.create_prim_func emits for comm_reducer based reductions - so AddRFactor generates the three expected sketches again. * test_meta_schedule_space_cuda.py (cap, dil, gmm, t2d, nrm, sfm, cbr, tbg) and test_meta_schedule_space_cuda_async.py (c2d) Commit b465646 (apache#18927) expanded DefaultCUDA unroll_max_steps from {0, 16, 64, 512, 1024} to {0, 16, 32, 64, 128, 256, 512, 1024} without updating the expected SampleCategorical decisions, so the recorded indices selected different unroll values than the expected modules encode. Remap the decision indices (2->3, 3->6, 4->7) so each test keeps sampling the same unroll value. The expected modules and all other decisions are unchanged; every sketch was re-verified by replaying the trace and structurally comparing against the expected module.
There was a problem hiding this comment.
Code Review
This pull request updates TVM Script tests in test_meta_schedule_schedule_rule_add_rfactor.py to use T.let for variable bindings in T.Select expressions. Additionally, it updates the expected SampleCategorical decision values in test_meta_schedule_space_cuda.py and test_meta_schedule_space_cuda_async.py to align with the updated space generation. No review comments were provided, so there is no feedback to address.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
The TVMScript TIRX parser now treats `v: T.int32 = expr` as a mutable local scalar buffer (AllocBuffer + BufferStore) rather than an immutable Bind. The tuple-style argmax/argmin/layer-norm reduction tests in test_s_tir_transform_lower_cross_thread_reduction.py still used the old spelling, so their reduction blocks no longer matched the reduction-block pattern required by LowerCrossThreadReduction (condition apache#3: the number of consecutive Binds in the block body must equal the number of BufferStores in the block init), and the pass rejected them. Switch the reduction update bindings to `v: T.let[dtype] = expr`, which produces the Bind nodes the pass expects, matching the spelling already used by the s_tir rfactor schedule tests. No pass behavior changes.
Sweep the s_tir test tree for other tests broken by the same TIRx parser semantics change: plain `x = expr` and `x: T.int32 = expr` now create mutable local-scalar buffers instead of immutable binds, so tests whose intent is a Bind (LetStmt) must spell it `x: T.let[dtype] = expr`. * test_s_tir_transform_compact_buffer_region.py TestLetBinding: index vars rii/rjj are meant to be binds the pass can analyze through; the scalar-buffer form made the compaction result diverge from expected. TestNonIndexLetBinding: plain assignments of call_extern results (incl. handle and void dtypes) crashed CompactBufferAllocation when parsed as scalar buffers. * test_s_tir_transform_hoist_expression.py test_hoist_with_let / test_hoist_disable_let / test_hoist_let_expr: the hoisted condition and Let-expr bindings must be Bind nodes for HoistExpression to hoist (or deliberately not hoist) them. * test_s_tir_transform_remove_undef.py test_remove_let_undef / test_raise_error_for_undef_as_store_indices: binding T.undef() through a mutable scalar hid the undef from RemoveStoreUndef, leaving a stray allocation in one test and swallowing the expected error in the other. Verified by running the full tests/python/s_tir and tests/python/tirx trees: the only remaining failures are unrelated (nvcc too old for compute_120a on the local RTX 5090, buffer_data_alignment annotation mismatches in lower_opaque_block, SBlockRealize well-formedness in default_gpu_schedule, and one cross-file test-isolation flake in test_parser_printer), none caused by bind spelling.
Sweep follow-up: convert reduction-combiner temporaries (v_*_red_temp_*, v_argmax_*) from the legacy `v: T.dtype = expr` spelling to `v: T.let[dtype] = expr` in tests that still pass but feed schedule rules / passes a non-canonical mutable-scalar form. Real lowered workloads (te.create_prim_func of comm_reducer reductions) produce Bind nodes, so these hand-written mimics should too; with the mutable-scalar spelling the reducer pattern matching in rfactor / cross-thread reduction would reject these blocks at lowering time even though the tests themselves stayed green. Deliberately left unchanged: tvmscript_printer_annotation (tests the scalar-assignment sugar itself), non-reduction scalar temporaries in schedule-error / plan-update / trace-apply tests (value semantics are equivalent and no pattern matching depends on them), and hardware-gated hexagon / nvshmem files that cannot be verified locally.
… let binds to T.let (apache#19729) Fix the s_tir tests broken or left stale by two upstream changes. * test_meta_schedule_space_cuda.py (cap, dil, gmm, t2d, nrm, sfm, cbr, tbg) and test_meta_schedule_space_cuda_async.py (c2d): apache#18927 expanded DefaultCUDA unroll_max_steps from {0, 16, 64, 512, 1024} to {0, 16, 32, 64, 128, 256, 512, 1024} without updating the recorded SampleCategorical decisions. Remap the indices (2->3, 3->6, 4->7) so each test keeps sampling the same unroll value; every sketch was re-verified by replaying the trace and structurally comparing against the expected module. * T.let migration: since apache#19581 the TIRx parser treats `v: T.int32 = expr` as a mutable local-scalar buffer instead of an immutable bind, which is now spelled `v: T.let[T.int32] = expr` (a Bind node, the same form te.create_prim_func emits). Tests whose intent is a bind are migrated to the new spelling: reduction combiner temporaries (add_rfactor, lower_cross_thread_reduction) and let-dependent passes (compact_buffer_region, hoist_expression, remove_undef). * Also convert reduction temporaries in still-green tests (cross_thread_reduction rule, compute_inline, schedule utilities, parallel_vectorize_unroll postproc, dlight general reduction, relax cuda_graph) so the hand-written workloads match the canonical Bind form instead of feeding rules a mutable-scalar body. (cherry picked from commit 67b0c6c)
… let binds to T.let (apache#19729) Fix the s_tir tests broken or left stale by two upstream changes. * test_meta_schedule_space_cuda.py (cap, dil, gmm, t2d, nrm, sfm, cbr, tbg) and test_meta_schedule_space_cuda_async.py (c2d): apache#18927 expanded DefaultCUDA unroll_max_steps from {0, 16, 64, 512, 1024} to {0, 16, 32, 64, 128, 256, 512, 1024} without updating the recorded SampleCategorical decisions. Remap the indices (2->3, 3->6, 4->7) so each test keeps sampling the same unroll value; every sketch was re-verified by replaying the trace and structurally comparing against the expected module. * T.let migration: since apache#19581 the TIRx parser treats `v: T.int32 = expr` as a mutable local-scalar buffer instead of an immutable bind, which is now spelled `v: T.let[T.int32] = expr` (a Bind node, the same form te.create_prim_func emits). Tests whose intent is a bind are migrated to the new spelling: reduction combiner temporaries (add_rfactor, lower_cross_thread_reduction) and let-dependent passes (compact_buffer_region, hoist_expression, remove_undef). * Also convert reduction temporaries in still-green tests (cross_thread_reduction rule, compute_inline, schedule utilities, parallel_vectorize_unroll postproc, dlight general reduction, relax cuda_graph) so the hand-written workloads match the canonical Bind form instead of feeding rules a mutable-scalar body. (cherry picked from commit 67b0c6c)
Fix the s_tir tests broken or left stale by two upstream changes.
test_meta_schedule_space_cuda.py (cap, dil, gmm, t2d, nrm, sfm, cbr, tbg) and test_meta_schedule_space_cuda_async.py (c2d): feat(meta_schedule): expand CUDA unroll steps for SM70 optimization #18927 expanded DefaultCUDA unroll_max_steps from {0, 16, 64, 512, 1024} to {0, 16, 32, 64, 128, 256, 512, 1024} without updating the recorded SampleCategorical decisions. Remap the indices (2->3, 3->6, 4->7) so each test keeps sampling the same unroll value; every sketch was re-verified by replaying the trace and structurally comparing against the expected module.
T.let migration: since [TIRx] Bringup TIRx Infrastructure #19581 the TIRx parser treats
v: T.int32 = expras a mutable local-scalar buffer instead of an immutable bind, which is now spelledv: T.let[T.int32] = expr(a Bind node, the same form te.create_prim_func emits). Tests whose intent is a bind are migrated to the new spelling: reduction combiner temporaries (add_rfactor, lower_cross_thread_reduction) and let-dependent passes (compact_buffer_region, hoist_expression, remove_undef).Also convert reduction temporaries in still-green tests (cross_thread_reduction rule, compute_inline, schedule utilities, parallel_vectorize_unroll postproc, dlight general reduction, relax cuda_graph) so the hand-written workloads match the canonical Bind form instead of feeding rules a mutable-scalar body.