Skip to content

[Tests][S-TIR] Fix stale MetaSchedule sketch expectations and migrate let binds to T.let#19729

Merged
tqchen merged 4 commits into
apache:mainfrom
tlopex:fix-s-tir-meta-schedule-sketches
Jun 11, 2026
Merged

[Tests][S-TIR] Fix stale MetaSchedule sketch expectations and migrate let binds to T.let#19729
tqchen merged 4 commits into
apache:mainfrom
tlopex:fix-s-tir-meta-schedule-sketches

Conversation

@tlopex

@tlopex tlopex commented Jun 11, 2026

Copy link
Copy Markdown
Member

Fix the s_tir tests broken or left stale by two upstream changes.

  • test_meta_schedule_space_cuda.py (cap, dil, gmm, t2d, nrm, sfm, cbr, tbg) and test_meta_schedule_space_cuda_async.py (c2d): feat(meta_schedule): expand CUDA unroll steps for SM70 optimization #18927 expanded DefaultCUDA unroll_max_steps from {0, 16, 64, 512, 1024} to {0, 16, 32, 64, 128, 256, 512, 1024} without updating the recorded SampleCategorical decisions. Remap the indices (2->3, 3->6, 4->7) so each test keeps sampling the same unroll value; every sketch was re-verified by replaying the trace and structurally comparing against the expected module.

  • T.let migration: since [TIRx] Bringup TIRx Infrastructure #19581 the TIRx parser treats v: T.int32 = expr as a mutable local-scalar buffer instead of an immutable bind, which is now spelled v: T.let[T.int32] = expr (a Bind node, the same form te.create_prim_func emits). Tests whose intent is a bind are migrated to the new spelling: reduction combiner temporaries (add_rfactor, lower_cross_thread_reduction) and let-dependent passes (compact_buffer_region, hoist_expression, remove_undef).

  • Also convert reduction temporaries in still-green tests (cross_thread_reduction rule, compute_inline, schedule utilities, parallel_vectorize_unroll postproc, dlight general reduction, relax cuda_graph) so the hand-written workloads match the canonical Bind form instead of feeding rules a mutable-scalar body.

Fix the s_tir MetaSchedule sketch tests that no longer matched the
design spaces generated by current TVM:

* test_meta_schedule_schedule_rule_add_rfactor.py::test_cpu_argmax
  The argmax workload and its expected sketches used the legacy
  `v: T.int32 = ...` annotated-assignment syntax. The TIRx parser now
  lowers that form to a mutable local-scalar buffer plus a store, which
  the rfactor/cross-thread-reduction reducer matching correctly rejects
  (reduction combiner temporaries must be immutable binds). Switch the
  temporaries to `v: T.let[T.int32] = ...`, producing Bind nodes - the
  same canonical form te.create_prim_func emits for comm_reducer based
  reductions - so AddRFactor generates the three expected sketches
  again.

* test_meta_schedule_space_cuda.py (cap, dil, gmm, t2d, nrm, sfm, cbr,
  tbg) and test_meta_schedule_space_cuda_async.py (c2d)
  Commit b465646 (apache#18927) expanded DefaultCUDA unroll_max_steps from
  {0, 16, 64, 512, 1024} to {0, 16, 32, 64, 128, 256, 512, 1024}
  without updating the expected SampleCategorical decisions, so the
  recorded indices selected different unroll values than the expected
  modules encode. Remap the decision indices (2->3, 3->6, 4->7) so each
  test keeps sampling the same unroll value. The expected modules and
  all other decisions are unchanged; every sketch was re-verified by
  replaying the trace and structurally comparing against the expected
  module.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates TVM Script tests in test_meta_schedule_schedule_rule_add_rfactor.py to use T.let for variable bindings in T.Select expressions. Additionally, it updates the expected SampleCategorical decision values in test_meta_schedule_space_cuda.py and test_meta_schedule_space_cuda_async.py to align with the updated space generation. No review comments were provided, so there is no feedback to address.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

tlopex added 3 commits June 11, 2026 00:59
The TVMScript TIRX parser now treats `v: T.int32 = expr` as a mutable
local scalar buffer (AllocBuffer + BufferStore) rather than an immutable
Bind. The tuple-style argmax/argmin/layer-norm reduction tests in
test_s_tir_transform_lower_cross_thread_reduction.py still used the old
spelling, so their reduction blocks no longer matched the reduction-block
pattern required by LowerCrossThreadReduction (condition apache#3: the number
of consecutive Binds in the block body must equal the number of
BufferStores in the block init), and the pass rejected them.

Switch the reduction update bindings to `v: T.let[dtype] = expr`, which
produces the Bind nodes the pass expects, matching the spelling already
used by the s_tir rfactor schedule tests. No pass behavior changes.
Sweep the s_tir test tree for other tests broken by the same TIRx
parser semantics change: plain `x = expr` and `x: T.int32 = expr` now
create mutable local-scalar buffers instead of immutable binds, so
tests whose intent is a Bind (LetStmt) must spell it `x: T.let[dtype]
= expr`.

* test_s_tir_transform_compact_buffer_region.py
  TestLetBinding: index vars rii/rjj are meant to be binds the pass
  can analyze through; the scalar-buffer form made the compaction
  result diverge from expected. TestNonIndexLetBinding: plain
  assignments of call_extern results (incl. handle and void dtypes)
  crashed CompactBufferAllocation when parsed as scalar buffers.

* test_s_tir_transform_hoist_expression.py
  test_hoist_with_let / test_hoist_disable_let / test_hoist_let_expr:
  the hoisted condition and Let-expr bindings must be Bind nodes for
  HoistExpression to hoist (or deliberately not hoist) them.

* test_s_tir_transform_remove_undef.py
  test_remove_let_undef / test_raise_error_for_undef_as_store_indices:
  binding T.undef() through a mutable scalar hid the undef from
  RemoveStoreUndef, leaving a stray allocation in one test and
  swallowing the expected error in the other.

Verified by running the full tests/python/s_tir and tests/python/tirx
trees: the only remaining failures are unrelated (nvcc too old for
compute_120a on the local RTX 5090, buffer_data_alignment annotation
mismatches in lower_opaque_block, SBlockRealize well-formedness in
default_gpu_schedule, and one cross-file test-isolation flake in
test_parser_printer), none caused by bind spelling.
Sweep follow-up: convert reduction-combiner temporaries (v_*_red_temp_*,
v_argmax_*) from the legacy `v: T.dtype = expr` spelling to
`v: T.let[dtype] = expr` in tests that still pass but feed schedule
rules / passes a non-canonical mutable-scalar form. Real lowered
workloads (te.create_prim_func of comm_reducer reductions) produce Bind
nodes, so these hand-written mimics should too; with the mutable-scalar
spelling the reducer pattern matching in rfactor / cross-thread
reduction would reject these blocks at lowering time even though the
tests themselves stayed green.

Deliberately left unchanged: tvmscript_printer_annotation (tests the
scalar-assignment sugar itself), non-reduction scalar temporaries in
schedule-error / plan-update / trace-apply tests (value semantics are
equivalent and no pattern matching depends on them), and
hardware-gated hexagon / nvshmem files that cannot be verified locally.
@tlopex tlopex changed the title [Tests][MetaSchedule] Update s_tir sketch tests for current defaults [Tests][S-TIR] Fix stale MetaSchedule sketch expectations and migrate let binds to T.let Jun 11, 2026
@tqchen tqchen merged commit 67b0c6c into apache:main Jun 11, 2026
10 of 11 checks passed
MasterJH5574 pushed a commit to MasterJH5574/tvm that referenced this pull request Jun 15, 2026
… let binds to T.let (apache#19729)

Fix the s_tir tests broken or left stale by two upstream changes.

* test_meta_schedule_space_cuda.py (cap, dil, gmm, t2d, nrm, sfm, cbr,
tbg) and test_meta_schedule_space_cuda_async.py (c2d): apache#18927 expanded
DefaultCUDA unroll_max_steps from {0, 16, 64, 512, 1024} to {0, 16, 32,
64, 128, 256, 512, 1024} without updating the recorded SampleCategorical
decisions. Remap the indices (2->3, 3->6, 4->7) so each test keeps
sampling the same unroll value; every sketch was re-verified by
replaying the trace and structurally comparing against the expected
module.

* T.let migration: since apache#19581 the TIRx parser treats `v: T.int32 =
expr` as a mutable local-scalar buffer instead of an immutable bind,
which is now spelled `v: T.let[T.int32] = expr` (a Bind node, the same
form te.create_prim_func emits). Tests whose intent is a bind are
migrated to the new spelling: reduction combiner temporaries
(add_rfactor, lower_cross_thread_reduction) and let-dependent passes
(compact_buffer_region, hoist_expression, remove_undef).

* Also convert reduction temporaries in still-green tests
(cross_thread_reduction rule, compute_inline, schedule utilities,
parallel_vectorize_unroll postproc, dlight general reduction, relax
cuda_graph) so the hand-written workloads match the canonical Bind form
instead of feeding rules a mutable-scalar body.

(cherry picked from commit 67b0c6c)
MasterJH5574 pushed a commit to MasterJH5574/tvm that referenced this pull request Jun 15, 2026
… let binds to T.let (apache#19729)

Fix the s_tir tests broken or left stale by two upstream changes.

* test_meta_schedule_space_cuda.py (cap, dil, gmm, t2d, nrm, sfm, cbr,
tbg) and test_meta_schedule_space_cuda_async.py (c2d): apache#18927 expanded
DefaultCUDA unroll_max_steps from {0, 16, 64, 512, 1024} to {0, 16, 32,
64, 128, 256, 512, 1024} without updating the recorded SampleCategorical
decisions. Remap the indices (2->3, 3->6, 4->7) so each test keeps
sampling the same unroll value; every sketch was re-verified by
replaying the trace and structurally comparing against the expected
module.

* T.let migration: since apache#19581 the TIRx parser treats `v: T.int32 =
expr` as a mutable local-scalar buffer instead of an immutable bind,
which is now spelled `v: T.let[T.int32] = expr` (a Bind node, the same
form te.create_prim_func emits). Tests whose intent is a bind are
migrated to the new spelling: reduction combiner temporaries
(add_rfactor, lower_cross_thread_reduction) and let-dependent passes
(compact_buffer_region, hoist_expression, remove_undef).

* Also convert reduction temporaries in still-green tests
(cross_thread_reduction rule, compute_inline, schedule utilities,
parallel_vectorize_unroll postproc, dlight general reduction, relax
cuda_graph) so the hand-written workloads match the canonical Bind form
instead of feeding rules a mutable-scalar body.

(cherry picked from commit 67b0c6c)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants