[TIR][IR] Update to use tirx by LeiWang1999 · Pull Request #2216 · tile-ai/tilelang

LeiWang1999 · 2026-05-18T03:59:04Z

Summary

This PR migrates TileLang to the updated TVM tirx API and the newer TVM-FFI baseline. It replaces the old tir parser/builder/type surface with tirx, updates the C++ compiler and backend code to compile against the new namespaces, and refreshes the Python language, JIT, examples, and tests accordingly.

Major changes

Rename TVM TIR integration points from tir to tirx across C++ and Python:
- tvm::tir / tvm.tir -> tvm::tirx / tvm.tirx
- tvm.script.parser.tir -> tvm.tirx.script.parser
- tvm.script.ir_builder.tir -> tvm.tirx.script.builder
- tvm.tir.* type imports such as PrimExpr, Buffer, BufferRegion, IndexMap, and Var -> tvm.tirx.*
Adapt TileLang script/builder semantics to the tirx IR:
- Replace scoped LetStmt builder/frame usage with tirx.bind / tirx::Bind.
- Rebuild C++ LetStmt wrappers as SeqStmt({tirx::Bind(...), body}) where tirx represents the binding as a flat statement.
- Add Python-side bind-value tracking (register_let_value, clear_let_values) for cases that still need to recover aliases from bound values, such as buffer-region pointer handling.
- Update block and allocation helpers to use the tirx structured block APIs, including SBlockFrame, SBlockAllocBuffer, alloc_buffer, and sblock_attr.
Update the lowering pipeline and pass configuration keys for the renamed API surface:
- Route supported passes through tirx.transform.*.
- Rename pass config keys from tir.* to tirx.*, including vectorization, storage rewrite, async copy, noalias, lower-pass injection, and debug options.
- Keep using the remaining upstream s_tir transforms for passes that have not moved to tirx yet.
Refresh TileLang compiler, backend, and runtime integration:
- Update CUDA, ROCm, CPU, Metal, and WebGPU codegen/lowering code to use tirx nodes, visitors, intrinsic calls, and analysis helpers.
- Remove the old src/support/ffi_aliases.h compatibility shim and add src/support/check.h wrappers around the new TVM-FFI check macros.
- Update the TVM submodule, CMake linkage, wheel repair rules, and the apache-tvm-ffi dependency requirement.
Update Python APIs, examples, docs, and tests for the renamed tirx surface, including parser overrides, eager builder behavior, JIT adapters, TileOp templates, and transform tests.

Validation

./format.sh

Review notes

This is a broad mechanical migration with some semantic updates around binding construction. Reviewers should pay particular attention to:

Places where LetStmt was converted to Bind, especially code that depends on binding scope or alias recovery.
Backend-specific lowering behavior after the tirx namespace and visitor/type updates.
Pass config compatibility for downstream users that may still pass old tir.* keys.
Packaging and runtime linkage after the TVM / TVM-FFI dependency update.

- Added a new pass to hoist loop-invariant if statements out of loops, improving optimization opportunities. - Introduced classes for collecting and checking conditions, as well as for rewriting statements. - Integrated the new pass into the optimization pipeline and provided a corresponding API for usage.

…ache disabling in benchmark script - Introduced CallNodeChecker class to identify CallNode expressions in loop conditions, enhancing loop-invariant checks. - Updated IsLoopInvariant function to reject conditions containing CallNodes, preventing potential side effects. - Added tilelang.disable_cache() in benchmark_mha_sink_fwd.py to optimize performance during benchmarking.

- Added support for Let-bound variables in the WrittenBufferReadChecker to improve buffer read checks. - Introduced UsesLoopVarThroughLetBindings function to check if conditions depend on loop variables through Let bindings. - Updated IsLoopInvariant function to account for Let bindings when determining loop invariance. - Enhanced HoistableIfFinder to track Let bindings for variables bound to BufferLoad expressions. - Added debug print statements in the OptimizeForTarget function to visualize the module state before and after loop unswitching.

- Added CallCheckerExcludingIf class to ensure function calls outside of hoisted if statements are identified, preventing potential synchronization issues during loop unswitching. - Updated loop unswitching logic to incorporate the new call checker, enhancing safety and correctness. - Integrated debug print statements in OptimizeForTarget to visualize module state before and after loop unswitching. - Disabled tilelang cache in the benchmark script for improved performance.

…yupku/main

…nts in OptimizeForTarget function

- Introduced a new configuration option `tl.disable_loop_unswitching` to allow users to disable the loop unswitching optimization. - Updated the Loop Unswitching pass to check this configuration and return the original function if the option is enabled. - Added relevant documentation in the PassConfigKey enumeration for clarity.

# Conflicts: # src/op/builtin.h # src/transform/loop_unswitching.cc # testing/python/transform/test_tilelang_transform_loop_unswitching.py # tilelang/transform/__init__.py # tilelang/transform/pass_config.py

- Removed references to `tvm.tir` and replaced them with `tvm.tirx` across various files, including examples and backend operations. - Updated target configuration in documentation to reflect the new usage of target config dictionaries instead of CLI-style strings. - Cleaned up `pyproject.toml` and `load_tvm.cmake` by removing obsolete paths. - Enhanced examples for dequantization and GEMM operations to utilize the new `tirx` constructs. - Adjusted various backend operations to ensure compatibility with the new `tirx` namespace.

coderabbitai · 2026-05-18T03:59:11Z

Important

Review skipped

Too many files!

This PR contains 300 files, which is 150 over the limit of 150.

To get a review, narrow the scope:
• coderabbit review --type committed # exclude uncommitted changes
• coderabbit review --dir # limit to a subdirectory
• coderabbit review --base # compare against a closer base

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b052c381-e944-4f9f-aad4-3836a6a64237

📥 Commits

Reviewing files that changed from the base of the PR and between 7c4718f and 7b2bae0.

📒 Files selected for processing (300)

.agents/skills/tilelang-tvm-ir/SKILL.md
3rdparty/tvm
CMakeLists.txt
cmake/load_tvm.cmake
cmake/pypi-z3/FindZ3.cmake
docs/get_started/targets.md
examples/dequantize_gemm/example_dequant_gemm_bf16_fp4_hopper.py
examples/dequantize_gemm/example_dequant_gemm_bf16_mxfp4_hopper.py
examples/dequantize_gemm/example_dequant_gemm_fp4_hopper.py
examples/dequantize_gemm/example_dequant_gemm_w4a8.py
examples/eager_jit/eagerjit.en.ipynb
examples/eager_jit/eagerjit.zh.ipynb
examples/plot_layout/README.md
examples/plot_layout/fragment_mfma_load_a.py
examples/plot_layout/fragment_mma_load_a.py
pyproject.toml
requirements-dev.txt
requirements.txt
src/backend/common/op/atomic_reduce.h
src/backend/common/op/cumsum.h
src/backend/common/op/fill.h
src/backend/common/op/finalize_reducer.h
src/backend/common/op/reduce.h
src/backend/common/op/transpose.h
src/backend/cpu/op/copy.cc
src/backend/cpu/op/fill.cc
src/backend/cpu/op/gemm.cc
src/backend/cpu/op/transpose.cc
src/backend/cuda/codegen/codegen_cuda.cc
src/backend/cuda/codegen/codegen_cuda.h
src/backend/cuda/codegen/codegen_cutedsl.cc
src/backend/cuda/codegen/codegen_cutedsl.h
src/backend/cuda/codegen/codegen_py.cc
src/backend/cuda/codegen/codegen_py.h
src/backend/cuda/codegen/intrin_rule_cuda.cc
src/backend/cuda/codegen/ptx.cc
src/backend/cuda/codegen/rt_mod_cuda.cc
src/backend/cuda/codegen/rt_mod_cutedsl.cc
src/backend/cuda/op/atomic_add.cc
src/backend/cuda/op/copy.cc
src/backend/cuda/op/copy.h
src/backend/cuda/op/copy_analysis.cc
src/backend/cuda/op/finalize_reducer.cc
src/backend/cuda/op/gemm.cc
src/backend/cuda/op/gemm_sp.cc
src/backend/cuda/op/reduce.cc
src/backend/cuda/runtime.cc
src/backend/metal/codegen/rt_mod_metal.cc
src/backend/metal/op/copy.cc
src/backend/metal/op/fill.cc
src/backend/metal/op/transpose.cc
src/backend/rocm/codegen/codegen_hip.cc
src/backend/rocm/codegen/codegen_hip.h
src/backend/rocm/codegen/intrin_rule_hip.cc
src/backend/rocm/codegen/rt_mod_hip.cc
src/backend/rocm/op/atomic_add.cc
src/backend/rocm/op/copy.cc
src/backend/rocm/op/finalize_reducer.cc
src/backend/rocm/op/gemm.cc
src/backend/rocm/op/reduce.cc
src/backend/webgpu/op/copy.cc
src/backend/webgpu/op/fill.cc
src/backend/webgpu/op/transpose.cc
src/config.h
src/ir.cc
src/layout/gemm_layouts.cc
src/layout/layout.cc
src/layout/layout.h
src/layout/tcgen05_layout.cc
src/layout/utils.cc
src/layout/utils.h
src/op/atomic_add.cc
src/op/atomic_add.h
src/op/atomic_reduce.cc
src/op/atomic_reduce.h
src/op/builtin.cc
src/op/builtin.h
src/op/copy.cc
src/op/copy.h
src/op/fill.cc
src/op/fill.h
src/op/finalize_reducer.cc
src/op/finalize_reducer.h
src/op/gemm.cc
src/op/gemm.h
src/op/gemm_sp.cc
src/op/gemm_sp.h
src/op/logical.cc
src/op/math.cc
src/op/operator.cc
src/op/operator.h
src/op/parallel.cc
src/op/parallel.h
src/op/reduce.cc
src/op/reduce.h
src/op/region.cc
src/op/region.h
src/op/tcgen5_meta.h
src/op/transpose.cc
src/op/transpose.h
src/op/utils.cc
src/op/utils.h
src/runtime/error_helpers.cc
src/runtime/logging.cc
src/support/check.h
src/support/ffi_aliases.h
src/target/codegen_c.cc
src/target/codegen_c.h
src/target/codegen_c_host.cc
src/target/codegen_c_host.h
src/target/rt_mod_c.cc
src/target/utils.cc
src/transform/annotate_device_regions.cc
src/transform/annotate_read_only_params.cc
src/transform/annotate_warp_group_reg_alloc.cc
src/transform/arg_binder.cc
src/transform/arg_binder.h
src/transform/cluster_planning.cc
src/transform/common/assume.cc
src/transform/common/assume.h
src/transform/common/attr.h
src/transform/common/collector.h
src/transform/common/constr_visitor.h
src/transform/common/loop_fusion_utils.h
src/transform/common/loop_vectorization_utils.h
src/transform/common/mbarrier.h
src/transform/common/pipeline_utils.h
src/transform/config_index_bitwidth.cc
src/transform/flatten_buffer.cc
src/transform/frontend_legalize.cc
src/transform/fuse_mbarrier_arrive_expect_tx.cc
src/transform/hoist_global_buffer_allocations.cc
src/transform/hoist_nonrestrict_params.cc
src/transform/if_stmt_binding.cc
src/transform/inject_assumes.cc
src/transform/inject_fence_proxy.cc
src/transform/inject_pipeline.cc
src/transform/inject_tcgen05_fence.cc
src/transform/instruction_annotation.cc
src/transform/layout_inference.cc
src/transform/layout_reducer.cc
src/transform/layout_reducer.h
src/transform/legalize_negative_index.cc
src/transform/legalize_safe_memory_access.cc
src/transform/legalize_vectorized_loop.cc
src/transform/loop_partition.cc
src/transform/loop_partition.h
src/transform/loop_unswitching.cc
src/transform/loop_vectorize.cc
src/transform/loop_vectorize.h
src/transform/lower_access_ptr.cc
src/transform/lower_blackwell_2sm.cc
src/transform/lower_device_kernel_launch.cc
src/transform/lower_device_storage_access_info.cc
src/transform/lower_hopper_intrin.cc
src/transform/lower_intrin.cc
src/transform/lower_l2_persistent_annotation.cc
src/transform/lower_ldg_stg.cc
src/transform/lower_opaque_block.cc
src/transform/lower_pdl.cc
src/transform/lower_ptx_async_copy.cc
src/transform/lower_shared_barrier.cc
src/transform/lower_shared_tmem.cc
src/transform/lower_thread_allreduce.cc
src/transform/lower_tile_op.cc
src/transform/make_packed_api.cc
src/transform/merge_if_stmt.cc
src/transform/merge_if_stmt.h
src/transform/merge_shared_memory_allocations.cc
src/transform/multi_version_buffer_rewriter.cc
src/transform/multi_version_buffer_rewriter.h
src/transform/parallel_loop_layout_validator.h
src/transform/persist_threadblock.cc
src/transform/pipeline_planning.cc
src/transform/plan_update_buffer_allocation_location.cc
src/transform/producer_consumer_ws.cc
src/transform/ptx_async_copy_injector.h
src/transform/simplify.cc
src/transform/split_host_device.cc
src/transform/storage_rewrite.cc
src/transform/thread_storage_sync.cc
src/transform/unroll_loop.cc
src/transform/vectorize_loop.cc
src/transform/verify_parallel_loop.cc
testing/python/arith/test_arith_hard.py
testing/python/arith/test_arith_intset.py
testing/python/arith/test_arith_iter_affine_map.py
testing/python/arith/test_arith_simplify.py
testing/python/cuda/test_cuda_f32x2_intrinsics.py
testing/python/cuda/test_cuda_mma_sm75_dispatch.py
testing/python/fastmath/test_mathops_fastmath.py
testing/python/issue/test_tilelang_issue_sm120_tma_smem_alignment.py
testing/python/jit/test_tilelang_jit_cutedsl.py
testing/python/kernel/test_tilelang_kernel_bf16_gemm_mma.py
testing/python/kernel/test_tilelang_kernel_fp8_gemm.py
testing/python/kernel/test_tilelang_kernel_fp8_gemv_simt.py
testing/python/language/test_tilelang_language_access_ptr.py
testing/python/language/test_tilelang_language_frontend_v2.py
testing/python/language/test_tilelang_language_pdl.py
testing/python/language/test_tilelang_language_view.py
testing/python/language/test_tilelang_language_warp_sync.py
testing/python/layout/test_tilelang_bank_swizzle_expand.py
testing/python/layout/test_tilelang_layout_equal.py
testing/python/math/test_math_fast_math.py
testing/python/runtime/test_tilelang_runtime_tma_validation.py
testing/python/target/test_tilelang_codegen_cutedsl_cp_async.py
testing/python/target/test_tilelang_rocm_target.py
testing/python/transform/test_tilelang_transform_Inject_software_pipeline.py
testing/python/transform/test_tilelang_transform_cluster_planning.py
testing/python/transform/test_tilelang_transform_decouple_type_cast.py
testing/python/transform/test_tilelang_transform_flatten_buffer.py
testing/python/transform/test_tilelang_transform_fuse_mbarrier_arrive_expect_tx.py
testing/python/transform/test_tilelang_transform_hoist_broadcast_values.py
testing/python/transform/test_tilelang_transform_inject_fence_proxy.py
testing/python/transform/test_tilelang_transform_inject_set_max_nreg.py
testing/python/transform/test_tilelang_transform_inject_tcgen05_fence.py
testing/python/transform/test_tilelang_transform_layout_inference.py
testing/python/transform/test_tilelang_transform_legalize_safe_memory_access.py
testing/python/transform/test_tilelang_transform_let_inline.py
testing/python/transform/test_tilelang_transform_lexical_alloc_scope.py
testing/python/transform/test_tilelang_transform_lower_hopper_intrin.py
testing/python/transform/test_tilelang_transform_lower_ldgstg.py
testing/python/transform/test_tilelang_transform_lower_ptx_async_copy.py
testing/python/transform/test_tilelang_transform_lower_shared_barrier.py
testing/python/transform/test_tilelang_transform_lower_shared_tmem.py
testing/python/transform/test_tilelang_transform_lower_tile_op.py
testing/python/transform/test_tilelang_transform_make_packed_api.py
testing/python/transform/test_tilelang_transform_pipeline_planning.py
testing/python/transform/test_tilelang_transform_plan_update_buffer_allocation_location.py
testing/python/transform/test_tilelang_transform_producer_consumer_ws.py
testing/python/transform/test_tilelang_transform_simplify.py
testing/python/transform/test_tilelang_transform_split_host_device.py
testing/python/transform/test_tilelang_transform_thread_sync.py
tilelang/__init__.py
tilelang/_typing.py
tilelang/analysis/ast_printer.py
tilelang/analysis/fragment_loop_checker.py
tilelang/analysis/layout_visual.py
tilelang/analysis/nested_loop_checker.py
tilelang/autotuner/grouped_compile.py
tilelang/autotuner/param.py
tilelang/autotuner/tuner.py
tilelang/cache/__init__.py
tilelang/cache/kernel_cache.py
tilelang/carver/analysis.py
tilelang/carver/arch/__init__.py
tilelang/carver/arch/cdna.py
tilelang/carver/arch/cuda.py
tilelang/carver/arch/metal.py
tilelang/carver/arch/rdna.py
tilelang/carver/common_schedules.py
tilelang/carver/matmul_analysis.py
tilelang/carver/roller/hint.py
tilelang/carver/roller/node.py
tilelang/carver/roller/policy/default.py
tilelang/carver/roller/policy/tensorcore.py
tilelang/carver/roller/shape_inference/tir.py
tilelang/carver/template/base.py
tilelang/carver/template/conv.py
tilelang/carver/utils.py
tilelang/contrib/nvcc.py
tilelang/cpu/op/gemm/gemm_scalar.py
tilelang/cuda/intrinsics/macro/mma_macro_generator.py
tilelang/cuda/intrinsics/macro/mma_sm70_macro_generator.py
tilelang/cuda/intrinsics/macro/mma_sp_macro_generator.py
tilelang/cuda/intrinsics/macro/tcgen05_macro_generator.py
tilelang/cuda/intrinsics/macro/wgmma_macro_generator.py
tilelang/cuda/intrinsics/macro/wgmma_sp_macro_generator.py
tilelang/cuda/op/gemm/gemm_mma.py
tilelang/cuda/op/gemm/gemm_mma_sm70.py
tilelang/cuda/op/gemm/gemm_tcgen05.py
tilelang/cuda/op/gemm/gemm_wgmma.py
tilelang/cuda/op/gemm_sp/gemm_sp_mma.py
tilelang/cuda/op/gemm_sp/gemm_sp_wgmma.py
tilelang/engine/lower.py
tilelang/engine/param.py
tilelang/engine/phase.py
tilelang/jit/__init__.py
tilelang/jit/adapter/cutedsl/adapter.py
tilelang/jit/adapter/cutedsl/wrapper.py
tilelang/jit/adapter/cython/adapter.py
tilelang/jit/adapter/cython/cython_wrapper.pyx
tilelang/jit/adapter/nvrtc/adapter.py
tilelang/jit/adapter/nvrtc/libgen.py
tilelang/jit/adapter/nvrtc/wrapper.py
tilelang/jit/adapter/torch/metal.py
tilelang/jit/adapter/tvm_ffi.py
tilelang/jit/adapter/utils.py
tilelang/jit/adapter/wrapper.py
tilelang/jit/kernel.py
tilelang/language/__init__.py
tilelang/language/allocate.py
tilelang/language/annotations.py
tilelang/language/ast/__init__.py
tilelang/language/ast/_ffi_api.py
tilelang/language/ast/ir.py
tilelang/language/atomic.py
tilelang/language/builtin.py
tilelang/language/cluster.py
tilelang/language/copy_op.py

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-05-18T03:59:13Z

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run pre-commit run --all-files in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀

…tor/tirx-tvm-update # Conflicts: # testing/python/transform/test_tilelang_transform_producer_consumer_ws.py # tilelang/cuda/intrinsics/macro/tcgen05_macro_generator.py # tilelang/engine/lower.py # tilelang/utils/target.py

…dify target_host initialization in lower.py for compatibility with TVM's target handling.

…d library dependencies for TVM integration. Adjusted output targets and library names for better compatibility across platforms.

…h wildcard for better compatibility with dependency packages.

…tor/tirx-tvm-update

LeiWang1999 · 2026-05-18T17:50:23Z

@regression-perf

github-actions · 2026-05-18T18:18:13Z

Performance Regression Test Report

Triggered by: @LeiWang1999
Workflow run: https://github.com/tile-ai/tilelang/actions/runs/26050580122

Results

File	Original Latency	Current Latency	Speedup
example_gqa_bwd_tma_reduce_varlen	0.0327871	0.0342266	0.957943
example_gqa_bwd	0.0328678	0.0333474	0.985617
example_tilelang_nsa_fwd	0.00513174	0.00518024	0.990638
example_mhc_pre	0.11904	0.120122	0.990993
example_convolution_autotune	0.720457	0.726915	0.991116
example_mha_inference	0.0623633	0.0629221	0.991121
example_warp_specialize_gemm_copy_0_gemm_1	0.027213	0.0273687	0.994312
example_gemv	0.199967	0.201066	0.994536
example_fusedmoe_tilelang	0.0953415	0.0957216	0.996029
example_mha_bwd_bhsd	0.0296933	0.0298055	0.996238
example_group_per_split_token_cast_to_fp8	0.00754321	0.00757068	0.996372
example_dynamic	0.50665	0.508062	0.997222
example_mha_fwd_bshd	0.0190251	0.0190677	0.997762
sparse_mla_bwd	0.21373	0.21416	0.997991
example_gemm_autotune	0.0161791	0.016199	0.998774
example_gemm	0.0170982	0.0171162	0.998949
example_mhc_post	0.106353	0.106441	0.999174
block_sparse_attn_tilelang	0.00672042	0.00672594	0.999179
example_per_token_cast_to_fp8	0.00650466	0.0065083	0.999442
example_topk	0.00787251	0.00787605	0.99955
example_convolution	0.924473	0.924765	0.999684
topk_selector	0.0397301	0.0397407	0.999733
example_gqa_sink_bwd_bhsd	0.0301026	0.0301097	0.999765
example_gqa_sink_bwd_bhsd_sliding_window	0.018109	0.0181121	0.999826
example_tilelang_block_sparse_attn	0.00725174	0.00725196	0.999969
example_warp_specialize_gemm_softpipe_stage2	0.0195677	0.0195682	0.999973
example_elementwise_add	0.112937	0.112932	1.00004
example_tilelang_sparse_gqa_decode_varlen_indice	0.0117925	0.0117912	1.00011
sparse_mla_fwd_pipelined	0.0697167	0.0697019	1.00021
example_gemm_intrinsics	0.0253606	0.0253487	1.00047
example_warp_specialize_gemm_copy_1_gemm_0	0.019615	0.0196054	1.00049
example_vertical_slash_sparse_attn	0.167886	0.167788	1.00058
example_tilelang_sparse_gqa_decode_varlen_mask	0.0127939	0.0127831	1.00084
example_mha_fwd_bhsd	0.00906961	0.00905746	1.00134
example_mha_bwd_bshd	0.0293754	0.0293295	1.00156
example_mha_sink_bwd_bhsd	0.0519375	0.0518538	1.00162
example_mha_fwd_varlen	0.0326906	0.0326374	1.00163
example_mla_decode	0.315035	0.314497	1.00171
example_tilelang_gemm_splitk_vectorize_atomicadd	0.798986	0.797409	1.00198
example_tilelang_gemm_fp8	0.23841	0.237892	1.00218
example_tilelang_gemm_splitk	0.776777	0.774808	1.00254
example_blocksparse_gemm	0.0143873	0.0143494	1.00265
example_gqa_decode	0.0411818	0.0410682	1.00277
example_gqa_fwd_bshd	0.0517446	0.0515738	1.00331
example_mha_sink_fwd_bhsd_sliding_window	0.01268	0.0126358	1.0035
example_tilelang_nsa_decode	0.00502706	0.00500341	1.00473
example_linear_attn_bwd	0.118129	0.117483	1.0055
example_tilelang_gemm_fp8_2xAcc	0.0918185	0.0911856	1.00694
example_mha_sink_bwd_bhsd_sliding_window	0.0382854	0.0380113	1.00721
example_linear_attn_fwd	0.028708	0.0284922	1.00758
fp8_lighting_indexer	0.0234433	0.0232621	1.00779
example_warp_specialize_gemm_barrierpipe_stage2	0.0297104	0.0293945	1.01075
sparse_mla_fwd	0.085772	0.0838691	1.02269
example_mha_sink_fwd_bhsd	0.0130641	0.012654	1.03241

Artifacts

regression_result.png (speedup plot) is attached as a workflow artifact. Download it from the workflow run page above.

…tor/tirx-tvm-update # Conflicts: # src/backend/cuda/op/gemm_sp.cc # src/op/gemm.cc # src/op/gemm_sp.cc # src/op/gemm_sp.h # src/op/gemm_sp_py.cc # src/op/gemm_sp_py.h # tilelang/cuda/op/gemm_sp/gemm_sp_mma.py # tilelang/ir.py # tilelang/language/experimental/gemm_sp.py # tilelang/tileop/gemm_sp/__init__.py

…ctor/tirx-tvm-update

…torization. Introduce new functions for preferred vectorized size and update existing reduction logic to handle packed operations for bfloat16 and float16 types. Add nan-aware min and max operations in CUDA and ROCm backends, and update related tests to validate functionality.

…tor/tirx-tvm-update

…r packed operations in reduce.h

…tor/tirx-tvm-update # Conflicts: # src/transform/legalize_negative_index.cc # tilelang/jit/adapter/cutedsl/wrapper.py

…tor/tirx-tvm-update

LeiWang1999 · 2026-05-20T06:30:09Z

@regression-perf

LeiWang1999 · 2026-05-20T06:39:55Z

local test can pass, looking forward to the regression test.

…tor/tirx-tvm-update

…pdated `BufferLoadNode` to `tirx::BufferLoadNode` in `GetBarrier` and `LowerCluster` methods to ensure compatibility with recent changes in the TIR API.

## Summary This PR adds a Z3 SMT solver backend to `tvm::arith::Analyzer` for stronger integer arithmetic proving. The integration is guarded by `USE_Z3`, which defaults to `AUTO`. In the default mode, TVM enables Z3 when the static Z3 development artifacts are available and otherwise builds the conservative stub implementation. When Z3 is enabled, `Analyzer::CanProve` runs the existing TVM arithmetic analysis path first, then falls back to Z3 only when the existing analyzers cannot prove the predicate and the requested strength is `kSymbolicBound`. Z3 is linked statically from the PyPI `z3-static` package, so `libtvm` does not need a runtime `libz3` dependency. ## Features - Z3 build support through `USE_Z3`, defaulting to `AUTO`. - A new `arith::Z3Prover` sub-analyzer owned by `arith::Analyzer`. - SMT-LIB2 export for debugging and external solver reproduction. - Python debug/config APIs: `Analyzer.get_smtlib2`, `Analyzer.set_z3_timeout_ms`, `Analyzer.set_z3_rlimit`, and `Analyzer.get_z3_stats`. - C++ APIs for proving, binding, constraints, stats, model inspection, and satisfying-value counting. - Scalar integer, unsigned integer, and boolean expression translation to Z3. - Support for arithmetic, comparisons, boolean operators, `min`, `max`, `select`, `if_then_else`, `let`, casts, truncated division/modulo, floor division/modulo, and selected bitwise/shift operations. - Deterministic solver control using Z3 `rlimit`, with `random_seed` fixed to `42`. - Thread-local Z3 context sharing to reduce initialization overhead while keeping thread safety. - A disabled-mode stub implementation that returns conservative results when Z3 is not built. ## Implementation Notes - The real and stub implementations live in `src/arith/z3_prover.cc`, selected by the `TVM_USE_Z3` macro from `cmake/modules/contrib/Z3.cmake`. - `cmake/modules/contrib/Z3.cmake` first resolves the PIC static `libz3` layout provided by `z3-static` using its `z3_static.get_cmake_dir()` helper, then falls back to a custom `Z3_DIR` or `CMAKE_PREFIX_PATH` installation. - `USE_Z3=ON` requires Z3 to be found, while `USE_Z3=AUTO` allows source builds and CI jobs without Z3 artifacts to continue with the stub. - The Z3 fallback is exception-safe and gated behind `kSymbolicBound`, so the common `kDefault` path does not pay solver cost. - TVM `Div` and `Mod` are translated with truncating helpers rather than Z3's Euclidean operators to stay sound for negative dividends. - Shift handling relies on Z3's native bit-vector semantics and does not add hard assertions to the shared solver. ## References The implementation is based on the Z3 analyzer integration used in TileLang's TVM fork, with the upstream port kept scoped to TVM's arithmetic analyzer. - [tile-ai/tilelang#1367](tile-ai/tilelang#1367) - [tile-ai/tilelang#1458](tile-ai/tilelang#1458) - [tile-ai/tilelang#2216](tile-ai/tilelang#2216) - [tile-ai#22](tile-ai#22) - [tile-ai#24](tile-ai#24) - [Original TileLang TVM commit](tile-ai@e633295) --------- Signed-off-by: Ubospica <ubospica@gmail.com>

chengyupku and others added 18 commits January 28, 2026 07:57

Add tilelang semantics guide to programming guides (tile-ai#1740)

3e5f6fe

lint fix

f64be56

lint fix

3cff414

fix

c542090

lint fix

77d2bc8

Merge branch 'main' of https://github.com/tile-ai/tilelang into cheng…

6ca70ed

…yupku/main

Update TVM subproject to latest commit and remove debug print stateme…

56f3201

…nts in OptimizeForTarget function

Merge branch 'main' of https://github.com/tile-ai/tilelang

5e0ca63

# Conflicts: # src/op/builtin.h # src/transform/loop_unswitching.cc # testing/python/transform/test_tilelang_transform_loop_unswitching.py # tilelang/transform/__init__.py # tilelang/transform/pass_config.py

update tvm

512c995

pass fix

3cb4d45

Update submodule 'tvm' to latest commit f24c7b19

8d962c1

Apply formatting fixes

8ee736f

LeiWang1999 changed the title ~~[tilelang] Update TileLang to use tirx~~ [TIR][IR] Update to use tirx May 18, 2026

LeiWang1999 added 9 commits May 18, 2026 12:11

Update CMakeLists.txt to set HIDE_PRIVATE_SYMBOLS for TileLang and mo…

f13add9

…dify target_host initialization in lower.py for compatibility with TVM's target handling.

Refactor CMakeLists.txt and kernel_cache.py to update target names an…

2be9ca2

…d library dependencies for TVM integration. Adjusted output targets and library names for better compatibility across platforms.

Update repair-wheel-command in pyproject.toml to exclude libz3.so wit…

008d88c

…h wildcard for better compatibility with dependency packages.

Fix macOS Z3 dylib lookup

3869d1d

Move macOS Z3 install-name fix to install step

f4274ab

Update TVM submodule for runtime import fix

0c11d0b

Update ROCm backend for TVM module factory changes

0b0674d

Merge branch 'main' of https://github.com/tile-ai/tilelang into refac…

32e5ef2

…tor/tirx-tvm-update

LeiWang1999 added 2 commits May 18, 2026 23:49

Fix PyPI Z3 packaging links

a49e62b

Update TVM for Windows DLL lookup

acc01a3

Register Windows dependency DLL directories

58cf4ca

oraluben and others added 9 commits May 19, 2026 07:56

Bump tvm-ffi to match tirx

e894033

tvm update

6ab406b

Merge remote-tracking branch 'lei/refactor/tirx-tvm-update' into refa…

9baf1dc

…ctor/tirx-tvm-update

Apply lint fixes

828abdb

Merge branch 'main' of https://github.com/tile-ai/tilelang into refac…

913726e

…tor/tirx-tvm-update

Refactor ReduceLowerer: Remove unnecessary buffer allocation logic fo…

1620cb6

…r packed operations in reduce.h

Merge branch 'main' of https://github.com/tile-ai/tilelang into refac…

98d391a

…tor/tirx-tvm-update # Conflicts: # src/transform/legalize_negative_index.cc # tilelang/jit/adapter/cutedsl/wrapper.py

This was referenced May 20, 2026

Add TL_DISABLE_SHARED_MEMORY_REUSE pass config #2228

Merged

[Metal] Add Metal GEMM support with simdgroup_matrix MMA #1869

Merged

LeiWang1999 added 3 commits May 20, 2026 13:36

Normalize PyPI Z3 dylib before linking

3805846

Drop legacy target option string support

02b259a

Merge branch 'main' of https://github.com/tile-ai/tilelang into refac…

4c804bc

…tor/tirx-tvm-update

LeiWang1999 mentioned this pull request May 20, 2026

[Feature] Emit named aliases for dynamic shared memory in CUDA codegen for better readability #2061

Closed

LeiWang1999 added 2 commits May 20, 2026 15:12

Merge branch 'main' of https://github.com/tile-ai/tilelang into refac…

377aa10

…tor/tirx-tvm-update

Fix type reference for barrier annotations in CUDA copy operations. U…

7b2bae0

…pdated `BufferLoadNode` to `tirx::BufferLoadNode` in `GetBarrier` and `LowerCluster` methods to ensure compatibility with recent changes in the TIR API.

LeiWang1999 changed the title ~~[TIR][IR] Update to use tirx~~ [TIR][IR] Update to use tirx May 20, 2026

LeiWang1999 merged commit b939fa0 into tile-ai:main May 20, 2026
9 of 11 checks passed

ShigureNyako mentioned this pull request May 30, 2026

[BugFix][Examples] Use tirx in CDNA4 MXFP4 example #2310

Merged

This was referenced Jun 3, 2026

[ARITH] Add optional Z3-backed proving to Analyzer apache/tvm#19666

Closed

[ARITH] Add optional Z3-backed proving to Analyzer apache/tvm#19667

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TIR][IR] Update to use tirx#2216

[TIR][IR] Update to use tirx#2216
LeiWang1999 merged 51 commits into
tile-ai:mainfrom
LeiWang1999:refactor/tirx-tvm-update

LeiWang1999 commented May 18, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented May 18, 2026 •

edited

Loading

Review skipped

Uh oh!

github-actions Bot commented May 18, 2026

Uh oh!

LeiWang1999 commented May 18, 2026

Uh oh!

github-actions Bot commented May 18, 2026

Uh oh!

LeiWang1999 commented May 20, 2026

Uh oh!

LeiWang1999 commented May 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

LeiWang1999 commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Major changes

Validation

Review notes

Uh oh!

coderabbitai Bot commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

github-actions Bot commented May 18, 2026

Uh oh!

LeiWang1999 commented May 18, 2026

Uh oh!

github-actions Bot commented May 18, 2026

Performance Regression Test Report

Results

Artifacts

Uh oh!

LeiWang1999 commented May 20, 2026

Uh oh!

LeiWang1999 commented May 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

LeiWang1999 commented May 18, 2026 •

edited

Loading

coderabbitai Bot commented May 18, 2026 •

edited

Loading