Skip to content

[TIR][IR] Update to use tirx#2216

Merged
LeiWang1999 merged 51 commits into
tile-ai:mainfrom
LeiWang1999:refactor/tirx-tvm-update
May 20, 2026
Merged

[TIR][IR] Update to use tirx#2216
LeiWang1999 merged 51 commits into
tile-ai:mainfrom
LeiWang1999:refactor/tirx-tvm-update

Conversation

@LeiWang1999

@LeiWang1999 LeiWang1999 commented May 18, 2026

Copy link
Copy Markdown
Member

Closes apache/tvm-ffi#464

Summary

This PR migrates TileLang to the updated TVM tirx API and the newer TVM-FFI baseline. It replaces the old tir parser/builder/type surface with tirx, updates the C++ compiler and backend code to compile against the new namespaces, and refreshes the Python language, JIT, examples, and tests accordingly.

Major changes

  • Rename TVM TIR integration points from tir to tirx across C++ and Python:
    • tvm::tir / tvm.tir -> tvm::tirx / tvm.tirx
    • tvm.script.parser.tir -> tvm.tirx.script.parser
    • tvm.script.ir_builder.tir -> tvm.tirx.script.builder
    • tvm.tir.* type imports such as PrimExpr, Buffer, BufferRegion, IndexMap, and Var -> tvm.tirx.*
  • Adapt TileLang script/builder semantics to the tirx IR:
    • Replace scoped LetStmt builder/frame usage with tirx.bind / tirx::Bind.
    • Rebuild C++ LetStmt wrappers as SeqStmt({tirx::Bind(...), body}) where tirx represents the binding as a flat statement.
    • Add Python-side bind-value tracking (register_let_value, clear_let_values) for cases that still need to recover aliases from bound values, such as buffer-region pointer handling.
    • Update block and allocation helpers to use the tirx structured block APIs, including SBlockFrame, SBlockAllocBuffer, alloc_buffer, and sblock_attr.
  • Update the lowering pipeline and pass configuration keys for the renamed API surface:
    • Route supported passes through tirx.transform.*.
    • Rename pass config keys from tir.* to tirx.*, including vectorization, storage rewrite, async copy, noalias, lower-pass injection, and debug options.
    • Keep using the remaining upstream s_tir transforms for passes that have not moved to tirx yet.
  • Refresh TileLang compiler, backend, and runtime integration:
    • Update CUDA, ROCm, CPU, Metal, and WebGPU codegen/lowering code to use tirx nodes, visitors, intrinsic calls, and analysis helpers.
    • Remove the old src/support/ffi_aliases.h compatibility shim and add src/support/check.h wrappers around the new TVM-FFI check macros.
    • Update the TVM submodule, CMake linkage, wheel repair rules, and the apache-tvm-ffi dependency requirement.
  • Update Python APIs, examples, docs, and tests for the renamed tirx surface, including parser overrides, eager builder behavior, JIT adapters, TileOp templates, and transform tests.

Validation

  • ./format.sh

Review notes

This is a broad mechanical migration with some semantic updates around binding construction. Reviewers should pay particular attention to:

  • Places where LetStmt was converted to Bind, especially code that depends on binding scope or alias recovery.
  • Backend-specific lowering behavior after the tirx namespace and visitor/type updates.
  • Pass config compatibility for downstream users that may still pass old tir.* keys.
  • Packaging and runtime linkage after the TVM / TVM-FFI dependency update.

chengyupku and others added 18 commits January 28, 2026 07:57
- Added a new pass to hoist loop-invariant if statements out of loops, improving optimization opportunities.
- Introduced classes for collecting and checking conditions, as well as for rewriting statements.
- Integrated the new pass into the optimization pipeline and provided a corresponding API for usage.
…ache disabling in benchmark script

- Introduced CallNodeChecker class to identify CallNode expressions in loop conditions, enhancing loop-invariant checks.
- Updated IsLoopInvariant function to reject conditions containing CallNodes, preventing potential side effects.
- Added tilelang.disable_cache() in benchmark_mha_sink_fwd.py to optimize performance during benchmarking.
- Added support for Let-bound variables in the WrittenBufferReadChecker to improve buffer read checks.
- Introduced UsesLoopVarThroughLetBindings function to check if conditions depend on loop variables through Let bindings.
- Updated IsLoopInvariant function to account for Let bindings when determining loop invariance.
- Enhanced HoistableIfFinder to track Let bindings for variables bound to BufferLoad expressions.
- Added debug print statements in the OptimizeForTarget function to visualize the module state before and after loop unswitching.
- Added CallCheckerExcludingIf class to ensure function calls outside of hoisted if statements are identified, preventing potential synchronization issues during loop unswitching.
- Updated loop unswitching logic to incorporate the new call checker, enhancing safety and correctness.
- Integrated debug print statements in OptimizeForTarget to visualize module state before and after loop unswitching.
- Disabled tilelang cache in the benchmark script for improved performance.
- Introduced a new configuration option `tl.disable_loop_unswitching` to allow users to disable the loop unswitching optimization.
- Updated the Loop Unswitching pass to check this configuration and return the original function if the option is enabled.
- Added relevant documentation in the PassConfigKey enumeration for clarity.
# Conflicts:
#	src/op/builtin.h
#	src/transform/loop_unswitching.cc
#	testing/python/transform/test_tilelang_transform_loop_unswitching.py
#	tilelang/transform/__init__.py
#	tilelang/transform/pass_config.py
- Removed references to `tvm.tir` and replaced them with `tvm.tirx` across various files, including examples and backend operations.
- Updated target configuration in documentation to reflect the new usage of target config dictionaries instead of CLI-style strings.
- Cleaned up `pyproject.toml` and `load_tvm.cmake` by removing obsolete paths.
- Enhanced examples for dequantization and GEMM operations to utilize the new `tirx` constructs.
- Adjusted various backend operations to ensure compatibility with the new `tirx` namespace.
@coderabbitai

coderabbitai Bot commented May 18, 2026

Copy link
Copy Markdown
Contributor

Important

Review skipped

Too many files!

This PR contains 300 files, which is 150 over the limit of 150.

To get a review, narrow the scope:
• coderabbit review --type committed # exclude uncommitted changes
• coderabbit review --dir # limit to a subdirectory
• coderabbit review --base # compare against a closer base

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b052c381-e944-4f9f-aad4-3836a6a64237

📥 Commits

Reviewing files that changed from the base of the PR and between 7c4718f and 7b2bae0.

📒 Files selected for processing (300)
  • .agents/skills/tilelang-tvm-ir/SKILL.md
  • 3rdparty/tvm
  • CMakeLists.txt
  • cmake/load_tvm.cmake
  • cmake/pypi-z3/FindZ3.cmake
  • docs/get_started/targets.md
  • examples/dequantize_gemm/example_dequant_gemm_bf16_fp4_hopper.py
  • examples/dequantize_gemm/example_dequant_gemm_bf16_mxfp4_hopper.py
  • examples/dequantize_gemm/example_dequant_gemm_fp4_hopper.py
  • examples/dequantize_gemm/example_dequant_gemm_w4a8.py
  • examples/eager_jit/eagerjit.en.ipynb
  • examples/eager_jit/eagerjit.zh.ipynb
  • examples/plot_layout/README.md
  • examples/plot_layout/fragment_mfma_load_a.py
  • examples/plot_layout/fragment_mma_load_a.py
  • pyproject.toml
  • requirements-dev.txt
  • requirements.txt
  • src/backend/common/op/atomic_reduce.h
  • src/backend/common/op/cumsum.h
  • src/backend/common/op/fill.h
  • src/backend/common/op/finalize_reducer.h
  • src/backend/common/op/reduce.h
  • src/backend/common/op/transpose.h
  • src/backend/cpu/op/copy.cc
  • src/backend/cpu/op/fill.cc
  • src/backend/cpu/op/gemm.cc
  • src/backend/cpu/op/transpose.cc
  • src/backend/cuda/codegen/codegen_cuda.cc
  • src/backend/cuda/codegen/codegen_cuda.h
  • src/backend/cuda/codegen/codegen_cutedsl.cc
  • src/backend/cuda/codegen/codegen_cutedsl.h
  • src/backend/cuda/codegen/codegen_py.cc
  • src/backend/cuda/codegen/codegen_py.h
  • src/backend/cuda/codegen/intrin_rule_cuda.cc
  • src/backend/cuda/codegen/ptx.cc
  • src/backend/cuda/codegen/rt_mod_cuda.cc
  • src/backend/cuda/codegen/rt_mod_cutedsl.cc
  • src/backend/cuda/op/atomic_add.cc
  • src/backend/cuda/op/copy.cc
  • src/backend/cuda/op/copy.h
  • src/backend/cuda/op/copy_analysis.cc
  • src/backend/cuda/op/finalize_reducer.cc
  • src/backend/cuda/op/gemm.cc
  • src/backend/cuda/op/gemm_sp.cc
  • src/backend/cuda/op/reduce.cc
  • src/backend/cuda/runtime.cc
  • src/backend/metal/codegen/rt_mod_metal.cc
  • src/backend/metal/op/copy.cc
  • src/backend/metal/op/fill.cc
  • src/backend/metal/op/transpose.cc
  • src/backend/rocm/codegen/codegen_hip.cc
  • src/backend/rocm/codegen/codegen_hip.h
  • src/backend/rocm/codegen/intrin_rule_hip.cc
  • src/backend/rocm/codegen/rt_mod_hip.cc
  • src/backend/rocm/op/atomic_add.cc
  • src/backend/rocm/op/copy.cc
  • src/backend/rocm/op/finalize_reducer.cc
  • src/backend/rocm/op/gemm.cc
  • src/backend/rocm/op/reduce.cc
  • src/backend/webgpu/op/copy.cc
  • src/backend/webgpu/op/fill.cc
  • src/backend/webgpu/op/transpose.cc
  • src/config.h
  • src/ir.cc
  • src/layout/gemm_layouts.cc
  • src/layout/layout.cc
  • src/layout/layout.h
  • src/layout/tcgen05_layout.cc
  • src/layout/utils.cc
  • src/layout/utils.h
  • src/op/atomic_add.cc
  • src/op/atomic_add.h
  • src/op/atomic_reduce.cc
  • src/op/atomic_reduce.h
  • src/op/builtin.cc
  • src/op/builtin.h
  • src/op/copy.cc
  • src/op/copy.h
  • src/op/fill.cc
  • src/op/fill.h
  • src/op/finalize_reducer.cc
  • src/op/finalize_reducer.h
  • src/op/gemm.cc
  • src/op/gemm.h
  • src/op/gemm_sp.cc
  • src/op/gemm_sp.h
  • src/op/logical.cc
  • src/op/math.cc
  • src/op/operator.cc
  • src/op/operator.h
  • src/op/parallel.cc
  • src/op/parallel.h
  • src/op/reduce.cc
  • src/op/reduce.h
  • src/op/region.cc
  • src/op/region.h
  • src/op/tcgen5_meta.h
  • src/op/transpose.cc
  • src/op/transpose.h
  • src/op/utils.cc
  • src/op/utils.h
  • src/runtime/error_helpers.cc
  • src/runtime/logging.cc
  • src/support/check.h
  • src/support/ffi_aliases.h
  • src/target/codegen_c.cc
  • src/target/codegen_c.h
  • src/target/codegen_c_host.cc
  • src/target/codegen_c_host.h
  • src/target/rt_mod_c.cc
  • src/target/utils.cc
  • src/transform/annotate_device_regions.cc
  • src/transform/annotate_read_only_params.cc
  • src/transform/annotate_warp_group_reg_alloc.cc
  • src/transform/arg_binder.cc
  • src/transform/arg_binder.h
  • src/transform/cluster_planning.cc
  • src/transform/common/assume.cc
  • src/transform/common/assume.h
  • src/transform/common/attr.h
  • src/transform/common/collector.h
  • src/transform/common/constr_visitor.h
  • src/transform/common/loop_fusion_utils.h
  • src/transform/common/loop_vectorization_utils.h
  • src/transform/common/mbarrier.h
  • src/transform/common/pipeline_utils.h
  • src/transform/config_index_bitwidth.cc
  • src/transform/flatten_buffer.cc
  • src/transform/frontend_legalize.cc
  • src/transform/fuse_mbarrier_arrive_expect_tx.cc
  • src/transform/hoist_global_buffer_allocations.cc
  • src/transform/hoist_nonrestrict_params.cc
  • src/transform/if_stmt_binding.cc
  • src/transform/inject_assumes.cc
  • src/transform/inject_fence_proxy.cc
  • src/transform/inject_pipeline.cc
  • src/transform/inject_tcgen05_fence.cc
  • src/transform/instruction_annotation.cc
  • src/transform/layout_inference.cc
  • src/transform/layout_reducer.cc
  • src/transform/layout_reducer.h
  • src/transform/legalize_negative_index.cc
  • src/transform/legalize_safe_memory_access.cc
  • src/transform/legalize_vectorized_loop.cc
  • src/transform/loop_partition.cc
  • src/transform/loop_partition.h
  • src/transform/loop_unswitching.cc
  • src/transform/loop_vectorize.cc
  • src/transform/loop_vectorize.h
  • src/transform/lower_access_ptr.cc
  • src/transform/lower_blackwell_2sm.cc
  • src/transform/lower_device_kernel_launch.cc
  • src/transform/lower_device_storage_access_info.cc
  • src/transform/lower_hopper_intrin.cc
  • src/transform/lower_intrin.cc
  • src/transform/lower_l2_persistent_annotation.cc
  • src/transform/lower_ldg_stg.cc
  • src/transform/lower_opaque_block.cc
  • src/transform/lower_pdl.cc
  • src/transform/lower_ptx_async_copy.cc
  • src/transform/lower_shared_barrier.cc
  • src/transform/lower_shared_tmem.cc
  • src/transform/lower_thread_allreduce.cc
  • src/transform/lower_tile_op.cc
  • src/transform/make_packed_api.cc
  • src/transform/merge_if_stmt.cc
  • src/transform/merge_if_stmt.h
  • src/transform/merge_shared_memory_allocations.cc
  • src/transform/multi_version_buffer_rewriter.cc
  • src/transform/multi_version_buffer_rewriter.h
  • src/transform/parallel_loop_layout_validator.h
  • src/transform/persist_threadblock.cc
  • src/transform/pipeline_planning.cc
  • src/transform/plan_update_buffer_allocation_location.cc
  • src/transform/producer_consumer_ws.cc
  • src/transform/ptx_async_copy_injector.h
  • src/transform/simplify.cc
  • src/transform/split_host_device.cc
  • src/transform/storage_rewrite.cc
  • src/transform/thread_storage_sync.cc
  • src/transform/unroll_loop.cc
  • src/transform/vectorize_loop.cc
  • src/transform/verify_parallel_loop.cc
  • testing/python/arith/test_arith_hard.py
  • testing/python/arith/test_arith_intset.py
  • testing/python/arith/test_arith_iter_affine_map.py
  • testing/python/arith/test_arith_simplify.py
  • testing/python/cuda/test_cuda_f32x2_intrinsics.py
  • testing/python/cuda/test_cuda_mma_sm75_dispatch.py
  • testing/python/fastmath/test_mathops_fastmath.py
  • testing/python/issue/test_tilelang_issue_sm120_tma_smem_alignment.py
  • testing/python/jit/test_tilelang_jit_cutedsl.py
  • testing/python/kernel/test_tilelang_kernel_bf16_gemm_mma.py
  • testing/python/kernel/test_tilelang_kernel_fp8_gemm.py
  • testing/python/kernel/test_tilelang_kernel_fp8_gemv_simt.py
  • testing/python/language/test_tilelang_language_access_ptr.py
  • testing/python/language/test_tilelang_language_frontend_v2.py
  • testing/python/language/test_tilelang_language_pdl.py
  • testing/python/language/test_tilelang_language_view.py
  • testing/python/language/test_tilelang_language_warp_sync.py
  • testing/python/layout/test_tilelang_bank_swizzle_expand.py
  • testing/python/layout/test_tilelang_layout_equal.py
  • testing/python/math/test_math_fast_math.py
  • testing/python/runtime/test_tilelang_runtime_tma_validation.py
  • testing/python/target/test_tilelang_codegen_cutedsl_cp_async.py
  • testing/python/target/test_tilelang_rocm_target.py
  • testing/python/transform/test_tilelang_transform_Inject_software_pipeline.py
  • testing/python/transform/test_tilelang_transform_cluster_planning.py
  • testing/python/transform/test_tilelang_transform_decouple_type_cast.py
  • testing/python/transform/test_tilelang_transform_flatten_buffer.py
  • testing/python/transform/test_tilelang_transform_fuse_mbarrier_arrive_expect_tx.py
  • testing/python/transform/test_tilelang_transform_hoist_broadcast_values.py
  • testing/python/transform/test_tilelang_transform_inject_fence_proxy.py
  • testing/python/transform/test_tilelang_transform_inject_set_max_nreg.py
  • testing/python/transform/test_tilelang_transform_inject_tcgen05_fence.py
  • testing/python/transform/test_tilelang_transform_layout_inference.py
  • testing/python/transform/test_tilelang_transform_legalize_safe_memory_access.py
  • testing/python/transform/test_tilelang_transform_let_inline.py
  • testing/python/transform/test_tilelang_transform_lexical_alloc_scope.py
  • testing/python/transform/test_tilelang_transform_lower_hopper_intrin.py
  • testing/python/transform/test_tilelang_transform_lower_ldgstg.py
  • testing/python/transform/test_tilelang_transform_lower_ptx_async_copy.py
  • testing/python/transform/test_tilelang_transform_lower_shared_barrier.py
  • testing/python/transform/test_tilelang_transform_lower_shared_tmem.py
  • testing/python/transform/test_tilelang_transform_lower_tile_op.py
  • testing/python/transform/test_tilelang_transform_make_packed_api.py
  • testing/python/transform/test_tilelang_transform_pipeline_planning.py
  • testing/python/transform/test_tilelang_transform_plan_update_buffer_allocation_location.py
  • testing/python/transform/test_tilelang_transform_producer_consumer_ws.py
  • testing/python/transform/test_tilelang_transform_simplify.py
  • testing/python/transform/test_tilelang_transform_split_host_device.py
  • testing/python/transform/test_tilelang_transform_thread_sync.py
  • tilelang/__init__.py
  • tilelang/_typing.py
  • tilelang/analysis/ast_printer.py
  • tilelang/analysis/fragment_loop_checker.py
  • tilelang/analysis/layout_visual.py
  • tilelang/analysis/nested_loop_checker.py
  • tilelang/autotuner/grouped_compile.py
  • tilelang/autotuner/param.py
  • tilelang/autotuner/tuner.py
  • tilelang/cache/__init__.py
  • tilelang/cache/kernel_cache.py
  • tilelang/carver/analysis.py
  • tilelang/carver/arch/__init__.py
  • tilelang/carver/arch/cdna.py
  • tilelang/carver/arch/cuda.py
  • tilelang/carver/arch/metal.py
  • tilelang/carver/arch/rdna.py
  • tilelang/carver/common_schedules.py
  • tilelang/carver/matmul_analysis.py
  • tilelang/carver/roller/hint.py
  • tilelang/carver/roller/node.py
  • tilelang/carver/roller/policy/default.py
  • tilelang/carver/roller/policy/tensorcore.py
  • tilelang/carver/roller/shape_inference/tir.py
  • tilelang/carver/template/base.py
  • tilelang/carver/template/conv.py
  • tilelang/carver/utils.py
  • tilelang/contrib/nvcc.py
  • tilelang/cpu/op/gemm/gemm_scalar.py
  • tilelang/cuda/intrinsics/macro/mma_macro_generator.py
  • tilelang/cuda/intrinsics/macro/mma_sm70_macro_generator.py
  • tilelang/cuda/intrinsics/macro/mma_sp_macro_generator.py
  • tilelang/cuda/intrinsics/macro/tcgen05_macro_generator.py
  • tilelang/cuda/intrinsics/macro/wgmma_macro_generator.py
  • tilelang/cuda/intrinsics/macro/wgmma_sp_macro_generator.py
  • tilelang/cuda/op/gemm/gemm_mma.py
  • tilelang/cuda/op/gemm/gemm_mma_sm70.py
  • tilelang/cuda/op/gemm/gemm_tcgen05.py
  • tilelang/cuda/op/gemm/gemm_wgmma.py
  • tilelang/cuda/op/gemm_sp/gemm_sp_mma.py
  • tilelang/cuda/op/gemm_sp/gemm_sp_wgmma.py
  • tilelang/engine/lower.py
  • tilelang/engine/param.py
  • tilelang/engine/phase.py
  • tilelang/jit/__init__.py
  • tilelang/jit/adapter/cutedsl/adapter.py
  • tilelang/jit/adapter/cutedsl/wrapper.py
  • tilelang/jit/adapter/cython/adapter.py
  • tilelang/jit/adapter/cython/cython_wrapper.pyx
  • tilelang/jit/adapter/nvrtc/adapter.py
  • tilelang/jit/adapter/nvrtc/libgen.py
  • tilelang/jit/adapter/nvrtc/wrapper.py
  • tilelang/jit/adapter/torch/metal.py
  • tilelang/jit/adapter/tvm_ffi.py
  • tilelang/jit/adapter/utils.py
  • tilelang/jit/adapter/wrapper.py
  • tilelang/jit/kernel.py
  • tilelang/language/__init__.py
  • tilelang/language/allocate.py
  • tilelang/language/annotations.py
  • tilelang/language/ast/__init__.py
  • tilelang/language/ast/_ffi_api.py
  • tilelang/language/ast/ir.py
  • tilelang/language/atomic.py
  • tilelang/language/builtin.py
  • tilelang/language/cluster.py
  • tilelang/language/copy_op.py

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

Copy link
Copy Markdown

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run pre-commit run --all-files in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀

@LeiWang1999 LeiWang1999 changed the title [tilelang] Update TileLang to use tirx [TIR][IR] Update to use tirx May 18, 2026
…tor/tirx-tvm-update

# Conflicts:
#	testing/python/transform/test_tilelang_transform_producer_consumer_ws.py
#	tilelang/cuda/intrinsics/macro/tcgen05_macro_generator.py
#	tilelang/engine/lower.py
#	tilelang/utils/target.py
…dify target_host initialization in lower.py for compatibility with TVM's target handling.
…d library dependencies for TVM integration. Adjusted output targets and library names for better compatibility across platforms.
…h wildcard for better compatibility with dependency packages.
@LeiWang1999

Copy link
Copy Markdown
Member Author

@regression-perf

@github-actions

Copy link
Copy Markdown

Performance Regression Test Report

Triggered by: @LeiWang1999
Workflow run: https://github.com/tile-ai/tilelang/actions/runs/26050580122

Results

File Original Latency Current Latency Speedup
example_gqa_bwd_tma_reduce_varlen 0.0327871 0.0342266 0.957943
example_gqa_bwd 0.0328678 0.0333474 0.985617
example_tilelang_nsa_fwd 0.00513174 0.00518024 0.990638
example_mhc_pre 0.11904 0.120122 0.990993
example_convolution_autotune 0.720457 0.726915 0.991116
example_mha_inference 0.0623633 0.0629221 0.991121
example_warp_specialize_gemm_copy_0_gemm_1 0.027213 0.0273687 0.994312
example_gemv 0.199967 0.201066 0.994536
example_fusedmoe_tilelang 0.0953415 0.0957216 0.996029
example_mha_bwd_bhsd 0.0296933 0.0298055 0.996238
example_group_per_split_token_cast_to_fp8 0.00754321 0.00757068 0.996372
example_dynamic 0.50665 0.508062 0.997222
example_mha_fwd_bshd 0.0190251 0.0190677 0.997762
sparse_mla_bwd 0.21373 0.21416 0.997991
example_gemm_autotune 0.0161791 0.016199 0.998774
example_gemm 0.0170982 0.0171162 0.998949
example_mhc_post 0.106353 0.106441 0.999174
block_sparse_attn_tilelang 0.00672042 0.00672594 0.999179
example_per_token_cast_to_fp8 0.00650466 0.0065083 0.999442
example_topk 0.00787251 0.00787605 0.99955
example_convolution 0.924473 0.924765 0.999684
topk_selector 0.0397301 0.0397407 0.999733
example_gqa_sink_bwd_bhsd 0.0301026 0.0301097 0.999765
example_gqa_sink_bwd_bhsd_sliding_window 0.018109 0.0181121 0.999826
example_tilelang_block_sparse_attn 0.00725174 0.00725196 0.999969
example_warp_specialize_gemm_softpipe_stage2 0.0195677 0.0195682 0.999973
example_elementwise_add 0.112937 0.112932 1.00004
example_tilelang_sparse_gqa_decode_varlen_indice 0.0117925 0.0117912 1.00011
sparse_mla_fwd_pipelined 0.0697167 0.0697019 1.00021
example_gemm_intrinsics 0.0253606 0.0253487 1.00047
example_warp_specialize_gemm_copy_1_gemm_0 0.019615 0.0196054 1.00049
example_vertical_slash_sparse_attn 0.167886 0.167788 1.00058
example_tilelang_sparse_gqa_decode_varlen_mask 0.0127939 0.0127831 1.00084
example_mha_fwd_bhsd 0.00906961 0.00905746 1.00134
example_mha_bwd_bshd 0.0293754 0.0293295 1.00156
example_mha_sink_bwd_bhsd 0.0519375 0.0518538 1.00162
example_mha_fwd_varlen 0.0326906 0.0326374 1.00163
example_mla_decode 0.315035 0.314497 1.00171
example_tilelang_gemm_splitk_vectorize_atomicadd 0.798986 0.797409 1.00198
example_tilelang_gemm_fp8 0.23841 0.237892 1.00218
example_tilelang_gemm_splitk 0.776777 0.774808 1.00254
example_blocksparse_gemm 0.0143873 0.0143494 1.00265
example_gqa_decode 0.0411818 0.0410682 1.00277
example_gqa_fwd_bshd 0.0517446 0.0515738 1.00331
example_mha_sink_fwd_bhsd_sliding_window 0.01268 0.0126358 1.0035
example_tilelang_nsa_decode 0.00502706 0.00500341 1.00473
example_linear_attn_bwd 0.118129 0.117483 1.0055
example_tilelang_gemm_fp8_2xAcc 0.0918185 0.0911856 1.00694
example_mha_sink_bwd_bhsd_sliding_window 0.0382854 0.0380113 1.00721
example_linear_attn_fwd 0.028708 0.0284922 1.00758
fp8_lighting_indexer 0.0234433 0.0232621 1.00779
example_warp_specialize_gemm_barrierpipe_stage2 0.0297104 0.0293945 1.01075
sparse_mla_fwd 0.085772 0.0838691 1.02269
example_mha_sink_fwd_bhsd 0.0130641 0.012654 1.03241

Artifacts

  • regression_result.png (speedup plot) is attached as a workflow artifact. Download it from the workflow run page above.

oraluben and others added 9 commits May 19, 2026 07:56
…tor/tirx-tvm-update

# Conflicts:
#	src/backend/cuda/op/gemm_sp.cc
#	src/op/gemm.cc
#	src/op/gemm_sp.cc
#	src/op/gemm_sp.h
#	src/op/gemm_sp_py.cc
#	src/op/gemm_sp_py.h
#	tilelang/cuda/op/gemm_sp/gemm_sp_mma.py
#	tilelang/ir.py
#	tilelang/language/experimental/gemm_sp.py
#	tilelang/tileop/gemm_sp/__init__.py
…torization. Introduce new functions for preferred vectorized size and update existing reduction logic to handle packed operations for bfloat16 and float16 types. Add nan-aware min and max operations in CUDA and ROCm backends, and update related tests to validate functionality.
…tor/tirx-tvm-update

# Conflicts:
#	src/transform/legalize_negative_index.cc
#	tilelang/jit/adapter/cutedsl/wrapper.py
@LeiWang1999

Copy link
Copy Markdown
Member Author

@regression-perf

@LeiWang1999

Copy link
Copy Markdown
Member Author

local test can pass, looking forward to the regression test.

…pdated `BufferLoadNode` to `tirx::BufferLoadNode` in `GetBarrier` and `LowerCluster` methods to ensure compatibility with recent changes in the TIR API.
@LeiWang1999 LeiWang1999 changed the title [TIR][IR] Update to use tirx [TIR][IR] Update to use tirx May 20, 2026
@LeiWang1999 LeiWang1999 merged commit b939fa0 into tile-ai:main May 20, 2026
9 of 11 checks passed
tlopex pushed a commit to apache/tvm that referenced this pull request Jun 18, 2026
## Summary

This PR adds a Z3 SMT solver backend to `tvm::arith::Analyzer` for
stronger integer arithmetic proving.

The integration is guarded by `USE_Z3`, which defaults to `AUTO`. In the
default mode, TVM enables Z3 when the static Z3 development artifacts
are available and otherwise builds the conservative stub implementation.
When Z3 is enabled, `Analyzer::CanProve` runs the existing TVM
arithmetic analysis path first, then falls back to Z3 only when the
existing analyzers cannot prove the predicate and the requested strength
is `kSymbolicBound`. Z3 is linked statically from the PyPI `z3-static`
package, so `libtvm` does not need a runtime `libz3` dependency.

## Features

- Z3 build support through `USE_Z3`, defaulting to `AUTO`.
- A new `arith::Z3Prover` sub-analyzer owned by `arith::Analyzer`.
- SMT-LIB2 export for debugging and external solver reproduction.
- Python debug/config APIs: `Analyzer.get_smtlib2`,
`Analyzer.set_z3_timeout_ms`, `Analyzer.set_z3_rlimit`, and
`Analyzer.get_z3_stats`.
- C++ APIs for proving, binding, constraints, stats, model inspection,
and satisfying-value counting.
- Scalar integer, unsigned integer, and boolean expression translation
to Z3.
- Support for arithmetic, comparisons, boolean operators, `min`, `max`,
`select`, `if_then_else`, `let`, casts, truncated division/modulo, floor
division/modulo, and selected bitwise/shift operations.
- Deterministic solver control using Z3 `rlimit`, with `random_seed`
fixed to `42`.
- Thread-local Z3 context sharing to reduce initialization overhead
while keeping thread safety.
- A disabled-mode stub implementation that returns conservative results
when Z3 is not built.

## Implementation Notes

- The real and stub implementations live in `src/arith/z3_prover.cc`,
selected by the `TVM_USE_Z3` macro from
`cmake/modules/contrib/Z3.cmake`.
- `cmake/modules/contrib/Z3.cmake` first resolves the PIC static `libz3`
layout provided by `z3-static` using its `z3_static.get_cmake_dir()`
helper, then falls back to a custom `Z3_DIR` or `CMAKE_PREFIX_PATH`
installation.
- `USE_Z3=ON` requires Z3 to be found, while `USE_Z3=AUTO` allows source
builds and CI jobs without Z3 artifacts to continue with the stub.
- The Z3 fallback is exception-safe and gated behind `kSymbolicBound`,
so the common `kDefault` path does not pay solver cost.
- TVM `Div` and `Mod` are translated with truncating helpers rather than
Z3's Euclidean operators to stay sound for negative dividends.
- Shift handling relies on Z3's native bit-vector semantics and does not
add hard assertions to the shared solver.

## References

The implementation is based on the Z3 analyzer integration used in
TileLang's TVM fork, with the upstream port kept scoped to TVM's
arithmetic analyzer.

- [tile-ai/tilelang#1367](tile-ai/tilelang#1367)
- [tile-ai/tilelang#1458](tile-ai/tilelang#1458)
- [tile-ai/tilelang#2216](tile-ai/tilelang#2216)
- [tile-ai#22](tile-ai#22)
- [tile-ai#24](tile-ai#24)
- [Original TileLang TVM
commit](tile-ai@e633295)

---------

Signed-off-by: Ubospica <ubospica@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] NPE since 0.1.8

3 participants