[REFACTOR][CODEGEN] Phase out tvm_global_barrier_state and tvm_prepare_global_barrier by tqchen · Pull Request #19454 · apache/tvm

tqchen · 2026-04-27T14:18:58Z

Phase out the legacy spin-on-global-memory CUDA barrier machinery
(tvm_global_barrier_state / __tvm_prepare_global_barrier / the
tvm_global_barrier_kinit() builtin and the tirx.detect_global_barrier
pass-config option). CUDA's native cooperative groups / grid sync
primitives cover the use case better; the bespoke implementation has
been dead in the active codegen pipelines.

This is a deletion-only refactor across 10 files (~−264 lines net):

Public symbol constants in include/tvm/runtime/device_api.h
TIR builtin op tvm_global_barrier_kinit() (include/tvm/tirx/builtin.h,
src/tirx/op/builtin.cc)
Pass config tirx.detect_global_barrier (src/tirx/ir/transform.cc)
Entire kGlobal branch of s_tir::ThreadSync including
InitGlobalBarrier, MakeGlobalBarrier, and supporting state in
ThreadSyncInserter
CUDAPrepGlobalBarrier runtime class + CUDAModuleNode::GetGlobal()
CodeGenCUDA::PrintStorageSync "global" branch and the
VisitStmt_(EvaluateNode*) override + 3 member fields
Two Python pipeline opt-in blocks (s_tir/pipeline.py + adreno mirror)

No Python or test references to these symbols. Build clean
(260 targets), CUDA codegen 50/50 passed, tirx-base + tirx-transform
621 passed, IRF cpptests 8/8 passed.

…e_global_barrier This PR removes the legacy spin-on-global-memory barrier machinery from TVM. The implementation was a software synchronization primitive that used a global device counter with busy-wait polling — CUDA provides native cooperative groups / grid-sync for this use case, making this dead code. Main changes: - Remove `tvm_global_barrier_kinit` builtin op and `tirx.detect_global_barrier` pass config option - Strip kGlobal barrier path from `ThreadSyncInserter` (deletes ~130 lines of init/make logic) - Delete `CUDAModuleNode::GetGlobal`, `CUDAPrepGlobalBarrier`, and the `GetFunction` early-return for `tvm_prepare_global_barrier` - Drop the "global" branch in `CodeGenCUDA::PrintStorageSync` and the `VisitStmt_(EvaluateNode*)` override that emitted the kinit shared memory setup - Remove the `tirx.detect_global_barrier` opt-in from both default and Adreno GPU pipelines

gemini-code-assist

Code Review

This pull request removes the legacy global barrier implementation across the TVM runtime, TIRX, and CUDA codegen. This involves deleting global barrier symbols, built-in operators, configuration options, and the corresponding logic in the thread synchronization transformation and CUDA code generation. Review feedback suggests that the CUDA codegen should explicitly throw an error if a global sync is encountered to avoid silent failures, and identifies a redundant header inclusion in the thread storage synchronization pass.

Per code review: the removal of the global barrier path left PrintStorageSync silently doing nothing when sync == "global". Add an explicit TVM_FFI_THROW(InternalError) so any stale IR reaching this path fails loudly instead of generating incorrect code.

tqchen · 2026-04-27T15:02:40Z

Thanks for the review, addressing both comments:

Comment 1 (PrintStorageSync silent no-op) — ACCEPTED

You are correct. After removing the global barrier branch, PrintStorageSync silently did nothing for sync == "global", which would produce incorrect code without any diagnostic. Added an explicit TVM_FFI_THROW(InternalError) for the "global" case (matching the existing error style in the file, e.g. PrintStorageScope already has TVM_FFI_ICHECK_NE(scope, "global")). Pushed as a follow-up commit (36a69d0).

Comment 2 (redundant <tvm/tirx/op.h> include) — REJECTED

The include is intentional and required. make_zero (used at line 300 in ThreadSyncAfterWaitQueueInserter::VisitStmt_) is declared in <tvm/tirx/op.h>. It was previously provided transitively through ../../tirx/transform/ir_utils.h, which we removed as part of the cleanup. Neither <tvm/tirx/builtin.h> nor <tvm/ir/op.h> (which builtin.h includes) declare make_zero — that symbol lives only in <tvm/tirx/op.h>. Removing it causes a build error.

tqchen · 2026-04-27T16:53:18Z

The `#include <tvm/tirx/op.h>` include is required and not redundant.

`builtin.h` includes `<tvm/ir/op.h>` but not `<tvm/tirx/op.h>`. The `make_zero` function is defined in `<tvm/tirx/op.h>` (line 300 of this file) and used by `ThreadSyncAfterWaitQueueInserter::VisitStmt_` at line 300.

Empirical verification: removing `#include <tvm/tirx/op.h>` and attempting a focused build fails with:
```
error: 'make_zero' was not declared in this scope
```

The include is necessary.

gemini-code-assist Bot reviewed Apr 27, 2026

View reviewed changes

Comment thread src/target/source/codegen_cuda.cc

Comment thread src/s_tir/transform/thread_storage_sync.cc

tlopex approved these changes Apr 27, 2026

View reviewed changes

tlopex merged commit 216be63 into apache:main Apr 27, 2026
9 checks passed

ysh329 mentioned this pull request May 6, 2026

[Release] v0.24.0 Release Candidate Notes #19513

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[REFACTOR][CODEGEN] Phase out tvm_global_barrier_state and tvm_prepare_global_barrier#19454

[REFACTOR][CODEGEN] Phase out tvm_global_barrier_state and tvm_prepare_global_barrier#19454
tlopex merged 2 commits into
apache:mainfrom
tqchen:phase-out-global-barrier-state

tqchen commented Apr 27, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

tqchen commented Apr 27, 2026

Uh oh!

tqchen commented Apr 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

tqchen commented Apr 27, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

tqchen commented Apr 27, 2026

Uh oh!

tqchen commented Apr 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants