[Pipeline] Refactor software pipeline transforms#2245
Conversation
|
👋 Hi! Thank you for contributing to the TileLang project. Please remember to run We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀 |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughSplits software-pipeline planning/rewriting into analyzers (access, body, stage), barrier processing, helpers, and a buffer-versioning rewriter; refactors PipelinePlanner to delegate to these components; updates pass ordering, pass-config, tests, and build sources. ChangesSoftware Pipeline Transform Refactor
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/transform/pipeline/access_analysis.cc`:
- Around line 174-179: The if-then-else handling in access_analysis.cc
incorrectly hard-resets within_condition_expr_ and leaves the else branch
visited with condition-context set; modify the block handling
op->op.same_as(builtin::if_then_else()) to save the previous
within_condition_expr_ value (e.g., auto prev = within_condition_expr_), set
within_condition_expr_ = true only while visiting the condition via
this->VisitExpr(op->args[0]), then restore within_condition_expr_ = prev before
visiting the then- and else- branches (visit op->args[1] and subsequent else
cases) so the else case is not treated as a condition and nested condition
contexts are preserved; apply the same save/restore pattern to the analogous
code range for 186-195.
In `@src/transform/pipeline/barrier.cc`:
- Around line 429-438: The loops over original_order access tma_copies[i]
without validating sizes, which can cause OOB if tma_copies and original_order
differ; before the loops in barrier.cc, add a check/assert that
tma_copies.size() == original_order.size() (or handle the mismatch explicitly)
so accesses to tma_copies[i] in the first loop that sets last_tma_idx and the
second loop that tests is_zero(tma_copies[i]) are safe; update any error paths
to return or abort if the sizes differ to prevent undefined behavior.
In `@testing/python/language/test_tilelang_language_tma_copy.py`:
- Around line 55-56: Update the module docstring to reflect the new TMA wait API
used in the test by replacing or augmenting the mention of
T.mbarrier_wait_parity() with the current T.barrier_wait(...) call (or mention
both forms), and briefly describe the parameter usage (e.g., parity argument k %
2) so readers understand how the test synchronizes TMA loads; look for the
docstring at the top of the test module and update the text to reference
T.barrier_wait and its parity usage alongside or instead of
T.mbarrier_wait_parity.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: d5978427-5287-426b-aea2-00a13e9fc634
📒 Files selected for processing (17)
CMakeLists.txtsrc/transform/inject_pipeline.ccsrc/transform/pipeline/access_analysis.ccsrc/transform/pipeline/access_analysis.hsrc/transform/pipeline/barrier.ccsrc/transform/pipeline/barrier.hsrc/transform/pipeline/body_analysis.hsrc/transform/pipeline/helpers.ccsrc/transform/pipeline/helpers.hsrc/transform/pipeline/rewriter.ccsrc/transform/pipeline/rewriter.hsrc/transform/pipeline/stage_analysis.hsrc/transform/pipeline_planning.cctesting/python/language/test_tilelang_language_tma_copy.pytesting/python/transform/test_tilelang_transform_Inject_software_pipeline.pytesting/python/transform/test_tilelang_transform_pipeline_planning.pytilelang/engine/phase.py
| } else if (op->op.same_as(builtin::if_then_else())) { | ||
| within_condition_expr_ = true; | ||
| this->VisitExpr(op->args[0]); | ||
| within_condition_expr_ = false; | ||
| for (auto i = 1; i < op->args.size(); i++) { | ||
| this->VisitExpr(op->args[i]); |
There was a problem hiding this comment.
Preserve conditional-context state and don’t mark else as a condition expression.
within_condition_expr_ is hard-reset instead of restored (nested contexts), and else_case is incorrectly visited with condition-context enabled (Line 192). This can hide real global reads and misclassify copy stages.
💡 Suggested fix
void BufferRegionCollector::VisitExpr_(const CallNode *op) {
@@
} else if (op->op.same_as(builtin::if_then_else())) {
- within_condition_expr_ = true;
+ bool prev_within_condition_expr = within_condition_expr_;
+ within_condition_expr_ = true;
this->VisitExpr(op->args[0]);
- within_condition_expr_ = false;
+ within_condition_expr_ = prev_within_condition_expr;
for (auto i = 1; i < op->args.size(); i++) {
this->VisitExpr(op->args[i]);
}
@@
void BufferRegionCollector::VisitStmt_(const IfThenElseNode *op) {
- within_condition_expr_ = true;
+ bool prev_within_condition_expr = within_condition_expr_;
+ within_condition_expr_ = true;
this->VisitExpr(op->condition);
- within_condition_expr_ = false;
+ within_condition_expr_ = prev_within_condition_expr;
this->VisitStmt(op->then_case);
if (op->else_case.defined()) {
- within_condition_expr_ = true;
this->VisitStmt(op->else_case.value());
- within_condition_expr_ = false;
}
}Also applies to: 186-195
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/transform/pipeline/access_analysis.cc` around lines 174 - 179, The
if-then-else handling in access_analysis.cc incorrectly hard-resets
within_condition_expr_ and leaves the else branch visited with condition-context
set; modify the block handling op->op.same_as(builtin::if_then_else()) to save
the previous within_condition_expr_ value (e.g., auto prev =
within_condition_expr_), set within_condition_expr_ = true only while visiting
the condition via this->VisitExpr(op->args[0]), then restore
within_condition_expr_ = prev before visiting the then- and else- branches
(visit op->args[1] and subsequent else cases) so the else case is not treated as
a condition and nested condition contexts are preserved; apply the same
save/restore pattern to the analogous code range for 186-195.
| for (size_t i = 0; i < original_order.size(); i++) { | ||
| if (!is_zero(tma_copies[i])) | ||
| last_tma_idx = static_cast<int>(i); | ||
| } | ||
|
|
||
| // Phase 1: Rewrite TMA copy blocks - all share barrier slot 0. | ||
| // ExpandPipelineBarriers (called later) will rewrite indices to be | ||
| // stage-dependent. Only the last TMA copy emits arrive. | ||
| for (size_t i = 0; i < original_order.size(); i++) { | ||
| if (is_zero(tma_copies[i])) |
There was a problem hiding this comment.
Potential out-of-bounds access if tma_copies size mismatches original_order size.
The loop iterates over original_order.size() but accesses tma_copies[i] without verifying that tma_copies has the same length. If a caller provides a shorter tma_copies array, this causes undefined behavior.
🛡️ Proposed fix: Add assertion for size match
Buffer RewritePipelineTmaBarriers(
Array<SBlock> &original_order, PipelineInfo &pipeline_info,
const Array<Integer> &tma_copies, Map<Var, Buffer> &buffer_data_to_buffer,
BufferSet &allocated_buffers, Array<Buffer> &block_local_allocs,
Var loop_var, PrimExpr loop_min, int num_stages) {
+ ICHECK_EQ(tma_copies.size(), original_order.size())
+ << "tma_copies must have the same size as original_order";
if (!std::any_of(tma_copies.begin(), tma_copies.end(),
[](const Integer &tc) { return !is_zero(tc); })) {
return Buffer();
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/transform/pipeline/barrier.cc` around lines 429 - 438, The loops over
original_order access tma_copies[i] without validating sizes, which can cause
OOB if tma_copies and original_order differ; before the loops in barrier.cc, add
a check/assert that tma_copies.size() == original_order.size() (or handle the
mismatch explicitly) so accesses to tma_copies[i] in the first loop that sets
last_tma_idx and the second loop that tests is_zero(tma_copies[i]) are safe;
update any error paths to return or abort if the sizes differ to prevent
undefined behavior.
| T.barrier_wait(mbar_A, k % 2) | ||
| T.barrier_wait(mbar_B, k % 2) |
There was a problem hiding this comment.
Sync test documentation with the new wait API usage.
This test now uses T.barrier_wait(...), but the module docstring still states T.mbarrier_wait_parity() for TMA load synchronization. Please update the docstring (or mention both forms) to avoid confusion.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@testing/python/language/test_tilelang_language_tma_copy.py` around lines 55 -
56, Update the module docstring to reflect the new TMA wait API used in the test
by replacing or augmenting the mention of T.mbarrier_wait_parity() with the
current T.barrier_wait(...) call (or mention both forms), and briefly describe
the parameter usage (e.g., parity argument k % 2) so readers understand how the
test synchronizes TMA loads; look for the docstring at the top of the test
module and update the text to reference T.barrier_wait and its parity usage
alongside or instead of T.mbarrier_wait_parity.
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/transform/pipeline/stage_analysis.h (1)
114-115: 💤 Low valueUnused variable
if_then_elsemay trigger compiler warning.The variable is declared but never used. Consider using the cast result directly in the condition or suppressing the warning.
♻️ Suggested fix
- if (const auto *if_then_else = node.as<IfThenElseNode>()) { + if (node.as<IfThenElseNode>()) { conditional = true; return; }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/transform/pipeline/stage_analysis.h` around lines 114 - 115, The local pointer if_then_else is assigned from node.as<IfThenElseNode>() but never used, which can trigger a compiler warning; replace the declaration with a direct check by changing the condition to use node.as<IfThenElseNode>() directly (i.e., if (node.as<IfThenElseNode>()) { conditional = true; }) or, if you actually need the casted pointer later, use the variable where needed or mark it [[maybe_unused]]; update the code around the if (const auto *if_then_else = node.as<IfThenElseNode>()) and the conditional assignment to remove the unused variable.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@src/transform/pipeline/stage_analysis.h`:
- Around line 114-115: The local pointer if_then_else is assigned from
node.as<IfThenElseNode>() but never used, which can trigger a compiler warning;
replace the declaration with a direct check by changing the condition to use
node.as<IfThenElseNode>() directly (i.e., if (node.as<IfThenElseNode>()) {
conditional = true; }) or, if you actually need the casted pointer later, use
the variable where needed or mark it [[maybe_unused]]; update the code around
the if (const auto *if_then_else = node.as<IfThenElseNode>()) and the
conditional assignment to remove the unused variable.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: faacdcf2-4a31-40be-9467-bac5f84f7b79
📒 Files selected for processing (6)
CMakeLists.txtsrc/transform/pipeline/access_analysis.ccsrc/transform/pipeline/barrier.ccsrc/transform/pipeline/helpers.ccsrc/transform/pipeline/stage_analysis.htilelang/engine/phase.py
|
@regression-perf |
Performance Regression Test ReportTriggered by: @LeiWang1999 Results
Artifacts
|
|
@regression-perf |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/transform/pipeline_planning.cc (1)
135-174:⚠️ Potential issue | 🟠 Major | ⚡ Quick winKeep the explicit-annotation body aligned with the flattened schedule.
This branch filters pipeline metadata against
pipeline_stmts, but it still writes backpipeline_body_stmts. IfNormalizePipelineBodyleaves nestedSeqStmts, the loop body andsoftware_pipeline_*annotations can describe different statement lists, which is exactly what thenum_stagespath avoids a few lines below. Rebuild the explicit path from the flattened list too.Suggested fix
- n->body = MakePipelineBody(pipeline_body_stmts); + n->body = MakePipelineBody(pipeline_stmts);🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/transform/pipeline_planning.cc` around lines 135 - 174, The explicit-annotation branch currently writes back the original pipeline_body_stmts while metadata (filtered_order_array/filtered_stage_array and replayable masks) are computed from the flattened pipeline_stmts, causing mismatch when NormalizePipelineBody left nested SeqStmt nodes; update the code so the loop body is rebuilt from the flattened list (pipeline_stmts) instead of pipeline_body_stmts (i.e. replace the final n->body = MakePipelineBody(pipeline_body_stmts) with n->body = MakePipelineBody(pipeline_stmts)) so annotations (s_tir::attr::software_pipeline_order/stage and kPipelineReplayableScalarBinds) match the actual flattened statement sequence used to compute them.
🧹 Nitpick comments (2)
src/transform/if_stmt_binding.cc (1)
51-73: ⚡ Quick winReuse the shared replayability analyzer here.
This pass now reimplements the same access/write/replayability flow that
src/transform/pipeline/body_analysis.halready centralizes, and the two paths are already drifting:PipelinePlanningBodyAnalyzer::CollectStmtAccessRegions()wraps each stmt in anSBlock, while this copy does not. Please route both through one helper soIfStmtBindingand pipeline planning cannot disagree on which binds are replayable.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/transform/if_stmt_binding.cc` around lines 51 - 73, The code duplicates access/write/replayability logic; replace the local CollectStmtAccessRegions/CollectWriteBuffers/IsReplayableBindStmt implementations in IfStmtBinding with calls into the shared analyzer in PipelinePlanningBodyAnalyzer (or its exported helper) so both paths use the same logic; specifically, stop constructing BufferRegionCollector here and instead call PipelinePlanningBodyAnalyzer::CollectStmtAccessRegions (or the shared helper) which wraps statements in an SBlock correctly, then use its returned reads/writes to build write_buffers and to call IsReplayableScalarBind — ensuring we preserve the SBlock wrapping behavior used by the pipeline analyzer.testing/python/transform/test_tilelang_transform_pipeline_planning.py (1)
195-203: ⚡ Quick winAssert the injected IR, not just that injection succeeds.
This regression currently passes as long as
InjectSoftwarePipelinedoes not throw. A no-op or wrong rewrite would still slip through. Please add one structural postcondition on the injected module as well, so the guarded-bind case stays covered end-to-end. Based on learnings, focus assertions on structural patterns in the generated kernel source rather than specific numeric literals.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@testing/python/transform/test_tilelang_transform_pipeline_planning.py` around lines 195 - 203, After calling tl.transform.InjectSoftwarePipeline()(mod) re-collect pipeline annotations (e.g., call _collect_pipeline_loop_annotations(mod["main"]) again) and assert a structural postcondition on the injected IR: ensure the annotation(s) now include the guarded-bind information (check that "software_pipeline_replayable_scalar_binds" is present and non-empty on the annotation for the pipeline) or, alternatively, inspect the generated kernel body for a Bind-like pattern (e.g., a non-empty list/string containing "bind" or equivalent), so the test verifies the rewrite produced the expected structural change rather than merely not throwing.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@src/transform/pipeline_planning.cc`:
- Around line 135-174: The explicit-annotation branch currently writes back the
original pipeline_body_stmts while metadata
(filtered_order_array/filtered_stage_array and replayable masks) are computed
from the flattened pipeline_stmts, causing mismatch when NormalizePipelineBody
left nested SeqStmt nodes; update the code so the loop body is rebuilt from the
flattened list (pipeline_stmts) instead of pipeline_body_stmts (i.e. replace the
final n->body = MakePipelineBody(pipeline_body_stmts) with n->body =
MakePipelineBody(pipeline_stmts)) so annotations
(s_tir::attr::software_pipeline_order/stage and kPipelineReplayableScalarBinds)
match the actual flattened statement sequence used to compute them.
---
Nitpick comments:
In `@src/transform/if_stmt_binding.cc`:
- Around line 51-73: The code duplicates access/write/replayability logic;
replace the local
CollectStmtAccessRegions/CollectWriteBuffers/IsReplayableBindStmt
implementations in IfStmtBinding with calls into the shared analyzer in
PipelinePlanningBodyAnalyzer (or its exported helper) so both paths use the same
logic; specifically, stop constructing BufferRegionCollector here and instead
call PipelinePlanningBodyAnalyzer::CollectStmtAccessRegions (or the shared
helper) which wraps statements in an SBlock correctly, then use its returned
reads/writes to build write_buffers and to call IsReplayableScalarBind —
ensuring we preserve the SBlock wrapping behavior used by the pipeline analyzer.
In `@testing/python/transform/test_tilelang_transform_pipeline_planning.py`:
- Around line 195-203: After calling tl.transform.InjectSoftwarePipeline()(mod)
re-collect pipeline annotations (e.g., call
_collect_pipeline_loop_annotations(mod["main"]) again) and assert a structural
postcondition on the injected IR: ensure the annotation(s) now include the
guarded-bind information (check that "software_pipeline_replayable_scalar_binds"
is present and non-empty on the annotation for the pipeline) or, alternatively,
inspect the generated kernel body for a Bind-like pattern (e.g., a non-empty
list/string containing "bind" or equivalent), so the test verifies the rewrite
produced the expected structural change rather than merely not throwing.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: b864f799-6851-4e37-b8c8-89c196b5df88
📒 Files selected for processing (13)
src/op/builtin.ccsrc/op/builtin.hsrc/transform/common/pipeline_utils.hsrc/transform/if_stmt_binding.ccsrc/transform/inject_pipeline.ccsrc/transform/pipeline/access_analysis.ccsrc/transform/pipeline/access_analysis.hsrc/transform/pipeline/body_analysis.hsrc/transform/pipeline/helpers.ccsrc/transform/pipeline_planning.cctesting/python/transform/test_tilelang_transform_if_stmt_binding.pytesting/python/transform/test_tilelang_transform_pipeline_planning.pytilelang/transform/pass_config.py
✅ Files skipped from review due to trivial changes (2)
- src/op/builtin.cc
- src/op/builtin.h
Performance Regression Test ReportTriggered by: @LeiWang1999 Results
Artifacts
|
…form/pipeline-pass-cleanup
…s analysis - Removed unused Target parameter from IfStmtBindingRewriter. - Introduced IfStmtAccessCollector to handle buffer access collection. - Updated access analysis methods to utilize the new collector. - Cleaned up includes and removed obsolete functions in access_analysis. - Enhanced pipeline utility headers with necessary includes for consistency.
Summary
src/transform/pipeline/modules.pipeline_planning.ccandinject_pipeline.cc, while removing legacy raw PTX async planning paths.IfStmtBindingbefore pipeline planning so the pipeline passes operate on canonical high-level TileOP bodies.Changes
ptx_cp_async/ commit / wait scheduling support from pipeline planning and access analysis; pipeline planning now reasons over TileOP copies and generated async annotations.InjectSoftwarePipelineimplementation back intoinject_pipeline.ccand deleted the temporary injector shim.SeqStmtbody produced by pipeline planning.Validation
pre-commit run --all-filesgit diff --checkcmake --build build -j$(nproc)PYTHONPATH=$(pwd):$PYTHONPATH python -m pytest testing/python/transform/test_tilelang_transform_Inject_software_pipeline.py testing/python/transform/test_tilelang_transform_pipeline_planning.py testing/python/language/test_tilelang_language_tma_copy.py -qNotes
examples/quickstart.pyanddocs/compiler_internals/async_mbarrier_dependency_analysis.mdwere intentionally left out of this PR.Summary by CodeRabbit
New Features
Improvements
Tests