Skip to content

[Refactor] Enhance deterministic ordering in shared memory allocation merge.#1570

Merged
LeiWang1999 merged 2 commits into
tile-ai:mainfrom
LeiWang1999:smem_1230
Dec 30, 2025
Merged

[Refactor] Enhance deterministic ordering in shared memory allocation merge.#1570
LeiWang1999 merged 2 commits into
tile-ai:mainfrom
LeiWang1999:smem_1230

Conversation

@LeiWang1999

@LeiWang1999 LeiWang1999 commented Dec 29, 2025

Copy link
Copy Markdown
Member
  • Updated comparison logic in merge_shared_memory_allocations.cc to use name hints for deterministic ordering of variables instead of pointer comparisons.
  • Introduced a sorted vector of keys for shmem_allocs_ to ensure consistent iteration order when processing allocations.

This refactor aims to improve the predictability of shared memory allocation handling in the transformation process. Otherwise, a same input program may lead to different smem allocate result and may lead to nondeterministic kernel performance.

Before:

First Run:

 S_shared -> offset=221184
 K_tail_shared_1 -> offset=212992
 K_tail_shared_0 -> offset=204800
 KV_shared_1_r -> offset=139264
 KV_shared_1_l -> offset=73728
 Q_tail_shared -> offset=65536
 sum_exp_shared -> offset=229376
 KV_shared_0_r -> offset=172032
 KV_shared_0_l -> offset=106496
 O_shared_l -> offset=32768
 O_shared_r -> offset=0

Second Run:

 sum_exp_shared -> offset=229376
 K_tail_shared_1 -> offset=212992
 K_tail_shared_0 -> offset=221184
 S_shared -> offset=204800
 KV_shared_1_l -> offset=172032
 KV_shared_1_r -> offset=73728
 Q_tail_shared -> offset=65536
 KV_shared_0_l -> offset=139264
 O_shared_l -> offset=32768
 KV_shared_0_r -> offset=106496
 O_shared_r -> offset=0

After this pass, they'll be the same.

Summary by CodeRabbit

  • Bug Fixes
    • Fixed non-deterministic behavior in shared memory allocation scheduling: allocation ordering and offsets are now computed consistently by using a stable, name-based ordering of variables. This produces reproducible memory layout and scheduling across runs, reducing variability in builds and executions.

✏️ Tip: You can customize this high-level summary in your review settings.

… handling

* Updated comparison logic in merge_shared_memory_allocations.cc to use name hints for deterministic ordering of variables instead of pointer comparisons.
* Introduced a sorted vector of keys for shmem_allocs_ to ensure consistent iteration order when processing allocations.

This refactor aims to improve the predictability of shared memory allocation handling in the transformation process.
@github-actions

Copy link
Copy Markdown

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run pre-commit run --all-files in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀

@coderabbitai

coderabbitai Bot commented Dec 29, 2025

Copy link
Copy Markdown
Contributor
📝 Walkthrough

Walkthrough

Replace pointer-based ordering with name-based ordering for shared memory allocations: introduce a sorted variable vector (by name_hint), iterate over that for liveness planning and BufInfo construction, and retrieve AllocateNode via shmem_allocs_.at(var) to ensure deterministic iteration and scheduling.

Changes

Cohort / File(s) Summary
Deterministic Sorting & Iteration
src/transform/merge_shared_memory_allocations.cc
Replace pointer-based comparator with name_hint-based comparator; create sorted_vars (keys of shmem_allocs_) sorted by name_hint; iterate sorted_vars for liveness and BufInfo building; use shmem_allocs_.at(var) to fetch AllocateNode.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I hop through names in tidy rows,
No pointer tangles, calm repose.
Sorted whispers guide my way,
Shared memory neat by name today.
Hooray for order — hop and stay! 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: replacing pointer-based ordering with name-based ordering to achieve deterministic behavior in shared memory allocation merging.
✨ Finishing touches
  • 📝 Generate docstrings

📜 Recent review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fd97699 and dee3982.

📒 Files selected for processing (1)
  • src/transform/merge_shared_memory_allocations.cc
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Quick Lint
🔇 Additional comments (3)
src/transform/merge_shared_memory_allocations.cc (3)

754-757: Correct approach for deterministic ordering.

Using name_hint comparison instead of pointer comparison ensures deterministic sorting across runs. The tertiary key guarantees stable behavior when start and size_bytes match.

Consider whether name_hint values are guaranteed to be unique within the scope of merged allocations. If duplicate names could theoretically occur, a secondary tie-breaker (e.g., object ID or concatenated type info) would fully guarantee determinism. In practice, TIR buffer names are typically unique, so this should be fine.


1105-1115: LGTM! Deterministic iteration via sorted keys.

Creating a sorted vector from the unordered map keys ensures consistent iteration order. Reserving capacity upfront and using a simple lambda comparator is efficient and idiomatic.


1117-1137: LGTM! Clean adaptation to sorted iteration.

The iteration over sorted_vars combined with shmem_allocs_.at(var) retrieval is correct. Since sorted_vars is populated directly from shmem_allocs_ keys, the at() call is guaranteed to succeed—no risk of throwing std::out_of_range.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/transform/merge_shared_memory_allocations.cc (1)

1104-1113: Good approach to ensure deterministic iteration.

Creating a pre-sorted vector of variables sorted by name_hint ensures consistent processing order when building buffer info, which is essential for deterministic allocation offsets.

Optional: Consider adding a tiebreaker for rare edge cases

If two variables could ever have identical name_hints in practice, consider adding a secondary comparison criterion to guarantee total ordering:

 std::sort(sorted_vars.begin(), sorted_vars.end(),
           [](const VarNode *a, const VarNode *b) {
-            return a->name_hint < b->name_hint;
+            if (a->name_hint != b->name_hint) {
+              return a->name_hint < b->name_hint;
+            }
+            // Fallback: compare type keys or other stable properties
+            return std::less<const VarNode*>()(a, b);
           });

However, this is only necessary if name collisions are possible in your IR.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8c9101e and fd97699.

📒 Files selected for processing (1)
  • src/transform/merge_shared_memory_allocations.cc
🔇 Additional comments (3)
src/transform/merge_shared_memory_allocations.cc (3)

754-755: LGTM: Deterministic ordering achieved.

The change from pointer-based to name-based comparison ensures consistent sorting across runs, addressing the PR's core objective.


1118-1118: Correct iteration over sorted collection.

Iterating over sorted_vars instead of directly over shmem_allocs_ ensures buffer info is built in a deterministic, name-based order.


1135-1135: Correct retrieval pattern for sorted iteration.

Since the loop now iterates over sorted_vars (just the keys), using shmem_allocs_.at(var) correctly retrieves the corresponding AllocateNode.

@LeiWang1999

Copy link
Copy Markdown
Member Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/transform/merge_shared_memory_allocations.cc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant