Conversation
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
This reverts commit 84065bd.
Signed-off-by: Gene Der Su <e870252314@gmail.com>
Signed-off-by: Gene Der Su <e870252314@gmail.com>
Bring the latest mainline Primus updates into the GPT-OSS branch while keeping the sink sliding-window GPT-OSS configs enabled. Made-with: Cursor
There was a problem hiding this comment.
Pull request overview
This PR updates the Megatron MI355X GPT-OSS pretrain example configs to enable sink-attention sliding window using the GPT-OSS default window size.
Changes:
- Set
sink_sliding_windowfrom0(disabled) to128(enabled) in MI355X GPT-OSS pretrain configs. - Remove the inline note indicating sliding window was not yet supported (by implication via enabling the feature).
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| examples/megatron/configs/MI355X/gpt_oss_20B-FP8-pretrain.yaml | Enable sink sliding window (128) for 20B FP8 MI355X pretrain config. |
| examples/megatron/configs/MI355X/gpt_oss_20B-BF16-pretrain.yaml | Enable sink sliding window (128) for 20B BF16 MI355X pretrain config. |
| examples/megatron/configs/MI355X/gpt_oss_120B-FP8-pretrain.yaml | Enable sink sliding window (128) for 120B FP8 MI355X pretrain config. |
| examples/megatron/configs/MI355X/gpt_oss_120B-BF16-pretrain.yaml | Enable sink sliding window (128) for 120B BF16 MI355X pretrain config. |
| # Note: sliding window not yet supported by aiter Triton backend | ||
| # Set to 0 to disable, or wait for backend support | ||
| sink_sliding_window: 0 # gpt-oss default is 128, but disabled for now | ||
| sink_sliding_window: 128 # gpt-oss default |
There was a problem hiding this comment.
PR description says this change should wait for ROCm/aiter PR #2505 to merge into main first. With this config now enabling sink_sliding_window=128, runs using the current pinned AITer commit in CI may fail if that backend support isn't available yet. Consider keeping this at 0 (or gating it) until the dependency commit is updated to include the required upstream change, and/or update the pinned AITer commit alongside this config change.
should wait aiter pr: ROCm/aiter#2505 merge into main first.