Skip to content

feat: Deepseek Support#591

Merged
terrykong merged 39 commits intomainfrom
yifu/ds
Jul 10, 2025
Merged

feat: Deepseek Support#591
terrykong merged 39 commits intomainfrom
yifu/ds

Conversation

@yfw
Copy link
Contributor

@yfw yfw commented Jul 1, 2025

This reverts commit 427548f.

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Issues

List issues that this PR closes (syntax):

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this 

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
  • Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

  • Sample run (140 steps):
    Screenshot 2025-07-09 at 2 20 46 PM
  • To reproduce this run, first follow these instructions to create a BF16 Hugging Face checkpoint. Then, use the following script (replace /path/to/hf_checkpoint with the newly created checkpoint):
uv run python examples/run_grpo_math.py --config=examples/configs/grpo_math_1B_megatron.yaml \
    grpo.val_batch_size=2 \
    policy.model_name=/path/to/hf_checkpoint \
    cluster.num_nodes=64 \
    cluster.gpus_per_node=8 \
    policy.megatron_cfg.pipeline_model_parallel_size=8 \
    policy.megatron_cfg.tensor_model_parallel_size=1 \
    policy.megatron_cfg.expert_tensor_parallel_size=1 \
    policy.megatron_cfg.sequence_parallel=False \
    policy.megatron_cfg.expert_model_parallel_size=64 \
    policy.megatron_cfg.num_layers_in_first_pipeline_stage=7 \
    policy.megatron_cfg.num_layers_in_last_pipeline_stage=6 \
    policy.megatron_cfg.activation_checkpointing=True \
    policy.megatron_cfg.apply_rope_fusion=False \
    policy.max_total_sequence_length=512 \
    checkpointing.enabled=False \
    checkpointing.save_period=20 \
    grpo.val_period=20 \
    grpo.max_val_samples=16 \
    grpo.val_batch_size=4 \
    checkpointing.keep_top_k=100 \
    checkpointing.checkpoint_dir=results/dsv3 \
    grpo.val_at_start=False \
    grpo.max_val_samples=16 \
    policy.generation.vllm_cfg.async_engine=False \
    policy.generation.vllm_cfg.tensor_parallel_size=64 \
    grpo.max_num_steps=1000000 \
    grpo.num_prompts_per_step=32 \
    grpo.num_generations_per_prompt=16 \
    policy.train_global_batch_size=512 \
    policy.train_micro_batch_size=1 \
    policy.sequence_packing.enabled=False
  • Currently refit is still very slow for the large dsv3 model (just under 500 seconds). We will address this in a follow-up PR.

yfw and others added 15 commits June 25, 2025 12:44
Revert of
e01017a

Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
Signed-off-by: Guyue Huang <guyueh@nvidia.com>

Run get_weights_for_ipc and get key map once for all

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

Tag version that seems to be working

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

Fix: use new weight param info every step

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

Cleanup

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

bugfix

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

Add more time logs

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

guyue/wip

fix aggregated all gather objects

fix aggregated all gather objects
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
Delete self._held_gather_buffer

Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
Signed-off-by: Anna Shors <ashors@nvidia.com>
Signed-off-by: Anna Shors <ashors@nvidia.com>
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Jul 1, 2025
@yfw yfw mentioned this pull request Jul 1, 2025
4 tasks
@yfw yfw marked this pull request as ready for review July 1, 2025 23:56
yfw added 4 commits July 1, 2025 17:19
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
yfw added 3 commits July 1, 2025 17:36
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
yfw added 2 commits July 2, 2025 09:21
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
yfw added 8 commits July 9, 2025 10:48
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
terrykong
terrykong previously approved these changes Jul 9, 2025
parthchadha
parthchadha previously approved these changes Jul 9, 2025
@terrykong terrykong enabled auto-merge July 9, 2025 21:50
@terrykong terrykong added this pull request to the merge queue Jul 9, 2025
github-merge-queue bot pushed a commit that referenced this pull request Jul 9, 2025
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
Signed-off-by: Anna Shors <ashors@nvidia.com>
Co-authored-by: Guyue Huang <guyueh@nvidia.com>
Co-authored-by: Anna Shors <ashors@nvidia.com>
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jul 9, 2025
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
@yfw yfw dismissed stale reviews from parthchadha and terrykong via b3a5e85 July 9, 2025 22:51
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
@terrykong terrykong added this pull request to the merge queue Jul 9, 2025
Merged via the queue into main with commit 13e5c34 Jul 10, 2025
13 of 14 checks passed
@terrykong terrykong deleted the yifu/ds branch July 10, 2025 01:11
RayenTian pushed a commit that referenced this pull request Jul 10, 2025
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
Signed-off-by: Anna Shors <ashors@nvidia.com>
Co-authored-by: Guyue Huang <guyueh@nvidia.com>
Co-authored-by: Anna Shors <ashors@nvidia.com>
RayenTian pushed a commit that referenced this pull request Jul 10, 2025
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
Signed-off-by: Anna Shors <ashors@nvidia.com>
Co-authored-by: Guyue Huang <guyueh@nvidia.com>
Co-authored-by: Anna Shors <ashors@nvidia.com>
RayenTian pushed a commit that referenced this pull request Jul 10, 2025
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
Signed-off-by: Anna Shors <ashors@nvidia.com>
Co-authored-by: Guyue Huang <guyueh@nvidia.com>
Co-authored-by: Anna Shors <ashors@nvidia.com>
Signed-off-by: ruit <ruit@nvidia.com>
jialei777 pushed a commit to jialei777/nemo-rl that referenced this pull request Jul 23, 2025
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
Signed-off-by: Anna Shors <ashors@nvidia.com>
Co-authored-by: Guyue Huang <guyueh@nvidia.com>
Co-authored-by: Anna Shors <ashors@nvidia.com>
Signed-off-by: Jialei Chen <jialeic@google.com>
KiddoZhu pushed a commit that referenced this pull request Jul 28, 2025
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
Signed-off-by: Anna Shors <ashors@nvidia.com>
Co-authored-by: Guyue Huang <guyueh@nvidia.com>
Co-authored-by: Anna Shors <ashors@nvidia.com>
FannYYW pushed a commit to xxman-google/NeMo-RL that referenced this pull request Aug 5, 2025
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
Signed-off-by: Anna Shors <ashors@nvidia.com>
Co-authored-by: Guyue Huang <guyueh@nvidia.com>
Co-authored-by: Anna Shors <ashors@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants