Skip to content

[Runtime] DeepSeek v4 pro multi node runtimes#631

Open
YouNeedCryDear wants to merge 1 commit into
mainfrom
feat/deepseek-v4-pro-multi-node
Open

[Runtime] DeepSeek v4 pro multi node runtimes#631
YouNeedCryDear wants to merge 1 commit into
mainfrom
feat/deepseek-v4-pro-multi-node

Conversation

@YouNeedCryDear

Copy link
Copy Markdown
Collaborator

What this PR does

Adds DeepSeek v4 Pro multi-node runtime configuration:

  • Adds the vllm-deepseek-v4-pro-multi ClusterServingRuntime with an SMG router and vLLM leader/worker engine configuration.
  • Registers the runtime in config/runtimes/kustomization.yaml.
  • Adds a matching InferenceService sample for deepseek-v4-pro-multi.

Why we need it

Enables OME users to deploy DeepSeek v4 Pro on a two-node H100 topology with vLLM.

Fixes #

How to test

Not run locally; configuration-only PR submission.

Checklist

  • Tests added/updated (if applicable)
  • Docs updated (if applicable)
  • make test passes locally

@github-actions github-actions Bot added runtime Runtime configuration changes config Configuration changes labels Jun 17, 2026
@YouNeedCryDear YouNeedCryDear force-pushed the feat/deepseek-v4-pro-multi-node branch from 5862827 to 530087c Compare June 17, 2026 16:38
@YouNeedCryDear YouNeedCryDear changed the title DeepSeek v4 pro multi node runtimes [Runtime] DeepSeek v4 pro multi node runtimes Jun 17, 2026
- --master-addr=$(LWS_LEADER_ADDRESS)
- --gpu-memory-utilization=0.95
- --max-num-seqs=256
- --max-num-batched-tokens=512

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😢

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, sadly.

- -cc.pass_config.fuse_allreduce_rms=False
- --master-addr=$(LWS_LEADER_ADDRESS)
- --gpu-memory-utilization=0.95
- --max-num-seqs=256

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does decreasing this increase this improve the batched_tokens?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

decreasing this help with the memory pressure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

config Configuration changes runtime Runtime configuration changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants