Changelog

NVIDIA Nemo Run 0.6.0

Detailed Changelogs:

Executors

Added Pre-Launch Commands Support to LeptonExecutor #312
Remove breaking torchrun config for single-node runs #292
Upgrade skypilot to v0.10.0, introduce network_tier #297
Fixes for multi-node execution with torchrun + LocalExecutor #251
Add option to specify --container-env for srun #293
Fix skypilot archive mount bug #288
finetune on dgxcloud with nemo-run and deploy on bedrock example #286

Ray Integration

Add nsys patch in ray sub template #318
Add logs dir to container mount for ray slurm #287
Allow customizing folder for SlurmRayRequest #281

CLI & Configuration

Experiment & Job Management

Use thread pool for status, run methods inside experiment + other fixes #295

Packaging & Deployment

Correctly append tar files for packaging #317

Documentation

Create CHANGELOG.md #314
docs: Fixing doc build issue #290
fix docs tutorial links and add intro to guides/index.md #285
README #277

CI/CD

changelog workflow #315
Update release.yml #306
ci(fix): Use GITHUB_TOKEN for community bot #302
ci: Add community-bot #300

Bug Fixes

[Bugfix] Adding a check for name length #273
misc fixes #280
adding fix for lowercase and name length k8s requirements #274

Others

Specify nodes for gpu metrics collection and split data to each rank #320
Apply '_enable_goodbye_message' check to both goodbye messages. #319
Update refs #278
chore: Bump to version 0.6.0rc0.dev0 #272

NVIDIA Nemo Run 0.5.0

Fix docs warnings #271
Fix docs build #269
Support overlapped srun commands in Slurm Ray #263
Refactor DGXC Lepton data mover: switch to BatchJob with auto cleanup and sleep after every run #265
ci: Fix nemo fw template ref after migrating to new org #256
Enable Nsys gpu device metrics #257
Sync job code in local tunnel for Slurm Ray job #254
Change the create dist job function to support creating a single node #240
Making job names match Run:ai requirements and making errors more descriptive #255
Support for %j in slurm log retrieval #252
Add KubeRay tests for Ray APIs #249
Upgrade skypilot executor with 0.9.2 #246
Add user scoping for k8s backend and log level support for Ray APIs #247
Update to latest Lepton SDK #248
Add storage mount options to LeptonExecutor #237
Import guard k8s import in Ray Cluster and Job #245
Add RayJob and Slurm support for Ray APIs + integration with run.Experiment #236
ci: Enforce coverage #238
Fix bug with a CLI overwrite #235
Add LeptonExecutor support #224
Add cancel to docker executor #233
Change default log wait timeout to 10s #232
Add RayCluster API with Kuberay support #222
Add sbatch network arg #230
chore: Update package info #227
Add support for job groups for local executor #220
Roll back get_underlying_types change + introduce extract_constituent #223
Fix some bugs for --lazy in CLI #179
Adding support for modern type-hints #221
Fix bug in CLI with calling a factory-fn inside a list #214
Handle more edge cases in --help #219
Add autogenerated API reference content to the documentation #190
Handle Callable in --help to fix nemo llm export --help error #217
Ensure job directory creation for various schedulers #216
Adding support for ForwardRef in CLI #176
Add additional debug to DGXC data mover #215
Handle ctx in entrypoint for experiment #213
zozhang/dgxc executor data mover #206
Add support for YAML, TOML & JSON #182
Add clean mode for experiment to avoid printing any NeMo-Run specific logs #208
Fix seed for torchrun #209
Support torchrun multi node on local executor #143
Add nsys filename param #205
Add DGXCloudExecutor docs and update execution guide #192
Add --cuda-event-trace=false to nsys command #180

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changelog

NVIDIA Nemo Run 0.6.0

Detailed Changelogs:

Executors

Ray Integration

CLI & Configuration

Experiment & Job Management

Packaging & Deployment

Documentation

CI/CD

Bug Fixes

Others

NVIDIA Nemo Run 0.5.0

FilesExpand file tree

CHANGELOG.md

Latest commit

History

CHANGELOG.md

File metadata and controls

Changelog

NVIDIA Nemo Run 0.6.0

Detailed Changelogs:

Executors

Ray Integration

CLI & Configuration

Experiment & Job Management

Packaging & Deployment

Documentation

CI/CD

Bug Fixes

Others

NVIDIA Nemo Run 0.5.0