Vllm inference by Ssofja · Pull Request #1975 · NVIDIA-NeMo/Curator

Ssofja · 2026-05-13T12:30:18Z

Description

Add a vLLM-based inference stage (vLLMInference) to the NeMo Curator audio pipeline for generating multi-speaker conversations from topic prompts.

New stages:

vLLMInference — Generates multi-turn conversations via batched vLLM inference with automatic retry logic for failed outputs. Supports three prompt modes: static string, per-entry field, or YAML template file with $topic substitution.
DocumentToAudioStage — Converts DocumentBatch inputs into individual AudioTask objects for downstream processing.
TopicExpander — Fans out a set of topics into N conversation-generation tasks with reproducible random assignment.

Key design decisions:

Uses batched inference with configurable retry rounds to maximize GPU utilization and handle LLM output validation failures gracefully.
JSON output validation includes Unicode cleaning, speakable-content checks, and overlap-type enforcement.
TopicExpander uses an instance-level random.Random to avoid polluting global random state.
Lazy LLM initialization (_ensure_llm) delegates to setup() to avoid code duplication.

Usage

from nemo_curator.stages.audio import vLLMInference, TopicExpander, DocumentToAudioStage

# Expand topics into conversation-generation tasks
expander = TopicExpander(num_conversations=100, seed=42)

# Generate conversations via vLLM
inference = vLLMInference(
    prompt_file="prompts/conversation.yaml",
    model={"model": "meta-llama/Llama-3.1-8B-Instruct"},
    inference={"temperature": 0.8, "max_tokens": 2048},
    apply_chat_template={"tokenize": False, "add_generation_prompt": True},
)

Checklist

I am familiar with the Contributing Guide.
New or Existing tests cover these changes.
The documentation is up to date with these changes.

Signed-off-by: Ssofja <sofiakostandian@gmail.com>

copy-pr-bot · 2026-05-13T12:30:22Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Ssofja added 2 commits May 13, 2026 15:56

added vllm inference stage to the curator pipeline

c468f54

Adding vllm inference to the curator

895d9c9

Signed-off-by: Ssofja <sofiakostandian@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vllm inference#1975

Vllm inference#1975
Ssofja wants to merge 2 commits into
NVIDIA-NeMo:mainfrom
Ssofja:vllm_inference

Ssofja commented May 13, 2026

Uh oh!

copy-pr-bot Bot commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Ssofja commented May 13, 2026

Description

Usage

Checklist

Uh oh!

copy-pr-bot Bot commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant