Skip to content

Vllm inference#1975

Draft
Ssofja wants to merge 2 commits into
NVIDIA-NeMo:mainfrom
Ssofja:vllm_inference
Draft

Vllm inference#1975
Ssofja wants to merge 2 commits into
NVIDIA-NeMo:mainfrom
Ssofja:vllm_inference

Conversation

@Ssofja
Copy link
Copy Markdown
Contributor

@Ssofja Ssofja commented May 13, 2026

Description

Add a vLLM-based inference stage (vLLMInference) to the NeMo Curator audio pipeline for generating multi-speaker conversations from topic prompts.

New stages:

  • vLLMInference — Generates multi-turn conversations via batched vLLM inference with automatic retry logic for failed outputs. Supports three prompt modes: static string, per-entry field, or YAML template file with $topic substitution.
  • DocumentToAudioStage — Converts DocumentBatch inputs into individual AudioTask objects for downstream processing.
  • TopicExpander — Fans out a set of topics into N conversation-generation tasks with reproducible random assignment.

Key design decisions:

  • Uses batched inference with configurable retry rounds to maximize GPU utilization and handle LLM output validation failures gracefully.
  • JSON output validation includes Unicode cleaning, speakable-content checks, and overlap-type enforcement.
  • TopicExpander uses an instance-level random.Random to avoid polluting global random state.
  • Lazy LLM initialization (_ensure_llm) delegates to setup() to avoid code duplication.

Usage

from nemo_curator.stages.audio import vLLMInference, TopicExpander, DocumentToAudioStage

# Expand topics into conversation-generation tasks
expander = TopicExpander(num_conversations=100, seed=42)

# Generate conversations via vLLM
inference = vLLMInference(
    prompt_file="prompts/conversation.yaml",
    model={"model": "meta-llama/Llama-3.1-8B-Instruct"},
    inference={"temperature": 0.8, "max_tokens": 2048},
    apply_chat_template={"tokenize": False, "add_generation_prompt": True},
)

Checklist

  • I am familiar with the Contributing Guide.
  • New or Existing tests cover these changes.
  • The documentation is up to date with these changes.

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 13, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant