Draft: PnC with LLM for audio pipeline#2006
Conversation
Signed-off-by: Sushmitha Deva <sdeva@nvidia.com>
Signed-off-by: Sushmitha Deva <sdeva@nvidia.com>
Signed-off-by: Sushmitha Deva <sdeva@nvidia.com>
Signed-off-by: Sushmitha Deva <sdeva@nvidia.com>
Signed-off-by: Sushmitha Deva <sdeva@nvidia.com>
Signed-off-by: Sushmitha Deva <sdeva@nvidia.com>
Signed-off-by: Sushmitha Deva <sdeva@nvidia.com>
Signed-off-by: Sushmitha Deva <sdeva@nvidia.com>
Signed-off-by: Sushmitha Deva <sdeva@nvidia.com>
Signed-off-by: Sushmitha Deva <sdeva@nvidia.com>
Signed-off-by: Sushmitha Deva <sdeva@nvidia.com>
Signed-off-by: Sushmitha Deva <sdeva@nvidia.com>
Signed-off-by: Sushmitha Deva <sdeva@nvidia.com>
Signed-off-by: Sushmitha Deva <sdeva@nvidia.com>
Signed-off-by: Sushmitha Deva <sdeva@nvidia.com>
Signed-off-by: Sushmitha Deva <sdeva@nvidia.com>
Signed-off-by: Sushmitha Deva <sdeva@nvidia.com>
Signed-off-by: Sushmitha Deva <sdeva@nvidia.com>
Signed-off-by: Sushmitha Deva <sdeva@nvidia.com>
Signed-off-by: Sushmitha Deva <sdeva@nvidia.com>
Signed-off-by: Sushmitha Deva <sdeva@nvidia.com>
Signed-off-by: Sushmitha Deva <sdeva@nvidia.com>
Signed-off-by: Sushmitha Deva <sdeva@nvidia.com>
Signed-off-by: Sushmitha Deva <sdeva@nvidia.com>
Signed-off-by: Sushmitha Deva <sdeva@nvidia.com>
Signed-off-by: Sushmitha Deva <sdeva@nvidia.com>
Signed-off-by: Sushmitha Deva <sdeva@nvidia.com>
Signed-off-by: Sushmitha Deva <sdeva@nvidia.com>
Signed-off-by: Sushmitha Deva <sdeva@nvidia.com>
Signed-off-by: Sushmitha Deva <sdeva@nvidia.com>
Signed-off-by: Sushmitha Deva <sdeva@nvidia.com>
Signed-off-by: Sushmitha Deva <sdeva@nvidia.com>
Signed-off-by: Sushmitha Deva <sdeva@nvidia.com>
Signed-off-by: Sushmitha Deva <sdeva@nvidia.com>
Signed-off-by: Sushmitha Deva <sdeva@nvidia.com>
Signed-off-by: Sushmitha Deva <sdeva@nvidia.com>
Signed-off-by: Sushmitha Deva <sdeva@nvidia.com>
Signed-off-by: Sushmitha Deva <sdeva@nvidia.com>
Signed-off-by: Sushmitha Deva <sdeva@nvidia.com>
Signed-off-by: Sushmitha Deva <sdeva@nvidia.com>
Signed-off-by: Sushmitha Deva <sdeva@nvidia.com>
Signed-off-by: Sushmitha Deva <sdeva@nvidia.com>
Signed-off-by: Sushmitha Deva <sdeva@nvidia.com>
Signed-off-by: Sushmitha Deva <sdeva@nvidia.com>
Signed-off-by: Sushmitha Deva <sdeva@nvidia.com>
Signed-off-by: Sushmitha Deva <sdeva@nvidia.com>
Signed-off-by: Sushmitha Deva <sdeva@nvidia.com>
Signed-off-by: Sushmitha Deva <sdeva@nvidia.com>
Signed-off-by: Sushmitha Deva <sdeva@nvidia.com>
Signed-off-by: Sushmitha Deva <sdeva@nvidia.com>
Greptile SummaryThis PR introduces LLM-based Punctuation and Capitalization (PNC) into the NeMo Curator audio tagging pipeline via two new stages —
Confidence Score: 3/5The core PNC stage logic is sound, but two issues in the shared VLLMInference class need attention before merging. The get_entry_prompt method has a dead unreachable fallback — the early return at line 404 means the if self.use_chat_api guard inside the except block is always False when reached, so a template error in chat-API mode silently returns [] instead of entry_chat. Separately, process_entry_prompts destroys the GPU engine after every call with no clear signal to callers; a second call without setup() raises RuntimeError immediately. Both issues live in the shared VLLMInference class used beyond the audio PNC stage. nemo_curator/models/vllm_model.py for the dead branch in get_entry_prompt and the engine teardown in process_entry_prompts; tests/stages/audio/tagging/text/test_pnc_vllm.py for the missing setup() calls that leave _full_vocab_set empty. Important Files Changed
Sequence DiagramsequenceDiagram
participant Pipeline
participant PNCStage as PNCwithvLLMInferenceStage
participant VLLMInference
participant VLLMBase
participant CleanStage as CleanLLMOutputStage
Pipeline->>PNCStage: setup_on_node()
PNCStage->>VLLMInference: setup_on_node()
Pipeline->>PNCStage: setup()
PNCStage->>VLLMInference: setup()
VLLMInference->>VLLMBase: _init_tokenizer()
VLLMInference->>VLLMBase: _init_engine()
Pipeline->>PNCStage: process_batch(tasks)
PNCStage->>VLLMInference: process_batch(prompts)
VLLMInference->>VLLMBase: _generate(prompts, use_chat)
VLLMBase-->>PNCStage: RequestOutput list
PNCStage-->>Pipeline: tasks with generation field
Pipeline->>CleanStage: process(task)
CleanStage->>CleanStage: clean_llm_output + CER + is_valid_text
CleanStage-->>Pipeline: task with generation_cleaned + use_bert_pnc
Pipeline->>PNCStage: teardown()
PNCStage->>VLLMInference: clean_up()
VLLMInference->>VLLMBase: _cleanup_gpu()
Reviews (1): Last reviewed commit: "Merge with main" | Re-trigger Greptile |
| if self.use_chat_api: | ||
| return entry_chat |
There was a problem hiding this comment.
The if self.use_chat_api: return entry_chat check inside the except block can never execute. When use_chat_api=True the function always returns entry_chat at line 404–405, before tokenizer.apply_chat_template is even called — the except block is only reachable when use_chat_api=False. The inner guard evaluates to False every time, and the fallback return is dead code. A developer expecting chat-API mode to fall back to entry_chat on a template error will find it silently returns [] instead.
| def process_entry_prompts(self, entry_prompts: list, batch_size: int = 10000) -> list: | ||
| """Generate in batches, then clean up.""" |
There was a problem hiding this comment.
process_entry_prompts silently destroys the vLLM engine at the end of every call via self.clean_up(). This is a heavyweight, irreversible operation (GPU memory deallocation, process-group teardown) that callers will not expect from a method named process_entry_prompts. A second call without an intervening setup() raises RuntimeError. The docstring buries the side effect; at minimum expand it to warn callers that the engine cannot be reused after this call.
| def process_entry_prompts(self, entry_prompts: list, batch_size: int = 10000) -> list: | |
| """Generate in batches, then clean up.""" | |
| def process_entry_prompts(self, entry_prompts: list, batch_size: int = 10000) -> list: | |
| """Generate in batches, then destroy the vLLM engine. | |
| .. warning:: | |
| This method calls :meth:`clean_up` after generation, which releases | |
| all GPU memory and tears down the distributed process group. | |
| **The engine cannot be used again** without calling :meth:`setup` first. | |
| For repeated inference calls, use :meth:`process_batch` directly. | |
| """ |
| def _init_engine(self, model_kwargs: dict[str, Any], sampling_kwargs: dict[str, Any]) -> None: | ||
| """Create the vLLM ``LLM`` engine and ``SamplingParams``. | ||
|
|
||
| Args: | ||
| model_kwargs: Keyword arguments forwarded to ``vllm.LLM``. | ||
| sampling_kwargs: Keyword arguments forwarded to ``vllm.SamplingParams``. | ||
| """ | ||
| os.environ.setdefault("VLLM_USE_V1", "0") |
There was a problem hiding this comment.
os.environ.setdefault("VLLM_USE_V1", "0") is already called at module import time inside the try block that imports vllm, so the identical call here is redundant.
| def _init_engine(self, model_kwargs: dict[str, Any], sampling_kwargs: dict[str, Any]) -> None: | |
| """Create the vLLM ``LLM`` engine and ``SamplingParams``. | |
| Args: | |
| model_kwargs: Keyword arguments forwarded to ``vllm.LLM``. | |
| sampling_kwargs: Keyword arguments forwarded to ``vllm.SamplingParams``. | |
| """ | |
| os.environ.setdefault("VLLM_USE_V1", "0") | |
| def _init_engine(self, model_kwargs: dict[str, Any], sampling_kwargs: dict[str, Any]) -> None: | |
| """Create the vLLM ``LLM`` engine and ``SamplingParams``. | |
| Args: | |
| model_kwargs: Keyword arguments forwarded to ``vllm.LLM``. | |
| sampling_kwargs: Keyword arguments forwarded to ``vllm.SamplingParams``. | |
| """ |
| @staticmethod | ||
| def remove_pncs(text: str, punct_marks: str) -> str: | ||
| text = re.sub(r"[.,?،؟.、?¿!,?।:;]", "", text.lower()) # noqa: RUF001 | ||
| pattern = f"[{re.escape(punct_marks)}]" | ||
| text = re.sub(pattern, " ", text.lower()) | ||
| return " ".join(text.split()) |
There was a problem hiding this comment.
text.lower() is called twice: once as the argument to the first re.sub (producing an already-lowercase string), then again on the already-lowercase result for the second re.sub. The second call is a no-op.
| @staticmethod | |
| def remove_pncs(text: str, punct_marks: str) -> str: | |
| text = re.sub(r"[.,?،؟.、?¿!,?।:;]", "", text.lower()) # noqa: RUF001 | |
| pattern = f"[{re.escape(punct_marks)}]" | |
| text = re.sub(pattern, " ", text.lower()) | |
| return " ".join(text.split()) | |
| @staticmethod | |
| def remove_pncs(text: str, punct_marks: str) -> str: | |
| text = text.lower() | |
| text = re.sub(r"[.,?،؟.、?¿!,?।:;]", "", text) # noqa: RUF001 | |
| pattern = f"[{re.escape(punct_marks)}]" | |
| text = re.sub(pattern, " ", text) | |
| return " ".join(text.split()) |
| stage = CleanLLMOutputStage( | ||
| cer_threshold=0, | ||
| update_alignment=True, | ||
| alignment_key="alignment", | ||
| segments_key="__none__", | ||
| ) | ||
| result = stage.process(AudioTask(data=data)) | ||
|
|
||
| assert result.data["use_bert_pnc"] is False | ||
| words = [w["word"] for w in result.data["alignment"]] | ||
| assert words == ["Hello,", "world.", "Good", "morning,", "everyone."] | ||
| assert result.data["alignment"][0]["start"] == 0.0 | ||
| assert result.data["alignment"][0]["confidence"] == 0.9 | ||
|
|
There was a problem hiding this comment.
setup() not called — _full_vocab_set is empty
CleanLLMOutputStage._full_vocab_set is populated in setup(). Without it, the field defaults to an empty set, so is_valid_text(llm_cleaned, self._full_vocab_set) returns False for any non-empty text. Tests like test_invalid_chars_trigger_bert_fallback pass by coincidence (the CER or digit check fires first, or chars match so the vocab check is never reached), not because the vocab logic is exercised. Adding stage.setup() before stage.process(task) would make each test actually validate what it claims.
Description
Usage
# Add snippet demonstrating usageChecklist