adding part of pipeline as smoke merge to main by Jorjeous · Pull Request #2007 · NVIDIA-NeMo/Curator

Jorjeous · 2026-05-21T12:50:53Z

Description

Usage

# Add snippet demonstrating usage

Checklist

I am familiar with the Contributing Guide.
New or Existing tests cover these changes.
The documentation is up to date with these changes.

Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>

copy-pr-bot · 2026-05-21T12:50:58Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

greptile-apps · 2026-05-21T12:56:49Z

Greptile Summary

This PR adds a speaker ID pipeline (embedding extraction, AHC/BIRCH clustering, per-utterance confidence scoring) and a UTMOSv2 MOS-scoring stage to the NeMo Curator audio processing path, along with unit tests and Fern documentation pages.

speaker_embedding_lhotse.py imports NeMo at module level and raises RuntimeError immediately if NeMo is absent; since the audio package re-exports this class eagerly, the whole nemo_curator.stages.audio namespace breaks for users without NeMo installed.
utmosv2_score.py's _waveform_to_wav calls waveform.mean(axis=1) for stereo mix-down, which is the wrong axis for channels-first layout, silently writing a 1-sample WAV and scoring garbage audio.

Confidence Score: 3/5

Two concrete defects in changed code: one breaks the entire audio stages namespace on import without NeMo, and one silently writes malformed WAV data for multi-channel waveforms passed to the UTMOSv2 scorer.

The NeMo hard-import causes import nemo_curator.stages.audio to fail in any NeMo-free environment, affecting all audio pipeline users. The wrong-axis mix-down produces silently wrong MOS scores for multi-channel audio from NemoTarShardReaderStage without raising any exception.

nemo_curator/stages/audio/speaker_id/speaker_embedding_lhotse.py and nemo_curator/stages/audio/metrics/utmosv2_score.py

Important Files Changed

Filename	Overview
nemo_curator/stages/audio/metrics/utmosv2_score.py	New UTMOSv2 MOS scoring stage; contains a wrong-axis stereo mix-down bug that silently produces garbage scores for multi-channel waveforms, and relies on undocumented UTMOSv2 result ordering.
nemo_curator/stages/audio/speaker_id/speaker_embedding_lhotse.py	New Lhotse-based speaker embedding stage; hard top-level NeMo import raises RuntimeError at import time in environments without NeMo, breaking the entire audio stages namespace.
nemo_curator/stages/audio/init.py	Eagerly re-exports SpeakerEmbeddingLhotseStage, transitively forcing a hard NeMo import at package load time.
nemo_curator/stages/audio/speaker_id/speaker_clustering_and_scoring.py	New AHC + confidence-scoring stage; global/shard/grouped clustering modes look correct; offset-based shard label slice-back logic is sound.
nemo_curator/stages/audio/speaker_id/clustering/large_scale_clustering_and_scoring.py	New BIRCH + AHC large-scale clustering pipeline; leaf-cap backoff logic and tiled assignment are well-implemented.
nemo_curator/stages/audio/speaker_id/speaker_embedding_audiotask.py	New AudioTask-native speaker embedding stage; per-batch NPZ flush with clear() after save looks correct.
nemo_curator/stages/audio/speaker_id/embedding/model_loader.py	Custom WeSpeaker model loader bypassing wespeaker/init.py via importlib; logic is sound.
nemo_curator/stages/audio/speaker_id/clustering/ahc.py	New AHC utility module; cosine-distance AHC, cluster quality, and per-utterance confidence all look correct.

_{Reviews (1): Last reviewed commit: "adding part of pipeline as smoke merge t..." | Re-trigger Greptile}

greptile-apps · 2026-05-21T12:56:56Z

+        return f"gpu{cv.split(',')[0]}"
+    return f"pid{os.getpid()}"
+
+if TYPE_CHECKING:
+    from lhotse import Cut, CutSet
+
+try:
+    from nemo.collections.common.data.lhotse.nemo_adapters import (
+        LazyNeMoIterator,
+        LazyNeMoTarredIterator,


Unconditional top-level NeMo import breaks environments without NeMo

The module raises RuntimeError at import time if NeMo is not installed, because the try/except around the NeMo import re-raises as a hard error. Since speaker_id/__init__.py eagerly imports SpeakerEmbeddingLhotseStage from this module, and nemo_curator/stages/audio/__init__.py re-exports it, any import nemo_curator.stages.audio will fail in environments where NeMo is not installed. Moving the LazyNeMoIterator/LazyNeMoTarredIterator imports inside the process() method would make NeMo a lazy dependency and avoid breaking all other audio stages.

greptile-apps · 2026-05-21T12:56:59Z

+        if waveform.ndim > 1:
+            waveform = waveform.mean(axis=1)


Wrong axis for multi-channel stereo mix-down silently corrupts audio

When the waveform is in channels-first format (channels, samples) as used by NeMo and Lhotse, mean(axis=1) averages over the time dimension, yielding shape (channels,) instead of (samples,). That array is then resampled and written as a 1-sample WAV, so UTMOSv2 scores garbage audio with no error. The correct axis for channels-first is 0.

Suggested change

if waveform.ndim > 1:

waveform = waveform.mean(axis=1)

if waveform.ndim > 1:

waveform = waveform.mean(axis=0)

greptile-apps · 2026-05-21T12:57:01Z

+        results = self._model.predict(
+            input_dir=wav_dir,
+            batch_size=self.inference_batch_size,
+            num_repetitions=self.num_repetitions,
+            predict_dataset=self.predict_dataset,
+            num_workers=0,
+            verbose=False,
+        )


_score_dir result ordering is not guaranteed

model.predict(input_dir=wav_dir) returns results whose order UTMOSv2 does not document. The code zips results directly against valid_indices relying on lexicographic file-name order. The zero-padded names happen to work today, but if a future UTMOSv2 version changes traversal order, MOS scores will be silently assigned to the wrong entries.

greptile-apps · 2026-05-21T12:57:03Z

+    try:
+        import librosa
+        return librosa.resample(audio, orig_sr=orig_sr, target_sr=target_sr)
+    except ImportError:
+        ratio = target_sr / orig_sr
+        indices = np.round(np.arange(0, len(audio), 1 / ratio)).astype(int)
+        indices = indices[indices < len(audio)]
+        return audio[indices]


Fallback resampler performs nearest-neighbor, not linear interpolation

The fallback uses np.round to pick the nearest existing sample index, which is nearest-neighbor resampling. The comment should say so, or the implementation should use np.interp for true linear interpolation.

Suggested change

try:

import librosa

return librosa.resample(audio, orig_sr=orig_sr, target_sr=target_sr)

except ImportError:

ratio = target_sr / orig_sr

indices = np.round(np.arange(0, len(audio), 1 / ratio)).astype(int)

indices = indices[indices < len(audio)]

return audio[indices]

try:

import librosa

return librosa.resample(audio, orig_sr=orig_sr, target_sr=target_sr)

except ImportError:

# Nearest-neighbour fallback (low quality - prefer installing librosa).

ratio = target_sr / orig_sr

indices = np.round(np.arange(0, len(audio), 1 / ratio)).astype(int)

indices = indices[indices < len(audio)]

return audio[indices]

adding part of pipeline as smoke merge to main

1a15b8f

Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>

Jorjeous requested a review from a team as a code owner May 21, 2026 12:50

Jorjeous requested review from oyilmaz-nvidia and removed request for a team May 21, 2026 12:50

greptile-apps Bot reviewed May 21, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adding part of pipeline as smoke merge to main#2007

adding part of pipeline as smoke merge to main#2007
Jorjeous wants to merge 1 commit into
mainfrom
Test_pipeline_MR

Jorjeous commented May 21, 2026

Uh oh!

copy-pr-bot Bot commented May 21, 2026

Uh oh!

greptile-apps Bot commented May 21, 2026

Uh oh!

greptile-apps Bot May 21, 2026

Uh oh!

greptile-apps Bot May 21, 2026

Uh oh!

greptile-apps Bot May 21, 2026

Uh oh!

greptile-apps Bot May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Jorjeous commented May 21, 2026

Description

Usage

Checklist

Uh oh!

copy-pr-bot Bot commented May 21, 2026

Uh oh!

greptile-apps Bot commented May 21, 2026

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Uh oh!

greptile-apps Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant