Skip to content

Add Qwen Omni first inference stage#1967

Open
mohammadaaftabv wants to merge 7 commits into
NVIDIA-NeMo:mainfrom
mohammadaaftabv:aaftabv/granary-v2-qwen-omni-first-stage
Open

Add Qwen Omni first inference stage#1967
mohammadaaftabv wants to merge 7 commits into
NVIDIA-NeMo:mainfrom
mohammadaaftabv:aaftabv/granary-v2-qwen-omni-first-stage

Conversation

@mohammadaaftabv
Copy link
Copy Markdown
Contributor

@mohammadaaftabv mohammadaaftabv commented May 11, 2026

Summary

Adds a small Granary v2 Qwen-Omni first-stage inference path for audio. Curator keeps algorithm-only scope: Qwen3-Omni vLLM inference, local staged NeMo-tarred reader, sharded manifest writer, and explicit YAML tutorial stage graph. NvLLMOps remains responsible for data delivery, Kratos/MPI/Ray launch, GPU allocation, rank upload, and perf merge.

Pipeline shape

NemoTarredAudioReader -> InferenceQwenOmniStage -> ShardedManifestWriterStage

Scope boundary (Curator vs NvLLMOps)

  • Curator owns: Qwen3-Omni vLLM in-process inference (two-turn: transcription + disfluency refinement), NemoTarredAudioReader (streaming tar reads via lhotse, manifest-keyed lookup with deduplication, optional duration filtering, checkpoint skip-completed-shards), ShardedManifestWriterStage (actor-mode single-writer with .done marker, per-shard perf JSONL, aggregate perf_summary.json), and the Hydra YAML tutorial that wires these into an explicit stage graph.
  • NvLLMOps owns: Swift staging, MPI/Ray launch, GPU allocation, rank upload, merged perf summary, Kratos workflow generation.

Files in this PR

Source:

  • nemo_curator/models/qwen_omni.py
  • nemo_curator/stages/audio/inference/__init__.py
  • nemo_curator/stages/audio/inference/qwen_omni.py
  • nemo_curator/stages/audio/io/nemo_tarred_reader.py
  • nemo_curator/stages/audio/io/sharded_manifest_writer.py
  • nemo_curator/stages/audio/metrics/performance.py

Tutorial:

  • tutorials/audio/qwen_omni_inprocess/main.py
  • tutorials/audio/qwen_omni_inprocess/qwen_omni_inprocess.yaml
  • tutorials/audio/qwen_omni_inprocess/README.md
  • tutorials/audio/README.md (index entry)

Tests:

  • tests/stages/audio/inference/test_qwen_omni.py
  • tests/stages/audio/io/test_nemo_tarred_reader.py
  • tests/stages/audio/io/test_sharded_manifest_writer.py

Packaging:

  • pyproject.toml (add the audio_cuda12 extra; pulls in vLLM + qwen-omni-utils)
  • uv.lock

How to run the tutorial (single rank)

uv sync --extra audio_cuda12
source .venv/bin/activate

python tutorials/audio/qwen_omni_inprocess/main.py \
    --config-path=tutorials/audio/qwen_omni_inprocess \
    --config-name=qwen_omni_inprocess \
    workspace_dir=/work \
    input_manifest=/data/data_config.yaml

Hydra config knobs of note: tensor_parallel_size, max_model_len, max_num_seqs, gpu_memory_utilization, batch_size, max_output_tokens, keep_waveform, prefetch_fail_on_error, default_language, prompt_text / en_prompt_file / followup_prompt, system_prompt.

Testing

Local:

  • git diff --check — clean
  • python -m py_compile on every PR-touched file — clean
  • pytest --confcutdir=tests/stages/audio tests/stages/audio/inference/test_qwen_omni.py tests/stages/audio/io/test_nemo_tarred_reader.py tests/stages/audio/io/test_sharded_manifest_writer.py — passes

Live multi-node (via the NvLLMOps Kratos workflow, not part of this PR):

  • 4 nodes x 4 GPUs (16 x L40S) — qwen_omni_inprocess pipeline only (reader + Qwen-Omni + writer)
  • 806.91 audio-hours / hour aggregate; 50.43 audio-hours / GPU-hour; 115.24 utterances / s; 2,587.45 output tokens / s
  • Pipeline completed in 13.6 min with 68,819 output tasks; outputs uploaded; MPIJob Succeeded (Kratos run 3eca039b-7c40-4adc-ab4a-26660edc6c29)

Out of scope for this PR

This PR is intentionally only the first Qwen-Omni inference stage. It does not include Qwen-ASR recovery, hallucination / cross-lingual filtering, LID, regex cleanup, PnC, ITN, SED, diarization, or any of the later Granary v2 stages — those will land in follow-up PRs.

Risk

  • Imports the heavyweight vllm/transformers/qwen-omni-utils stack via the optional audio_cuda12 extra. The vllm import keeps the existing try/except ImportError + VLLM_AVAILABLE guard (matches 5 sister model wrappers on main) so Mac/ARM Curator installs can still import nemo_curator.models.qwen_omni cleanly.
  • trust_remote_code=True is required by Qwen3-Omni-30B-A3B-Instruct today (the model card ships custom modeling code).
  • Single-tutorial scope keeps blast radius small; the live 16x L40S run shows the stage graph executes correctly end-to-end.

@mohammadaaftabv mohammadaaftabv requested a review from a team as a code owner May 11, 2026 12:49
@mohammadaaftabv mohammadaaftabv requested review from suiyoubi and removed request for a team May 11, 2026 12:49
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 11, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 11, 2026

Greptile Summary

This PR introduces the first Granary v2 Qwen-Omni audio inference stage for NeMo Curator, wiring together a NemoTarredAudioReader, an InferenceQwenOmniStage (two-turn vLLM transcription + disfluency refinement), and a ShardedManifestWriterStage with .done checkpoint markers.

  • QwenOmni model wrapper (nemo_curator/models/qwen_omni.py): imports qwen_omni_utils and Qwen3OmniMoeProcessor unconditionally at module level — outside the try/except ImportError guard that protects the vllm import — breaking import nemo_curator.models.qwen_omni on any non-audio_cuda12 install, including the Mac/ARM case the PR description claims is safe.
  • InferenceQwenOmniStage (nemo_curator/stages/audio/inference/qwen_omni.py): waveform_key and sample_rate_key are listed as optional inputs in inputs(), so validate_input never checks for them; process_batch then accesses both via direct dict lookup (t.data[self.waveform_key]), raising an opaque KeyError instead of the descriptive ValueError the validation path was meant to produce.
  • pyproject.toml: qwen-asr==0.0.6 and fasttext==0.9.3 are added for future out-of-scope stages, pulling five NLP/audio-DSP packages into audio_common (affects audio_cpu users) and forcing exact global uv overrides for transformers, huggingface-hub, and accelerate that lock the resolved versions for all extras today.

Confidence Score: 3/5

Two known unfixed issues in the model and inference-stage files will cause import failures on non-GPU installs and silent KeyError crashes for tasks with missing waveform data.

The unconditional top-level imports of qwen_omni_utils and Qwen3OmniMoeProcessor in qwen_omni.py break the import on any install without the audio_cuda12 extra, contradicting the stated Mac/ARM compatibility guarantee. Separately, InferenceQwenOmniStage.process_batch uses bare dict access for keys that validate_input never actually checks, so a missing waveform produces an opaque KeyError at runtime. Both issues remain unfixed at the current head.

nemo_curator/models/qwen_omni.py (unconditional imports at module level) and nemo_curator/stages/audio/inference/qwen_omni.py (direct dict access bypasses validate_input) need attention before this is safe to merge for general use.

Important Files Changed

Filename Overview
nemo_curator/models/qwen_omni.py New Qwen3-Omni vLLM model wrapper with two-turn inference; qwen_omni_utils and Qwen3OmniMoeProcessor are imported unconditionally at module level, breaking non-audio_cuda12 installs despite the stated Mac/ARM compatibility claim.
nemo_curator/stages/audio/inference/qwen_omni.py New batched inference stage wrapping QwenOmni; waveform/sample-rate keys are placed in the optional inputs list while process_batch accesses them via direct dict lookup, meaning a missing key raises an opaque KeyError rather than the descriptive ValueError the validation path would produce.
nemo_curator/stages/audio/io/nemo_tarred_reader.py New NeMo tarred audio reader with shard discovery, checkpoint skip, and in-memory audio decoding via lhotse/soundfile; extractfile None-return is guarded, deduplication logic and duration filtering are correct.
nemo_curator/stages/audio/io/sharded_manifest_writer.py New actor-mode single-writer stage producing per-shard JSONL with .done markers and aggregate perf summary; numpy/tensor arrays are correctly dropped before serialisation, and a TypeError guard handles any remaining non-serialisable values.
tutorials/audio/qwen_omni_inprocess/main.py Hydra entry point for the Qwen-Omni pipeline; hf_token and other secrets are redacted via _safe_config_yaml before logging, and the executor factory is cleanly selected by backend name.
pyproject.toml Adds audio_cuda12 extra with vLLM and qwen-omni-utils; also pre-adds qwen-asr==0.0.6 and fasttext==0.9.3 for future out-of-scope stages, forcing exact global uv overrides for transformers, huggingface-hub, and accelerate that affect all extras.

Reviews (5): Last reviewed commit: "Move qwen-asr runtime stack into audio_c..." | Re-trigger Greptile

Comment on lines +106 to +107
with open(out_path, "a", encoding="utf-8") as f:
f.write(json.dumps(task.data, ensure_ascii=False) + "\n")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 json.dumps will crash when keep_waveform=True

task.data can contain a numpy ndarray (keyed by waveform_key) when the upstream InferenceQwenOmniStage is configured with keep_waveform: true. json.dumps has no numpy serializer and raises TypeError: Object of type ndarray is not JSON serializable, crashing the entire shard write for all tasks in that batch. Either strip the waveform here before serialising, or document that keep_waveform must be false when this writer is used (and add a validation guard in setup or __post_init__).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for flagging. This is already addressed on the current head (936fa17f):

  • _manifest_data first drops keys named in drop_manifest_keys (defaults to ("waveform",)) so the configured waveform_key never reaches serialisation, regardless of keep_waveform.
  • Anything else with .shape and .dtype (numpy ndarrays, torch tensors, etc.) is dropped via a duck-typing guard before json.dumps is called.
  • The remaining json.dumps call is wrapped in try/except TypeError, so a previously-unseen non-serialisable value raises a focused TypeError with the offending key instead of crashing the shard.

Citation: nemo_curator/stages/audio/io/sharded_manifest_writer.py:96-111. Resolving as already-fixed.

Comment on lines +83 to +88
cfg = value if OmegaConf.is_config(value) else OmegaConf.create(value)
if "_target_" in cfg:
return hydra.utils.instantiate(cfg)
raw = OmegaConf.to_container(cfg, resolve=True)
return Resources(**raw)
msg = f"Invalid resources override: {value!r}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 security Credentials exposed in startup log

logger.info(f"Hydra config:\n{OmegaConf.to_yaml(cfg)}") prints the full resolved config, including any hf_token passed as a Hydra override. The credential ends up in every log sink (stdout, files, observability stacks) in plaintext. Consider redacting the hf_token field before logging — for example, by building a sanitised copy of the config dict — or logging only a subset of non-sensitive keys.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for flagging. This was addressed in commit 936fa17f:

  • The startup log call no longer uses raw OmegaConf.to_yaml(cfg). It now uses _safe_config_yaml(cfg) (tutorial main.py:346), which builds a redacted copy of the config before rendering to YAML.
  • _redact_secret_values walks the config recursively (main.py:170-179) and replaces values of any key matching _SECRET_KEY_NAMES (which explicitly includes hf_token, password, secret_key, token, credentials, …) or _SECRET_KEY_PARTS substrings with <redacted>.
  • Trailing-suffix matching also catches any custom secret named *_token, *_secret, or *_password.

Note: the regression test that covered this behaviour (tests/stages/audio/inference/test_qwen_omni_tutorial.py::test_safe_config_yaml_redacts_hf_token_but_keeps_token_counts) was removed in this revision per @sarahyurick's "we shouldn't need pytests for tutorials" comment. The helper code itself is unchanged.

Citations: tutorials/audio/qwen_omni_inprocess/main.py:60-72, :160-184, :346. Resolving as already-fixed.

completed.add(shard_key)
return completed

@staticmethod
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Unchecked None return from extractfile

tarfile.TarFile.extractfile returns None for members that are not regular files (hard links, directory entries embedded in some tar formats). The preceding tar_info.isfile() guard does not cover all cases where extractfile may return None — calling .read() on a None result would raise AttributeError. Add a None check before .read().

Comment thread nemo_curator/models/qwen_omni.py Outdated

def _get_prompt_text(self, language: str | None) -> str:
"""Return the EN-specific prompt for English, otherwise the default prompt."""
if language and language == "English" and self.en_prompt_text:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 The leading language and is redundant: if language == "English" evaluates to True the string is already truthy, so the extra truthiness check is dead code and can mislead readers into thinking an empty-string case needs guarding here.

Suggested change
if language and language == "English" and self.en_prompt_text:
if language == "English" and self.en_prompt_text:

Comment on lines +121 to +132
max_num_seqs=self.max_num_seqs,
max_model_len=self.max_model_len,
seed=1234,
enable_prefix_caching=True,
prefix_caching_hash_algo="xxhash",
)

from transformers import Qwen3OmniMoeProcessor

self._processor = Qwen3OmniMoeProcessor.from_pretrained(self.model_id)

self._sampling_params = SamplingParams(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 trust_remote_code=True hardcoded without user control

trust_remote_code=True executes arbitrary Python bundled with the model weights; there is no constructor parameter or config knob to disable it. For downstream users who want to run audited/frozen snapshots, or who apply security policies, this silently bypasses those controls. Exposing it as an __init__ parameter (defaulting to True for backward-compat) would let callers opt out.

@mohammadaaftabv mohammadaaftabv requested a review from a team as a code owner May 11, 2026 16:56
Copy link
Copy Markdown
Contributor

@sarahyurick sarahyurick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did an initial pass to start familiarizing myself with the PR for now. Left some minor comments.

Comment thread nemo_curator/models/qwen_omni.py Outdated
model_id: str = _QWEN3_OMNI_MODEL_ID,
prompt_text: str = "Transcribe the audio.",
en_prompt_text: str | None = None,
followup_prompt: str = "Now listen to the audio again and add any false starts, filler words and preserve colloquial words (like lemme, gonna, wanna, etc) as is spoken in the audio.",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of having the long string here, let's create a script variable called _FOLLOWUP_PROMPT or similar.

Comment thread nemo_curator/models/qwen_omni.py Outdated
prefix_caching_hash_algo="xxhash",
)

from transformers import Qwen3OmniMoeProcessor
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Top-level import?

Comment thread nemo_curator/models/qwen_omni.py Outdated
self._sampling_params = None
gc.collect()
try:
import torch
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Top-level import?

Comment thread nemo_curator/models/qwen_omni.py Outdated
def _prepare_turn2_single(
self, waveform_16k: np.ndarray, pred_text: str, language: str | None = None,
) -> dict[str, Any] | None:
from qwen_omni_utils import process_mm_info
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Top-level import?

Comment thread nemo_curator/stages/audio/inference/__init__.py
"""

name: str = "sharded_manifest_writer"
output_dir: str = ""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
output_dir: str = ""
output_dir: str

instead of checking for it in the post init.

from nemo_curator.stages.audio.inference.qwen_omni import InferenceQwenOmniStage
from nemo_curator.stages.audio.io.nemo_tarred_reader import NemoTarredAudioReader, NemoTarShardReaderStage
from nemo_curator.stages.audio.io.sharded_manifest_writer import ShardedManifestWriterStage
from tutorials.audio.qwen_omni_inprocess.main import (
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't need pytests for tutorials.

and prefetches common HuggingFace model attributes without hardcoding a
full Granary v2 post-processing graph in this entry point.
"""
from huggingface_hub import hf_hub_download, snapshot_download
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Top-level import?

Comment thread tutorials/audio/qwen_omni_inprocess/qwen_omni_inprocess.yaml
Comment on lines +35 to +39
If you do not have `uv`, use pip:

```bash
pip install -e ".[audio_cuda12]"
```
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should only encourage uv and not pip.

- hoist followup_prompt default into _FOLLOWUP_PROMPT module constant
- move 6 lazy imports to module scope (torch, transformers.Qwen3OmniMoeProcessor,
  qwen_omni_utils.process_mm_info, huggingface_hub.{snapshot_download,
  hf_hub_download}, yaml); keep existing vllm try/except guard
- drop redundant `language and` short-circuit in _get_prompt_text
- guard tar.extractfile(...) against None before .read() in
  NemoTarShardReaderStage; add hard-link regression test
- make ShardedManifestWriterStage.output_dir a required field; drop empty-
  string post_init check
- add Apache/NVIDIA copyright headers to inference/__init__.py and
  qwen_omni_inprocess.yaml
- drop pip-fallback install block from tutorial README (uv-only)
- remove tests/stages/audio/inference/test_qwen_omni_tutorial.py per
  "no pytests for tutorials"
- retarget @patch decorators in test_qwen_omni.py to the use-site
  (nemo_curator.stages.audio.inference.qwen_omni.snapshot_download) so
  the patches still bind after the import hoist

Signed-off-by: Aaftab V <aaftabv@nvidia.com>
Comment on lines +24 to +25
from qwen_omni_utils import process_mm_info
from transformers import Qwen3OmniMoeProcessor
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Unconditional top-level imports break non-audio_cuda12 installs

qwen_omni_utils is only shipped with the audio_cuda12 optional extra, but from qwen_omni_utils import process_mm_info and from transformers import Qwen3OmniMoeProcessor are both imported at module level, outside any guard. On a standard Curator installation (including the Mac/ARM case the PR description explicitly claims will work), import nemo_curator.models.qwen_omni fails immediately with ImportError: No module named 'qwen_omni_utils'. The vllm import immediately below correctly uses try/except ImportError + VLLM_AVAILABLE; these two imports need the same treatment — either fold them into that same try block, or defer them into setup() where VLLM_AVAILABLE is already checked.

Add qwen-asr and its lazy-imported runtime companions to Curator's audio
extras so that the harvest.curator Docker image gets the full qwen-asr stack
via Curator's uv sync rather than via post-uv pip installs in NvLLMOps. This
honors the Algorithmic vs Data-Mover Dep Ownership Rule: algorithmic libraries
belong in Curator, NvLLMOps owns only data-mover clients.

audio_common gains the qwen-asr forced-aligner text-norm and audio-feature
companions that qwen-asr 0.0.6's qwen3_forced_aligner.py imports lazily:
nagisa==0.2.11, soynlp==0.0.493, pyarabic, opencc-python-reimplemented, and
nnAudio.

audio_cuda12 gains qwen-asr==0.0.6 itself for the Granary v2 Qwen-ASR
recovery stage, and fasttext==0.9.3 for the Granary v2 LID stage. fasttext
already lives in text_cpu but audio_cuda12 does not pull text_cpu, so the
declaration is duplicated here.

[tool.uv] override-dependencies replaces the broad huggingface-hub>=0.34,<1.0
override with three exact pins proven against qwen-asr by NvLLMOps commit
68f18e9b: transformers==4.57.6, accelerate==1.12.0, and
huggingface-hub==0.36.0. These force-override qwen-asr's declared
(incompatible) version pins so the resolver picks the proven-compatible
versions for the entire graph.

This change extends PR1967's scope from "first-stage Qwen-Omni inference
only" to also cover Granary v2 algorithmic-dep self-containment, so that
later Granary v2 PRs (Qwen-ASR recovery, text filtering, PnC, ITN, SED) can
rely on Curator's audio_cuda12 extra without further pip-after-uv overrides
in NvLLMOps.

Lock churn: +18 packages including qwen-asr 0.0.6, nagisa, soynlp, pyarabic,
opencc-python-reimplemented, nnaudio, and qwen-asr's gradio/flask transitive
demo deps. transformers/huggingface-hub/accelerate stayed at the
override-pinned versions, so no version drift for the qwen-omni stack.

Signed-off-by: Aaftab V <aaftabv@nvidia.com>
"""

name: str = "nemo_tar_shard_discovery"
yaml_path: str = ""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
yaml_path: str = ""
yaml_path: str

This can be empty instead of checking it in the post init.

"""

name: str = "nemo_tarred_audio_reader"
yaml_path: str = ""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
yaml_path: str = ""
yaml_path: str

Same comment as above.

Comment thread pyproject.toml
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @ayushdg to review.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mohammadaaftabv For changes that override the deps and might have impact outside just the intended modality can we run the full benchmark suite with changes from this PR to ensure nothing regressed?

Comment thread pyproject.toml
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mohammadaaftabv For changes that override the deps and might have impact outside just the intended modality can we run the full benchmark suite with changes from this PR to ensure nothing regressed?

Comment thread pyproject.toml
# qwen-asr 0.0.6's qwen3_forced_aligner.py lazily imports the following text-norm
# and audio-feature helpers. Keep them in audio_common (not audio_cuda12) so the
# CPU audio extra picks them up too if a future qwen-asr CPU path appears.
"nagisa==0.2.11",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We try to avoid hard pins if possible since it may cause incompatibility with other packages due to more restrictive pinning.

Comment thread pyproject.toml
# qwen-asr by NvLLMOps commit 68f18e9b. Lazy-imported runtime companions
# (nagisa, soynlp, pyarabic, opencc-python-reimplemented, nnAudio) come from
# audio_common above.
"qwen-asr==0.0.6",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, is it possible to avoid hard pin. The comment here is too verbose.

Comment thread pyproject.toml
"qwen-asr==0.0.6",
# Granary v2 LID stage uses fasttext directly; it also lives in text_cpu but
# audio_cuda12 does not pull text_cpu, so declare it explicitly here.
"fasttext==0.9.3",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment about hard pins

Comment thread pyproject.toml
Comment on lines +354 to +356
"huggingface-hub==0.36.0", # Pinned to qwen-asr 0.0.6 runtime compat (NvLLMOps commit 68f18e9b); also covers transformers vs data-designer disagreement
"transformers==4.57.6", # Pinned to qwen-asr 0.0.6 runtime compat (NvLLMOps commit 68f18e9b); overrides qwen-asr's incompatible declared transformers pin
"accelerate==1.12.0", # Pinned to qwen-asr 0.0.6 runtime compat (NvLLMOps commit 68f18e9b); overrides qwen-asr's incompatible declared accelerate pin
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to avoid hard pins here. Generally hf-hub and transformers are used across multiple modalities so it might be harder to unify this here.

Comment thread pyproject.toml
"distance; sys_platform == 'never'",
"huggingface-hub>=0.34,<1.0", # Override huggingface-hub, transformers and data-designer require two different versions of hugging-face hub
"huggingface-hub==0.36.0", # Pinned to qwen-asr 0.0.6 runtime compat (NvLLMOps commit 68f18e9b); also covers transformers vs data-designer disagreement
"transformers==4.57.6", # Pinned to qwen-asr 0.0.6 runtime compat (NvLLMOps commit 68f18e9b); overrides qwen-asr's incompatible declared transformers pin
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@praateekmahajan can you review these as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants