Fix component management by Jayce-Ping · Pull Request #27 · X-GenGroup/Flow-Factory

Jayce-Ping · 2026-02-08T03:18:31Z

No description provided.

… unused import) Each fix below maps to a specific Copilot review comment from Round-3 (2026-04-17). All five sites are documentation / comment changes that bring the docstrings in line with current behavior, plus one trivial unused-import removal flagged in the same review. - audio.py load_audio (Comment id 3098052328): Removed the false `Raises: ImportError` claim — load_audio never raises ImportError because the implementation falls back to the stdlib `wave` module if both torchaudio and soundfile are unavailable. Added a Note: section listing all three backends in priority order (torchaudio primary; soundfile wav/flac/ogg; stdlib wave WAV-only 16/32-bit PCM) so callers know which formats survive each fallback. - audio.py _load_audio_backend (related to Comment 3098052328, found via audit): Replaced the cryptic "Priority: torchaudio > soundfile + torch" line with a numbered backend chain that documents: 1. torchaudio.load — primary 2. soundfile.read — (T, C) transposed to (C, T) 3. stdlib wave — WAV-only last-resort Added Raises: section for the WAV-fallback ValueError. - audio.py hash_audio (Comment id 3098052292): The docstring claimed "Longer audio is uniformly subsampled", but the implementation uses `step = n // max_samples`, which collapses to step == 1 (effective truncation to the first max_samples samples) when max_samples < n < 2 * max_samples. Added a Note: section that documents the truncation-vs-uniform-stride boundary at n = 2*max_samples and clarifies that determinism (cache-key intent), not collision resistance, is the design goal — keeping the existing implementation intact and accurately documented. - loader.py import torch (Comment id 3098052303): Removed unused `import torch` (line 21). Only `torch.utils.data` is used and that import on line 22 is satisfied independently. - loader.py is_orchestrator block (Comment id 3098052316): Rewrote the comment to be precise per case. The old comment claimed "is_orchestrator prevents multi-node races ... when the FS is shared", which is only true in "global" mode. Clarified: - non-distributed: the lone process - "global" mode: rank-0 globally — required when cache_dir is on a shared FS (eliminates cross-node race on shutil.rmtree / sentinel writes) - "local" mode: per-node local main — ASSUMES cache_dir is on node-local storage; pointing "local" mode at a shared FS WILL race across node-local mains and is unsupported The user's contract (already documented on _create_or_load_dataset's `preprocess_parallelism` arg) is now consistent with the comment at the actual orchestrator selection site. Comment id 3098052262 (save_audio "converted to int16") was already addressed by the Round-3 commit (49d5a54) when save_audio was reduced to a single torchaudio.save call — no separate fix needed here. Per .agents/knowledge/constraints.md #27 (Google-style docstrings) and philosophy.md #3 (fail-fast, no speculative code): the docstrings now faithfully reflect what the code does, including the corner cases that Round-2 narrowing exposed. Verification: ReadLints — clean black --check / diff — 358 lines, byte-identical to parent (49d5a54) isort --check / diff — 18 lines, byte-identical to parent (49d5a54) (all 358 + 18 are pre-existing on origin/main, outside the edited regions; not touched per "structural vs behavioral separation" — same rationale as Phase 2/3.) smoke test — load_audio docstring has no ImportError + names all 3 backends; _load_audio_backend uses Backend chain phrasing + names wave; hash_audio mentions truncation + 2*max_samples boundary; loader.py is_orchestrator comment includes 'unsupported' and 'node-local'; no `^import torch$` survives in loader.py. Made-with: Cursor

… merge (#129) * [utils] feat: add audio utility module with type aliases, standardization, and loading Add `utils/audio.py` following the exact pattern of `utils/image.py` and `utils/video.py`. Provides a complete audio waveform toolkit: - Type aliases: `AudioSingle` (C,T), `AudioBatch` (B,C,T) - Validation: `is_audio()`, `is_audio_batch()` - Loading/saving: `load_audio()`, `save_audio()` with 3-tier backend fallback (torchaudio → soundfile → stdlib wave) - Conversion: `audio_to_tensor()`, `audio_to_numpy()`, `convert_audio()` for resampling and mono/stereo conversion - Standardization: `standardize_audio_batch()` with output_type='pt'|'np' - Hashing: `hash_audio()`, `hash_audio_list()` with int16 quantization Design conventions follow diffusers/audiocraft: - Channel-first (C, T) tensor layout - [-1.0, 1.0] float32 value range - Channel conversion: downmix=mean, upmix=repeat Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Made-with: Cursor * [data] feat: support audio loading and preprocessing in dataset pipeline - Add `audio_dir` field to DataArguments (defaults to 'audios' subfolder) - Add audio loading block in GeneralDataset._preprocess_batch() that mirrors the existing video loading pattern: detect 'audio' column in JSONL, load via load_audio(), pass to preprocess_func(audios=...) - Add 'audios' to PREPROCESS_KEYS for metadata exclusion - Pass audio_dir through fn_kwargs to .map() call The audio_dir parameter flows automatically through loader.py via filter_kwargs — no changes needed in loader.py. Fully backward compatible: datasets without audio columns are unaffected. JSONL format: {"prompt": "...", "audio": "file.wav"} Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * [data_utils] perf: eliminate redundant preprocessed-dataset writes Each rank's preprocessed Arrow shard is now written exactly once: the orchestrator routes Dataset.map output directly to the final per-rank location via cache_file_name= (under {merged_cache_path}.tmp/_parts/), and the consolidator writes only state.json + dataset_info.json before atomically renaming .tmp -> merged_cache_path. No row data is re-copied during the merge, no duplicate cache lands under ~/.cache/huggingface, and the build-dir sentinel _build_meta.json enables crash recovery for unchanged num_shards while wiping cleanly on num_shards changes. Single-process and distributed paths are unified through the same flow (N=1 case for single-process); enable_preprocess=False bypasses the consolidate pipeline entirely to preserve pre-refactor behavior. I/O budget per cache build drops from ~4*N*S to ~N*S bytes touched. Made-with: Cursor * [data_utils] fix: address PR #129 review - loader.py: gate build-dir prep on a shared is_orchestrator predicate (rank-0 globally for "global" mode, local main for "local" mode) and reuse it for the consolidator. Avoids concurrent shutil.rmtree / _build_meta.json writes when multiple nodes share the FS. - loader.py: align shard log with dataset.py convention {idx}/{N-1}. Addresses Copilot review comments #2 and #3 on PR #129. Made-with: Cursor * [utils] feat: add ndim validation to standardize_audio_batch Cherry-picked from a902429 (audio.py portion only). Adds explicit ValueError when input ndim is not in {1, 2, 3}, replacing silent fall-through. Aligns with the no-defensive-except.mdc fail-fast rule. The data_utils path only calls load_audio, never standardize_audio_batch, so this is a pure independent safety improvement to the audio utility module. Made-with: Cursor * [utils] fix: address PR #129 Comments 4-7 — fail-fast in audio.py + meta_matches recovery - audio.py save_audio: narrow torchaudio block to ImportError only, using try/import + else/save so genuine save failures (permission, encoding, disk full) propagate instead of silently falling back to soundfile or wave (Copilot Comment #4). - audio.py _load_audio_backend: same fix for torchaudio.load and soundfile.read blocks; the always-available wave fallback at the end stays as-is (Comment #5). - audio.py _resample: same fix for torchaudio.functional.resample; the scipy block already uses except ImportError correctly (Comment #6). - loader.py _meta_matches: catch (json.JSONDecodeError, OSError) so a corrupted sentinel from a previously crashed mid-write is treated as "stale" and triggers the wipe-and-recreate path, matching the existing "missing -> return False" semantics. Behavior is documented inline per no-defensive-except.mdc (Comment #7). All five sites now obey .cursor/rules/no-defensive-except.mdc: narrow, documented try/except limited to optional-import detection or recovery from a known on-disk failure mode. Made-with: Cursor * [utils] refactor: drop fallback chains in save_audio and _resample (fail-fast) torchaudio>=2.4.0 is a hard core dependency in pyproject.toml, so the optional-backend fallback chains in save_audio (soundfile + stdlib wave) and _resample (scipy + linear interpolation) were never genuinely optional - they only ever ran on broken installs, and silently substituted a different backend's behavior for what the caller asked for. Per .cursor/rules/no-defensive-except.mdc, the fail-fast posture is to surface the failure directly: - save_audio: now a single torchaudio.save call. If torchaudio is missing (broken install) the ImportError surfaces; if torchaudio.save raises (permission, encoding, disk full, unsupported extension) the exception surfaces with the original cause and full traceback. - _resample: now a single torchaudio.functional.resample call. dtype, device, and rate-mismatch errors surface directly instead of being silently masked by scipy or linear interp. Round-2 narrowed the broad except (ImportError, Exception) to ImportError-only (commit f8fb452); Round-3 goes further and removes the fallbacks themselves, since the underlying premise - "torchaudio might be missing" - is false for this codebase. _load_audio_backend's soundfile fallback is intentionally retained: soundfile is NOT a hard dep, and is needed for users who load formats torchaudio doesn't support out-of-the-box. That stays as a documented optional-backend pattern. Made-with: Cursor * [utils,data_utils] docs: address PR #129 Round-3 review (docstrings + unused import) Each fix below maps to a specific Copilot review comment from Round-3 (2026-04-17). All five sites are documentation / comment changes that bring the docstrings in line with current behavior, plus one trivial unused-import removal flagged in the same review. - audio.py load_audio (Comment id 3098052328): Removed the false `Raises: ImportError` claim — load_audio never raises ImportError because the implementation falls back to the stdlib `wave` module if both torchaudio and soundfile are unavailable. Added a Note: section listing all three backends in priority order (torchaudio primary; soundfile wav/flac/ogg; stdlib wave WAV-only 16/32-bit PCM) so callers know which formats survive each fallback. - audio.py _load_audio_backend (related to Comment 3098052328, found via audit): Replaced the cryptic "Priority: torchaudio > soundfile + torch" line with a numbered backend chain that documents: 1. torchaudio.load — primary 2. soundfile.read — (T, C) transposed to (C, T) 3. stdlib wave — WAV-only last-resort Added Raises: section for the WAV-fallback ValueError. - audio.py hash_audio (Comment id 3098052292): The docstring claimed "Longer audio is uniformly subsampled", but the implementation uses `step = n // max_samples`, which collapses to step == 1 (effective truncation to the first max_samples samples) when max_samples < n < 2 * max_samples. Added a Note: section that documents the truncation-vs-uniform-stride boundary at n = 2*max_samples and clarifies that determinism (cache-key intent), not collision resistance, is the design goal — keeping the existing implementation intact and accurately documented. - loader.py import torch (Comment id 3098052303): Removed unused `import torch` (line 21). Only `torch.utils.data` is used and that import on line 22 is satisfied independently. - loader.py is_orchestrator block (Comment id 3098052316): Rewrote the comment to be precise per case. The old comment claimed "is_orchestrator prevents multi-node races ... when the FS is shared", which is only true in "global" mode. Clarified: - non-distributed: the lone process - "global" mode: rank-0 globally — required when cache_dir is on a shared FS (eliminates cross-node race on shutil.rmtree / sentinel writes) - "local" mode: per-node local main — ASSUMES cache_dir is on node-local storage; pointing "local" mode at a shared FS WILL race across node-local mains and is unsupported The user's contract (already documented on _create_or_load_dataset's `preprocess_parallelism` arg) is now consistent with the comment at the actual orchestrator selection site. Comment id 3098052262 (save_audio "converted to int16") was already addressed by the Round-3 commit (49d5a54) when save_audio was reduced to a single torchaudio.save call — no separate fix needed here. Per .agents/knowledge/constraints.md #27 (Google-style docstrings) and philosophy.md #3 (fail-fast, no speculative code): the docstrings now faithfully reflect what the code does, including the corner cases that Round-2 narrowing exposed. Verification: ReadLints — clean black --check / diff — 358 lines, byte-identical to parent (49d5a54) isort --check / diff — 18 lines, byte-identical to parent (49d5a54) (all 358 + 18 are pre-existing on origin/main, outside the edited regions; not touched per "structural vs behavioral separation" — same rationale as Phase 2/3.) smoke test — load_audio docstring has no ImportError + names all 3 backends; _load_audio_backend uses Backend chain phrasing + names wave; hash_audio mentions truncation + 2*max_samples boundary; loader.py is_orchestrator comment includes 'unsupported' and 'node-local'; no `^import torch$` survives in loader.py. Made-with: Cursor * [data_utils,guidance] docs: complete audio_dir docstring coverage (audit) Round-5 follow-up to the design-principle audit on PR #129 — the audit found three audio_dir-related docstring drifts introduced by our own commits (27b5967, fa2e4fd) that Copilot did not flag: 1. dataset.py GeneralDataset.__init__: - We added audio_dir/target_arrow_path to the signature in fa2e4fd and only documented the latter. This commit adds Args entries for image_dir + video_dir + audio_dir + extra_hash_strs (the directory trio kept identical wording for symmetry; extra_hash_strs was pre-PR undocumented but is the natural workaround for the cache issue documented in the new Note). - Added a Note: section calling out that image_dir/video_dir/audio_dir are NOT in compute_cache_path's fingerprint (pre-existing and not fixed in this PR per "structural vs behavioral separation" — see audit report on the PR), so a directory swap with relative JSONL paths needs force_reprocess=True or extra_hash_strs to bust the cache. Users were silently inheriting this footgun without warning. 2. dataset.py GeneralDataset._preprocess_batch: - Added audio_dir to the Args section. - Renumbered the Workflow from 5+Returns to 7 explicit steps; the original 4-step body (text, images, videos, call) became 1/2/3/5, and we inserted "4. Load and prepare audio inputs" + an explicit metadata-pack step (7) that the prose previously left implicit. - Added a cross-reference to utils.audio.load_audio so readers don't have to grep for the loader. 3. guidance/workflow.md Stage 1 _preprocess_batch snippet: - Signature now matches the real one (audio_dir is the third dir parameter). - Comment list extended to 7 steps, mirroring the docstring fix above. Audit summary that prompted this commit (full report posted as PR comment): - 0 new Copilot comments since Round-4 push (a790e8a). - 12 historical Copilot comments verified addressed across Rounds 1-4. - Distributed barriers correct: prep -> map -> consolidate, three wait_for_everyone() boundaries, no double-fire on enable_preprocess=False. - consolidate_parts is idempotent on retry (state.json/dataset_info.json rewrites are last-write-wins; os.replace recovers from partial wipe). - audio.py fail-fast posture is clean: only ImportError around optional backend imports; everything else (decode/resample/save) propagates. - Pre-existing patterns surfaced by the audit but intentionally NOT touched per Phase-2/3 convention (structural vs behavioral separation): * set_format try/except Exception: pass at L267-270 + L446-449 of dataset.py (introduced in aa0715e, predates our PR). * Bare 'except:' in _compute_function_hash (introduced in 884a9db, predates our PR). * data_args.py import order is third-party-before-stdlib (predates our PR; isort baseline on origin/main is 18 lines, identical here). - Latent risk worth user-side awareness (already covered by the new Note: in __init__): image_dir/video_dir/audio_dir omitted from fingerprint -> stale cache on directory swap with relative paths. Affects all three sibling fields equally; documenting was the appropriate scope for this PR. Verification: - ReadLints: 0 errors on both edited files. - 13/13 smoke-test assertions pass (audio_dir in __init__ Args, image_dir/video_dir entries added, Note: re fingerprint present, workflow numbered 1-7, audio_dir in _preprocess_batch Args, load_audio cross-ref present, workflow.md signature matches, workflow.md steps 4-7 in expected positions). - black --check fingerprint on dataset.py: 563 lines before my edits, 563 lines after my edits (byte-identical) - no new formatting drift. Made-with: Cursor * [data_utils,utils,models] refactor: homogenize multi-media batch types in _preprocess_batch + add encode_audio abstract method Why --- `_preprocess_batch` in `dataset.py` was producing three different per-sample types for the `audios` column within a single batch: empty sample -> None 1 audio -> Tensor N audios -> List[Tensor] This breaks HF Arrow's homogeneous-column requirement for `Dataset.map(batched=True)` / cache-write, and forces every downstream consumer to handle three input shapes instead of one. Image/video had a parallel latent bug: their empty-branch only updated `xx_args` but not `batch[xx]`, leaving `batch[images]`/`batch[videos]` shorter than the input batch size — only "didn't surface" because real datasets are typically all-or-nothing per modality. Additionally, the BaseAdapter ABC documented `prompt`/`images`/`videos` but had no `audios` plumbing, so the audio inputs prepared by the dataset preprocess step had no path to the model's encode step. What ---- `dataset.py _preprocess_batch`: * Drop the unwrap-single line (`audios[0] if len(audios) == 1 else audios`); audio always stored as `List[torch.Tensor]` per sample. * Empty image/video/audio branches now append `[]` to BOTH `xx_args` AND `batch`, so every column has length == input batch size and a uniform `List[List[Media]]` type. (`MultiImageBatch` / `MultiVideoBatch` / `MultiAudioBatch` invariant.) * Docstring rewritten to specify the new contract per modality and a Note explaining why `[]`-for-empty matters for HF Arrow. `utils/audio.py`: * Add `MultiAudioBatch = Union[List[AudioBatch], torch.Tensor, np.ndarray]` type alias mirroring `MultiImageBatch` (image.py) and `MultiVideoBatch` (video.py); register in `__all__` and module-level type-hierarchy docstring. `models/abc.py`: * Add `audios: Optional[List[Union[torch.Tensor, List[torch.Tensor]]]]` parameter to `BaseAdapter.preprocess_func` (mirrors the existing `images`/`videos` Union style). * Iterate `(audios, self.encode_audio)` in the encoder dispatch loop so audio inputs flow through to the model when present. * Add new `@abstractmethod encode_audio()` modeled on `encode_video`, documenting the (single / batch-of-single / batch-of-list) input forms and the `condition_audios` output convention. 11 concrete adapters (z_image, wan2_v2v/t2v/i2v, sd3_5, qwen_image/_edit_plus, flux1/1_kontext/2/2_klein): * Add a one-line `def encode_audio(...): pass` stub mirroring each file's local `encode_video` style and explanatory docstring (e.g., "Not needed for X models." / "X does not support audio encoding."). The ABC call path already short-circuits via `if input is not None`, so unused stubs are never invoked. Verified -------- * 0 lint errors on all 14 touched files. * AST smoke test passes: every dataset.py edit site landed; audio.py exports `MultiAudioBatch`; abc.py defines `encode_audio` as abstract with the expected docstring; all 11 adapters define `encode_audio`. * black-diff hunks confirm new lines follow each file's existing style (single-quoted dict keys in dataset.py; aligned-comment Union members in audio.py; `audios : Optional[...]` in abc.py to match `prompt :`, `images :`, `videos :`). No NEW pre-existing-style violations introduced. * No current adapter overrides `encode_audio` (audited via Grep), so the new abstract is a forward-looking contract for the LTX2 adapter on the feat branch and any future audio-enabled model. Refs ---- PR #129 follow-up (Round 6). Made-with: Cursor * [models] refactor: make all 4 BaseAdapter encoders non-abstract with no-op defaults Why --- Round-6 promoted `encode_audio` to a 4th `@abstractmethod` on `BaseAdapter`, which forced all 11 existing concrete adapters to gain a one-line `def encode_audio(...): pass` stub just to remain instantiable. That penalises adapters that legitimately don't need the modality (e.g., SD3.5, Flux.1/2/Kontext, Z-Image, Qwen-Image*, Wan2-T2V/I2V/V2V) — they have to write boilerplate forwarding nothing — and it scales linearly with every new modality we add (audio today, who-knows-what tomorrow). The right invariant for `BaseAdapter` is the OPPOSITE: an adapter only implements the encoders for the modalities it actually consumes; missing modalities should be a silent no-op handled by the base class. `preprocess_func` already has the right plumbing — it iterates over `(text, encode_prompt), (images, encode_image), (videos, encode_video), (audios, encode_audio)` and merges only non-`None` results — so a default `pass` (which returns `None`) integrates cleanly with that loop. What ---- 1. `models/abc.py`: - Drop `@abstractmethod` from `encode_prompt`, `encode_image`, `encode_video`, `encode_audio` (all 4 encoders). - Replace each body with a single `pass` (Python returns `None`, which `preprocess_func`'s "skip if not isinstance(res, dict)" branch already handles correctly — see lines ~1928-1948). - Tighten signatures to use the homogeneous batch types produced by `dataset.py._preprocess_batch` after Round-6: encode_prompt(prompt: List[str], **kwargs) encode_image(images: MultiImageBatch, **kwargs) encode_video(videos: MultiVideoBatch, **kwargs) encode_audio(audios: MultiAudioBatch, **kwargs) — i.e., always a batch (`List[List[Media]]` ragged or uniform-shape Tensor/ndarray), never a single sample. This matches what the dataset actually hands to the encoders and lets every override stop re-implementing batch detection. - Return type narrowed to `Optional[Dict[str, Union[List[Any], torch.Tensor]]]` to make the no-op default's `None` return explicit in the contract. - Rewrite docstrings in Google style (consistent with existing base methods, satisfies repo design-principle re docstrings): state the no-op default, the dispatch contract, the input `Multi*Batch` shape, and the canonical output keys (`condition_images` / `condition_videos` / `condition_audios`). 2. `models/{z_image,wan,stable_diffusion,qwen_image,flux}/*.py` (11 adapters): revert the Round-6 `def encode_audio(...): pass` stub. Each adapter now inherits the no-op default from `BaseAdapter`. Net diff vs `origin/main` for these 11 files is now exactly **0 lines** — they match the PR base again, so the PR's footprint shrinks from 12 files to 1 (`models/abc.py` only, on the `models/` side). 3. New imports in `models/abc.py`: `from ..utils.image import MultiImageBatch` `from ..utils.video import MultiVideoBatch` `from ..utils.audio import MultiAudioBatch` (the audio one was already added in Round-6 via `utils/audio.py`). Compat / risk ------------- - LTX-2 adapters (`ltx2_t2av.py`, `ltx2_i2v.py`) live on `feat/ltx2-audio-video-support` and already implement `encode_audio` properly. The new ABC accepts a more general `MultiAudioBatch` (a superset of what they currently expect) so their existing `Union[torch.Tensor, List[torch.Tensor]]` typed override is still valid Python. **No LTX-2 changes needed.** - Adapters that DO need text/image/video encoding still override the corresponding encoder (their override silently shadows the base no-op), so behaviour is unchanged for everyone. - `preprocess_func`'s strict-validation branch (lines ~1937-1948, "must return a non-empty dict ...") only fires when the encoder returns something OTHER than `None` and that something is not a proper dict — `None` from the default still skips integration cleanly (see `if res is not None:` guard above the validation). Verification ------------ - `git diff origin/main -- src/flow_factory/models/<adapter>.py` → 0 lines for all 11 adapters. - AST smoke test: - `BaseAdapter` defines exactly 4 encoders; none decorated with `@abstractmethod`. - Each encoder body is `pass` (with optional docstring). - Each encoder's first non-self arg has the expected `Multi*Batch` annotation; return type is `Optional[...]`. - No adapter (of the 11) defines `encode_audio` anymore. - `ReadLints` clean on all 12 touched files. - `black --check --diff` line counts on all 12 files DECREASED vs HEAD (Δ ranges -37 to -219), so this commit introduces zero new formatting violations. Made-with: Cursor * [docs] update: align BaseAdapter contract docs with R6/R7 (4 abstract + 4 no-op encoders, MultiAudioBatch) constraints.md #12, ff-new-model/SKILL.md, guidance/new_model.md were written for the pre-R6/R7 contract. Bring them in sync with what src/flow_factory/models/abc.py actually declares after this PR: - 4 abstract methods: load_pipeline, decode_latents, forward, inference (was 7). - 4 per-modality encoders (encode_prompt/image/video/audio) are non-abstract no-op defaults. Subclasses override only the modalities they consume; preprocess_func dispatches to all four and skips when the encoder returns None. - Multi-modal batch convention extended to include audios: every modality column from _preprocess_batch is List[List[Media]] (empty samples contribute [], single-item samples contribute [item]). - guidance/new_model.md gains an encode_audio block, audios in the inference signature comment, and an encode_audio checklist item. Pure docs change; no Python touched. Made-with: Cursor --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Resync .agents/, .cursor/, guidance/, AGENTS.md and CLAUDE.md with the current code after plugin growth (9 trainers, 14 model adapters, 13 reward models). Fixes registry drift, wrong config/API facts and broken cross-references found in a full audit. - architecture.md/AGENTS.md: add diffusion-opd trainer, clap/imagebind/ geneval rewards, Bagel/LTX2 models; fix RationalRewards* class names - constraints.md: evaluate() is concrete (not abstract); index #28-29; paradigm (#7) and training-args (#16) lists; de-numbered line refs - philosophy.md: Accelerate (DDP/DeepSpeed ZeRO-1-2/FSDP) backend; fix #27 ref - guidance: scheduler.* config keys, real sample()/compute_advantages snippets, GenEval metadata convention, audio reward param, Bagel link - skills: model_name_or_path, default_target_modules, data.datasets, rewards-as-list, 9 trainers; CLAUDE.md imports AGENTS.md to avoid drift - topics/samplers.md: correct _resolve_sampler_type + AdvantageProcessor group_distributed paths; parity_testing set_scheduler_timesteps - hparams/model_args.py: model_type Literal now matches registry keys Co-authored-by: Cursor <cursoragent@cursor.com>

… merge (X-GenGroup#129) * [utils] feat: add audio utility module with type aliases, standardization, and loading Add `utils/audio.py` following the exact pattern of `utils/image.py` and `utils/video.py`. Provides a complete audio waveform toolkit: - Type aliases: `AudioSingle` (C,T), `AudioBatch` (B,C,T) - Validation: `is_audio()`, `is_audio_batch()` - Loading/saving: `load_audio()`, `save_audio()` with 3-tier backend fallback (torchaudio → soundfile → stdlib wave) - Conversion: `audio_to_tensor()`, `audio_to_numpy()`, `convert_audio()` for resampling and mono/stereo conversion - Standardization: `standardize_audio_batch()` with output_type='pt'|'np' - Hashing: `hash_audio()`, `hash_audio_list()` with int16 quantization Design conventions follow diffusers/audiocraft: - Channel-first (C, T) tensor layout - [-1.0, 1.0] float32 value range - Channel conversion: downmix=mean, upmix=repeat Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Made-with: Cursor * [data] feat: support audio loading and preprocessing in dataset pipeline - Add `audio_dir` field to DataArguments (defaults to 'audios' subfolder) - Add audio loading block in GeneralDataset._preprocess_batch() that mirrors the existing video loading pattern: detect 'audio' column in JSONL, load via load_audio(), pass to preprocess_func(audios=...) - Add 'audios' to PREPROCESS_KEYS for metadata exclusion - Pass audio_dir through fn_kwargs to .map() call The audio_dir parameter flows automatically through loader.py via filter_kwargs — no changes needed in loader.py. Fully backward compatible: datasets without audio columns are unaffected. JSONL format: {"prompt": "...", "audio": "file.wav"} Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * [data_utils] perf: eliminate redundant preprocessed-dataset writes Each rank's preprocessed Arrow shard is now written exactly once: the orchestrator routes Dataset.map output directly to the final per-rank location via cache_file_name= (under {merged_cache_path}.tmp/_parts/), and the consolidator writes only state.json + dataset_info.json before atomically renaming .tmp -> merged_cache_path. No row data is re-copied during the merge, no duplicate cache lands under ~/.cache/huggingface, and the build-dir sentinel _build_meta.json enables crash recovery for unchanged num_shards while wiping cleanly on num_shards changes. Single-process and distributed paths are unified through the same flow (N=1 case for single-process); enable_preprocess=False bypasses the consolidate pipeline entirely to preserve pre-refactor behavior. I/O budget per cache build drops from ~4*N*S to ~N*S bytes touched. Made-with: Cursor * [data_utils] fix: address PR X-GenGroup#129 review - loader.py: gate build-dir prep on a shared is_orchestrator predicate (rank-0 globally for "global" mode, local main for "local" mode) and reuse it for the consolidator. Avoids concurrent shutil.rmtree / _build_meta.json writes when multiple nodes share the FS. - loader.py: align shard log with dataset.py convention {idx}/{N-1}. Addresses Copilot review comments X-GenGroup#2 and X-GenGroup#3 on PR X-GenGroup#129. Made-with: Cursor * [utils] feat: add ndim validation to standardize_audio_batch Cherry-picked from a902429 (audio.py portion only). Adds explicit ValueError when input ndim is not in {1, 2, 3}, replacing silent fall-through. Aligns with the no-defensive-except.mdc fail-fast rule. The data_utils path only calls load_audio, never standardize_audio_batch, so this is a pure independent safety improvement to the audio utility module. Made-with: Cursor * [utils] fix: address PR X-GenGroup#129 Comments 4-7 — fail-fast in audio.py + meta_matches recovery - audio.py save_audio: narrow torchaudio block to ImportError only, using try/import + else/save so genuine save failures (permission, encoding, disk full) propagate instead of silently falling back to soundfile or wave (Copilot Comment X-GenGroup#4). - audio.py _load_audio_backend: same fix for torchaudio.load and soundfile.read blocks; the always-available wave fallback at the end stays as-is (Comment X-GenGroup#5). - audio.py _resample: same fix for torchaudio.functional.resample; the scipy block already uses except ImportError correctly (Comment X-GenGroup#6). - loader.py _meta_matches: catch (json.JSONDecodeError, OSError) so a corrupted sentinel from a previously crashed mid-write is treated as "stale" and triggers the wipe-and-recreate path, matching the existing "missing -> return False" semantics. Behavior is documented inline per no-defensive-except.mdc (Comment X-GenGroup#7). All five sites now obey .cursor/rules/no-defensive-except.mdc: narrow, documented try/except limited to optional-import detection or recovery from a known on-disk failure mode. Made-with: Cursor * [utils] refactor: drop fallback chains in save_audio and _resample (fail-fast) torchaudio>=2.4.0 is a hard core dependency in pyproject.toml, so the optional-backend fallback chains in save_audio (soundfile + stdlib wave) and _resample (scipy + linear interpolation) were never genuinely optional - they only ever ran on broken installs, and silently substituted a different backend's behavior for what the caller asked for. Per .cursor/rules/no-defensive-except.mdc, the fail-fast posture is to surface the failure directly: - save_audio: now a single torchaudio.save call. If torchaudio is missing (broken install) the ImportError surfaces; if torchaudio.save raises (permission, encoding, disk full, unsupported extension) the exception surfaces with the original cause and full traceback. - _resample: now a single torchaudio.functional.resample call. dtype, device, and rate-mismatch errors surface directly instead of being silently masked by scipy or linear interp. Round-2 narrowed the broad except (ImportError, Exception) to ImportError-only (commit f8fb452); Round-3 goes further and removes the fallbacks themselves, since the underlying premise - "torchaudio might be missing" - is false for this codebase. _load_audio_backend's soundfile fallback is intentionally retained: soundfile is NOT a hard dep, and is needed for users who load formats torchaudio doesn't support out-of-the-box. That stays as a documented optional-backend pattern. Made-with: Cursor * [utils,data_utils] docs: address PR X-GenGroup#129 Round-3 review (docstrings + unused import) Each fix below maps to a specific Copilot review comment from Round-3 (2026-04-17). All five sites are documentation / comment changes that bring the docstrings in line with current behavior, plus one trivial unused-import removal flagged in the same review. - audio.py load_audio (Comment id 3098052328): Removed the false `Raises: ImportError` claim — load_audio never raises ImportError because the implementation falls back to the stdlib `wave` module if both torchaudio and soundfile are unavailable. Added a Note: section listing all three backends in priority order (torchaudio primary; soundfile wav/flac/ogg; stdlib wave WAV-only 16/32-bit PCM) so callers know which formats survive each fallback. - audio.py _load_audio_backend (related to Comment 3098052328, found via audit): Replaced the cryptic "Priority: torchaudio > soundfile + torch" line with a numbered backend chain that documents: 1. torchaudio.load — primary 2. soundfile.read — (T, C) transposed to (C, T) 3. stdlib wave — WAV-only last-resort Added Raises: section for the WAV-fallback ValueError. - audio.py hash_audio (Comment id 3098052292): The docstring claimed "Longer audio is uniformly subsampled", but the implementation uses `step = n // max_samples`, which collapses to step == 1 (effective truncation to the first max_samples samples) when max_samples < n < 2 * max_samples. Added a Note: section that documents the truncation-vs-uniform-stride boundary at n = 2*max_samples and clarifies that determinism (cache-key intent), not collision resistance, is the design goal — keeping the existing implementation intact and accurately documented. - loader.py import torch (Comment id 3098052303): Removed unused `import torch` (line 21). Only `torch.utils.data` is used and that import on line 22 is satisfied independently. - loader.py is_orchestrator block (Comment id 3098052316): Rewrote the comment to be precise per case. The old comment claimed "is_orchestrator prevents multi-node races ... when the FS is shared", which is only true in "global" mode. Clarified: - non-distributed: the lone process - "global" mode: rank-0 globally — required when cache_dir is on a shared FS (eliminates cross-node race on shutil.rmtree / sentinel writes) - "local" mode: per-node local main — ASSUMES cache_dir is on node-local storage; pointing "local" mode at a shared FS WILL race across node-local mains and is unsupported The user's contract (already documented on _create_or_load_dataset's `preprocess_parallelism` arg) is now consistent with the comment at the actual orchestrator selection site. Comment id 3098052262 (save_audio "converted to int16") was already addressed by the Round-3 commit (49d5a54) when save_audio was reduced to a single torchaudio.save call — no separate fix needed here. Per .agents/knowledge/constraints.md X-GenGroup#27 (Google-style docstrings) and philosophy.md X-GenGroup#3 (fail-fast, no speculative code): the docstrings now faithfully reflect what the code does, including the corner cases that Round-2 narrowing exposed. Verification: ReadLints — clean black --check / diff — 358 lines, byte-identical to parent (49d5a54) isort --check / diff — 18 lines, byte-identical to parent (49d5a54) (all 358 + 18 are pre-existing on origin/main, outside the edited regions; not touched per "structural vs behavioral separation" — same rationale as Phase 2/3.) smoke test — load_audio docstring has no ImportError + names all 3 backends; _load_audio_backend uses Backend chain phrasing + names wave; hash_audio mentions truncation + 2*max_samples boundary; loader.py is_orchestrator comment includes 'unsupported' and 'node-local'; no `^import torch$` survives in loader.py. Made-with: Cursor * [data_utils,guidance] docs: complete audio_dir docstring coverage (audit) Round-5 follow-up to the design-principle audit on PR X-GenGroup#129 — the audit found three audio_dir-related docstring drifts introduced by our own commits (27b5967, fa2e4fd) that Copilot did not flag: 1. dataset.py GeneralDataset.__init__: - We added audio_dir/target_arrow_path to the signature in fa2e4fd and only documented the latter. This commit adds Args entries for image_dir + video_dir + audio_dir + extra_hash_strs (the directory trio kept identical wording for symmetry; extra_hash_strs was pre-PR undocumented but is the natural workaround for the cache issue documented in the new Note). - Added a Note: section calling out that image_dir/video_dir/audio_dir are NOT in compute_cache_path's fingerprint (pre-existing and not fixed in this PR per "structural vs behavioral separation" — see audit report on the PR), so a directory swap with relative JSONL paths needs force_reprocess=True or extra_hash_strs to bust the cache. Users were silently inheriting this footgun without warning. 2. dataset.py GeneralDataset._preprocess_batch: - Added audio_dir to the Args section. - Renumbered the Workflow from 5+Returns to 7 explicit steps; the original 4-step body (text, images, videos, call) became 1/2/3/5, and we inserted "4. Load and prepare audio inputs" + an explicit metadata-pack step (7) that the prose previously left implicit. - Added a cross-reference to utils.audio.load_audio so readers don't have to grep for the loader. 3. guidance/workflow.md Stage 1 _preprocess_batch snippet: - Signature now matches the real one (audio_dir is the third dir parameter). - Comment list extended to 7 steps, mirroring the docstring fix above. Audit summary that prompted this commit (full report posted as PR comment): - 0 new Copilot comments since Round-4 push (a790e8a). - 12 historical Copilot comments verified addressed across Rounds 1-4. - Distributed barriers correct: prep -> map -> consolidate, three wait_for_everyone() boundaries, no double-fire on enable_preprocess=False. - consolidate_parts is idempotent on retry (state.json/dataset_info.json rewrites are last-write-wins; os.replace recovers from partial wipe). - audio.py fail-fast posture is clean: only ImportError around optional backend imports; everything else (decode/resample/save) propagates. - Pre-existing patterns surfaced by the audit but intentionally NOT touched per Phase-2/3 convention (structural vs behavioral separation): * set_format try/except Exception: pass at L267-270 + L446-449 of dataset.py (introduced in aa0715e, predates our PR). * Bare 'except:' in _compute_function_hash (introduced in 884a9db, predates our PR). * data_args.py import order is third-party-before-stdlib (predates our PR; isort baseline on origin/main is 18 lines, identical here). - Latent risk worth user-side awareness (already covered by the new Note: in __init__): image_dir/video_dir/audio_dir omitted from fingerprint -> stale cache on directory swap with relative paths. Affects all three sibling fields equally; documenting was the appropriate scope for this PR. Verification: - ReadLints: 0 errors on both edited files. - 13/13 smoke-test assertions pass (audio_dir in __init__ Args, image_dir/video_dir entries added, Note: re fingerprint present, workflow numbered 1-7, audio_dir in _preprocess_batch Args, load_audio cross-ref present, workflow.md signature matches, workflow.md steps 4-7 in expected positions). - black --check fingerprint on dataset.py: 563 lines before my edits, 563 lines after my edits (byte-identical) - no new formatting drift. Made-with: Cursor * [data_utils,utils,models] refactor: homogenize multi-media batch types in _preprocess_batch + add encode_audio abstract method Why --- `_preprocess_batch` in `dataset.py` was producing three different per-sample types for the `audios` column within a single batch: empty sample -> None 1 audio -> Tensor N audios -> List[Tensor] This breaks HF Arrow's homogeneous-column requirement for `Dataset.map(batched=True)` / cache-write, and forces every downstream consumer to handle three input shapes instead of one. Image/video had a parallel latent bug: their empty-branch only updated `xx_args` but not `batch[xx]`, leaving `batch[images]`/`batch[videos]` shorter than the input batch size — only "didn't surface" because real datasets are typically all-or-nothing per modality. Additionally, the BaseAdapter ABC documented `prompt`/`images`/`videos` but had no `audios` plumbing, so the audio inputs prepared by the dataset preprocess step had no path to the model's encode step. What ---- `dataset.py _preprocess_batch`: * Drop the unwrap-single line (`audios[0] if len(audios) == 1 else audios`); audio always stored as `List[torch.Tensor]` per sample. * Empty image/video/audio branches now append `[]` to BOTH `xx_args` AND `batch`, so every column has length == input batch size and a uniform `List[List[Media]]` type. (`MultiImageBatch` / `MultiVideoBatch` / `MultiAudioBatch` invariant.) * Docstring rewritten to specify the new contract per modality and a Note explaining why `[]`-for-empty matters for HF Arrow. `utils/audio.py`: * Add `MultiAudioBatch = Union[List[AudioBatch], torch.Tensor, np.ndarray]` type alias mirroring `MultiImageBatch` (image.py) and `MultiVideoBatch` (video.py); register in `__all__` and module-level type-hierarchy docstring. `models/abc.py`: * Add `audios: Optional[List[Union[torch.Tensor, List[torch.Tensor]]]]` parameter to `BaseAdapter.preprocess_func` (mirrors the existing `images`/`videos` Union style). * Iterate `(audios, self.encode_audio)` in the encoder dispatch loop so audio inputs flow through to the model when present. * Add new `@abstractmethod encode_audio()` modeled on `encode_video`, documenting the (single / batch-of-single / batch-of-list) input forms and the `condition_audios` output convention. 11 concrete adapters (z_image, wan2_v2v/t2v/i2v, sd3_5, qwen_image/_edit_plus, flux1/1_kontext/2/2_klein): * Add a one-line `def encode_audio(...): pass` stub mirroring each file's local `encode_video` style and explanatory docstring (e.g., "Not needed for X models." / "X does not support audio encoding."). The ABC call path already short-circuits via `if input is not None`, so unused stubs are never invoked. Verified -------- * 0 lint errors on all 14 touched files. * AST smoke test passes: every dataset.py edit site landed; audio.py exports `MultiAudioBatch`; abc.py defines `encode_audio` as abstract with the expected docstring; all 11 adapters define `encode_audio`. * black-diff hunks confirm new lines follow each file's existing style (single-quoted dict keys in dataset.py; aligned-comment Union members in audio.py; `audios : Optional[...]` in abc.py to match `prompt :`, `images :`, `videos :`). No NEW pre-existing-style violations introduced. * No current adapter overrides `encode_audio` (audited via Grep), so the new abstract is a forward-looking contract for the LTX2 adapter on the feat branch and any future audio-enabled model. Refs ---- PR X-GenGroup#129 follow-up (Round 6). Made-with: Cursor * [models] refactor: make all 4 BaseAdapter encoders non-abstract with no-op defaults Why --- Round-6 promoted `encode_audio` to a 4th `@abstractmethod` on `BaseAdapter`, which forced all 11 existing concrete adapters to gain a one-line `def encode_audio(...): pass` stub just to remain instantiable. That penalises adapters that legitimately don't need the modality (e.g., SD3.5, Flux.1/2/Kontext, Z-Image, Qwen-Image*, Wan2-T2V/I2V/V2V) — they have to write boilerplate forwarding nothing — and it scales linearly with every new modality we add (audio today, who-knows-what tomorrow). The right invariant for `BaseAdapter` is the OPPOSITE: an adapter only implements the encoders for the modalities it actually consumes; missing modalities should be a silent no-op handled by the base class. `preprocess_func` already has the right plumbing — it iterates over `(text, encode_prompt), (images, encode_image), (videos, encode_video), (audios, encode_audio)` and merges only non-`None` results — so a default `pass` (which returns `None`) integrates cleanly with that loop. What ---- 1. `models/abc.py`: - Drop `@abstractmethod` from `encode_prompt`, `encode_image`, `encode_video`, `encode_audio` (all 4 encoders). - Replace each body with a single `pass` (Python returns `None`, which `preprocess_func`'s "skip if not isinstance(res, dict)" branch already handles correctly — see lines ~1928-1948). - Tighten signatures to use the homogeneous batch types produced by `dataset.py._preprocess_batch` after Round-6: encode_prompt(prompt: List[str], **kwargs) encode_image(images: MultiImageBatch, **kwargs) encode_video(videos: MultiVideoBatch, **kwargs) encode_audio(audios: MultiAudioBatch, **kwargs) — i.e., always a batch (`List[List[Media]]` ragged or uniform-shape Tensor/ndarray), never a single sample. This matches what the dataset actually hands to the encoders and lets every override stop re-implementing batch detection. - Return type narrowed to `Optional[Dict[str, Union[List[Any], torch.Tensor]]]` to make the no-op default's `None` return explicit in the contract. - Rewrite docstrings in Google style (consistent with existing base methods, satisfies repo design-principle re docstrings): state the no-op default, the dispatch contract, the input `Multi*Batch` shape, and the canonical output keys (`condition_images` / `condition_videos` / `condition_audios`). 2. `models/{z_image,wan,stable_diffusion,qwen_image,flux}/*.py` (11 adapters): revert the Round-6 `def encode_audio(...): pass` stub. Each adapter now inherits the no-op default from `BaseAdapter`. Net diff vs `origin/main` for these 11 files is now exactly **0 lines** — they match the PR base again, so the PR's footprint shrinks from 12 files to 1 (`models/abc.py` only, on the `models/` side). 3. New imports in `models/abc.py`: `from ..utils.image import MultiImageBatch` `from ..utils.video import MultiVideoBatch` `from ..utils.audio import MultiAudioBatch` (the audio one was already added in Round-6 via `utils/audio.py`). Compat / risk ------------- - LTX-2 adapters (`ltx2_t2av.py`, `ltx2_i2v.py`) live on `feat/ltx2-audio-video-support` and already implement `encode_audio` properly. The new ABC accepts a more general `MultiAudioBatch` (a superset of what they currently expect) so their existing `Union[torch.Tensor, List[torch.Tensor]]` typed override is still valid Python. **No LTX-2 changes needed.** - Adapters that DO need text/image/video encoding still override the corresponding encoder (their override silently shadows the base no-op), so behaviour is unchanged for everyone. - `preprocess_func`'s strict-validation branch (lines ~1937-1948, "must return a non-empty dict ...") only fires when the encoder returns something OTHER than `None` and that something is not a proper dict — `None` from the default still skips integration cleanly (see `if res is not None:` guard above the validation). Verification ------------ - `git diff origin/main -- src/flow_factory/models/<adapter>.py` → 0 lines for all 11 adapters. - AST smoke test: - `BaseAdapter` defines exactly 4 encoders; none decorated with `@abstractmethod`. - Each encoder body is `pass` (with optional docstring). - Each encoder's first non-self arg has the expected `Multi*Batch` annotation; return type is `Optional[...]`. - No adapter (of the 11) defines `encode_audio` anymore. - `ReadLints` clean on all 12 touched files. - `black --check --diff` line counts on all 12 files DECREASED vs HEAD (Δ ranges -37 to -219), so this commit introduces zero new formatting violations. Made-with: Cursor * [docs] update: align BaseAdapter contract docs with R6/R7 (4 abstract + 4 no-op encoders, MultiAudioBatch) constraints.md X-GenGroup#12, ff-new-model/SKILL.md, guidance/new_model.md were written for the pre-R6/R7 contract. Bring them in sync with what src/flow_factory/models/abc.py actually declares after this PR: - 4 abstract methods: load_pipeline, decode_latents, forward, inference (was 7). - 4 per-modality encoders (encode_prompt/image/video/audio) are non-abstract no-op defaults. Subclasses override only the modalities they consume; preprocess_func dispatches to all four and skips when the encoder returns None. - Multi-modal batch convention extended to include audios: every modality column from _preprocess_batch is List[List[Media]] (empty samples contribute [], single-item samples contribute [item]). - guidance/new_model.md gains an encode_audio block, audios in the inference signature comment, and an encode_audio checklist item. Pure docs change; no Python touched. Made-with: Cursor --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Fix component management

8ba9ac4

Jayce-Ping merged commit 7e5d5b2 into main Feb 8, 2026

Jayce-Ping deleted the fix_comp_manage branch February 8, 2026 14:46

Jayce-Ping mentioned this pull request Apr 4, 2026

[trainer,advantage,utils] feat: add Diffusion-DPO algorithm and fix distributed metric aggregation #115

Merged

5 tasks

Jayce-Ping mentioned this pull request Apr 17, 2026

[data_utils] feat+perf: audio loading support and metadata-only shard merge #129

Merged

5 tasks

Jayce-Ping added a commit to Jayce-Ping/Flow-Factory-Private that referenced this pull request Jul 2, 2026

Fix component management (X-GenGroup#27)

a4f44e9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix component management#27

Fix component management#27
Jayce-Ping merged 1 commit into
mainfrom
fix_comp_manage

Jayce-Ping commented Feb 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Jayce-Ping commented Feb 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant