Skip to content

Fix component management#27

Merged
Jayce-Ping merged 1 commit into
mainfrom
fix_comp_manage
Feb 8, 2026
Merged

Fix component management#27
Jayce-Ping merged 1 commit into
mainfrom
fix_comp_manage

Conversation

@Jayce-Ping

Copy link
Copy Markdown
Collaborator

No description provided.

@Jayce-Ping Jayce-Ping merged commit 7e5d5b2 into main Feb 8, 2026
@Jayce-Ping Jayce-Ping deleted the fix_comp_manage branch February 8, 2026 14:46
Jayce-Ping added a commit that referenced this pull request Apr 17, 2026
… unused import)

Each fix below maps to a specific Copilot review comment from Round-3
(2026-04-17). All five sites are documentation / comment changes that
bring the docstrings in line with current behavior, plus one trivial
unused-import removal flagged in the same review.

- audio.py load_audio (Comment id 3098052328):
    Removed the false `Raises: ImportError` claim — load_audio never
    raises ImportError because the implementation falls back to the stdlib
    `wave` module if both torchaudio and soundfile are unavailable. Added
    a Note: section listing all three backends in priority order
    (torchaudio primary; soundfile wav/flac/ogg; stdlib wave WAV-only
    16/32-bit PCM) so callers know which formats survive each fallback.

- audio.py _load_audio_backend (related to Comment 3098052328, found via
  audit):
    Replaced the cryptic "Priority: torchaudio > soundfile + torch" line
    with a numbered backend chain that documents:
      1. torchaudio.load — primary
      2. soundfile.read  — (T, C) transposed to (C, T)
      3. stdlib wave     — WAV-only last-resort
    Added Raises: section for the WAV-fallback ValueError.

- audio.py hash_audio (Comment id 3098052292):
    The docstring claimed "Longer audio is uniformly subsampled", but the
    implementation uses `step = n // max_samples`, which collapses to
    step == 1 (effective truncation to the first max_samples samples)
    when max_samples < n < 2 * max_samples. Added a Note: section that
    documents the truncation-vs-uniform-stride boundary at n = 2*max_samples
    and clarifies that determinism (cache-key intent), not collision
    resistance, is the design goal — keeping the existing implementation
    intact and accurately documented.

- loader.py import torch (Comment id 3098052303):
    Removed unused `import torch` (line 21). Only `torch.utils.data` is
    used and that import on line 22 is satisfied independently.

- loader.py is_orchestrator block (Comment id 3098052316):
    Rewrote the comment to be precise per case. The old comment claimed
    "is_orchestrator prevents multi-node races ... when the FS is shared",
    which is only true in "global" mode. Clarified:
      - non-distributed: the lone process
      - "global" mode:   rank-0 globally — required when cache_dir is on a
                         shared FS (eliminates cross-node race on
                         shutil.rmtree / sentinel writes)
      - "local"  mode:   per-node local main — ASSUMES cache_dir is on
                         node-local storage; pointing "local" mode at a
                         shared FS WILL race across node-local mains and
                         is unsupported
    The user's contract (already documented on _create_or_load_dataset's
    `preprocess_parallelism` arg) is now consistent with the comment at
    the actual orchestrator selection site.

Comment id 3098052262 (save_audio "converted to int16") was already
addressed by the Round-3 commit (49d5a54) when save_audio was reduced to
a single torchaudio.save call — no separate fix needed here.

Per .agents/knowledge/constraints.md #27 (Google-style docstrings) and
philosophy.md #3 (fail-fast, no speculative code): the docstrings now
faithfully reflect what the code does, including the corner cases that
Round-2 narrowing exposed.

Verification: ReadLints              — clean
  black --check / diff   — 358 lines, byte-identical to parent (49d5a54)
  isort  --check / diff  — 18 lines, byte-identical to parent (49d5a54)
                           (all 358 + 18 are pre-existing on origin/main,
                           outside the edited regions; not touched per
                           "structural vs behavioral separation" — same
                           rationale as Phase 2/3.)
  smoke test             — load_audio docstring has no ImportError + names
                           all 3 backends; _load_audio_backend uses Backend
                           chain phrasing + names wave; hash_audio mentions
                           truncation + 2*max_samples boundary; loader.py
                           is_orchestrator comment includes 'unsupported'
                           and 'node-local'; no `^import torch$` survives
                           in loader.py.
Made-with: Cursor
Jayce-Ping added a commit that referenced this pull request Apr 17, 2026
… merge (#129)

* [utils] feat: add audio utility module with type aliases, standardization, and loading

Add `utils/audio.py` following the exact pattern of `utils/image.py` and
`utils/video.py`. Provides a complete audio waveform toolkit:

- Type aliases: `AudioSingle` (C,T), `AudioBatch` (B,C,T)
- Validation: `is_audio()`, `is_audio_batch()`
- Loading/saving: `load_audio()`, `save_audio()` with 3-tier backend
  fallback (torchaudio → soundfile → stdlib wave)
- Conversion: `audio_to_tensor()`, `audio_to_numpy()`, `convert_audio()`
  for resampling and mono/stereo conversion
- Standardization: `standardize_audio_batch()` with output_type='pt'|'np'
- Hashing: `hash_audio()`, `hash_audio_list()` with int16 quantization

Design conventions follow diffusers/audiocraft:
- Channel-first (C, T) tensor layout
- [-1.0, 1.0] float32 value range
- Channel conversion: downmix=mean, upmix=repeat

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Made-with: Cursor

* [data] feat: support audio loading and preprocessing in dataset pipeline

- Add `audio_dir` field to DataArguments (defaults to 'audios' subfolder)
- Add audio loading block in GeneralDataset._preprocess_batch() that
  mirrors the existing video loading pattern: detect 'audio' column in
  JSONL, load via load_audio(), pass to preprocess_func(audios=...)
- Add 'audios' to PREPROCESS_KEYS for metadata exclusion
- Pass audio_dir through fn_kwargs to .map() call

The audio_dir parameter flows automatically through loader.py via
filter_kwargs — no changes needed in loader.py. Fully backward
compatible: datasets without audio columns are unaffected.

JSONL format: {"prompt": "...", "audio": "file.wav"}

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* [data_utils] perf: eliminate redundant preprocessed-dataset writes

Each rank's preprocessed Arrow shard is now written exactly once: the
orchestrator routes Dataset.map output directly to the final per-rank
location via cache_file_name= (under {merged_cache_path}.tmp/_parts/),
and the consolidator writes only state.json + dataset_info.json before
atomically renaming .tmp -> merged_cache_path. No row data is re-copied
during the merge, no duplicate cache lands under ~/.cache/huggingface,
and the build-dir sentinel _build_meta.json enables crash recovery for
unchanged num_shards while wiping cleanly on num_shards changes.

Single-process and distributed paths are unified through the same flow
(N=1 case for single-process); enable_preprocess=False bypasses the
consolidate pipeline entirely to preserve pre-refactor behavior. I/O
budget per cache build drops from ~4*N*S to ~N*S bytes touched.

Made-with: Cursor

* [data_utils] fix: address PR #129 review

- loader.py: gate build-dir prep on a shared is_orchestrator predicate
  (rank-0 globally for "global" mode, local main for "local" mode) and
  reuse it for the consolidator. Avoids concurrent shutil.rmtree /
  _build_meta.json writes when multiple nodes share the FS.
- loader.py: align shard log with dataset.py convention {idx}/{N-1}.

Addresses Copilot review comments #2 and #3 on PR #129.

Made-with: Cursor

* [utils] feat: add ndim validation to standardize_audio_batch

Cherry-picked from a902429 (audio.py portion only). Adds explicit
ValueError when input ndim is not in {1, 2, 3}, replacing silent
fall-through. Aligns with the no-defensive-except.mdc fail-fast rule.

The data_utils path only calls load_audio, never standardize_audio_batch,
so this is a pure independent safety improvement to the audio utility module.

Made-with: Cursor

* [utils] fix: address PR #129 Comments 4-7 — fail-fast in audio.py + meta_matches recovery

- audio.py save_audio: narrow torchaudio block to ImportError only,
  using try/import + else/save so genuine save failures (permission,
  encoding, disk full) propagate instead of silently falling back to
  soundfile or wave (Copilot Comment #4).
- audio.py _load_audio_backend: same fix for torchaudio.load and
  soundfile.read blocks; the always-available wave fallback at the end
  stays as-is (Comment #5).
- audio.py _resample: same fix for torchaudio.functional.resample;
  the scipy block already uses except ImportError correctly (Comment #6).
- loader.py _meta_matches: catch (json.JSONDecodeError, OSError) so a
  corrupted sentinel from a previously crashed mid-write is treated as
  "stale" and triggers the wipe-and-recreate path, matching the existing
  "missing -> return False" semantics. Behavior is documented inline
  per no-defensive-except.mdc (Comment #7).

All five sites now obey .cursor/rules/no-defensive-except.mdc: narrow,
documented try/except limited to optional-import detection or recovery
from a known on-disk failure mode.

Made-with: Cursor

* [utils] refactor: drop fallback chains in save_audio and _resample (fail-fast)

torchaudio>=2.4.0 is a hard core dependency in pyproject.toml, so the
optional-backend fallback chains in save_audio (soundfile + stdlib wave)
and _resample (scipy + linear interpolation) were never genuinely optional
- they only ever ran on broken installs, and silently substituted a different
backend's behavior for what the caller asked for.

Per .cursor/rules/no-defensive-except.mdc, the fail-fast posture is to
surface the failure directly:

- save_audio: now a single torchaudio.save call. If torchaudio is missing
  (broken install) the ImportError surfaces; if torchaudio.save raises
  (permission, encoding, disk full, unsupported extension) the exception
  surfaces with the original cause and full traceback.
- _resample: now a single torchaudio.functional.resample call. dtype,
  device, and rate-mismatch errors surface directly instead of being
  silently masked by scipy or linear interp.

Round-2 narrowed the broad except (ImportError, Exception) to ImportError-only
(commit f8fb452); Round-3 goes further and removes the fallbacks themselves,
since the underlying premise - "torchaudio might be missing" - is false for
this codebase.

_load_audio_backend's soundfile fallback is intentionally retained: soundfile
is NOT a hard dep, and is needed for users who load formats torchaudio
doesn't support out-of-the-box. That stays as a documented optional-backend
pattern.

Made-with: Cursor

* [utils,data_utils] docs: address PR #129 Round-3 review (docstrings + unused import)

Each fix below maps to a specific Copilot review comment from Round-3
(2026-04-17). All five sites are documentation / comment changes that
bring the docstrings in line with current behavior, plus one trivial
unused-import removal flagged in the same review.

- audio.py load_audio (Comment id 3098052328):
    Removed the false `Raises: ImportError` claim — load_audio never
    raises ImportError because the implementation falls back to the stdlib
    `wave` module if both torchaudio and soundfile are unavailable. Added
    a Note: section listing all three backends in priority order
    (torchaudio primary; soundfile wav/flac/ogg; stdlib wave WAV-only
    16/32-bit PCM) so callers know which formats survive each fallback.

- audio.py _load_audio_backend (related to Comment 3098052328, found via
  audit):
    Replaced the cryptic "Priority: torchaudio > soundfile + torch" line
    with a numbered backend chain that documents:
      1. torchaudio.load — primary
      2. soundfile.read  — (T, C) transposed to (C, T)
      3. stdlib wave     — WAV-only last-resort
    Added Raises: section for the WAV-fallback ValueError.

- audio.py hash_audio (Comment id 3098052292):
    The docstring claimed "Longer audio is uniformly subsampled", but the
    implementation uses `step = n // max_samples`, which collapses to
    step == 1 (effective truncation to the first max_samples samples)
    when max_samples < n < 2 * max_samples. Added a Note: section that
    documents the truncation-vs-uniform-stride boundary at n = 2*max_samples
    and clarifies that determinism (cache-key intent), not collision
    resistance, is the design goal — keeping the existing implementation
    intact and accurately documented.

- loader.py import torch (Comment id 3098052303):
    Removed unused `import torch` (line 21). Only `torch.utils.data` is
    used and that import on line 22 is satisfied independently.

- loader.py is_orchestrator block (Comment id 3098052316):
    Rewrote the comment to be precise per case. The old comment claimed
    "is_orchestrator prevents multi-node races ... when the FS is shared",
    which is only true in "global" mode. Clarified:
      - non-distributed: the lone process
      - "global" mode:   rank-0 globally — required when cache_dir is on a
                         shared FS (eliminates cross-node race on
                         shutil.rmtree / sentinel writes)
      - "local"  mode:   per-node local main — ASSUMES cache_dir is on
                         node-local storage; pointing "local" mode at a
                         shared FS WILL race across node-local mains and
                         is unsupported
    The user's contract (already documented on _create_or_load_dataset's
    `preprocess_parallelism` arg) is now consistent with the comment at
    the actual orchestrator selection site.

Comment id 3098052262 (save_audio "converted to int16") was already
addressed by the Round-3 commit (49d5a54) when save_audio was reduced to
a single torchaudio.save call — no separate fix needed here.

Per .agents/knowledge/constraints.md #27 (Google-style docstrings) and
philosophy.md #3 (fail-fast, no speculative code): the docstrings now
faithfully reflect what the code does, including the corner cases that
Round-2 narrowing exposed.

Verification: ReadLints              — clean
  black --check / diff   — 358 lines, byte-identical to parent (49d5a54)
  isort  --check / diff  — 18 lines, byte-identical to parent (49d5a54)
                           (all 358 + 18 are pre-existing on origin/main,
                           outside the edited regions; not touched per
                           "structural vs behavioral separation" — same
                           rationale as Phase 2/3.)
  smoke test             — load_audio docstring has no ImportError + names
                           all 3 backends; _load_audio_backend uses Backend
                           chain phrasing + names wave; hash_audio mentions
                           truncation + 2*max_samples boundary; loader.py
                           is_orchestrator comment includes 'unsupported'
                           and 'node-local'; no `^import torch$` survives
                           in loader.py.
Made-with: Cursor

* [data_utils,guidance] docs: complete audio_dir docstring coverage (audit)

Round-5 follow-up to the design-principle audit on PR #129 — the audit
found three audio_dir-related docstring drifts introduced by our own
commits (27b5967, fa2e4fd) that Copilot did not flag:

1. dataset.py GeneralDataset.__init__:
   - We added audio_dir/target_arrow_path to the signature in fa2e4fd
     and only documented the latter. This commit adds Args entries for
     image_dir + video_dir + audio_dir + extra_hash_strs (the directory
     trio kept identical wording for symmetry; extra_hash_strs was
     pre-PR undocumented but is the natural workaround for the cache
     issue documented in the new Note).
   - Added a Note: section calling out that image_dir/video_dir/audio_dir
     are NOT in compute_cache_path's fingerprint (pre-existing and not
     fixed in this PR per "structural vs behavioral separation" — see
     audit report on the PR), so a directory swap with relative JSONL
     paths needs force_reprocess=True or extra_hash_strs to bust the
     cache. Users were silently inheriting this footgun without warning.

2. dataset.py GeneralDataset._preprocess_batch:
   - Added audio_dir to the Args section.
   - Renumbered the Workflow from 5+Returns to 7 explicit steps; the
     original 4-step body (text, images, videos, call) became 1/2/3/5,
     and we inserted "4. Load and prepare audio inputs" + an explicit
     metadata-pack step (7) that the prose previously left implicit.
   - Added a cross-reference to utils.audio.load_audio so readers
     don't have to grep for the loader.

3. guidance/workflow.md Stage 1 _preprocess_batch snippet:
   - Signature now matches the real one (audio_dir is the third dir
     parameter).
   - Comment list extended to 7 steps, mirroring the docstring fix above.

Audit summary that prompted this commit (full report posted as PR comment):
- 0 new Copilot comments since Round-4 push (a790e8a).
- 12 historical Copilot comments verified addressed across Rounds 1-4.
- Distributed barriers correct: prep -> map -> consolidate, three
  wait_for_everyone() boundaries, no double-fire on enable_preprocess=False.
- consolidate_parts is idempotent on retry (state.json/dataset_info.json
  rewrites are last-write-wins; os.replace recovers from partial wipe).
- audio.py fail-fast posture is clean: only ImportError around optional
  backend imports; everything else (decode/resample/save) propagates.
- Pre-existing patterns surfaced by the audit but intentionally NOT
  touched per Phase-2/3 convention (structural vs behavioral separation):
  * set_format try/except Exception: pass at L267-270 + L446-449 of
    dataset.py (introduced in aa0715e, predates our PR).
  * Bare 'except:' in _compute_function_hash (introduced in 884a9db,
    predates our PR).
  * data_args.py import order is third-party-before-stdlib (predates
    our PR; isort baseline on origin/main is 18 lines, identical here).
- Latent risk worth user-side awareness (already covered by the new
  Note: in __init__): image_dir/video_dir/audio_dir omitted from
  fingerprint -> stale cache on directory swap with relative paths.
  Affects all three sibling fields equally; documenting was the
  appropriate scope for this PR.

Verification:
- ReadLints: 0 errors on both edited files.
- 13/13 smoke-test assertions pass (audio_dir in __init__ Args,
  image_dir/video_dir entries added, Note: re fingerprint present,
  workflow numbered 1-7, audio_dir in _preprocess_batch Args,
  load_audio cross-ref present, workflow.md signature matches,
  workflow.md steps 4-7 in expected positions).
- black --check fingerprint on dataset.py: 563 lines before my edits,
  563 lines after my edits (byte-identical) - no new formatting drift.

Made-with: Cursor

* [data_utils,utils,models] refactor: homogenize multi-media batch types in _preprocess_batch + add encode_audio abstract method

Why
---
`_preprocess_batch` in `dataset.py` was producing three different per-sample
types for the `audios` column within a single batch:

  empty sample -> None
  1 audio      -> Tensor
  N audios     -> List[Tensor]

This breaks HF Arrow's homogeneous-column requirement for
`Dataset.map(batched=True)` / cache-write, and forces every downstream
consumer to handle three input shapes instead of one. Image/video had a
parallel latent bug: their empty-branch only updated `xx_args` but not
`batch[xx]`, leaving `batch[images]`/`batch[videos]` shorter than the input
batch size — only "didn't surface" because real datasets are typically
all-or-nothing per modality.

Additionally, the BaseAdapter ABC documented `prompt`/`images`/`videos` but
had no `audios` plumbing, so the audio inputs prepared by the dataset
preprocess step had no path to the model's encode step.

What
----
`dataset.py _preprocess_batch`:
  * Drop the unwrap-single line (`audios[0] if len(audios) == 1 else audios`);
    audio always stored as `List[torch.Tensor]` per sample.
  * Empty image/video/audio branches now append `[]` to BOTH `xx_args` AND
    `batch`, so every column has length == input batch size and a uniform
    `List[List[Media]]` type. (`MultiImageBatch` / `MultiVideoBatch` /
    `MultiAudioBatch` invariant.)
  * Docstring rewritten to specify the new contract per modality and a Note
    explaining why `[]`-for-empty matters for HF Arrow.

`utils/audio.py`:
  * Add `MultiAudioBatch = Union[List[AudioBatch], torch.Tensor, np.ndarray]`
    type alias mirroring `MultiImageBatch` (image.py) and `MultiVideoBatch`
    (video.py); register in `__all__` and module-level type-hierarchy
    docstring.

`models/abc.py`:
  * Add `audios: Optional[List[Union[torch.Tensor, List[torch.Tensor]]]]`
    parameter to `BaseAdapter.preprocess_func` (mirrors the existing
    `images`/`videos` Union style).
  * Iterate `(audios, self.encode_audio)` in the encoder dispatch loop so
    audio inputs flow through to the model when present.
  * Add new `@abstractmethod encode_audio()` modeled on `encode_video`,
    documenting the (single / batch-of-single / batch-of-list) input forms
    and the `condition_audios` output convention.

11 concrete adapters (z_image, wan2_v2v/t2v/i2v, sd3_5,
qwen_image/_edit_plus, flux1/1_kontext/2/2_klein):
  * Add a one-line `def encode_audio(...): pass` stub mirroring each file's
    local `encode_video` style and explanatory docstring (e.g., "Not needed
    for X models." / "X does not support audio encoding."). The ABC call
    path already short-circuits via `if input is not None`, so unused stubs
    are never invoked.

Verified
--------
  * 0 lint errors on all 14 touched files.
  * AST smoke test passes: every dataset.py edit site landed; audio.py
    exports `MultiAudioBatch`; abc.py defines `encode_audio` as abstract
    with the expected docstring; all 11 adapters define `encode_audio`.
  * black-diff hunks confirm new lines follow each file's existing style
    (single-quoted dict keys in dataset.py; aligned-comment Union members
    in audio.py; `audios : Optional[...]` in abc.py to match `prompt :`,
    `images :`, `videos :`). No NEW pre-existing-style violations
    introduced.
  * No current adapter overrides `encode_audio` (audited via Grep), so the
    new abstract is a forward-looking contract for the LTX2 adapter on the
    feat branch and any future audio-enabled model.

Refs
----
PR #129 follow-up (Round 6).

Made-with: Cursor

* [models] refactor: make all 4 BaseAdapter encoders non-abstract with no-op defaults

Why
---
Round-6 promoted `encode_audio` to a 4th `@abstractmethod` on
`BaseAdapter`, which forced all 11 existing concrete adapters to gain a
one-line `def encode_audio(...): pass` stub just to remain instantiable.
That penalises adapters that legitimately don't need the modality (e.g.,
SD3.5, Flux.1/2/Kontext, Z-Image, Qwen-Image*, Wan2-T2V/I2V/V2V) — they
have to write boilerplate forwarding nothing — and it scales linearly
with every new modality we add (audio today, who-knows-what tomorrow).

The right invariant for `BaseAdapter` is the OPPOSITE: an adapter only
implements the encoders for the modalities it actually consumes; missing
modalities should be a silent no-op handled by the base class.
`preprocess_func` already has the right plumbing — it iterates over
`(text, encode_prompt), (images, encode_image), (videos, encode_video),
(audios, encode_audio)` and merges only non-`None` results — so a default
`pass` (which returns `None`) integrates cleanly with that loop.

What
----
1. `models/abc.py`:
   - Drop `@abstractmethod` from `encode_prompt`, `encode_image`,
     `encode_video`, `encode_audio` (all 4 encoders).
   - Replace each body with a single `pass` (Python returns `None`,
     which `preprocess_func`'s "skip if not isinstance(res, dict)"
     branch already handles correctly — see lines ~1928-1948).
   - Tighten signatures to use the homogeneous batch types produced
     by `dataset.py._preprocess_batch` after Round-6:
       encode_prompt(prompt: List[str], **kwargs)
       encode_image(images: MultiImageBatch, **kwargs)
       encode_video(videos: MultiVideoBatch, **kwargs)
       encode_audio(audios: MultiAudioBatch, **kwargs)
     — i.e., always a batch (`List[List[Media]]` ragged or
     uniform-shape Tensor/ndarray), never a single sample. This
     matches what the dataset actually hands to the encoders and
     lets every override stop re-implementing batch detection.
   - Return type narrowed to
     `Optional[Dict[str, Union[List[Any], torch.Tensor]]]` to make
     the no-op default's `None` return explicit in the contract.
   - Rewrite docstrings in Google style (consistent with existing
     base methods, satisfies repo design-principle re docstrings):
     state the no-op default, the dispatch contract, the input
     `Multi*Batch` shape, and the canonical output keys
     (`condition_images` / `condition_videos` / `condition_audios`).
2. `models/{z_image,wan,stable_diffusion,qwen_image,flux}/*.py`
   (11 adapters): revert the Round-6 `def encode_audio(...): pass`
   stub. Each adapter now inherits the no-op default from
   `BaseAdapter`. Net diff vs `origin/main` for these 11 files
   is now exactly **0 lines** — they match the PR base again, so
   the PR's footprint shrinks from 12 files to 1
   (`models/abc.py` only, on the `models/` side).
3. New imports in `models/abc.py`:
   `from ..utils.image import MultiImageBatch`
   `from ..utils.video import MultiVideoBatch`
   `from ..utils.audio import MultiAudioBatch`
   (the audio one was already added in Round-6 via `utils/audio.py`).

Compat / risk
-------------
- LTX-2 adapters (`ltx2_t2av.py`, `ltx2_i2v.py`) live on
  `feat/ltx2-audio-video-support` and already implement
  `encode_audio` properly. The new ABC accepts a more general
  `MultiAudioBatch` (a superset of what they currently expect) so
  their existing `Union[torch.Tensor, List[torch.Tensor]]` typed
  override is still valid Python. **No LTX-2 changes needed.**
- Adapters that DO need text/image/video encoding still override
  the corresponding encoder (their override silently shadows the
  base no-op), so behaviour is unchanged for everyone.
- `preprocess_func`'s strict-validation branch (lines ~1937-1948,
  "must return a non-empty dict ...") only fires when the encoder
  returns something OTHER than `None` and that something is not a
  proper dict — `None` from the default still skips integration
  cleanly (see `if res is not None:` guard above the validation).

Verification
------------
- `git diff origin/main -- src/flow_factory/models/<adapter>.py`
  → 0 lines for all 11 adapters.
- AST smoke test:
  - `BaseAdapter` defines exactly 4 encoders; none decorated with
    `@abstractmethod`.
  - Each encoder body is `pass` (with optional docstring).
  - Each encoder's first non-self arg has the expected
    `Multi*Batch` annotation; return type is `Optional[...]`.
  - No adapter (of the 11) defines `encode_audio` anymore.
- `ReadLints` clean on all 12 touched files.
- `black --check --diff` line counts on all 12 files DECREASED vs
  HEAD (Δ ranges -37 to -219), so this commit introduces zero new
  formatting violations.

Made-with: Cursor

* [docs] update: align BaseAdapter contract docs with R6/R7 (4 abstract + 4 no-op encoders, MultiAudioBatch)

constraints.md #12, ff-new-model/SKILL.md, guidance/new_model.md were
written for the pre-R6/R7 contract. Bring them in sync with what
src/flow_factory/models/abc.py actually declares after this PR:

- 4 abstract methods: load_pipeline, decode_latents, forward, inference
  (was 7).
- 4 per-modality encoders (encode_prompt/image/video/audio) are
  non-abstract no-op defaults. Subclasses override only the modalities
  they consume; preprocess_func dispatches to all four and skips when
  the encoder returns None.
- Multi-modal batch convention extended to include audios: every
  modality column from _preprocess_batch is List[List[Media]] (empty
  samples contribute [], single-item samples contribute [item]).
- guidance/new_model.md gains an encode_audio block, audios in the
  inference signature comment, and an encode_audio checklist item.

Pure docs change; no Python touched.

Made-with: Cursor

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Jayce-Ping added a commit that referenced this pull request Jun 14, 2026
Resync .agents/, .cursor/, guidance/, AGENTS.md and CLAUDE.md with the
current code after plugin growth (9 trainers, 14 model adapters, 13 reward
models). Fixes registry drift, wrong config/API facts and broken
cross-references found in a full audit.

- architecture.md/AGENTS.md: add diffusion-opd trainer, clap/imagebind/
  geneval rewards, Bagel/LTX2 models; fix RationalRewards* class names
- constraints.md: evaluate() is concrete (not abstract); index #28-29;
  paradigm (#7) and training-args (#16) lists; de-numbered line refs
- philosophy.md: Accelerate (DDP/DeepSpeed ZeRO-1-2/FSDP) backend; fix #27 ref
- guidance: scheduler.* config keys, real sample()/compute_advantages
  snippets, GenEval metadata convention, audio reward param, Bagel link
- skills: model_name_or_path, default_target_modules, data.datasets,
  rewards-as-list, 9 trainers; CLAUDE.md imports AGENTS.md to avoid drift
- topics/samplers.md: correct _resolve_sampler_type + AdvantageProcessor
  group_distributed paths; parity_testing set_scheduler_timesteps
- hparams/model_args.py: model_type Literal now matches registry keys

Co-authored-by: Cursor <cursoragent@cursor.com>
Jayce-Ping added a commit to Jayce-Ping/Flow-Factory-Private that referenced this pull request Jul 2, 2026
Jayce-Ping added a commit to Jayce-Ping/Flow-Factory-Private that referenced this pull request Jul 2, 2026
… merge (X-GenGroup#129)

* [utils] feat: add audio utility module with type aliases, standardization, and loading

Add `utils/audio.py` following the exact pattern of `utils/image.py` and
`utils/video.py`. Provides a complete audio waveform toolkit:

- Type aliases: `AudioSingle` (C,T), `AudioBatch` (B,C,T)
- Validation: `is_audio()`, `is_audio_batch()`
- Loading/saving: `load_audio()`, `save_audio()` with 3-tier backend
  fallback (torchaudio → soundfile → stdlib wave)
- Conversion: `audio_to_tensor()`, `audio_to_numpy()`, `convert_audio()`
  for resampling and mono/stereo conversion
- Standardization: `standardize_audio_batch()` with output_type='pt'|'np'
- Hashing: `hash_audio()`, `hash_audio_list()` with int16 quantization

Design conventions follow diffusers/audiocraft:
- Channel-first (C, T) tensor layout
- [-1.0, 1.0] float32 value range
- Channel conversion: downmix=mean, upmix=repeat

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Made-with: Cursor

* [data] feat: support audio loading and preprocessing in dataset pipeline

- Add `audio_dir` field to DataArguments (defaults to 'audios' subfolder)
- Add audio loading block in GeneralDataset._preprocess_batch() that
  mirrors the existing video loading pattern: detect 'audio' column in
  JSONL, load via load_audio(), pass to preprocess_func(audios=...)
- Add 'audios' to PREPROCESS_KEYS for metadata exclusion
- Pass audio_dir through fn_kwargs to .map() call

The audio_dir parameter flows automatically through loader.py via
filter_kwargs — no changes needed in loader.py. Fully backward
compatible: datasets without audio columns are unaffected.

JSONL format: {"prompt": "...", "audio": "file.wav"}

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* [data_utils] perf: eliminate redundant preprocessed-dataset writes

Each rank's preprocessed Arrow shard is now written exactly once: the
orchestrator routes Dataset.map output directly to the final per-rank
location via cache_file_name= (under {merged_cache_path}.tmp/_parts/),
and the consolidator writes only state.json + dataset_info.json before
atomically renaming .tmp -> merged_cache_path. No row data is re-copied
during the merge, no duplicate cache lands under ~/.cache/huggingface,
and the build-dir sentinel _build_meta.json enables crash recovery for
unchanged num_shards while wiping cleanly on num_shards changes.

Single-process and distributed paths are unified through the same flow
(N=1 case for single-process); enable_preprocess=False bypasses the
consolidate pipeline entirely to preserve pre-refactor behavior. I/O
budget per cache build drops from ~4*N*S to ~N*S bytes touched.

Made-with: Cursor

* [data_utils] fix: address PR X-GenGroup#129 review

- loader.py: gate build-dir prep on a shared is_orchestrator predicate
  (rank-0 globally for "global" mode, local main for "local" mode) and
  reuse it for the consolidator. Avoids concurrent shutil.rmtree /
  _build_meta.json writes when multiple nodes share the FS.
- loader.py: align shard log with dataset.py convention {idx}/{N-1}.

Addresses Copilot review comments X-GenGroup#2 and X-GenGroup#3 on PR X-GenGroup#129.

Made-with: Cursor

* [utils] feat: add ndim validation to standardize_audio_batch

Cherry-picked from a902429 (audio.py portion only). Adds explicit
ValueError when input ndim is not in {1, 2, 3}, replacing silent
fall-through. Aligns with the no-defensive-except.mdc fail-fast rule.

The data_utils path only calls load_audio, never standardize_audio_batch,
so this is a pure independent safety improvement to the audio utility module.

Made-with: Cursor

* [utils] fix: address PR X-GenGroup#129 Comments 4-7 — fail-fast in audio.py + meta_matches recovery

- audio.py save_audio: narrow torchaudio block to ImportError only,
  using try/import + else/save so genuine save failures (permission,
  encoding, disk full) propagate instead of silently falling back to
  soundfile or wave (Copilot Comment X-GenGroup#4).
- audio.py _load_audio_backend: same fix for torchaudio.load and
  soundfile.read blocks; the always-available wave fallback at the end
  stays as-is (Comment X-GenGroup#5).
- audio.py _resample: same fix for torchaudio.functional.resample;
  the scipy block already uses except ImportError correctly (Comment X-GenGroup#6).
- loader.py _meta_matches: catch (json.JSONDecodeError, OSError) so a
  corrupted sentinel from a previously crashed mid-write is treated as
  "stale" and triggers the wipe-and-recreate path, matching the existing
  "missing -> return False" semantics. Behavior is documented inline
  per no-defensive-except.mdc (Comment X-GenGroup#7).

All five sites now obey .cursor/rules/no-defensive-except.mdc: narrow,
documented try/except limited to optional-import detection or recovery
from a known on-disk failure mode.

Made-with: Cursor

* [utils] refactor: drop fallback chains in save_audio and _resample (fail-fast)

torchaudio>=2.4.0 is a hard core dependency in pyproject.toml, so the
optional-backend fallback chains in save_audio (soundfile + stdlib wave)
and _resample (scipy + linear interpolation) were never genuinely optional
- they only ever ran on broken installs, and silently substituted a different
backend's behavior for what the caller asked for.

Per .cursor/rules/no-defensive-except.mdc, the fail-fast posture is to
surface the failure directly:

- save_audio: now a single torchaudio.save call. If torchaudio is missing
  (broken install) the ImportError surfaces; if torchaudio.save raises
  (permission, encoding, disk full, unsupported extension) the exception
  surfaces with the original cause and full traceback.
- _resample: now a single torchaudio.functional.resample call. dtype,
  device, and rate-mismatch errors surface directly instead of being
  silently masked by scipy or linear interp.

Round-2 narrowed the broad except (ImportError, Exception) to ImportError-only
(commit f8fb452); Round-3 goes further and removes the fallbacks themselves,
since the underlying premise - "torchaudio might be missing" - is false for
this codebase.

_load_audio_backend's soundfile fallback is intentionally retained: soundfile
is NOT a hard dep, and is needed for users who load formats torchaudio
doesn't support out-of-the-box. That stays as a documented optional-backend
pattern.

Made-with: Cursor

* [utils,data_utils] docs: address PR X-GenGroup#129 Round-3 review (docstrings + unused import)

Each fix below maps to a specific Copilot review comment from Round-3
(2026-04-17). All five sites are documentation / comment changes that
bring the docstrings in line with current behavior, plus one trivial
unused-import removal flagged in the same review.

- audio.py load_audio (Comment id 3098052328):
    Removed the false `Raises: ImportError` claim — load_audio never
    raises ImportError because the implementation falls back to the stdlib
    `wave` module if both torchaudio and soundfile are unavailable. Added
    a Note: section listing all three backends in priority order
    (torchaudio primary; soundfile wav/flac/ogg; stdlib wave WAV-only
    16/32-bit PCM) so callers know which formats survive each fallback.

- audio.py _load_audio_backend (related to Comment 3098052328, found via
  audit):
    Replaced the cryptic "Priority: torchaudio > soundfile + torch" line
    with a numbered backend chain that documents:
      1. torchaudio.load — primary
      2. soundfile.read  — (T, C) transposed to (C, T)
      3. stdlib wave     — WAV-only last-resort
    Added Raises: section for the WAV-fallback ValueError.

- audio.py hash_audio (Comment id 3098052292):
    The docstring claimed "Longer audio is uniformly subsampled", but the
    implementation uses `step = n // max_samples`, which collapses to
    step == 1 (effective truncation to the first max_samples samples)
    when max_samples < n < 2 * max_samples. Added a Note: section that
    documents the truncation-vs-uniform-stride boundary at n = 2*max_samples
    and clarifies that determinism (cache-key intent), not collision
    resistance, is the design goal — keeping the existing implementation
    intact and accurately documented.

- loader.py import torch (Comment id 3098052303):
    Removed unused `import torch` (line 21). Only `torch.utils.data` is
    used and that import on line 22 is satisfied independently.

- loader.py is_orchestrator block (Comment id 3098052316):
    Rewrote the comment to be precise per case. The old comment claimed
    "is_orchestrator prevents multi-node races ... when the FS is shared",
    which is only true in "global" mode. Clarified:
      - non-distributed: the lone process
      - "global" mode:   rank-0 globally — required when cache_dir is on a
                         shared FS (eliminates cross-node race on
                         shutil.rmtree / sentinel writes)
      - "local"  mode:   per-node local main — ASSUMES cache_dir is on
                         node-local storage; pointing "local" mode at a
                         shared FS WILL race across node-local mains and
                         is unsupported
    The user's contract (already documented on _create_or_load_dataset's
    `preprocess_parallelism` arg) is now consistent with the comment at
    the actual orchestrator selection site.

Comment id 3098052262 (save_audio "converted to int16") was already
addressed by the Round-3 commit (49d5a54) when save_audio was reduced to
a single torchaudio.save call — no separate fix needed here.

Per .agents/knowledge/constraints.md X-GenGroup#27 (Google-style docstrings) and
philosophy.md X-GenGroup#3 (fail-fast, no speculative code): the docstrings now
faithfully reflect what the code does, including the corner cases that
Round-2 narrowing exposed.

Verification: ReadLints              — clean
  black --check / diff   — 358 lines, byte-identical to parent (49d5a54)
  isort  --check / diff  — 18 lines, byte-identical to parent (49d5a54)
                           (all 358 + 18 are pre-existing on origin/main,
                           outside the edited regions; not touched per
                           "structural vs behavioral separation" — same
                           rationale as Phase 2/3.)
  smoke test             — load_audio docstring has no ImportError + names
                           all 3 backends; _load_audio_backend uses Backend
                           chain phrasing + names wave; hash_audio mentions
                           truncation + 2*max_samples boundary; loader.py
                           is_orchestrator comment includes 'unsupported'
                           and 'node-local'; no `^import torch$` survives
                           in loader.py.
Made-with: Cursor

* [data_utils,guidance] docs: complete audio_dir docstring coverage (audit)

Round-5 follow-up to the design-principle audit on PR X-GenGroup#129 — the audit
found three audio_dir-related docstring drifts introduced by our own
commits (27b5967, fa2e4fd) that Copilot did not flag:

1. dataset.py GeneralDataset.__init__:
   - We added audio_dir/target_arrow_path to the signature in fa2e4fd
     and only documented the latter. This commit adds Args entries for
     image_dir + video_dir + audio_dir + extra_hash_strs (the directory
     trio kept identical wording for symmetry; extra_hash_strs was
     pre-PR undocumented but is the natural workaround for the cache
     issue documented in the new Note).
   - Added a Note: section calling out that image_dir/video_dir/audio_dir
     are NOT in compute_cache_path's fingerprint (pre-existing and not
     fixed in this PR per "structural vs behavioral separation" — see
     audit report on the PR), so a directory swap with relative JSONL
     paths needs force_reprocess=True or extra_hash_strs to bust the
     cache. Users were silently inheriting this footgun without warning.

2. dataset.py GeneralDataset._preprocess_batch:
   - Added audio_dir to the Args section.
   - Renumbered the Workflow from 5+Returns to 7 explicit steps; the
     original 4-step body (text, images, videos, call) became 1/2/3/5,
     and we inserted "4. Load and prepare audio inputs" + an explicit
     metadata-pack step (7) that the prose previously left implicit.
   - Added a cross-reference to utils.audio.load_audio so readers
     don't have to grep for the loader.

3. guidance/workflow.md Stage 1 _preprocess_batch snippet:
   - Signature now matches the real one (audio_dir is the third dir
     parameter).
   - Comment list extended to 7 steps, mirroring the docstring fix above.

Audit summary that prompted this commit (full report posted as PR comment):
- 0 new Copilot comments since Round-4 push (a790e8a).
- 12 historical Copilot comments verified addressed across Rounds 1-4.
- Distributed barriers correct: prep -> map -> consolidate, three
  wait_for_everyone() boundaries, no double-fire on enable_preprocess=False.
- consolidate_parts is idempotent on retry (state.json/dataset_info.json
  rewrites are last-write-wins; os.replace recovers from partial wipe).
- audio.py fail-fast posture is clean: only ImportError around optional
  backend imports; everything else (decode/resample/save) propagates.
- Pre-existing patterns surfaced by the audit but intentionally NOT
  touched per Phase-2/3 convention (structural vs behavioral separation):
  * set_format try/except Exception: pass at L267-270 + L446-449 of
    dataset.py (introduced in aa0715e, predates our PR).
  * Bare 'except:' in _compute_function_hash (introduced in 884a9db,
    predates our PR).
  * data_args.py import order is third-party-before-stdlib (predates
    our PR; isort baseline on origin/main is 18 lines, identical here).
- Latent risk worth user-side awareness (already covered by the new
  Note: in __init__): image_dir/video_dir/audio_dir omitted from
  fingerprint -> stale cache on directory swap with relative paths.
  Affects all three sibling fields equally; documenting was the
  appropriate scope for this PR.

Verification:
- ReadLints: 0 errors on both edited files.
- 13/13 smoke-test assertions pass (audio_dir in __init__ Args,
  image_dir/video_dir entries added, Note: re fingerprint present,
  workflow numbered 1-7, audio_dir in _preprocess_batch Args,
  load_audio cross-ref present, workflow.md signature matches,
  workflow.md steps 4-7 in expected positions).
- black --check fingerprint on dataset.py: 563 lines before my edits,
  563 lines after my edits (byte-identical) - no new formatting drift.

Made-with: Cursor

* [data_utils,utils,models] refactor: homogenize multi-media batch types in _preprocess_batch + add encode_audio abstract method

Why
---
`_preprocess_batch` in `dataset.py` was producing three different per-sample
types for the `audios` column within a single batch:

  empty sample -> None
  1 audio      -> Tensor
  N audios     -> List[Tensor]

This breaks HF Arrow's homogeneous-column requirement for
`Dataset.map(batched=True)` / cache-write, and forces every downstream
consumer to handle three input shapes instead of one. Image/video had a
parallel latent bug: their empty-branch only updated `xx_args` but not
`batch[xx]`, leaving `batch[images]`/`batch[videos]` shorter than the input
batch size — only "didn't surface" because real datasets are typically
all-or-nothing per modality.

Additionally, the BaseAdapter ABC documented `prompt`/`images`/`videos` but
had no `audios` plumbing, so the audio inputs prepared by the dataset
preprocess step had no path to the model's encode step.

What
----
`dataset.py _preprocess_batch`:
  * Drop the unwrap-single line (`audios[0] if len(audios) == 1 else audios`);
    audio always stored as `List[torch.Tensor]` per sample.
  * Empty image/video/audio branches now append `[]` to BOTH `xx_args` AND
    `batch`, so every column has length == input batch size and a uniform
    `List[List[Media]]` type. (`MultiImageBatch` / `MultiVideoBatch` /
    `MultiAudioBatch` invariant.)
  * Docstring rewritten to specify the new contract per modality and a Note
    explaining why `[]`-for-empty matters for HF Arrow.

`utils/audio.py`:
  * Add `MultiAudioBatch = Union[List[AudioBatch], torch.Tensor, np.ndarray]`
    type alias mirroring `MultiImageBatch` (image.py) and `MultiVideoBatch`
    (video.py); register in `__all__` and module-level type-hierarchy
    docstring.

`models/abc.py`:
  * Add `audios: Optional[List[Union[torch.Tensor, List[torch.Tensor]]]]`
    parameter to `BaseAdapter.preprocess_func` (mirrors the existing
    `images`/`videos` Union style).
  * Iterate `(audios, self.encode_audio)` in the encoder dispatch loop so
    audio inputs flow through to the model when present.
  * Add new `@abstractmethod encode_audio()` modeled on `encode_video`,
    documenting the (single / batch-of-single / batch-of-list) input forms
    and the `condition_audios` output convention.

11 concrete adapters (z_image, wan2_v2v/t2v/i2v, sd3_5,
qwen_image/_edit_plus, flux1/1_kontext/2/2_klein):
  * Add a one-line `def encode_audio(...): pass` stub mirroring each file's
    local `encode_video` style and explanatory docstring (e.g., "Not needed
    for X models." / "X does not support audio encoding."). The ABC call
    path already short-circuits via `if input is not None`, so unused stubs
    are never invoked.

Verified
--------
  * 0 lint errors on all 14 touched files.
  * AST smoke test passes: every dataset.py edit site landed; audio.py
    exports `MultiAudioBatch`; abc.py defines `encode_audio` as abstract
    with the expected docstring; all 11 adapters define `encode_audio`.
  * black-diff hunks confirm new lines follow each file's existing style
    (single-quoted dict keys in dataset.py; aligned-comment Union members
    in audio.py; `audios : Optional[...]` in abc.py to match `prompt :`,
    `images :`, `videos :`). No NEW pre-existing-style violations
    introduced.
  * No current adapter overrides `encode_audio` (audited via Grep), so the
    new abstract is a forward-looking contract for the LTX2 adapter on the
    feat branch and any future audio-enabled model.

Refs
----
PR X-GenGroup#129 follow-up (Round 6).

Made-with: Cursor

* [models] refactor: make all 4 BaseAdapter encoders non-abstract with no-op defaults

Why
---
Round-6 promoted `encode_audio` to a 4th `@abstractmethod` on
`BaseAdapter`, which forced all 11 existing concrete adapters to gain a
one-line `def encode_audio(...): pass` stub just to remain instantiable.
That penalises adapters that legitimately don't need the modality (e.g.,
SD3.5, Flux.1/2/Kontext, Z-Image, Qwen-Image*, Wan2-T2V/I2V/V2V) — they
have to write boilerplate forwarding nothing — and it scales linearly
with every new modality we add (audio today, who-knows-what tomorrow).

The right invariant for `BaseAdapter` is the OPPOSITE: an adapter only
implements the encoders for the modalities it actually consumes; missing
modalities should be a silent no-op handled by the base class.
`preprocess_func` already has the right plumbing — it iterates over
`(text, encode_prompt), (images, encode_image), (videos, encode_video),
(audios, encode_audio)` and merges only non-`None` results — so a default
`pass` (which returns `None`) integrates cleanly with that loop.

What
----
1. `models/abc.py`:
   - Drop `@abstractmethod` from `encode_prompt`, `encode_image`,
     `encode_video`, `encode_audio` (all 4 encoders).
   - Replace each body with a single `pass` (Python returns `None`,
     which `preprocess_func`'s "skip if not isinstance(res, dict)"
     branch already handles correctly — see lines ~1928-1948).
   - Tighten signatures to use the homogeneous batch types produced
     by `dataset.py._preprocess_batch` after Round-6:
       encode_prompt(prompt: List[str], **kwargs)
       encode_image(images: MultiImageBatch, **kwargs)
       encode_video(videos: MultiVideoBatch, **kwargs)
       encode_audio(audios: MultiAudioBatch, **kwargs)
     — i.e., always a batch (`List[List[Media]]` ragged or
     uniform-shape Tensor/ndarray), never a single sample. This
     matches what the dataset actually hands to the encoders and
     lets every override stop re-implementing batch detection.
   - Return type narrowed to
     `Optional[Dict[str, Union[List[Any], torch.Tensor]]]` to make
     the no-op default's `None` return explicit in the contract.
   - Rewrite docstrings in Google style (consistent with existing
     base methods, satisfies repo design-principle re docstrings):
     state the no-op default, the dispatch contract, the input
     `Multi*Batch` shape, and the canonical output keys
     (`condition_images` / `condition_videos` / `condition_audios`).
2. `models/{z_image,wan,stable_diffusion,qwen_image,flux}/*.py`
   (11 adapters): revert the Round-6 `def encode_audio(...): pass`
   stub. Each adapter now inherits the no-op default from
   `BaseAdapter`. Net diff vs `origin/main` for these 11 files
   is now exactly **0 lines** — they match the PR base again, so
   the PR's footprint shrinks from 12 files to 1
   (`models/abc.py` only, on the `models/` side).
3. New imports in `models/abc.py`:
   `from ..utils.image import MultiImageBatch`
   `from ..utils.video import MultiVideoBatch`
   `from ..utils.audio import MultiAudioBatch`
   (the audio one was already added in Round-6 via `utils/audio.py`).

Compat / risk
-------------
- LTX-2 adapters (`ltx2_t2av.py`, `ltx2_i2v.py`) live on
  `feat/ltx2-audio-video-support` and already implement
  `encode_audio` properly. The new ABC accepts a more general
  `MultiAudioBatch` (a superset of what they currently expect) so
  their existing `Union[torch.Tensor, List[torch.Tensor]]` typed
  override is still valid Python. **No LTX-2 changes needed.**
- Adapters that DO need text/image/video encoding still override
  the corresponding encoder (their override silently shadows the
  base no-op), so behaviour is unchanged for everyone.
- `preprocess_func`'s strict-validation branch (lines ~1937-1948,
  "must return a non-empty dict ...") only fires when the encoder
  returns something OTHER than `None` and that something is not a
  proper dict — `None` from the default still skips integration
  cleanly (see `if res is not None:` guard above the validation).

Verification
------------
- `git diff origin/main -- src/flow_factory/models/<adapter>.py`
  → 0 lines for all 11 adapters.
- AST smoke test:
  - `BaseAdapter` defines exactly 4 encoders; none decorated with
    `@abstractmethod`.
  - Each encoder body is `pass` (with optional docstring).
  - Each encoder's first non-self arg has the expected
    `Multi*Batch` annotation; return type is `Optional[...]`.
  - No adapter (of the 11) defines `encode_audio` anymore.
- `ReadLints` clean on all 12 touched files.
- `black --check --diff` line counts on all 12 files DECREASED vs
  HEAD (Δ ranges -37 to -219), so this commit introduces zero new
  formatting violations.

Made-with: Cursor

* [docs] update: align BaseAdapter contract docs with R6/R7 (4 abstract + 4 no-op encoders, MultiAudioBatch)

constraints.md X-GenGroup#12, ff-new-model/SKILL.md, guidance/new_model.md were
written for the pre-R6/R7 contract. Bring them in sync with what
src/flow_factory/models/abc.py actually declares after this PR:

- 4 abstract methods: load_pipeline, decode_latents, forward, inference
  (was 7).
- 4 per-modality encoders (encode_prompt/image/video/audio) are
  non-abstract no-op defaults. Subclasses override only the modalities
  they consume; preprocess_func dispatches to all four and skips when
  the encoder returns None.
- Multi-modal batch convention extended to include audios: every
  modality column from _preprocess_batch is List[List[Media]] (empty
  samples contribute [], single-item samples contribute [item]).
- guidance/new_model.md gains an encode_audio block, audios in the
  inference signature comment, and an encode_audio checklist item.

Pure docs change; no Python touched.

Made-with: Cursor

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant