[Optimization] Add selective trajectory storage#28
Merged
Conversation
Jayce-Ping
added a commit
that referenced
this pull request
Apr 25, 2026
…, sync agent knowledge - Restructure examples/ to algorithm/ft/model/variant.yaml with examples/README.md - Add LTX-2/2.3 to README (News, model table, install note) - Add .scratch/ constraint for agent temp files (#28), examples convention (#29) - Sync agent knowledge: GroupDistributedSampler in samplers.md, LTX2 + RationalRewards in architecture.md - Clean up .docs/ltx2-research/ dev artifacts - Update LTX2 configs: guidance_scale=1.0, comment out attn_backend Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Jayce-Ping
added a commit
that referenced
this pull request
Apr 25, 2026
… support (#118) * docs: comprehensive analysis of samples, rewards, and video adapter architecture Add detailed documentation covering: - Complete BaseSample dataclass fields and sample type hierarchy (T2ISample, T2VSample, etc.) - RewardModelOutput and reward model abstract classes (PointwiseRewardModel, GroupwiseRewardModel) - WAN2 text-to-video adapter implementation and video generation pipeline - Sample canonicalization and unique_id computation via SHA256 hashing - Video format specifications (Tensor(T, C, H, W) with frame/dimension constraints) - Reward model input/output modality handling and tensor input configuration - Data flow examples from generation through sampling to reward computation - Quick reference cards and integration patterns for implementing new adapters Files created: - START_HERE.md: Quick navigation guide - ANALYSIS_REPORT.md: Comprehensive technical analysis - QUICK_REFERENCE.md: Copy-paste templates and patterns - MODALITY_FLOW.md: Complete data flow walkthrough - CODEBASE_EXPLORATION.md: File structure and discovery process - DOCUMENTATION_INDEX.md: Structured index of all findings Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: move research docs to .docs/ltx2-research/ Move auto-generated analysis documents out of project root into a dedicated temporary folder for LTX2 integration research. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * [utils] feat: add audio utility module with type aliases, standardization, and loading Add `utils/audio.py` following the exact pattern of `utils/image.py` and `utils/video.py`. Provides a complete audio waveform toolkit: - Type aliases: `AudioSingle` (C,T), `AudioBatch` (B,C,T) - Validation: `is_audio()`, `is_audio_batch()` - Loading/saving: `load_audio()`, `save_audio()` with 3-tier backend fallback (torchaudio → soundfile → stdlib wave) - Conversion: `audio_to_tensor()`, `audio_to_numpy()`, `convert_audio()` for resampling and mono/stereo conversion - Standardization: `standardize_audio_batch()` with output_type='pt'|'np' - Hashing: `hash_audio()`, `hash_audio_list()` with int16 quantization Design conventions follow diffusers/audiocraft: - Channel-first (C, T) tensor layout - [-1.0, 1.0] float32 value range - Channel conversion: downmix=mean, upmix=repeat Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * [samples] feat: add audio field to BaseSample and T2AVSample class - Add `audio: Optional[torch.Tensor]` field to BaseSample with automatic promotion from 1D (T,) to 2D (C, T) via audio_to_tensor() - Add T2AVSample(BaseSample) for text-to-audio-video generation tasks - Update __init__.py exports The audio field follows the same pattern as image/video: stored without batch dimension, standardized in __post_init__, supports stack/to_dict. Fully backward compatible — defaults to None. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * [data] feat: support audio loading and preprocessing in dataset pipeline - Add `audio_dir` field to DataArguments (defaults to 'audios' subfolder) - Add audio loading block in GeneralDataset._preprocess_batch() that mirrors the existing video loading pattern: detect 'audio' column in JSONL, load via load_audio(), pass to preprocess_func(audios=...) - Add 'audios' to PREPROCESS_KEYS for metadata exclusion - Pass audio_dir through fn_kwargs to .map() call The audio_dir parameter flows automatically through loader.py via filter_kwargs — no changes needed in loader.py. Fully backward compatible: datasets without audio columns are unaffected. JSONL format: {"prompt": "...", "audio": "file.wav"} Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * [models] feat: add encode_audio interface and audio_vae support to BaseAdapter - Add `encode_audio()` default method that returns None — non-abstract, so existing adapters need no changes - Add `audio_vae` property with getter and setter (mirrors vae pattern) - Update `preprocess_func()` to accept `audios` parameter and route it through `encode_audio()` in the same loop as prompt/image/video - Update `_freeze_vae()` to also freeze audio_vae when present, keeping `_freeze_components()` clean Fully backward compatible: encode_audio returns None by default, audio_vae returns None when pipeline has no audio_vae component. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * [rewards] feat: add audio parameter to reward model interfaces - Add `audio: Optional[List[torch.Tensor]]` parameter to both PointwiseRewardModel.__call__ and GroupwiseRewardModel.__call__, positioned after `video` and before `condition_images` - Add 'audio' to RewardProcessor.MEDIA_FIELDS - Add audio branch in _convert_media_format(): tensor passthrough when use_tensor_inputs=True, numpy conversion otherwise Audio is always tensor-based (no PIL equivalent). Existing reward models (PickScore, CLIP, etc.) are unaffected — their concrete __call__ signatures don't include `audio`, so filter_kwargs strips it automatically with zero overhead. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: update Step 6 sub-plan to v4 with verified diffusers 0.38.0.dev0 API Key corrections from runtime verification: - Connectors use additive mask, not padding_side - Transformer forward has no sigma/audio_timestep/STG/modality params - CFG is velocity-space with [uncond, cond] chunk order - Compression ratios are instance attributes, not class properties - Audio params from pipeline attributes, not transformer config Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: clean up obsolete research docs, consolidate plan Remove 8 early exploration documents that have been fully superseded by actual code implementation and the two remaining plan files: - COMMIT_PLAN.md: overall progress tracker (updated with final status) - STEP6_SUBPLAN.md: detailed LTX2 adapter plan (v4, API-verified) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * [models/ltx2] feat: scaffold LTX2 adapter with sample dataclass and pipeline loading Add LTX2 text-to-audio-video adapter scaffold: - LTX2Sample(T2AVSample): dataclass with audio trajectory fields (audio_all_latents, audio_latent_index_map), connector embedding fields for both video/audio streams, and negative prompt fields for CFG during training - LTX2_T2AV_Adapter(BaseAdapter): skeleton with: - load_pipeline(): LTX2Pipeline.from_pretrained with low_cpu_mem_usage=False - _create_audio_scheduler(): ODE-only FlowMatchEulerDiscreteSDEScheduler (separate instance to avoid step_index collision with video) - default_target_modules: 28 Linear layers per block, verified against LTX2VideoTransformerBlock.named_modules() - preprocessing_modules: ['text_encoders', 'connectors'] - inference_modules: ['transformer', 'vae', 'audio_vae', 'connectors', 'vocoder'] - Stub methods for encode/decode/forward/inference (NotImplementedError) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * [models/ltx2] feat: implement encode_prompt and decode_latents for LTX2 encode_prompt(): - Delegates to pipeline.encode_prompt() for Gemma3 all-layer encoding + _pack_text_embeds normalization - Passes through connectors with additive attention mask (1 - binary_mask) * -1e6, additive_mask=True - Splits [negative, positive] connector outputs for CFG - Returns prompt_ids + video/audio connector embeddings + masks decode_latents(): - Video: unpack → denormalize → optional timestep conditioning with decode noise injection → VAE decode → postprocess - Audio: denormalize → unpack → audio_vae decode → vocoder (denormalize BEFORE unpack — order differs from video!) - All operations match pipeline source L1172-1218 exactly Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * [models/ltx2] feat: implement forward() with CFG and dual scheduler steps Single denoising step matching pipeline L1097-1154: 1. Prepare CFG inputs: duplicate latents + concat [neg, pos] embeddings 2. Joint transformer forward with cache_context("cond_uncond") 3. CFG in velocity-space: uncond + gs * (cond - uncond), with optional guidance_rescale 4. Video: SDE scheduler step (stochastic, with log_prob for RL) 5. Audio: ODE scheduler step (deterministic, no log_prob) 6. Attach audio_next_latents on video output for trajectory tracking RoPE coords are computed on demand if not cached from inference loop. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * [models/ltx2] feat: implement inference loop and register ltx2_t2av adapter inference(): - Full denoising loop following pipeline L908-1226 - Encode prompts (with fallback to pre-encoded inputs) - Compute latent dimensions: video (32x spatial, 8x temporal), audio (4x mel, 4x temporal, sr=16kHz, hop=160) - Prepare video + audio latents via pipeline.prepare_latents/audio_latents - Timestep shift with LTX2-specific mu (base_seq=1024, shift=0.95-2.05) - Positional coords via transformer.rope / audio_rope - Dual trajectory collection: video (for RL) + audio (for reconstruction) - Decode both modalities and construct LTX2Sample per batch element Registry: - Add 'ltx2_t2av' entry mapping to LTX2_T2AV_Adapter Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * [models/ltx2] refactor: improve type safety and code cleanup for LTX2 adapter - Add self.scheduler type declaration (consistent with all other adapters) - Replace SDESchedulerOutput with FlowMatchEulerDiscreteSDESchedulerOutput - Change forward() return type to Tuple instead of dynamic attribute - Fix Optional parameter annotations for forward() signature - Remove unused imports (Any, DISTILLED_SIGMA_VALUES) and variables - Replace unicode arrows with ASCII in comments for encoding safety Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: update COMMIT_PLAN with completed steps summary and next steps (7-9) Steps 1-6 + type cleanup are all committed. Remaining work: - Step 7: Design multi-modal forward() return pattern for optimize() - Step 8: Example YAML configs (GRPO, NFT, AWM) - Step 9: Audio-video dataset integration for testing Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * [models/ltx2] refactor: promote num_frames and frame_rate to explicit LTX2Sample fields Move num_frames and frame_rate from extra_kwargs to explicit dataclass fields on LTX2Sample, consistent with height/width on BaseSample. Add num_frames to _shared_fields (shared across batch, not stacked). Keep duration_s in extra_kwargs as a derived value. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * [models/ltx2] refactor: unified latent interface for forward() Redesign forward() to accept concatenated video+audio latents as a single tensor (B, video_seq + audio_seq, C). Internally splits by video_seq_len, runs the joint transformer, steps video SDE and audio ODE schedulers separately, then concatenates next_latents back into a unified output. This makes the trainer interface identical to single-modality adapters: - forward() returns a single FlowMatchEulerDiscreteSDESchedulerOutput - Trainers access output.next_latents and output.log_prob directly - No trainer changes needed for multi-modal generation Key changes: - forward(): accept unified latents, split/cat internally, return single output - inference(): cat video+audio before loop, use single latent_collector - LTX2Sample: replace audio_all_latents with video_seq_len split point - Remove Tuple return type, keep dual scheduler (video SDE + audio ODE) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: update COMMIT_PLAN — Step 7 complete, Steps 8-9 remaining Step 7 resolved via unified latent interface: - 7a: Type safety cleanup (FlowMatchEulerDiscreteSDESchedulerOutput) - 7b: Promote num_frames/frame_rate to explicit LTX2Sample fields - 7c: Unified forward() — cat(video,audio) input, single output, dual scheduler internal Remaining: Step 8 (example YAML configs), Step 9 (test dataset) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: update COMMIT_PLAN to reflect current state, remove obsolete STEP6_SUBPLAN - Step 7 fully completed (7a-7c): type cleanup, explicit fields, unified latents - Steps 8-9 remain: example configs + test dataset - Add deferred features table with diffusers source availability - Delete STEP6_SUBPLAN.md (diverged from implementation, no longer useful) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: correct deferred features — STG, prompt enhancement, x0-guidance all exist in diffusers All three features are available in installed diffusers 0.38.0.dev0: - STG: extra transformer forward with spatio_temporal_guidance_blocks - Modality Isolation: extra forward with isolate_modalities=True - Prompt Enhancement: Gemma3 text_encoder.generate() with system prompt - x0-space guidance: convert_velocity_to_x0/convert_x0_to_velocity Note: current adapter uses velocity-space CFG; x0-space is prerequisite for STG and Modality Isolation Guidance. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * [models/ltx2] feat: align forward() with official x0-space multi-guidance pipeline Rewrite the guidance section to match official diffusers pipeline_ltx2.py: 1. x0-space guidance: convert velocity predictions to x0-space, compute all guidance deltas (CFG + STG + Modality Isolation) in x0-space, then convert back to velocity. This replaces the previous velocity-space CFG. 2. STG (Spatio-Temporal Guidance): optional extra transformer forward with spatio_temporal_guidance_blocks to perturb specific blocks. Separate stg_scale / audio_stg_scale for video and audio. 3. Modality Isolation Guidance: optional extra transformer forward with isolate_modalities=True (disables A2V/V2A cross-attention). Separate modality_scale / audio_modality_scale. 4. Prompt Enhancement: inference() supports system_prompt parameter to rewrite prompts via Gemma3 text_encoder.generate(). 5. LTX-2.3 compatibility: pass sigma=timestep and use_cross_timestep to all transformer forward calls. 6. Independent audio guidance: separate audio_guidance_scale, audio_guidance_rescale, audio_stg_scale, audio_modality_scale (all default to their video counterparts). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: update COMMIT_PLAN with refined next steps (8a/8b/9) Restructured remaining steps: - 8a: Inference alignment refinements (validation, num_frames rounding, coord pre-duplication, distilled sigmas, embed concatenation) - 8b: Example YAML configs (GRPO/NFT/AWM x lora/full) - 9: VGGSound-50k test dataset integration Updated design decisions to reflect x0-space guidance, sigma-based conversion, and adapter design philosophy (inference=__call__, forward=step). Removed obsolete deferred features (STG, prompt enhancement, x0-guidance now implemented). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * [models/ltx2] refactor: align inference() with official pipeline - Add _check_inputs(): validate height/width divisibility, STG block spec, auto-round num_frames to VAE-temporal-compatible value - Pre-duplicate RoPE coords for CFG before denoising loop (official L1201) - Pre-concatenate [neg, pos] connector embeds before loop to avoid re-catting every step; forward() receives _cfg_prepared flag - Extract positive-only embeds from pre-catted tensors for STG/modality passes when _cfg_prepared=True Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * [examples] feat: add LTX2 T2AV GRPO+LoRA example config Verified with ff-train: config parses correctly, model architecture resolves to ltx2_t2av, all parameters (resolution, num_frames, frame_rate, audio_dir, guidance_scale, LoRA settings) are properly propagated. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Update config and dataset * [logger,samples] feat: mux audio into video when logging T2AV samples T2AV samples (e.g. LTX2) carry both video and audio, but the logging pipeline silently dropped the audio track. This commit adds audio-video muxing so wandb/other loggers receive a single MP4 with sound. Changes: - BaseSample: add `audio_sample_rate` field alongside `audio` - LTX2 adapter: populate `audio_sample_rate` from pipeline vocoder config - LogVideo: add optional `audio`/`audio_sample_rate` fields; when present, `get_value('mp4')` muxes H.264 video + AAC audio via PyAV (mirrors diffusers' encode_video); falls back to silent MP4 if PyAV unavailable - LogFormatter: add T2AVSample dispatch → `_process_t2av_samples()` that creates LogVideo with audio attached and correct fps from frame_rate Made-with: Cursor * Fix _resolve_component_names * [models/ltx2] fix: align connectors call with current diffusers API Pass binary attention mask directly to LTX2TextConnectors.forward() instead of pre-computing additive mask. The current diffusers version handles binary-to-additive conversion internally and no longer accepts the `additive_mask` keyword argument. Made-with: Cursor * update * Fix * [models] refactor: unify CFG control via guidance_scale across all adapters Remove explicit `do_classifier_free_guidance` parameter from encode_prompt, inference, and forward signatures. Instead, derive the CFG flag internally from `guidance_scale` (>1.0 standard, >0.0 for Z-Image), ensuring data_preprocessing and inference stages always use the same CFG decision. Affected adapters: SD3.5, Z-Image, Wan T2V/I2V/V2V, LTX2 T2AV, Flux2 Klein. Made-with: Cursor * update * Fix cfg * Update config * [models/ltx2] refactor: inline Gemma3 encoding into adapter encode_prompt Eliminate delegation to pipeline.encode_prompt() and redundant tokenizer call by inlining _get_gemma_prompt_embeds logic into a new _encode_text() helper. This produces prompt_ids and negative_prompt_ids from the same tokenization pass used for embeddings. Made-with: Cursor * [samples] refactor: include negative_prompt in unique_id and extract _hash_id_fields Add negative_prompt/negative_prompt_ids to _id_fields and hash computation so samples with different negative prompts are correctly distinguished. Extract _hash_id_fields(hasher) to eliminate duplicated prompt hashing in ImageConditionSample and VideoConditionSample. Also parameterize digest length via num_bytes (default 16 = 128-bit). Made-with: Cursor * Update config * Fix bytes * Fix audio_sample_rate * [models/ltx2,logger,utils] fix: address PR review — input validation, mux guard, ndim check - Extend _check_inputs with prompt/embedding presence and CFG+negative consistency validation (Copilot comments #3, #5) - Narrow audio mux guard to format == 'mp4' (comment #6) - Add ndim validation in standardize_audio_batch (comment #7) Made-with: Cursor * [models/ltx2,utils] feat: integrate prompt enhancement with RNG isolation Add isolated_rng context manager to utils/base.py for safe global RNG seeding. Implement _enhance_prompt_batch in LTX2 adapter using official pipeline.enhance_prompt with deterministic seed and RNG state isolation, preventing seed leakage into downstream noise sampling. Enhancement is opt-in via system_prompt config (null=disabled, "default"=Lightricks prompt). Clean up placeholder enhancement code from inference(). Made-with: Cursor * Reorder unique_id priority * [reward] feat: add CLAP and ImageBind audio reward models Add two audio-specific reward models for LTX2 audio-video generation: - CLAPRewardModel: audio-text alignment via LAION CLAP (48 kHz mono, cosine similarity, zero new deps via transformers.ClapModel) - ImageBindRewardModel: audio-video semantic alignment via Meta ImageBind (16 kHz mel-spectrogram, video spatial crops, multi-mode scoring) Register both in reward registry, replace PickScore with CLAP + ImageBind in ltx2_t2av.yaml config, and update COMMIT_PLAN.md Step 11 as done. Made-with: Cursor * [docs] feat: add flat inheritance rules for adapters and trainers - Constraint #11: trainers MUST inherit from BaseTrainer directly (only GRPOGuardTrainer → GRPOTrainer sanctioned) - Constraint #12: adapters MUST inherit from BaseAdapter directly; shared logic uses helpers, code duplication, or mixins - Update architecture.md and cursor rule to reference both rules Made-with: Cursor * [docs] feat: add I2AV adapter and audio reward research plans - I2AV_PLAN: LTX2 Image-to-Audio-Video adapter design (BaseAdapter flat hierarchy, conditioning mask, first-frame preservation) - AUDIO_REWARD_PLAN: CLAP and ImageBind reward model integration - COMMIT_PLAN: update GPU validation status and prompt enhancement analysis Made-with: Cursor * chore: update diffusers submodule Made-with: Cursor * [docs] feat: add two-layer sample hierarchy constraint Model-specific samples must inherit from task-level samples (e.g. LTX2I2AVSample → I2AVSample), never from other model-specific samples. Updated constraint #14, architecture.md, and base-class-contract rule. Made-with: Cursor * [docs] update: expand I2AV plan with full implementation details - Complete LTX2I2AVSample dataclass with all duplicated LTX2 fields - Code duplication strategy (no _common.py shared module) - encode_image with condition_image_size (flux2/qwen pattern) - Dual-path inference: raw images or pre-encoded condition_images - Full forward()/inference() structure with conditioning mask semantics - enhance_prompt I2AV multimodal difference documented Made-with: Cursor * [reward] fix: add transformers v5 compatibility for PickScore get_*_features() API In transformers >=5.0, get_text_features()/get_image_features() return BaseModelOutputWithPooling instead of a tensor. Add _extract_feature_tensor() helper to handle both v4 (tensor) and v5 (ModelOutput) return types. Made-with: Cursor * [samples,logger] feat: add I2AVSample dataclass and logger support Introduce I2AVSample (Image-to-Audio-Video) task-level sample and its logger handler, combining I2V condition-image table layout with T2AV audio-muxed LogVideo for complete I2AV logging across backends. Made-with: Cursor * [adapter,registry] feat: add LTX2 I2AV adapter for image-conditioned audio-video generation - LTX2_I2AV_Adapter(BaseAdapter) with conditioning_mask for frame-0 preservation - forward(): CFG-doubles conditioning_mask internally, per-token video timestep masking, scheduler.step on generated frames only - inference(): dual-path image input (raw PIL via encode_image or pre-encoded tensor) - _enhance_prompt_batch(): multimodal Gemma3 enhancement with raw PIL images - _standardize_image_input(): MultiImageBatch flattening for single-condition model - Registry entry 'ltx2_i2av' and example YAML config Made-with: Cursor * [adapter] fix: align MultiImageBatch/MultiVideoBatch type annotations across adapters Wan2 I2V and Wan2 V2V had correct runtime logic (is_multi_image_batch check in _standardize_*_input) but inference() and encode_image() annotations used ImageBatch/VideoBatch instead of MultiImageBatch/MultiVideoBatch. Made-with: Cursor * [docs] update: add nested batch convention to adapter docs - adapter_conventions.md: document that inference() receives MultiImageBatch/ MultiVideoBatch from collator; single-condition adapters must flatten via _standardize_*_input; append Gotcha #5; add cross-refs - ff-new-model/SKILL.md: extend Pitfall #6 with single-condition handling guidance and back-ref to adapter_conventions.md Gotcha #5 Made-with: Cursor * [style] fix: apply PR review fixes — logging, section-divider policy, unique_id - Replace bare print() with logger.warning() in formatting.py - Update section-divider rule: allow between methods, forbid inside functions - Convert in-function decorative dividers to plain numbered comments - Include negative_prompt in unique_id hash via _hash_id_fields() refactor - Fix compute_unique_id default to 8 bytes (fits torch.int64) Made-with: Cursor * [reward,models/ltx2] fix: CLAP BatchNorm dtype + I2AV forward bugs - CLAP: load model in float32 (BatchNorm requires it); fix audios->audio deprecation - I2AV: remove conditioning_mask from _shared_fields to preserve batch dim - I2AV: ensure next_latents in return_kwargs for frame-slicing logic - Remove dtype: bfloat16 from CLAP reward config in example YAMLs Made-with: Cursor * Update config * update config * Update config * Fix dtype * Update * [diffusers] fix: cherry-pick flash_3_varlen_hub mask dtype fix Cherry-pick commit 0f8a83fa6 from diffusers to support additive (bfloat16) attention masks in _flash_attention_3_varlen_hub. This fixes the ValueError when LTX2 connectors pass non-bool masks. Made-with: Cursor * [data_utils] perf: eliminate redundant preprocessed-dataset writes Each rank's preprocessed Arrow shard is now written exactly once: the orchestrator routes Dataset.map output directly to the final per-rank location via cache_file_name= (under {merged_cache_path}.tmp/_parts/), and the consolidator writes only state.json + dataset_info.json before atomically renaming .tmp -> merged_cache_path. No row data is re-copied during the merge, no duplicate cache lands under ~/.cache/huggingface, and the build-dir sentinel _build_meta.json enables crash recovery for unchanged num_shards while wiping cleanly on num_shards changes. Single-process and distributed paths are unified through the same flow (N=1 case for single-process); enable_preprocess=False bypasses the consolidate pipeline entirely to preserve pre-refactor behavior. I/O budget per cache build drops from ~4*N*S to ~N*S bytes touched. Made-with: Cursor * [docs] update: comprehensive audio + R7 no-op-default encoder docs sweep Bring agent docs and the developer guides on feat/ltx2-audio-video-support in line with the R6/R7 BaseAdapter contract changes (PR #129) AND extend every audio-aware section to cover the new modality. Critical contract fixes (mirror Wave A on data-utils-perf so the 0-hit verification gate passes on this branch independently of merge order): - constraints.md #12: 7 abstract methods -> 4 (load_pipeline, decode_latents, forward, inference); new "Optional encoder overrides (no-op default)" subsection naming all 4 encoders incl. encode_audio; preprocess_func note now explains the audios dispatch and "skip when None" semantics. Trailing "Adapter hierarchy" paragraph also fixed ("7-method contract" -> "4-abstract-method contract" with explanation). - ff-new-model/SKILL.md: frontmatter description fixed; Phase-1 step-3 mapping gains audio encoder/VAE row; Phase-2 step-3 implementation table reorders so the 4 truly-abstract methods are on top, marks all 4 encoders as Abstract? No (no-op default; override if your model consumes this modality), adds the encode_audio row; Pitfall #6 extended to include audios + the []-for-empty / never-unwrap contract (preserved the existing _standardize_*_input guidance and added a cross-ref to adapter_conventions.md Gotcha #6). - guidance/new_model.md: Step 4 narrative replaced "three encoding methods" with "Override the encoders your model consumes"; updated the dispatcher pseudocode to enumerate all 4 modalities with the "if encoded is not None" skip; new #### encode_audio block parallel to encode_video (MultiAudioBatch signature, default return None); Step-5 inference signature comment adds "audios: Optional[MultiAudioBatch]" (commented as opt-in); checklist gains explicit encode_video() / encode_audio() items framed as override-only. Audio-aware sweep (Wave-B-only, lives here because LTX-2 actually consumes audio): - adapter_conventions.md: Batch Dimension Convention extended to include audios/MultiAudioBatch on inference(); new bullet codifying the multi-media batch homogeneity guarantee; new Gotcha #6 for the []-for-empty / no-unwrap rule (applies symmetrically to images, videos, audios). - guidance/new_model.md Data Format Conventions: new ### Audio table with audios/condition_audios/audio_features rows and a callout pointing to flow_factory.utils.audio.MultiAudioBatch; cross-cutting batch-boundary callout extended to include encode_audio(). - guidance/workflow.md Stage 1: goal narrative + Input row extended to include "audio files"; new audio-symmetry callout after the Flux.2 example explaining audio_dir is the third optional input handled by _preprocess_batch. - architecture.md: Stage-1 ASCII box now lists audio + audio_features; Adapter Pattern subsection gains a one-liner that all 4 encoders are no-op by default (override only the modalities your model consumes). - ff-develop/SKILL.md sec.2 Adapter Hierarchy: appended the R7 design lesson bullet (non-abstract no-op default + opt-in override over @AbstractMethod for new modalities; the 4 abstract methods are intentionally minimal). - fix_patterns.md Recorded Fix Patterns: replaced "(No records yet)" with two full entries using the documented template — R6 multi-modal batch homogeneity and R7 non-abstract encoder defaults; both link back to the actual code locations. Pure docs change; no Python touched. Verified: zero hits across .agents/ + guidance/ for "7 abstract methods", "7-method contract", "three encoding methods", "Implement the three encoding"; encode_audio + MultiAudioBatch present in every doc per the matrix; ReadLints clean on all 8 touched files. Made-with: Cursor * [diffusers] sync: bump submodule to upstream main 77f8cf8bf Drops the locally cherry-picked commit 620286eb5 ("support ltx-2 type masking in flash_3_hub_varlen") and resets the submodule to the official huggingface/diffusers main HEAD as of 2026-04-18. Functional consequence: _flash_attention_3_varlen_hub no longer casts non-bool attn_mask via `attn_mask > -1`. Upstream already carries the `isinstance(result, tuple)` defensive unpack, so only the bool-cast is missing. Waiting for upstream to land an equivalent fix; until then, LTX2 paths that pass a non-bool attn_mask to flash-attn-3 varlen-hub may misbehave inside _normalize_attn_mask. Made-with: Cursor * [utils] feat: add move_tensors_to_device recursive helper Adds a shape-agnostic device-move utility in utils/base.py that walks list / tuple / dict containers depth-first, copying torch.Tensor leaves to the target device. Non-tensor leaves (PIL, str, int, np.ndarray) pass through unchanged. Containers are reconstructed immutably; the input is not modified. Signature: move_tensors_to_device(value, device, max_depth=None) The optional max_depth bounds recursion (None = unbounded; 0 = only move when value itself is a Tensor; N = walk N levels). Designed for the upcoming reward path device adaptation, but kept as a general utility so future callers (e.g., a future BaseSample.to refactor delegating with max_depth=1) can reuse it. Pure addition; no consumers in this commit. Made-with: Cursor * [reward] refactor: route reward inputs through move_tensors_to_device Inserts a move_tensors_to_device call between _convert_media_format and model(**batch_input) in three reward computation sites: - _compute_pointwise_batch - _compute_groupwise_group - _compute_groupwise_local (inner per-group loop) The recursive helper walks list/tuple/dict containers and copies tensor leaves to model.device. The local batch_input dict is reconstructed; sample objects are NOT mutated. Behavior with current GPU-resident samples: same-device .to() is a no-op, so reward outputs remain bit-identical. The change is defensive prep for the upcoming sample-loop CPU offload (commit 6) where samples will arrive on CPU and reward models still run on their declared device. The distributed groupwise path (_compute_groupwise_distributed) needs no change: it already passes device=self.accelerator.device to gather_samples, so its inputs are GPU-resident regardless of caller-side device. Made-with: Cursor * [hparams,trainer] feat: add offload_samples_to_cpu config and BaseTrainer helper Adds the configuration switch and the producer-side helper for the upcoming sample CPU-offload + lazy-reload pipeline. hparams/training_args.py: TrainingArguments gains a new field offload_samples_to_cpu: bool = False placed next to enable_gradient_checkpointing (sibling memory switch). The help string documents the trade-off (D2H per sample + per-reward H2D ~100ms/epoch vs sample/optimize GPU peak reduction). trainers/abc.py: BaseTrainer gains _maybe_offload_samples_to_cpu(samples), a non- abstract helper that no-ops when the config is False and otherwise walks the sample list calling BaseSample.to('cpu'). The docstring records the ordering invariant required by the consumer trainers (must be called BEFORE reward_buffer.add_samples) and points to RewardProcessor's move_tensors_to_device for the consumer-side H2D. No call sites yet -- behaviour is unchanged in this commit. The helper is wired into the five trainers' sample() loops in commit 6. Made-with: Cursor * [trainer] refactor: lazy per-batch reload in GRPO/GRPO-Guard/DPO optimize() Replaces the eager pre-stacked sample_batches list with a single per-batch loop that lazily reconstructs each micro-batch: for batch_idx in range(num_batches): batch_samples = [sample.to(device) for sample in shuffled_samples[...]] batch = BaseSample.stack(batch_samples) ... Affected sites: - trainers/grpo.py GRPOTrainer.optimize() - trainers/grpo.py GRPOGuardTrainer.optimize() - trainers/dpo.py DPOTrainer.optimize() (chosen_samples / rejected_samples extraction inside the per-pair-batch loop) Behaviour-preserving for the current GPU-resident sample buffer: every sample.to(device) is a same-device no-op, so loss values, gradients, and optimizer steps remain bit-identical to HEAD~1. The change is the consumer-side prerequisite for the upcoming sample-loop CPU offload (commit 6): once samples may be CPU-resident, this lazy reload is what keeps optimize()'s GPU footprint bounded by a single micro-batch instead of the full epoch. NFT/AWM are deliberately not touched in this commit -- their optimize() has an extra eager precompute layer that requires structural restructuring (commit 5). Made-with: Cursor * [trainer] refactor: NFT/AWM optimize() per-batch precompute interleave Restructures NFT and AWM optimize() from the previous double-pass design (eager precompute over ALL batches under sampling_context, then training over ALL batches under current params) to a single-pass per-batch interleave that matches the official DiffusionNFT and AWM implementations: for each micro-batch: 1. lazy reload sample tensors to GPU and stack into a batch dict 2. precompute under sampling policy: adapter.rollout() with sampling_context(): compute (_all_timesteps, _all_random_noise, _old_v_pred_list or _old_log_probs) for THIS batch only 3. train under current policy: adapter.train() with self.autocast(): for t_idx in range(num_train_timesteps): forward / loss / backward / optimizer step Memory savings: only the current batch's _all_random_noise plus _old_v_pred_list (NFT) or _old_log_probs (AWM) lives on GPU at any time. The previous design held all num_batches_per_epoch batches' precompute output simultaneously, costing ~5+ GB on FLUX1 1024^2 LoRA at B=4 / T=40 and tens of GB on Wan video models (often the OOM trigger). Train-inference consistency (philosophy #1; see .agents/knowledge/topics/train_inference_consistency.md item #4): - Rollout (sample()/adapter.inference()) is unchanged. - EMA params are loaded via sampling_context() and restored before each batch's training forward, identical to the per-batch behavior of the previous design. - ema_step() runs only once per outer epoch in start(), so every batch within an optimize() call sees the SAME EMA snapshot regardless of interleave timing -> per-batch and eager designs are equivalent on the EMA invariant. Note on RNG (regression test guidance): randn_tensor for batch K is now called after batch K-1's backward step (vs all noises sampled upfront). The CUDA RNG consumption order changes; under the same seed, the per-batch noise sequences are NOT bit-identical to the eager design. The algorithm is unchanged (noise is augmentation; equivalent in expectation). Regression tests should use statistical metrics (loss mean / reward trend across an epoch), NOT a numeric diff of loss values, when comparing against HEAD~1. Sample-level lazy reload (`[sample.to(device) for sample in slice]`) is folded into this same restructure -- NFT/AWM now share the same lazy reload pattern as GRPO/DPO from commit 4. The KL paths (use_ref_parameters / use_ema_parameters per timestep) and all loss / backward / optimizer logic are unchanged in body and order. Made-with: Cursor * [trainer] feat: wire sample() loop CPU offload across all trainers Inserts self._maybe_offload_samples_to_cpu(sample_batch) into every trainer's sample() loop, immediately after adapter.inference() and BEFORE both samples.extend() and reward_buffer.add_samples(): sample_batch = self.adapter.inference(...) self._maybe_offload_samples_to_cpu(sample_batch) # synchronous D2H samples.extend(sample_batch) self.reward_buffer.add_samples(sample_batch) Affected sites (5 total): - GRPOTrainer.sample() (trainers/grpo.py) - GRPOGuardTrainer.sample() (trainers/grpo.py) - DiffusionNFTTrainer.sample() (trainers/nft.py) - AWMTrainer.sample() (trainers/awm.py) - DPOTrainer.sample() (trainers/dpo.py) Why BEFORE add_samples (not after): reward_buffer.add_samples() in the async-reward path records a CUDA sync_event and dispatches workers that read sample.image / sample.video / etc. Calling the offload BEFORE add_samples guarantees the recorded event captures "D2H complete + data ready on CPU"; workers wait on the event and then deterministically see CPU-resident samples. Inverse order would race the worker thread's getattr against the main thread's in-place setattr that BaseSample.to('cpu') performs. Behaviour gating: The helper short-circuits when training_args.offload_samples_to_cpu is False (the default), so the entire pipeline is wired but inert. Setting the flag to True in any trainer YAML now activates the producer side (D2H here), and the previously-landed pieces handle the consumer side: * commit 2: reward_processor moves the CPU input dict to model.device via move_tensors_to_device. * commits 4 & 5: optimize() loops lazily reload [sample.to(device) for sample in slice] per micro-batch. End-to-end VRAM saving (roughly num_batches_per_epoch x per_batch_size of sample tensors) is unlocked for the first time in this commit. Wan video YAMLs that opt in (commit 7) will exercise it. Evaluate paths are intentionally NOT touched -- eval samples are usually small and one-shot, and eval logging may rely on tensors being on the adapter device. Made-with: Cursor * [examples] feat: enable offload_samples_to_cpu for Wan video models Adds `offload_samples_to_cpu: true` to all 13 Wan video example configs (GRPO LoRA / Full and NFT LoRA / Full across Wan2.1 and Wan2.2, T2V / I2V / V2V variants). Inserted next to `enable_gradient_checkpointing` so the two memory switches sit together. Why required for video models: per-sample tensors (all_latents, condition videos, image_embeds, ...) are GB-scale on Wan; without the offload, sample()/optimize() OOMs as soon as num_batches_per_epoch > 1. The plumbing wired in commits 1-6 is now actually exercised on these configs. Files (13 total): examples/grpo/lora/wan21_t2v.yaml examples/grpo/lora/wan21_i2v.yaml examples/grpo/lora/wan21_v2v.yaml examples/grpo/lora/wan22_t2v.yaml examples/grpo/lora/wan22_i2v.yaml examples/grpo/full/wan21_t2v.yaml examples/grpo/full/wan21_i2v.yaml examples/grpo/full/wan22_t2v.yaml examples/grpo/full/wan22_i2v.yaml examples/nft/lora/wan21_t2v.yaml examples/nft/lora/wan21_i2v.yaml examples/nft/lora/wan22_t2v.yaml examples/nft/full/wan22_t2v.yaml Non-video model YAMLs are intentionally not touched in this commit; moderate-VRAM-pressure image models (Flux2, Qwen-Image-Edit-Plus) get an explicit `false` + pros/cons comment in commit 8 so users see the option as a documented decision point, while small/standard image models (FLUX1 / SD3 / Qwen-Image / Z-Image / DPO / etc.) rely on the code default `False` to avoid YAML noise. Made-with: Cursor * [examples] docs: expose offload_samples_to_cpu option in Flux2 and Qwen-Image-Edit-Plus configs Adds an explicit `offload_samples_to_cpu: false` to 13 example configs (11 Flux2 variants + 2 Qwen-Image-Edit-Plus) preceded by a multi-line comment that documents the parameter, its pros/cons, and the conditions under which a user should flip it to true: # offload_samples_to_cpu: CPU-offload sample tensor fields between # sample() and optimize() to reduce GPU peak memory. # Pros (true): saves N x per_batch_size GPU memory ... no correctness # or convergence impact. # Cons (true): adds ~100ms/epoch H2D in reward path; tiny per-batch # H2D in optimize (<5ms each). # Recommended (true) for higher resolutions, larger batch sizes, or # any sample()/optimize() OOM. Default false works for current # example settings. offload_samples_to_cpu: false Tier rationale (3-tier YAML strategy, deviating intentionally from the strict ALL-YAML rule in .cursor/rules/examples-yaml-sync.mdc): T1 (commit 7, Wan video, 13 YAMLs): explicit `true` -- required to avoid OOM. T2 (this commit, Flux2 + Qwen-Edit, 13 YAMLs): explicit `false` + pros/cons comment -- moderate VRAM pressure, decision left to the user with documentation right next to it. T3 (untouched, 23 YAMLs of FLUX1 / SD3 / Qwen-Image / Z-Image / DPO / AWM-non-Flux2 / template): no field added, sane defaults via the code-level default. The field is documented in the upcoming topics/sample_lifecycle.md (commit 9). This three-tier policy keeps T3 YAMLs noise-free while making T1/T2 both behaviour-correct (T1) and discoverable (T2). Files (13 total): examples/grpo/lora/flux2_t2i.yaml examples/grpo/lora/flux2_i2i.yaml examples/grpo/lora/flux2_klein.yaml examples/grpo/lora/flux2_klein_base.yaml examples/grpo/full/flux2_t2i.yaml examples/grpo/full/flux2_i2i.yaml examples/grpo/full/flux2_klein.yaml examples/grpo/full/flux2_klein_base.yaml examples/awm/lora/flux2_klein_base.yaml examples/nft/lora/flux2_klein_base.yaml examples/nft/full/flux2_klein_base.yaml examples/grpo/lora/qwen_image_edit_plus.yaml examples/grpo/full/qwen_image_edit_plus.yaml Made-with: Cursor * [docs] feat: add sample lifecycle topic and README routing New leaf doc .agents/knowledge/topics/sample_lifecycle.md (per .cursor/rules/agents-docs-maintenance.mdc), covering: - default sample lifecycle with the offload pipeline - the offload_samples_to_cpu switch and its effect at each stage - the 3-tier example YAML adoption matrix (T1 Wan video / T2 Flux2 + Qwen-Image-Edit-Plus / T3 the rest), including the rationale for intentionally deviating from the strict ALL-YAML rule in .cursor/rules/examples-yaml-sync.mdc - reward path device responsibility (move_tensors_to_device contract) - async-reward race-free argument for offload-before-add_samples order - NFT/AWM per-batch precompute interleave summary (memory savings, train-inference consistency, RNG-order caveat) - extra_kwargs device asymmetry caveat (rewards on CPU, advantage on GPU, neither moved by BaseSample.to) - Cross-refs to constraints #11, #14, #15, train_inference_consistency item #4, dtype_precision README.md routing table gains a corresponding row pointing at the new topic with explicit triggers (sample/optimize data flow changes, debugging sample/optimize OOM, adding high-resolution / video example configs). The new topic is the authoritative reference for the offload_samples_to_cpu switch -- T3 YAMLs do not list the field, so users discover it through this routing entry. This is the documentation closing the 9-commit refactor of the sample CPU offload + lazy reload pipeline (commits 1 through 8). No code changes in this commit. Made-with: Cursor * [trainer] docs: simplify NFT/AWM optimize() docstrings The previous docstrings (introduced in commit 49e7b49) inlined the full memory analysis, train-inference consistency proof, and RNG-order caveat -- ~30 lines each. All of that material lives in the authoritative .agents/knowledge/topics/sample_lifecycle.md (commit 8dc4a78), so the docstrings now keep only the essential "what": - one-line summary - per-batch interleave shape (3-step pipeline) - AWM-specific note on decoupled sampling/training timesteps - pointer to topics/sample_lifecycle.md for "why" Net change: -33 lines across the two files; bodies unchanged. Made-with: Cursor * [ltx2] fix: handle batched per-sample timestep in adapter forward() `t.expand(batch_size * 2)` raised RuntimeError when forward() was called during training with `t` of shape (B,) (distinct per-sample timesteps from `batch['timesteps'][:, ti]`). `expand` cannot stretch a non-singleton dim from B to 2B. Normalize `t` to (B,) at function entry with fail-fast shape validation, then use `torch.cat([t, t])` for the CFG-doubled batch (matching the `torch.cat([lat, lat])` ordering of `[t0..tB-1, t0..tB-1]`). Accepts 0-D scalar (inference), (1,) singleton, and (B,) per-sample inputs; other shapes raise an informative ValueError. Applied identically to LTX2_T2AV_Adapter and LTX2_I2AV_Adapter. Made-with: Cursor * [data_utils] fix: stabilize preprocess cache key via deep signature collection compute_cache_path previously hashed ALL preprocess_kwargs — including training-infrastructure fields like num_batches_per_epoch and gradient_accumulation_steps that leak through **training_args unpacking and filter_kwargs pass-through. These fields have non-deterministic values across launches (world-size-derived, __post_init__ timing, etc.), causing merged_cache_path to differ even with identical YAML and force_reprocess=False. Fix: add _select_cache_relevant_kwargs() which uses "deep signature collection" — it inspects the named parameters of preprocess_func AND (when preprocess_func accepts **kwargs and is a bound adapter method) the named parameters of all encode_* forwarding targets on the same adapter instance. Only kwargs whose key appears in this union are included in kwargs_hash. Training-only fields that no encoder declares are excluded. Safety: over-hash (hashing an encoder param that doesn't run at runtime) is harmless; under-hash (missing a param that affects output) is prevented by collecting from all four encoder methods regardless of runtime data presence. Made-with: Cursor * [examples] switch LoRA configs from DDP to DeepSpeed ZeRO-2 DeepSpeed ZeRO-2 provides optimizer-state sharding with negligible overhead, making plain multi_gpu (DDP) redundant for LoRA training. Made-with: Cursor * [examples] docs: note LTX-2.3-Diffusers as an option in LTX2 configs Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [examples] fix: enable offload_samples_to_cpu for LTX2 video configs Matches Wan video-model configs. Without this, per-sample audio+video tensors are GB-scale and sample()/optimize() OOMs at num_batches_per_epoch > 1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [examples] docs: rewrite stale Qwen attn_backend comment for LTX2 Replace misleading "for Qwen-Image Series" comment with a model-agnostic description of available backend options. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [models/ltx2,rewards] style: apply black and isort to new PR files Cosmetic-only changes (import reorder, string-quote normalization, long-line wrapping). Scoped to the 5 files this PR adds; existing unclean files on main are out of scope. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Fix dtype source * Update submodule * [models,docs] refactor: align CFG handling across all adapters with forward-stage warning Ensure every CFG-capable adapter follows a consistent two-stage pattern: 1. encode_prompt: derive do_classifier_free_guidance from guidance_scale (>1.0 standard, >0.0 for Z-Image); default negative_prompt to "" when None. 2. forward: if guidance_scale > threshold but negative_prompt_embeds is None, emit logger.warning and gracefully fallback to the no-CFG path. Adapter-specific changes: - flux2_klein: extract do_classifier_free_guidance variable in _forward - sd3_5: add forward warning; migrate to setup_logger; drop unused import - z_image: add forward warning (threshold > 0.0) - wan2_t2v/v2v/i2v: add forward warning; unify negative_prompt expansion - qwen_image: add guidance_scale to encode_prompt; add forward warning - qwen_image_edit_plus: add guidance_scale to encode_prompt; rename true_cfg_scale/do_true_cfg to guidance_scale/do_classifier_free_guidance; add _forward warning - ltx2_t2av/i2av: add forward warning for multi-guidance (video + audio) Document the CFG convention in .agents/knowledge/topics/adapter_conventions.md with reference implementation, model-specific extensions table, and gotcha #7. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * [examples,docs,agents] refactor: restructure examples, add LTX-2 docs, sync agent knowledge - Restructure examples/ to algorithm/ft/model/variant.yaml with examples/README.md - Add LTX-2/2.3 to README (News, model table, install note) - Add .scratch/ constraint for agent temp files (#28), examples convention (#29) - Sync agent knowledge: GroupDistributedSampler in samplers.md, LTX2 + RationalRewards in architecture.md - Clean up .docs/ltx2-research/ dev artifacts - Update LTX2 configs: guidance_scale=1.0, comment out attn_backend Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Jayce-Ping
added a commit
that referenced
this pull request
Jun 14, 2026
Resync .agents/, .cursor/, guidance/, AGENTS.md and CLAUDE.md with the current code after plugin growth (9 trainers, 14 model adapters, 13 reward models). Fixes registry drift, wrong config/API facts and broken cross-references found in a full audit. - architecture.md/AGENTS.md: add diffusion-opd trainer, clap/imagebind/ geneval rewards, Bagel/LTX2 models; fix RationalRewards* class names - constraints.md: evaluate() is concrete (not abstract); index #28-29; paradigm (#7) and training-args (#16) lists; de-numbered line refs - philosophy.md: Accelerate (DDP/DeepSpeed ZeRO-1-2/FSDP) backend; fix #27 ref - guidance: scheduler.* config keys, real sample()/compute_advantages snippets, GenEval metadata convention, audio reward param, Bagel link - skills: model_name_or_path, default_target_modules, data.datasets, rewards-as-list, 9 trainers; CLAUDE.md imports AGENTS.md to avoid drift - topics/samplers.md: correct _resolve_sampler_type + AdvantageProcessor group_distributed paths; parity_testing set_scheduler_timesteps - hparams/model_args.py: model_type Literal now matches registry keys Co-authored-by: Cursor <cursoragent@cursor.com>
Jayce-Ping
added a commit
to Jayce-Ping/Flow-Factory-Private
that referenced
this pull request
Jul 2, 2026
* Move `samples` * Update trajectory collector * update extra_callbacks selective * Remove undefined log_prob * Fix grpo guard
Jayce-Ping
added a commit
to Jayce-Ping/Flow-Factory-Private
that referenced
this pull request
Jul 2, 2026
… support (X-GenGroup#118) * docs: comprehensive analysis of samples, rewards, and video adapter architecture Add detailed documentation covering: - Complete BaseSample dataclass fields and sample type hierarchy (T2ISample, T2VSample, etc.) - RewardModelOutput and reward model abstract classes (PointwiseRewardModel, GroupwiseRewardModel) - WAN2 text-to-video adapter implementation and video generation pipeline - Sample canonicalization and unique_id computation via SHA256 hashing - Video format specifications (Tensor(T, C, H, W) with frame/dimension constraints) - Reward model input/output modality handling and tensor input configuration - Data flow examples from generation through sampling to reward computation - Quick reference cards and integration patterns for implementing new adapters Files created: - START_HERE.md: Quick navigation guide - ANALYSIS_REPORT.md: Comprehensive technical analysis - QUICK_REFERENCE.md: Copy-paste templates and patterns - MODALITY_FLOW.md: Complete data flow walkthrough - CODEBASE_EXPLORATION.md: File structure and discovery process - DOCUMENTATION_INDEX.md: Structured index of all findings Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: move research docs to .docs/ltx2-research/ Move auto-generated analysis documents out of project root into a dedicated temporary folder for LTX2 integration research. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * [utils] feat: add audio utility module with type aliases, standardization, and loading Add `utils/audio.py` following the exact pattern of `utils/image.py` and `utils/video.py`. Provides a complete audio waveform toolkit: - Type aliases: `AudioSingle` (C,T), `AudioBatch` (B,C,T) - Validation: `is_audio()`, `is_audio_batch()` - Loading/saving: `load_audio()`, `save_audio()` with 3-tier backend fallback (torchaudio → soundfile → stdlib wave) - Conversion: `audio_to_tensor()`, `audio_to_numpy()`, `convert_audio()` for resampling and mono/stereo conversion - Standardization: `standardize_audio_batch()` with output_type='pt'|'np' - Hashing: `hash_audio()`, `hash_audio_list()` with int16 quantization Design conventions follow diffusers/audiocraft: - Channel-first (C, T) tensor layout - [-1.0, 1.0] float32 value range - Channel conversion: downmix=mean, upmix=repeat Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * [samples] feat: add audio field to BaseSample and T2AVSample class - Add `audio: Optional[torch.Tensor]` field to BaseSample with automatic promotion from 1D (T,) to 2D (C, T) via audio_to_tensor() - Add T2AVSample(BaseSample) for text-to-audio-video generation tasks - Update __init__.py exports The audio field follows the same pattern as image/video: stored without batch dimension, standardized in __post_init__, supports stack/to_dict. Fully backward compatible — defaults to None. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * [data] feat: support audio loading and preprocessing in dataset pipeline - Add `audio_dir` field to DataArguments (defaults to 'audios' subfolder) - Add audio loading block in GeneralDataset._preprocess_batch() that mirrors the existing video loading pattern: detect 'audio' column in JSONL, load via load_audio(), pass to preprocess_func(audios=...) - Add 'audios' to PREPROCESS_KEYS for metadata exclusion - Pass audio_dir through fn_kwargs to .map() call The audio_dir parameter flows automatically through loader.py via filter_kwargs — no changes needed in loader.py. Fully backward compatible: datasets without audio columns are unaffected. JSONL format: {"prompt": "...", "audio": "file.wav"} Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * [models] feat: add encode_audio interface and audio_vae support to BaseAdapter - Add `encode_audio()` default method that returns None — non-abstract, so existing adapters need no changes - Add `audio_vae` property with getter and setter (mirrors vae pattern) - Update `preprocess_func()` to accept `audios` parameter and route it through `encode_audio()` in the same loop as prompt/image/video - Update `_freeze_vae()` to also freeze audio_vae when present, keeping `_freeze_components()` clean Fully backward compatible: encode_audio returns None by default, audio_vae returns None when pipeline has no audio_vae component. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * [rewards] feat: add audio parameter to reward model interfaces - Add `audio: Optional[List[torch.Tensor]]` parameter to both PointwiseRewardModel.__call__ and GroupwiseRewardModel.__call__, positioned after `video` and before `condition_images` - Add 'audio' to RewardProcessor.MEDIA_FIELDS - Add audio branch in _convert_media_format(): tensor passthrough when use_tensor_inputs=True, numpy conversion otherwise Audio is always tensor-based (no PIL equivalent). Existing reward models (PickScore, CLIP, etc.) are unaffected — their concrete __call__ signatures don't include `audio`, so filter_kwargs strips it automatically with zero overhead. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: update Step 6 sub-plan to v4 with verified diffusers 0.38.0.dev0 API Key corrections from runtime verification: - Connectors use additive mask, not padding_side - Transformer forward has no sigma/audio_timestep/STG/modality params - CFG is velocity-space with [uncond, cond] chunk order - Compression ratios are instance attributes, not class properties - Audio params from pipeline attributes, not transformer config Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: clean up obsolete research docs, consolidate plan Remove 8 early exploration documents that have been fully superseded by actual code implementation and the two remaining plan files: - COMMIT_PLAN.md: overall progress tracker (updated with final status) - STEP6_SUBPLAN.md: detailed LTX2 adapter plan (v4, API-verified) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * [models/ltx2] feat: scaffold LTX2 adapter with sample dataclass and pipeline loading Add LTX2 text-to-audio-video adapter scaffold: - LTX2Sample(T2AVSample): dataclass with audio trajectory fields (audio_all_latents, audio_latent_index_map), connector embedding fields for both video/audio streams, and negative prompt fields for CFG during training - LTX2_T2AV_Adapter(BaseAdapter): skeleton with: - load_pipeline(): LTX2Pipeline.from_pretrained with low_cpu_mem_usage=False - _create_audio_scheduler(): ODE-only FlowMatchEulerDiscreteSDEScheduler (separate instance to avoid step_index collision with video) - default_target_modules: 28 Linear layers per block, verified against LTX2VideoTransformerBlock.named_modules() - preprocessing_modules: ['text_encoders', 'connectors'] - inference_modules: ['transformer', 'vae', 'audio_vae', 'connectors', 'vocoder'] - Stub methods for encode/decode/forward/inference (NotImplementedError) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * [models/ltx2] feat: implement encode_prompt and decode_latents for LTX2 encode_prompt(): - Delegates to pipeline.encode_prompt() for Gemma3 all-layer encoding + _pack_text_embeds normalization - Passes through connectors with additive attention mask (1 - binary_mask) * -1e6, additive_mask=True - Splits [negative, positive] connector outputs for CFG - Returns prompt_ids + video/audio connector embeddings + masks decode_latents(): - Video: unpack → denormalize → optional timestep conditioning with decode noise injection → VAE decode → postprocess - Audio: denormalize → unpack → audio_vae decode → vocoder (denormalize BEFORE unpack — order differs from video!) - All operations match pipeline source L1172-1218 exactly Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * [models/ltx2] feat: implement forward() with CFG and dual scheduler steps Single denoising step matching pipeline L1097-1154: 1. Prepare CFG inputs: duplicate latents + concat [neg, pos] embeddings 2. Joint transformer forward with cache_context("cond_uncond") 3. CFG in velocity-space: uncond + gs * (cond - uncond), with optional guidance_rescale 4. Video: SDE scheduler step (stochastic, with log_prob for RL) 5. Audio: ODE scheduler step (deterministic, no log_prob) 6. Attach audio_next_latents on video output for trajectory tracking RoPE coords are computed on demand if not cached from inference loop. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * [models/ltx2] feat: implement inference loop and register ltx2_t2av adapter inference(): - Full denoising loop following pipeline L908-1226 - Encode prompts (with fallback to pre-encoded inputs) - Compute latent dimensions: video (32x spatial, 8x temporal), audio (4x mel, 4x temporal, sr=16kHz, hop=160) - Prepare video + audio latents via pipeline.prepare_latents/audio_latents - Timestep shift with LTX2-specific mu (base_seq=1024, shift=0.95-2.05) - Positional coords via transformer.rope / audio_rope - Dual trajectory collection: video (for RL) + audio (for reconstruction) - Decode both modalities and construct LTX2Sample per batch element Registry: - Add 'ltx2_t2av' entry mapping to LTX2_T2AV_Adapter Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * [models/ltx2] refactor: improve type safety and code cleanup for LTX2 adapter - Add self.scheduler type declaration (consistent with all other adapters) - Replace SDESchedulerOutput with FlowMatchEulerDiscreteSDESchedulerOutput - Change forward() return type to Tuple instead of dynamic attribute - Fix Optional parameter annotations for forward() signature - Remove unused imports (Any, DISTILLED_SIGMA_VALUES) and variables - Replace unicode arrows with ASCII in comments for encoding safety Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: update COMMIT_PLAN with completed steps summary and next steps (7-9) Steps 1-6 + type cleanup are all committed. Remaining work: - Step 7: Design multi-modal forward() return pattern for optimize() - Step 8: Example YAML configs (GRPO, NFT, AWM) - Step 9: Audio-video dataset integration for testing Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * [models/ltx2] refactor: promote num_frames and frame_rate to explicit LTX2Sample fields Move num_frames and frame_rate from extra_kwargs to explicit dataclass fields on LTX2Sample, consistent with height/width on BaseSample. Add num_frames to _shared_fields (shared across batch, not stacked). Keep duration_s in extra_kwargs as a derived value. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * [models/ltx2] refactor: unified latent interface for forward() Redesign forward() to accept concatenated video+audio latents as a single tensor (B, video_seq + audio_seq, C). Internally splits by video_seq_len, runs the joint transformer, steps video SDE and audio ODE schedulers separately, then concatenates next_latents back into a unified output. This makes the trainer interface identical to single-modality adapters: - forward() returns a single FlowMatchEulerDiscreteSDESchedulerOutput - Trainers access output.next_latents and output.log_prob directly - No trainer changes needed for multi-modal generation Key changes: - forward(): accept unified latents, split/cat internally, return single output - inference(): cat video+audio before loop, use single latent_collector - LTX2Sample: replace audio_all_latents with video_seq_len split point - Remove Tuple return type, keep dual scheduler (video SDE + audio ODE) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: update COMMIT_PLAN — Step 7 complete, Steps 8-9 remaining Step 7 resolved via unified latent interface: - 7a: Type safety cleanup (FlowMatchEulerDiscreteSDESchedulerOutput) - 7b: Promote num_frames/frame_rate to explicit LTX2Sample fields - 7c: Unified forward() — cat(video,audio) input, single output, dual scheduler internal Remaining: Step 8 (example YAML configs), Step 9 (test dataset) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: update COMMIT_PLAN to reflect current state, remove obsolete STEP6_SUBPLAN - Step 7 fully completed (7a-7c): type cleanup, explicit fields, unified latents - Steps 8-9 remain: example configs + test dataset - Add deferred features table with diffusers source availability - Delete STEP6_SUBPLAN.md (diverged from implementation, no longer useful) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: correct deferred features — STG, prompt enhancement, x0-guidance all exist in diffusers All three features are available in installed diffusers 0.38.0.dev0: - STG: extra transformer forward with spatio_temporal_guidance_blocks - Modality Isolation: extra forward with isolate_modalities=True - Prompt Enhancement: Gemma3 text_encoder.generate() with system prompt - x0-space guidance: convert_velocity_to_x0/convert_x0_to_velocity Note: current adapter uses velocity-space CFG; x0-space is prerequisite for STG and Modality Isolation Guidance. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * [models/ltx2] feat: align forward() with official x0-space multi-guidance pipeline Rewrite the guidance section to match official diffusers pipeline_ltx2.py: 1. x0-space guidance: convert velocity predictions to x0-space, compute all guidance deltas (CFG + STG + Modality Isolation) in x0-space, then convert back to velocity. This replaces the previous velocity-space CFG. 2. STG (Spatio-Temporal Guidance): optional extra transformer forward with spatio_temporal_guidance_blocks to perturb specific blocks. Separate stg_scale / audio_stg_scale for video and audio. 3. Modality Isolation Guidance: optional extra transformer forward with isolate_modalities=True (disables A2V/V2A cross-attention). Separate modality_scale / audio_modality_scale. 4. Prompt Enhancement: inference() supports system_prompt parameter to rewrite prompts via Gemma3 text_encoder.generate(). 5. LTX-2.3 compatibility: pass sigma=timestep and use_cross_timestep to all transformer forward calls. 6. Independent audio guidance: separate audio_guidance_scale, audio_guidance_rescale, audio_stg_scale, audio_modality_scale (all default to their video counterparts). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: update COMMIT_PLAN with refined next steps (8a/8b/9) Restructured remaining steps: - 8a: Inference alignment refinements (validation, num_frames rounding, coord pre-duplication, distilled sigmas, embed concatenation) - 8b: Example YAML configs (GRPO/NFT/AWM x lora/full) - 9: VGGSound-50k test dataset integration Updated design decisions to reflect x0-space guidance, sigma-based conversion, and adapter design philosophy (inference=__call__, forward=step). Removed obsolete deferred features (STG, prompt enhancement, x0-guidance now implemented). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * [models/ltx2] refactor: align inference() with official pipeline - Add _check_inputs(): validate height/width divisibility, STG block spec, auto-round num_frames to VAE-temporal-compatible value - Pre-duplicate RoPE coords for CFG before denoising loop (official L1201) - Pre-concatenate [neg, pos] connector embeds before loop to avoid re-catting every step; forward() receives _cfg_prepared flag - Extract positive-only embeds from pre-catted tensors for STG/modality passes when _cfg_prepared=True Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * [examples] feat: add LTX2 T2AV GRPO+LoRA example config Verified with ff-train: config parses correctly, model architecture resolves to ltx2_t2av, all parameters (resolution, num_frames, frame_rate, audio_dir, guidance_scale, LoRA settings) are properly propagated. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Update config and dataset * [logger,samples] feat: mux audio into video when logging T2AV samples T2AV samples (e.g. LTX2) carry both video and audio, but the logging pipeline silently dropped the audio track. This commit adds audio-video muxing so wandb/other loggers receive a single MP4 with sound. Changes: - BaseSample: add `audio_sample_rate` field alongside `audio` - LTX2 adapter: populate `audio_sample_rate` from pipeline vocoder config - LogVideo: add optional `audio`/`audio_sample_rate` fields; when present, `get_value('mp4')` muxes H.264 video + AAC audio via PyAV (mirrors diffusers' encode_video); falls back to silent MP4 if PyAV unavailable - LogFormatter: add T2AVSample dispatch → `_process_t2av_samples()` that creates LogVideo with audio attached and correct fps from frame_rate Made-with: Cursor * Fix _resolve_component_names * [models/ltx2] fix: align connectors call with current diffusers API Pass binary attention mask directly to LTX2TextConnectors.forward() instead of pre-computing additive mask. The current diffusers version handles binary-to-additive conversion internally and no longer accepts the `additive_mask` keyword argument. Made-with: Cursor * update * Fix * [models] refactor: unify CFG control via guidance_scale across all adapters Remove explicit `do_classifier_free_guidance` parameter from encode_prompt, inference, and forward signatures. Instead, derive the CFG flag internally from `guidance_scale` (>1.0 standard, >0.0 for Z-Image), ensuring data_preprocessing and inference stages always use the same CFG decision. Affected adapters: SD3.5, Z-Image, Wan T2V/I2V/V2V, LTX2 T2AV, Flux2 Klein. Made-with: Cursor * update * Fix cfg * Update config * [models/ltx2] refactor: inline Gemma3 encoding into adapter encode_prompt Eliminate delegation to pipeline.encode_prompt() and redundant tokenizer call by inlining _get_gemma_prompt_embeds logic into a new _encode_text() helper. This produces prompt_ids and negative_prompt_ids from the same tokenization pass used for embeddings. Made-with: Cursor * [samples] refactor: include negative_prompt in unique_id and extract _hash_id_fields Add negative_prompt/negative_prompt_ids to _id_fields and hash computation so samples with different negative prompts are correctly distinguished. Extract _hash_id_fields(hasher) to eliminate duplicated prompt hashing in ImageConditionSample and VideoConditionSample. Also parameterize digest length via num_bytes (default 16 = 128-bit). Made-with: Cursor * Update config * Fix bytes * Fix audio_sample_rate * [models/ltx2,logger,utils] fix: address PR review — input validation, mux guard, ndim check - Extend _check_inputs with prompt/embedding presence and CFG+negative consistency validation (Copilot comments X-GenGroup#3, X-GenGroup#5) - Narrow audio mux guard to format == 'mp4' (comment X-GenGroup#6) - Add ndim validation in standardize_audio_batch (comment X-GenGroup#7) Made-with: Cursor * [models/ltx2,utils] feat: integrate prompt enhancement with RNG isolation Add isolated_rng context manager to utils/base.py for safe global RNG seeding. Implement _enhance_prompt_batch in LTX2 adapter using official pipeline.enhance_prompt with deterministic seed and RNG state isolation, preventing seed leakage into downstream noise sampling. Enhancement is opt-in via system_prompt config (null=disabled, "default"=Lightricks prompt). Clean up placeholder enhancement code from inference(). Made-with: Cursor * Reorder unique_id priority * [reward] feat: add CLAP and ImageBind audio reward models Add two audio-specific reward models for LTX2 audio-video generation: - CLAPRewardModel: audio-text alignment via LAION CLAP (48 kHz mono, cosine similarity, zero new deps via transformers.ClapModel) - ImageBindRewardModel: audio-video semantic alignment via Meta ImageBind (16 kHz mel-spectrogram, video spatial crops, multi-mode scoring) Register both in reward registry, replace PickScore with CLAP + ImageBind in ltx2_t2av.yaml config, and update COMMIT_PLAN.md Step 11 as done. Made-with: Cursor * [docs] feat: add flat inheritance rules for adapters and trainers - Constraint X-GenGroup#11: trainers MUST inherit from BaseTrainer directly (only GRPOGuardTrainer → GRPOTrainer sanctioned) - Constraint X-GenGroup#12: adapters MUST inherit from BaseAdapter directly; shared logic uses helpers, code duplication, or mixins - Update architecture.md and cursor rule to reference both rules Made-with: Cursor * [docs] feat: add I2AV adapter and audio reward research plans - I2AV_PLAN: LTX2 Image-to-Audio-Video adapter design (BaseAdapter flat hierarchy, conditioning mask, first-frame preservation) - AUDIO_REWARD_PLAN: CLAP and ImageBind reward model integration - COMMIT_PLAN: update GPU validation status and prompt enhancement analysis Made-with: Cursor * chore: update diffusers submodule Made-with: Cursor * [docs] feat: add two-layer sample hierarchy constraint Model-specific samples must inherit from task-level samples (e.g. LTX2I2AVSample → I2AVSample), never from other model-specific samples. Updated constraint X-GenGroup#14, architecture.md, and base-class-contract rule. Made-with: Cursor * [docs] update: expand I2AV plan with full implementation details - Complete LTX2I2AVSample dataclass with all duplicated LTX2 fields - Code duplication strategy (no _common.py shared module) - encode_image with condition_image_size (flux2/qwen pattern) - Dual-path inference: raw images or pre-encoded condition_images - Full forward()/inference() structure with conditioning mask semantics - enhance_prompt I2AV multimodal difference documented Made-with: Cursor * [reward] fix: add transformers v5 compatibility for PickScore get_*_features() API In transformers >=5.0, get_text_features()/get_image_features() return BaseModelOutputWithPooling instead of a tensor. Add _extract_feature_tensor() helper to handle both v4 (tensor) and v5 (ModelOutput) return types. Made-with: Cursor * [samples,logger] feat: add I2AVSample dataclass and logger support Introduce I2AVSample (Image-to-Audio-Video) task-level sample and its logger handler, combining I2V condition-image table layout with T2AV audio-muxed LogVideo for complete I2AV logging across backends. Made-with: Cursor * [adapter,registry] feat: add LTX2 I2AV adapter for image-conditioned audio-video generation - LTX2_I2AV_Adapter(BaseAdapter) with conditioning_mask for frame-0 preservation - forward(): CFG-doubles conditioning_mask internally, per-token video timestep masking, scheduler.step on generated frames only - inference(): dual-path image input (raw PIL via encode_image or pre-encoded tensor) - _enhance_prompt_batch(): multimodal Gemma3 enhancement with raw PIL images - _standardize_image_input(): MultiImageBatch flattening for single-condition model - Registry entry 'ltx2_i2av' and example YAML config Made-with: Cursor * [adapter] fix: align MultiImageBatch/MultiVideoBatch type annotations across adapters Wan2 I2V and Wan2 V2V had correct runtime logic (is_multi_image_batch check in _standardize_*_input) but inference() and encode_image() annotations used ImageBatch/VideoBatch instead of MultiImageBatch/MultiVideoBatch. Made-with: Cursor * [docs] update: add nested batch convention to adapter docs - adapter_conventions.md: document that inference() receives MultiImageBatch/ MultiVideoBatch from collator; single-condition adapters must flatten via _standardize_*_input; append Gotcha X-GenGroup#5; add cross-refs - ff-new-model/SKILL.md: extend Pitfall X-GenGroup#6 with single-condition handling guidance and back-ref to adapter_conventions.md Gotcha X-GenGroup#5 Made-with: Cursor * [style] fix: apply PR review fixes — logging, section-divider policy, unique_id - Replace bare print() with logger.warning() in formatting.py - Update section-divider rule: allow between methods, forbid inside functions - Convert in-function decorative dividers to plain numbered comments - Include negative_prompt in unique_id hash via _hash_id_fields() refactor - Fix compute_unique_id default to 8 bytes (fits torch.int64) Made-with: Cursor * [reward,models/ltx2] fix: CLAP BatchNorm dtype + I2AV forward bugs - CLAP: load model in float32 (BatchNorm requires it); fix audios->audio deprecation - I2AV: remove conditioning_mask from _shared_fields to preserve batch dim - I2AV: ensure next_latents in return_kwargs for frame-slicing logic - Remove dtype: bfloat16 from CLAP reward config in example YAMLs Made-with: Cursor * Update config * update config * Update config * Fix dtype * Update * [diffusers] fix: cherry-pick flash_3_varlen_hub mask dtype fix Cherry-pick commit 0f8a83fa6 from diffusers to support additive (bfloat16) attention masks in _flash_attention_3_varlen_hub. This fixes the ValueError when LTX2 connectors pass non-bool masks. Made-with: Cursor * [data_utils] perf: eliminate redundant preprocessed-dataset writes Each rank's preprocessed Arrow shard is now written exactly once: the orchestrator routes Dataset.map output directly to the final per-rank location via cache_file_name= (under {merged_cache_path}.tmp/_parts/), and the consolidator writes only state.json + dataset_info.json before atomically renaming .tmp -> merged_cache_path. No row data is re-copied during the merge, no duplicate cache lands under ~/.cache/huggingface, and the build-dir sentinel _build_meta.json enables crash recovery for unchanged num_shards while wiping cleanly on num_shards changes. Single-process and distributed paths are unified through the same flow (N=1 case for single-process); enable_preprocess=False bypasses the consolidate pipeline entirely to preserve pre-refactor behavior. I/O budget per cache build drops from ~4*N*S to ~N*S bytes touched. Made-with: Cursor * [docs] update: comprehensive audio + R7 no-op-default encoder docs sweep Bring agent docs and the developer guides on feat/ltx2-audio-video-support in line with the R6/R7 BaseAdapter contract changes (PR X-GenGroup#129) AND extend every audio-aware section to cover the new modality. Critical contract fixes (mirror Wave A on data-utils-perf so the 0-hit verification gate passes on this branch independently of merge order): - constraints.md X-GenGroup#12: 7 abstract methods -> 4 (load_pipeline, decode_latents, forward, inference); new "Optional encoder overrides (no-op default)" subsection naming all 4 encoders incl. encode_audio; preprocess_func note now explains the audios dispatch and "skip when None" semantics. Trailing "Adapter hierarchy" paragraph also fixed ("7-method contract" -> "4-abstract-method contract" with explanation). - ff-new-model/SKILL.md: frontmatter description fixed; Phase-1 step-3 mapping gains audio encoder/VAE row; Phase-2 step-3 implementation table reorders so the 4 truly-abstract methods are on top, marks all 4 encoders as Abstract? No (no-op default; override if your model consumes this modality), adds the encode_audio row; Pitfall X-GenGroup#6 extended to include audios + the []-for-empty / never-unwrap contract (preserved the existing _standardize_*_input guidance and added a cross-ref to adapter_conventions.md Gotcha X-GenGroup#6). - guidance/new_model.md: Step 4 narrative replaced "three encoding methods" with "Override the encoders your model consumes"; updated the dispatcher pseudocode to enumerate all 4 modalities with the "if encoded is not None" skip; new #### encode_audio block parallel to encode_video (MultiAudioBatch signature, default return None); Step-5 inference signature comment adds "audios: Optional[MultiAudioBatch]" (commented as opt-in); checklist gains explicit encode_video() / encode_audio() items framed as override-only. Audio-aware sweep (Wave-B-only, lives here because LTX-2 actually consumes audio): - adapter_conventions.md: Batch Dimension Convention extended to include audios/MultiAudioBatch on inference(); new bullet codifying the multi-media batch homogeneity guarantee; new Gotcha X-GenGroup#6 for the []-for-empty / no-unwrap rule (applies symmetrically to images, videos, audios). - guidance/new_model.md Data Format Conventions: new ### Audio table with audios/condition_audios/audio_features rows and a callout pointing to flow_factory.utils.audio.MultiAudioBatch; cross-cutting batch-boundary callout extended to include encode_audio(). - guidance/workflow.md Stage 1: goal narrative + Input row extended to include "audio files"; new audio-symmetry callout after the Flux.2 example explaining audio_dir is the third optional input handled by _preprocess_batch. - architecture.md: Stage-1 ASCII box now lists audio + audio_features; Adapter Pattern subsection gains a one-liner that all 4 encoders are no-op by default (override only the modalities your model consumes). - ff-develop/SKILL.md sec.2 Adapter Hierarchy: appended the R7 design lesson bullet (non-abstract no-op default + opt-in override over @AbstractMethod for new modalities; the 4 abstract methods are intentionally minimal). - fix_patterns.md Recorded Fix Patterns: replaced "(No records yet)" with two full entries using the documented template — R6 multi-modal batch homogeneity and R7 non-abstract encoder defaults; both link back to the actual code locations. Pure docs change; no Python touched. Verified: zero hits across .agents/ + guidance/ for "7 abstract methods", "7-method contract", "three encoding methods", "Implement the three encoding"; encode_audio + MultiAudioBatch present in every doc per the matrix; ReadLints clean on all 8 touched files. Made-with: Cursor * [diffusers] sync: bump submodule to upstream main 77f8cf8bf Drops the locally cherry-picked commit 620286eb5 ("support ltx-2 type masking in flash_3_hub_varlen") and resets the submodule to the official huggingface/diffusers main HEAD as of 2026-04-18. Functional consequence: _flash_attention_3_varlen_hub no longer casts non-bool attn_mask via `attn_mask > -1`. Upstream already carries the `isinstance(result, tuple)` defensive unpack, so only the bool-cast is missing. Waiting for upstream to land an equivalent fix; until then, LTX2 paths that pass a non-bool attn_mask to flash-attn-3 varlen-hub may misbehave inside _normalize_attn_mask. Made-with: Cursor * [utils] feat: add move_tensors_to_device recursive helper Adds a shape-agnostic device-move utility in utils/base.py that walks list / tuple / dict containers depth-first, copying torch.Tensor leaves to the target device. Non-tensor leaves (PIL, str, int, np.ndarray) pass through unchanged. Containers are reconstructed immutably; the input is not modified. Signature: move_tensors_to_device(value, device, max_depth=None) The optional max_depth bounds recursion (None = unbounded; 0 = only move when value itself is a Tensor; N = walk N levels). Designed for the upcoming reward path device adaptation, but kept as a general utility so future callers (e.g., a future BaseSample.to refactor delegating with max_depth=1) can reuse it. Pure addition; no consumers in this commit. Made-with: Cursor * [reward] refactor: route reward inputs through move_tensors_to_device Inserts a move_tensors_to_device call between _convert_media_format and model(**batch_input) in three reward computation sites: - _compute_pointwise_batch - _compute_groupwise_group - _compute_groupwise_local (inner per-group loop) The recursive helper walks list/tuple/dict containers and copies tensor leaves to model.device. The local batch_input dict is reconstructed; sample objects are NOT mutated. Behavior with current GPU-resident samples: same-device .to() is a no-op, so reward outputs remain bit-identical. The change is defensive prep for the upcoming sample-loop CPU offload (commit 6) where samples will arrive on CPU and reward models still run on their declared device. The distributed groupwise path (_compute_groupwise_distributed) needs no change: it already passes device=self.accelerator.device to gather_samples, so its inputs are GPU-resident regardless of caller-side device. Made-with: Cursor * [hparams,trainer] feat: add offload_samples_to_cpu config and BaseTrainer helper Adds the configuration switch and the producer-side helper for the upcoming sample CPU-offload + lazy-reload pipeline. hparams/training_args.py: TrainingArguments gains a new field offload_samples_to_cpu: bool = False placed next to enable_gradient_checkpointing (sibling memory switch). The help string documents the trade-off (D2H per sample + per-reward H2D ~100ms/epoch vs sample/optimize GPU peak reduction). trainers/abc.py: BaseTrainer gains _maybe_offload_samples_to_cpu(samples), a non- abstract helper that no-ops when the config is False and otherwise walks the sample list calling BaseSample.to('cpu'). The docstring records the ordering invariant required by the consumer trainers (must be called BEFORE reward_buffer.add_samples) and points to RewardProcessor's move_tensors_to_device for the consumer-side H2D. No call sites yet -- behaviour is unchanged in this commit. The helper is wired into the five trainers' sample() loops in commit 6. Made-with: Cursor * [trainer] refactor: lazy per-batch reload in GRPO/GRPO-Guard/DPO optimize() Replaces the eager pre-stacked sample_batches list with a single per-batch loop that lazily reconstructs each micro-batch: for batch_idx in range(num_batches): batch_samples = [sample.to(device) for sample in shuffled_samples[...]] batch = BaseSample.stack(batch_samples) ... Affected sites: - trainers/grpo.py GRPOTrainer.optimize() - trainers/grpo.py GRPOGuardTrainer.optimize() - trainers/dpo.py DPOTrainer.optimize() (chosen_samples / rejected_samples extraction inside the per-pair-batch loop) Behaviour-preserving for the current GPU-resident sample buffer: every sample.to(device) is a same-device no-op, so loss values, gradients, and optimizer steps remain bit-identical to HEAD~1. The change is the consumer-side prerequisite for the upcoming sample-loop CPU offload (commit 6): once samples may be CPU-resident, this lazy reload is what keeps optimize()'s GPU footprint bounded by a single micro-batch instead of the full epoch. NFT/AWM are deliberately not touched in this commit -- their optimize() has an extra eager precompute layer that requires structural restructuring (commit 5). Made-with: Cursor * [trainer] refactor: NFT/AWM optimize() per-batch precompute interleave Restructures NFT and AWM optimize() from the previous double-pass design (eager precompute over ALL batches under sampling_context, then training over ALL batches under current params) to a single-pass per-batch interleave that matches the official DiffusionNFT and AWM implementations: for each micro-batch: 1. lazy reload sample tensors to GPU and stack into a batch dict 2. precompute under sampling policy: adapter.rollout() with sampling_context(): compute (_all_timesteps, _all_random_noise, _old_v_pred_list or _old_log_probs) for THIS batch only 3. train under current policy: adapter.train() with self.autocast(): for t_idx in range(num_train_timesteps): forward / loss / backward / optimizer step Memory savings: only the current batch's _all_random_noise plus _old_v_pred_list (NFT) or _old_log_probs (AWM) lives on GPU at any time. The previous design held all num_batches_per_epoch batches' precompute output simultaneously, costing ~5+ GB on FLUX1 1024^2 LoRA at B=4 / T=40 and tens of GB on Wan video models (often the OOM trigger). Train-inference consistency (philosophy #1; see .agents/knowledge/topics/train_inference_consistency.md item X-GenGroup#4): - Rollout (sample()/adapter.inference()) is unchanged. - EMA params are loaded via sampling_context() and restored before each batch's training forward, identical to the per-batch behavior of the previous design. - ema_step() runs only once per outer epoch in start(), so every batch within an optimize() call sees the SAME EMA snapshot regardless of interleave timing -> per-batch and eager designs are equivalent on the EMA invariant. Note on RNG (regression test guidance): randn_tensor for batch K is now called after batch K-1's backward step (vs all noises sampled upfront). The CUDA RNG consumption order changes; under the same seed, the per-batch noise sequences are NOT bit-identical to the eager design. The algorithm is unchanged (noise is augmentation; equivalent in expectation). Regression tests should use statistical metrics (loss mean / reward trend across an epoch), NOT a numeric diff of loss values, when comparing against HEAD~1. Sample-level lazy reload (`[sample.to(device) for sample in slice]`) is folded into this same restructure -- NFT/AWM now share the same lazy reload pattern as GRPO/DPO from commit 4. The KL paths (use_ref_parameters / use_ema_parameters per timestep) and all loss / backward / optimizer logic are unchanged in body and order. Made-with: Cursor * [trainer] feat: wire sample() loop CPU offload across all trainers Inserts self._maybe_offload_samples_to_cpu(sample_batch) into every trainer's sample() loop, immediately after adapter.inference() and BEFORE both samples.extend() and reward_buffer.add_samples(): sample_batch = self.adapter.inference(...) self._maybe_offload_samples_to_cpu(sample_batch) # synchronous D2H samples.extend(sample_batch) self.reward_buffer.add_samples(sample_batch) Affected sites (5 total): - GRPOTrainer.sample() (trainers/grpo.py) - GRPOGuardTrainer.sample() (trainers/grpo.py) - DiffusionNFTTrainer.sample() (trainers/nft.py) - AWMTrainer.sample() (trainers/awm.py) - DPOTrainer.sample() (trainers/dpo.py) Why BEFORE add_samples (not after): reward_buffer.add_samples() in the async-reward path records a CUDA sync_event and dispatches workers that read sample.image / sample.video / etc. Calling the offload BEFORE add_samples guarantees the recorded event captures "D2H complete + data ready on CPU"; workers wait on the event and then deterministically see CPU-resident samples. Inverse order would race the worker thread's getattr against the main thread's in-place setattr that BaseSample.to('cpu') performs. Behaviour gating: The helper short-circuits when training_args.offload_samples_to_cpu is False (the default), so the entire pipeline is wired but inert. Setting the flag to True in any trainer YAML now activates the producer side (D2H here), and the previously-landed pieces handle the consumer side: * commit 2: reward_processor moves the CPU input dict to model.device via move_tensors_to_device. * commits 4 & 5: optimize() loops lazily reload [sample.to(device) for sample in slice] per micro-batch. End-to-end VRAM saving (roughly num_batches_per_epoch x per_batch_size of sample tensors) is unlocked for the first time in this commit. Wan video YAMLs that opt in (commit 7) will exercise it. Evaluate paths are intentionally NOT touched -- eval samples are usually small and one-shot, and eval logging may rely on tensors being on the adapter device. Made-with: Cursor * [examples] feat: enable offload_samples_to_cpu for Wan video models Adds `offload_samples_to_cpu: true` to all 13 Wan video example configs (GRPO LoRA / Full and NFT LoRA / Full across Wan2.1 and Wan2.2, T2V / I2V / V2V variants). Inserted next to `enable_gradient_checkpointing` so the two memory switches sit together. Why required for video models: per-sample tensors (all_latents, condition videos, image_embeds, ...) are GB-scale on Wan; without the offload, sample()/optimize() OOMs as soon as num_batches_per_epoch > 1. The plumbing wired in commits 1-6 is now actually exercised on these configs. Files (13 total): examples/grpo/lora/wan21_t2v.yaml examples/grpo/lora/wan21_i2v.yaml examples/grpo/lora/wan21_v2v.yaml examples/grpo/lora/wan22_t2v.yaml examples/grpo/lora/wan22_i2v.yaml examples/grpo/full/wan21_t2v.yaml examples/grpo/full/wan21_i2v.yaml examples/grpo/full/wan22_t2v.yaml examples/grpo/full/wan22_i2v.yaml examples/nft/lora/wan21_t2v.yaml examples/nft/lora/wan21_i2v.yaml examples/nft/lora/wan22_t2v.yaml examples/nft/full/wan22_t2v.yaml Non-video model YAMLs are intentionally not touched in this commit; moderate-VRAM-pressure image models (Flux2, Qwen-Image-Edit-Plus) get an explicit `false` + pros/cons comment in commit 8 so users see the option as a documented decision point, while small/standard image models (FLUX1 / SD3 / Qwen-Image / Z-Image / DPO / etc.) rely on the code default `False` to avoid YAML noise. Made-with: Cursor * [examples] docs: expose offload_samples_to_cpu option in Flux2 and Qwen-Image-Edit-Plus configs Adds an explicit `offload_samples_to_cpu: false` to 13 example configs (11 Flux2 variants + 2 Qwen-Image-Edit-Plus) preceded by a multi-line comment that documents the parameter, its pros/cons, and the conditions under which a user should flip it to true: # offload_samples_to_cpu: CPU-offload sample tensor fields between # sample() and optimize() to reduce GPU peak memory. # Pros (true): saves N x per_batch_size GPU memory ... no correctness # or convergence impact. # Cons (true): adds ~100ms/epoch H2D in reward path; tiny per-batch # H2D in optimize (<5ms each). # Recommended (true) for higher resolutions, larger batch sizes, or # any sample()/optimize() OOM. Default false works for current # example settings. offload_samples_to_cpu: false Tier rationale (3-tier YAML strategy, deviating intentionally from the strict ALL-YAML rule in .cursor/rules/examples-yaml-sync.mdc): T1 (commit 7, Wan video, 13 YAMLs): explicit `true` -- required to avoid OOM. T2 (this commit, Flux2 + Qwen-Edit, 13 YAMLs): explicit `false` + pros/cons comment -- moderate VRAM pressure, decision left to the user with documentation right next to it. T3 (untouched, 23 YAMLs of FLUX1 / SD3 / Qwen-Image / Z-Image / DPO / AWM-non-Flux2 / template): no field added, sane defaults via the code-level default. The field is documented in the upcoming topics/sample_lifecycle.md (commit 9). This three-tier policy keeps T3 YAMLs noise-free while making T1/T2 both behaviour-correct (T1) and discoverable (T2). Files (13 total): examples/grpo/lora/flux2_t2i.yaml examples/grpo/lora/flux2_i2i.yaml examples/grpo/lora/flux2_klein.yaml examples/grpo/lora/flux2_klein_base.yaml examples/grpo/full/flux2_t2i.yaml examples/grpo/full/flux2_i2i.yaml examples/grpo/full/flux2_klein.yaml examples/grpo/full/flux2_klein_base.yaml examples/awm/lora/flux2_klein_base.yaml examples/nft/lora/flux2_klein_base.yaml examples/nft/full/flux2_klein_base.yaml examples/grpo/lora/qwen_image_edit_plus.yaml examples/grpo/full/qwen_image_edit_plus.yaml Made-with: Cursor * [docs] feat: add sample lifecycle topic and README routing New leaf doc .agents/knowledge/topics/sample_lifecycle.md (per .cursor/rules/agents-docs-maintenance.mdc), covering: - default sample lifecycle with the offload pipeline - the offload_samples_to_cpu switch and its effect at each stage - the 3-tier example YAML adoption matrix (T1 Wan video / T2 Flux2 + Qwen-Image-Edit-Plus / T3 the rest), including the rationale for intentionally deviating from the strict ALL-YAML rule in .cursor/rules/examples-yaml-sync.mdc - reward path device responsibility (move_tensors_to_device contract) - async-reward race-free argument for offload-before-add_samples order - NFT/AWM per-batch precompute interleave summary (memory savings, train-inference consistency, RNG-order caveat) - extra_kwargs device asymmetry caveat (rewards on CPU, advantage on GPU, neither moved by BaseSample.to) - Cross-refs to constraints X-GenGroup#11, X-GenGroup#14, X-GenGroup#15, train_inference_consistency item X-GenGroup#4, dtype_precision README.md routing table gains a corresponding row pointing at the new topic with explicit triggers (sample/optimize data flow changes, debugging sample/optimize OOM, adding high-resolution / video example configs). The new topic is the authoritative reference for the offload_samples_to_cpu switch -- T3 YAMLs do not list the field, so users discover it through this routing entry. This is the documentation closing the 9-commit refactor of the sample CPU offload + lazy reload pipeline (commits 1 through 8). No code changes in this commit. Made-with: Cursor * [trainer] docs: simplify NFT/AWM optimize() docstrings The previous docstrings (introduced in commit 49e7b49) inlined the full memory analysis, train-inference consistency proof, and RNG-order caveat -- ~30 lines each. All of that material lives in the authoritative .agents/knowledge/topics/sample_lifecycle.md (commit 8dc4a78), so the docstrings now keep only the essential "what": - one-line summary - per-batch interleave shape (3-step pipeline) - AWM-specific note on decoupled sampling/training timesteps - pointer to topics/sample_lifecycle.md for "why" Net change: -33 lines across the two files; bodies unchanged. Made-with: Cursor * [ltx2] fix: handle batched per-sample timestep in adapter forward() `t.expand(batch_size * 2)` raised RuntimeError when forward() was called during training with `t` of shape (B,) (distinct per-sample timesteps from `batch['timesteps'][:, ti]`). `expand` cannot stretch a non-singleton dim from B to 2B. Normalize `t` to (B,) at function entry with fail-fast shape validation, then use `torch.cat([t, t])` for the CFG-doubled batch (matching the `torch.cat([lat, lat])` ordering of `[t0..tB-1, t0..tB-1]`). Accepts 0-D scalar (inference), (1,) singleton, and (B,) per-sample inputs; other shapes raise an informative ValueError. Applied identically to LTX2_T2AV_Adapter and LTX2_I2AV_Adapter. Made-with: Cursor * [data_utils] fix: stabilize preprocess cache key via deep signature collection compute_cache_path previously hashed ALL preprocess_kwargs — including training-infrastructure fields like num_batches_per_epoch and gradient_accumulation_steps that leak through **training_args unpacking and filter_kwargs pass-through. These fields have non-deterministic values across launches (world-size-derived, __post_init__ timing, etc.), causing merged_cache_path to differ even with identical YAML and force_reprocess=False. Fix: add _select_cache_relevant_kwargs() which uses "deep signature collection" — it inspects the named parameters of preprocess_func AND (when preprocess_func accepts **kwargs and is a bound adapter method) the named parameters of all encode_* forwarding targets on the same adapter instance. Only kwargs whose key appears in this union are included in kwargs_hash. Training-only fields that no encoder declares are excluded. Safety: over-hash (hashing an encoder param that doesn't run at runtime) is harmless; under-hash (missing a param that affects output) is prevented by collecting from all four encoder methods regardless of runtime data presence. Made-with: Cursor * [examples] switch LoRA configs from DDP to DeepSpeed ZeRO-2 DeepSpeed ZeRO-2 provides optimizer-state sharding with negligible overhead, making plain multi_gpu (DDP) redundant for LoRA training. Made-with: Cursor * [examples] docs: note LTX-2.3-Diffusers as an option in LTX2 configs Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [examples] fix: enable offload_samples_to_cpu for LTX2 video configs Matches Wan video-model configs. Without this, per-sample audio+video tensors are GB-scale and sample()/optimize() OOMs at num_batches_per_epoch > 1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [examples] docs: rewrite stale Qwen attn_backend comment for LTX2 Replace misleading "for Qwen-Image Series" comment with a model-agnostic description of available backend options. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [models/ltx2,rewards] style: apply black and isort to new PR files Cosmetic-only changes (import reorder, string-quote normalization, long-line wrapping). Scoped to the 5 files this PR adds; existing unclean files on main are out of scope. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Fix dtype source * Update submodule * [models,docs] refactor: align CFG handling across all adapters with forward-stage warning Ensure every CFG-capable adapter follows a consistent two-stage pattern: 1. encode_prompt: derive do_classifier_free_guidance from guidance_scale (>1.0 standard, >0.0 for Z-Image); default negative_prompt to "" when None. 2. forward: if guidance_scale > threshold but negative_prompt_embeds is None, emit logger.warning and gracefully fallback to the no-CFG path. Adapter-specific changes: - flux2_klein: extract do_classifier_free_guidance variable in _forward - sd3_5: add forward warning; migrate to setup_logger; drop unused import - z_image: add forward warning (threshold > 0.0) - wan2_t2v/v2v/i2v: add forward warning; unify negative_prompt expansion - qwen_image: add guidance_scale to encode_prompt; add forward warning - qwen_image_edit_plus: add guidance_scale to encode_prompt; rename true_cfg_scale/do_true_cfg to guidance_scale/do_classifier_free_guidance; add _forward warning - ltx2_t2av/i2av: add forward warning for multi-guidance (video + audio) Document the CFG convention in .agents/knowledge/topics/adapter_conventions.md with reference implementation, model-specific extensions table, and gotcha X-GenGroup#7. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * [examples,docs,agents] refactor: restructure examples, add LTX-2 docs, sync agent knowledge - Restructure examples/ to algorithm/ft/model/variant.yaml with examples/README.md - Add LTX-2/2.3 to README (News, model table, install note) - Add .scratch/ constraint for agent temp files (X-GenGroup#28), examples convention (X-GenGroup#29) - Sync agent knowledge: GroupDistributedSampler in samplers.md, LTX2 + RationalRewards in architecture.md - Clean up .docs/ltx2-research/ dev artifacts - Update LTX2 configs: guidance_scale=1.0, comment out attn_backend Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.