Skip to content

[Feat] Extend EMA schedules#26

Merged
Jayce-Ping merged 6 commits into
mainfrom
ema
Feb 6, 2026
Merged

[Feat] Extend EMA schedules#26
Jayce-Ping merged 6 commits into
mainfrom
ema

Conversation

@Jayce-Ping

Copy link
Copy Markdown
Collaborator

No description provided.

@Jayce-Ping Jayce-Ping merged commit 857553b into main Feb 6, 2026
@Jayce-Ping Jayce-Ping deleted the ema branch February 6, 2026 23:26
87003697 pushed a commit to 87003697/Flow-Factory that referenced this pull request Apr 17, 2026
…ctured rewards

Replace the ad-hoc scalar UnifiedReward family with the structured
pointwise family that mirrors upstream UnifiedReward-2.0's official
ACS (image) and APS (video) API prompts, so Flow-Factory can drive
GRPO/NFT/AWM training against a vLLM-served UnifiedReward-2.0-qwen3vl
scorer.

Rewards layer (src/flow_factory/rewards/unified_reward.py):
- Keep UnifiedRewardAPIBase (OpenAI-compatible transport, semaphore,
  retries, FIFO text cache) and document the
  _replace_nan_with_mean batch-mean fallback as a Constraint X-GenGroup#26
  fail-fast exemption matching Pref-GRPO's policy.
- Delete the scalar family (UnifiedRewardScalarPointwiseBase,
  UnifiedRewardImageGenRewardModel, UnifiedRewardVideoGenRewardModel)
  -- no matching upstream 2.0 prompt and no users.
- UnifiedRewardStructuredPointwiseBase now owns _pack_results: packs
  (aggregated, per_axis) tuples into RewardModelOutput, filling
  aggregated NaNs with the batch mean and exposing per-axis scores
  as {axis}_scores in extra_info.
- UnifiedRewardImageGenACSRewardModel (unified_reward_image_acs):
  aligned to PointwiseRewardModel.__call__ signature, factored into
  _build_cache_key / _build_messages / _score_single. Prompt
  template uses a __PROMPT__ placeholder with str.replace so user
  captions with braces no longer KeyError.
- UnifiedRewardVideoGenAPSRewardModel (unified_reward_video_aps):
  adds max_frames (default 16, matches the upstream APS script) and
  np.linspace-based uniform frame sampling; supports I2V by
  accepting condition_images and prepending the first reference
  image to the frame sequence (its hash is folded into the cache
  key so different references do not alias).

Registry + examples:
- registry.py drops the two deprecated keys; only
  unified_reward_image_acs / unified_reward_video_aps are exposed.
- Add examples/grpo/lora/flux1_unified_reward_t2i.yaml (FLUX.1 T2I
  ACS) and examples/grpo/lora/wan21_i2v_unified_reward.yaml
  (Wan2.1 I2V APS with condition_images + max_frames: 16), both
  with async_reward: true and a documented prerequisite comment
  block for spinning up the vLLM server.

Docs:
- guidance/rewards.md: trim scalar-family rows from the reward
  table, simplify the class hierarchy diagram, add the APS
  max_frames note and the I2V condition_images behaviour.
- .agents/knowledge/architecture.md: drop deprecated rows from the
  reward registry table.

Verification:
- Python smoke test (CUDA_VISIBLE_DEVICES="" against a live
  CodeGoat24/UnifiedReward-2.0-qwen3vl-8b vLLM server on :8080)
  returns rewards in [0,1] and non-zero alignment / coherence /
  style scores for both samples, no NaN warnings.
- ff-train on examples/grpo/lora/flux1_unified_reward_t2i.yaml runs
  through the first epoch cleanly:
  * Step 0000 eval/reward_unified_reward_image_acs mean=0.6613
    std=0.0421
  * Step 0000 train/reward_unified_reward_image_acs mean=0.6399
    std=0.0481 zero_std_ratio=0
  * Async rewards path engaged; no UnifiedReward API failure
    warnings observed.

Made-with: Cursor
Jayce-Ping added a commit that referenced this pull request Jun 27, 2026
… plugin layer (lossless)

Introduce a registry-based acceleration plugin layer that respects the
algorithm/model decoupling, plus the first lossless accelerators.

- acceleration/: BaseAccelerator (safety/stage contract), registry with
  direct-path fallback, paradigm-gated validator, CompileAccelerator
  (torch.compile, regional/full), AttentionBackendAccelerator (exact backends).
- hparams: AccelerationArguments with shared/rollout slots; wired into Arguments
  (field + nested_map) and exported. Off by default (backward compatible).
- trainers: BaseTrainer builds and validates accelerators after prepare, applies
  the shared accelerator via setup() and wraps the Stage-3 rollout loop with the
  rollout accelerator context; per-trainer paradigm tags (coupled/decoupled/
  distillation) drive the lossy-safety gate (constraints.md #7, #20a, #26).
Jayce-Ping added a commit to Jayce-Ping/Flow-Factory-Private that referenced this pull request Jul 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant