docs: fix documentation drift across evaluators, trainers, and clients references

baochunli · baochunli · commit b6302299d2fb · 2026-04-01T18:55:45.000-04:00
- TrainingContext field list corrected: optimizer, lr_scheduler, metadata,
  and run_history are not on TrainingContext; note where each actually lives
- Composing Trainers example fixed: replaced non-existent _configure_composable
  with direct super().__init__() call and corrected GradientClipStepStrategy
  to GradientClippingStepStrategy
- context.callback_handler clarified: TrainingContext has no callback_handler;
  use trainer.callback_handler from within strategies
- FedNova removed from trainer strategy list; noted it is server-side only
- nanochat_core marked as not registry-registered in built-in evaluators table
  with explicit note that it requires trainer.type = nanochat
- Runtime flow step 3 expanded to describe both the registry path and the
  evaluator_override path used by the nanochat trainer
- evaluation.md nanochat_core example now includes [trainer] type = nanochat
  and a warning admonition about the trainer-type requirement
- clients.md import example made consistent (all five Default strategies from
  plato.clients.strategies, not a split across two sub-paths)
diff --git a/docs/docs/configurations/evaluation.md b/docs/docs/configurations/evaluation.md
@@ -23,7 +23,10 @@ If `[evaluation]` is omitted, Plato only records the trainer's normal scalar met
     Built-in values include:
 
     - `lighteval` for Hugging Face's Lighteval benchmark runner.
-    - `nanochat_core` for Nanochat's CORE benchmark.
+    - `nanochat_core` for Nanochat's CORE benchmark. **Requires `trainer.type = "nanochat"`.**
+      This evaluator is not registered in the general evaluator registry; it is wired
+      internally by the nanochat trainer. Using it with any other trainer type produces
+      no evaluation output and no error.
 
 !!! example "fail_on_error"
     Whether evaluator failures should abort the run.
@@ -37,7 +40,7 @@ If `[evaluation]` is omitted, Plato only records the trainer's normal scalar met
 | Evaluator | Install path | Primary output style | Typical use |
 | --- | --- | --- | --- |
 | `lighteval` | `uv sync --extra llm_eval` | Named benchmark metrics such as `ifeval_avg` and `arc_avg` | Server-side LLM evaluation |
-| `nanochat_core` | `uv sync --extra nanochat` | `core_metric` | Nanochat benchmark runs |
+| `nanochat_core` | `uv sync --extra nanochat` | `core_metric` | Nanochat benchmark runs — requires `trainer.type = "nanochat"` |
 
 ## Lighteval
 
@@ -176,7 +179,16 @@ Nanochat's CORE benchmark is also available through `[evaluation]`.
 
 ### Example
 
+!!! warning "Requires the nanochat trainer"
+    `nanochat_core` is only wired up when `trainer.type = "nanochat"`. The nanochat
+    trainer creates the evaluator internally rather than looking it up in the registry.
+    Setting `[evaluation] type = "nanochat_core"` with any other trainer type silently
+    produces no evaluation output.
+
 ```toml
+[trainer]
+type = "nanochat"
+
 [evaluation]
 type = "nanochat_core"
 max_per_task = 16
diff --git a/docs/docs/references/clients.md b/docs/docs/references/clients.md
@@ -35,10 +35,10 @@ from plato.clients import base
 from plato.clients.strategies import (
     DefaultCommunicationStrategy,
     DefaultLifecycleStrategy,
+    DefaultPayloadStrategy,
     DefaultReportingStrategy,
     DefaultTrainingStrategy,
 )
-from plato.clients.strategies.defaults import DefaultPayloadStrategy
 
 
 class AugmentedPayloadStrategy(DefaultPayloadStrategy):
diff --git a/docs/docs/references/evaluators.md b/docs/docs/references/evaluators.md
@@ -10,7 +10,11 @@ The evaluation path is:
 
 1. `TestingStrategy.test_model(...)` computes the trainer's scalar test metric.
 2. `plato.evaluators.runner.run_configured_evaluation(...)` reads `Config().evaluation`.
-3. The evaluator registry instantiates the requested evaluator.
+3. The evaluator is resolved in one of two ways:
+   - For `lighteval` (and any custom registered evaluator), the evaluator registry
+     looks up the factory by name and instantiates it.
+   - For `nanochat_core`, the nanochat trainer pre-builds a `NanochatCoreEvaluator`
+     and passes it as `evaluator_override`; the registry is bypassed entirely.
 4. The evaluator returns an `EvaluationResult`.
 5. Plato stores the serialized payload in `TrainingContext.state` under:
    - `evaluation_results`
@@ -64,10 +68,18 @@ def evaluate(self, request: EvaluationInput) -> EvaluationResult:
 
 ## Built-in evaluators
 
-| Name | Class | Notes |
-| --- | --- | --- |
-| `lighteval` | `plato.evaluators.lighteval.LightevalEvaluator` | Server-side LLM evaluation through Hugging Face Lighteval. |
-| `nanochat_core` | `plato.evaluators.nanochat_core.NanochatCoreEvaluator` | Nanochat CORE benchmark integration. |
+| Name | Class | Registration | Notes |
+| --- | --- | --- | --- |
+| `lighteval` | `plato.evaluators.lighteval.LightevalEvaluator` | Auto-registered via `registry.register` | Server-side LLM evaluation through Hugging Face Lighteval. |
+| `nanochat_core` | `plato.evaluators.nanochat_core.NanochatCoreEvaluator` | **Not** registry-registered; wired by the nanochat trainer only | Nanochat CORE benchmark integration. Requires `trainer.type = "nanochat"`. |
+
+!!! note "nanochat_core availability"
+    `nanochat_core` is **not** registered in the evaluator registry. Plato's nanochat
+    trainer (`plato/trainers/nanochat.py`) creates a `NanochatCoreEvaluator` directly
+    and supplies it as an override when `[evaluation] type = "nanochat_core"` is set.
+    Using this evaluator type with any other trainer (e.g., `HuggingFace`, `basic`,
+    or `composable`) produces no evaluation output and no error — the runner silently
+    skips it.
 
 ## Evaluator registry
 
diff --git a/docs/docs/references/trainers.md b/docs/docs/references/trainers.md
@@ -62,45 +62,54 @@ perplexity, loss, and so on), and an optional `[evaluation]` section can then
 run a named benchmark adapter such as Lighteval or Nanochat CORE. See
 [Evaluators](evaluators.md) for that layer.
 
-Each concrete strategy inherits optional `setup`/`teardown` hooks and can emit
-callback events via `context.callback_handler`.
+Each concrete strategy inherits optional `setup`/`teardown` hooks. To fire
+callback events from within a strategy, hold a reference to the trainer and
+call `trainer.callback_handler.call_event(...)` directly. The
+`TrainingContext` passed to strategies does not carry a `callback_handler`
+attribute; only `ClientContext` (for client strategies) does.
 
 ## Composing Trainers
 
 `ComposableTrainer` accepts either concrete strategy instances or `None` for the defaults. You can start from `plato.trainers.basic.Trainer` (which simply wraps the defaults) and override only the pieces you need:
 
 ```py
-from plato.trainers.basic import Trainer
-from plato.trainers.strategies.training_step import GradientClipStepStrategy
+from plato.trainers.composable import ComposableTrainer
+from plato.trainers.strategies.training_step import GradientClippingStepStrategy
 
-class ClippedTrainer(Trainer):
+class ClippedTrainer(ComposableTrainer):
     def __init__(self, *, model=None, callbacks=None, max_norm=1.0):
-        super().__init__(model=model, callbacks=callbacks)
-        self._configure_composable(
-            loss_strategy=self.loss_strategy,
-            optimizer_strategy=self.optimizer_strategy,
-            training_step_strategy=GradientClipStepStrategy(max_norm=max_norm),
-            lr_scheduler_strategy=self.lr_scheduler_strategy,
-            model_update_strategy=self.model_update_strategy,
-            data_loader_strategy=self.data_loader_strategy,
-            testing_strategy=self.testing_strategy,
+        super().__init__(
+            model=model,
+            callbacks=callbacks,
+            training_step_strategy=GradientClippingStepStrategy(max_norm=max_norm),
+            # All other strategies default to their standard implementations.
         )
 ```
 
-Strategies can also be registered in experiment configs—see the references under
-`plato.trainers.strategies` for ready-made options such as FedNova, Scaffold,
-and adaptation methods.
+See the references under `plato.trainers.strategies` for ready-made options
+such as Scaffold, FedProx, FedDyn, and personalised-FL adaptation strategies.
+FedNova is a server-side aggregation algorithm and lives under
+`plato.servers.strategies`, not the trainer strategies.
 
 ## Trainer Context and Run History
 
-`TrainingContext` exposes:
+`TrainingContext` carries the following fields:
+
+- `model`: the neural network being trained.
+- `device`: the active `torch.device`.
+- `client_id`, `current_round`, `current_epoch`: round/epoch counters.
+- `config`: the training configuration dictionary for the current round.
+- `state`: a plain dictionary for cross-strategy coordination at runtime.
 
-- `model`, `optimizer`, `lr_scheduler`, and active data loaders.
-- `client_id`, `current_round`, `current_epoch`, and `device`.
-- `state` and `metadata` dictionaries for cross-strategy coordination.
-- `run_history`, which records loss and accuracy per epoch/round.
+Note that `optimizer`, `lr_scheduler`, and `run_history` are attributes of
+`ComposableTrainer` itself, not of `TrainingContext`. The active data loader
+is stored at `context.state["train_loader"]` during training. A `metadata`
+dictionary exists on `ClientContext` (for client strategies) but not on
+`TrainingContext`.
 
-Use these fields instead of storing state on the trainer subclass directly.
+Prefer `context.state` for sharing transient values between strategies, and
+`trainer.run_history` when you need to read or update per-epoch metrics from
+callbacks.
 
 ## Structured Evaluators and Trainer State