Skip to content

Commit b630229

Browse files
committed
docs: fix documentation drift across evaluators, trainers, and clients references
- TrainingContext field list corrected: optimizer, lr_scheduler, metadata, and run_history are not on TrainingContext; note where each actually lives - Composing Trainers example fixed: replaced non-existent _configure_composable with direct super().__init__() call and corrected GradientClipStepStrategy to GradientClippingStepStrategy - context.callback_handler clarified: TrainingContext has no callback_handler; use trainer.callback_handler from within strategies - FedNova removed from trainer strategy list; noted it is server-side only - nanochat_core marked as not registry-registered in built-in evaluators table with explicit note that it requires trainer.type = nanochat - Runtime flow step 3 expanded to describe both the registry path and the evaluator_override path used by the nanochat trainer - evaluation.md nanochat_core example now includes [trainer] type = nanochat and a warning admonition about the trainer-type requirement - clients.md import example made consistent (all five Default strategies from plato.clients.strategies, not a split across two sub-paths)
1 parent 0a9e7a0 commit b630229

4 files changed

Lines changed: 64 additions & 31 deletions

File tree

docs/docs/configurations/evaluation.md

Lines changed: 14 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,10 @@ If `[evaluation]` is omitted, Plato only records the trainer's normal scalar met
2323
Built-in values include:
2424

2525
- `lighteval` for Hugging Face's Lighteval benchmark runner.
26-
- `nanochat_core` for Nanochat's CORE benchmark.
26+
- `nanochat_core` for Nanochat's CORE benchmark. **Requires `trainer.type = "nanochat"`.**
27+
This evaluator is not registered in the general evaluator registry; it is wired
28+
internally by the nanochat trainer. Using it with any other trainer type produces
29+
no evaluation output and no error.
2730

2831
!!! example "fail_on_error"
2932
Whether evaluator failures should abort the run.
@@ -37,7 +40,7 @@ If `[evaluation]` is omitted, Plato only records the trainer's normal scalar met
3740
| Evaluator | Install path | Primary output style | Typical use |
3841
| --- | --- | --- | --- |
3942
| `lighteval` | `uv sync --extra llm_eval` | Named benchmark metrics such as `ifeval_avg` and `arc_avg` | Server-side LLM evaluation |
40-
| `nanochat_core` | `uv sync --extra nanochat` | `core_metric` | Nanochat benchmark runs |
43+
| `nanochat_core` | `uv sync --extra nanochat` | `core_metric` | Nanochat benchmark runs — requires `trainer.type = "nanochat"` |
4144

4245
## Lighteval
4346

@@ -176,7 +179,16 @@ Nanochat's CORE benchmark is also available through `[evaluation]`.
176179

177180
### Example
178181

182+
!!! warning "Requires the nanochat trainer"
183+
`nanochat_core` is only wired up when `trainer.type = "nanochat"`. The nanochat
184+
trainer creates the evaluator internally rather than looking it up in the registry.
185+
Setting `[evaluation] type = "nanochat_core"` with any other trainer type silently
186+
produces no evaluation output.
187+
179188
```toml
189+
[trainer]
190+
type = "nanochat"
191+
180192
[evaluation]
181193
type = "nanochat_core"
182194
max_per_task = 16

docs/docs/references/clients.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,10 +35,10 @@ from plato.clients import base
3535
from plato.clients.strategies import (
3636
DefaultCommunicationStrategy,
3737
DefaultLifecycleStrategy,
38+
DefaultPayloadStrategy,
3839
DefaultReportingStrategy,
3940
DefaultTrainingStrategy,
4041
)
41-
from plato.clients.strategies.defaults import DefaultPayloadStrategy
4242

4343

4444
class AugmentedPayloadStrategy(DefaultPayloadStrategy):

docs/docs/references/evaluators.md

Lines changed: 17 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,11 @@ The evaluation path is:
1010

1111
1. `TestingStrategy.test_model(...)` computes the trainer's scalar test metric.
1212
2. `plato.evaluators.runner.run_configured_evaluation(...)` reads `Config().evaluation`.
13-
3. The evaluator registry instantiates the requested evaluator.
13+
3. The evaluator is resolved in one of two ways:
14+
- For `lighteval` (and any custom registered evaluator), the evaluator registry
15+
looks up the factory by name and instantiates it.
16+
- For `nanochat_core`, the nanochat trainer pre-builds a `NanochatCoreEvaluator`
17+
and passes it as `evaluator_override`; the registry is bypassed entirely.
1418
4. The evaluator returns an `EvaluationResult`.
1519
5. Plato stores the serialized payload in `TrainingContext.state` under:
1620
- `evaluation_results`
@@ -64,10 +68,18 @@ def evaluate(self, request: EvaluationInput) -> EvaluationResult:
6468

6569
## Built-in evaluators
6670

67-
| Name | Class | Notes |
68-
| --- | --- | --- |
69-
| `lighteval` | `plato.evaluators.lighteval.LightevalEvaluator` | Server-side LLM evaluation through Hugging Face Lighteval. |
70-
| `nanochat_core` | `plato.evaluators.nanochat_core.NanochatCoreEvaluator` | Nanochat CORE benchmark integration. |
71+
| Name | Class | Registration | Notes |
72+
| --- | --- | --- | --- |
73+
| `lighteval` | `plato.evaluators.lighteval.LightevalEvaluator` | Auto-registered via `registry.register` | Server-side LLM evaluation through Hugging Face Lighteval. |
74+
| `nanochat_core` | `plato.evaluators.nanochat_core.NanochatCoreEvaluator` | **Not** registry-registered; wired by the nanochat trainer only | Nanochat CORE benchmark integration. Requires `trainer.type = "nanochat"`. |
75+
76+
!!! note "nanochat_core availability"
77+
`nanochat_core` is **not** registered in the evaluator registry. Plato's nanochat
78+
trainer (`plato/trainers/nanochat.py`) creates a `NanochatCoreEvaluator` directly
79+
and supplies it as an override when `[evaluation] type = "nanochat_core"` is set.
80+
Using this evaluator type with any other trainer (e.g., `HuggingFace`, `basic`,
81+
or `composable`) produces no evaluation output and no error — the runner silently
82+
skips it.
7183

7284
## Evaluator registry
7385

docs/docs/references/trainers.md

Lines changed: 32 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -62,45 +62,54 @@ perplexity, loss, and so on), and an optional `[evaluation]` section can then
6262
run a named benchmark adapter such as Lighteval or Nanochat CORE. See
6363
[Evaluators](evaluators.md) for that layer.
6464

65-
Each concrete strategy inherits optional `setup`/`teardown` hooks and can emit
66-
callback events via `context.callback_handler`.
65+
Each concrete strategy inherits optional `setup`/`teardown` hooks. To fire
66+
callback events from within a strategy, hold a reference to the trainer and
67+
call `trainer.callback_handler.call_event(...)` directly. The
68+
`TrainingContext` passed to strategies does not carry a `callback_handler`
69+
attribute; only `ClientContext` (for client strategies) does.
6770

6871
## Composing Trainers
6972

7073
`ComposableTrainer` accepts either concrete strategy instances or `None` for the defaults. You can start from `plato.trainers.basic.Trainer` (which simply wraps the defaults) and override only the pieces you need:
7174

7275
```py
73-
from plato.trainers.basic import Trainer
74-
from plato.trainers.strategies.training_step import GradientClipStepStrategy
76+
from plato.trainers.composable import ComposableTrainer
77+
from plato.trainers.strategies.training_step import GradientClippingStepStrategy
7578

76-
class ClippedTrainer(Trainer):
79+
class ClippedTrainer(ComposableTrainer):
7780
def __init__(self, *, model=None, callbacks=None, max_norm=1.0):
78-
super().__init__(model=model, callbacks=callbacks)
79-
self._configure_composable(
80-
loss_strategy=self.loss_strategy,
81-
optimizer_strategy=self.optimizer_strategy,
82-
training_step_strategy=GradientClipStepStrategy(max_norm=max_norm),
83-
lr_scheduler_strategy=self.lr_scheduler_strategy,
84-
model_update_strategy=self.model_update_strategy,
85-
data_loader_strategy=self.data_loader_strategy,
86-
testing_strategy=self.testing_strategy,
81+
super().__init__(
82+
model=model,
83+
callbacks=callbacks,
84+
training_step_strategy=GradientClippingStepStrategy(max_norm=max_norm),
85+
# All other strategies default to their standard implementations.
8786
)
8887
```
8988

90-
Strategies can also be registered in experiment configs—see the references under
91-
`plato.trainers.strategies` for ready-made options such as FedNova, Scaffold,
92-
and adaptation methods.
89+
See the references under `plato.trainers.strategies` for ready-made options
90+
such as Scaffold, FedProx, FedDyn, and personalised-FL adaptation strategies.
91+
FedNova is a server-side aggregation algorithm and lives under
92+
`plato.servers.strategies`, not the trainer strategies.
9393

9494
## Trainer Context and Run History
9595

96-
`TrainingContext` exposes:
96+
`TrainingContext` carries the following fields:
97+
98+
- `model`: the neural network being trained.
99+
- `device`: the active `torch.device`.
100+
- `client_id`, `current_round`, `current_epoch`: round/epoch counters.
101+
- `config`: the training configuration dictionary for the current round.
102+
- `state`: a plain dictionary for cross-strategy coordination at runtime.
97103

98-
- `model`, `optimizer`, `lr_scheduler`, and active data loaders.
99-
- `client_id`, `current_round`, `current_epoch`, and `device`.
100-
- `state` and `metadata` dictionaries for cross-strategy coordination.
101-
- `run_history`, which records loss and accuracy per epoch/round.
104+
Note that `optimizer`, `lr_scheduler`, and `run_history` are attributes of
105+
`ComposableTrainer` itself, not of `TrainingContext`. The active data loader
106+
is stored at `context.state["train_loader"]` during training. A `metadata`
107+
dictionary exists on `ClientContext` (for client strategies) but not on
108+
`TrainingContext`.
102109

103-
Use these fields instead of storing state on the trainer subclass directly.
110+
Prefer `context.state` for sharing transient values between strategies, and
111+
`trainer.run_history` when you need to read or update per-epoch metrics from
112+
callbacks.
104113

105114
## Structured Evaluators and Trainer State
106115

0 commit comments

Comments
 (0)