chore(core): integrate component tasks via supervision trees#1942
Conversation
Binary Size Analysis (Agent Data Plane)Baseline: de7b786 · Comparison: 636fc4a · diff ✅ Binary size difference within thresholdChanges by Module
Detailed Symbol Changes |
Regression Detector (Agent Data Plane)Run ID: Optimization Goals: ✅ No significant changes detectedFine details of change detection per experiment (5)Experiments configured
Bounds Checks: ✅ Passed (5)
ExplanationA change is flagged as a regression when |Δ mean %| > 5.00% in the regressing direction for its optimization goal AND SMP marks the experiment as a regression ( |
This comment has been minimized.
This comment has been minimized.
4f8f870 to
636fc4a
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 636fc4ac07
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| let mut topology_shutdown_trigger = Some(topology_shutdown_trigger); | ||
| loop { | ||
| select! { | ||
| result = &mut run => return result.map_err(Into::into), |
There was a problem hiding this comment.
Preserve shutdown timeout failures
When shutdown is requested and a component does not exit before shutdown_timeout, the per-component supervisor aborts that worker, but supervisor shutdown drains workers without surfacing those aborts as an error; this line then maps the topology supervisor's requested-shutdown Ok(()) straight through. That regresses the previous RunningTopology::shutdown_with_timeout behavior, which returned an error after the timeout, so ADP can now log a successful shutdown even when a component had to be forcefully stopped.
Useful? React with 👍 / 👎.

Summary
This PR moves the management of component tasks over to supervision trees.
After adding support for managing dynamic child processes in #1874, we're following up with moving the management of component tasks themselves over to supervision trees to work towards full alignment of all asynchronous tasks being managed by a supervisor.
The main change here is that all components run under supervision (duh!) but it's how we do it that matters most. We've designed the topology supervisor construction such that every component runs under its own supervisor, and it gets access to a handle to spawn dynamic child processes on that supervisor. Coupled together with the support we added for significant children in #1874, and this gives us a mechanism to dynamically spawn child processes on a per-component basis that functionally are scoped to the component such that when the component dies, the dynamic child processes also die to. This is poor man's structured concurrency.
None of this replaces (yet) stuff like the global thread pool, which is specifically about compute-bound tasks rather than structured concurrency by itself... but we'll approach that in a future PR.
Overall, I'm not entirely thrilled with the boilerplate traded into here, and I also plan to get rid of most of it in a follow-up PR, but it's deeply tied to the patterns we use for defining component bounds, health check names, and so on... it requires a holistic refactoring that unifies everything in a single go, and it would have just been too much of a change for a single PR. 😅
Change Type
How did you test this PR?
Existing tests.
References
DADP-2