chore(core): integrate component tasks via supervision trees by tobz · Pull Request #1942 · DataDog/saluki

tobz · 2026-06-29T13:29:10Z

Summary

This PR moves the management of component tasks over to supervision trees.

After adding support for managing dynamic child processes in #1874, we're following up with moving the management of component tasks themselves over to supervision trees to work towards full alignment of all asynchronous tasks being managed by a supervisor.

The main change here is that all components run under supervision (duh!) but it's how we do it that matters most. We've designed the topology supervisor construction such that every component runs under its own supervisor, and it gets access to a handle to spawn dynamic child processes on that supervisor. Coupled together with the support we added for significant children in #1874, and this gives us a mechanism to dynamically spawn child processes on a per-component basis that functionally are scoped to the component such that when the component dies, the dynamic child processes also die to. This is poor man's structured concurrency.

None of this replaces (yet) stuff like the global thread pool, which is specifically about compute-bound tasks rather than structured concurrency by itself... but we'll approach that in a future PR.

Overall, I'm not entirely thrilled with the boilerplate traded into here, and I also plan to get rid of most of it in a follow-up PR, but it's deeply tied to the patterns we use for defining component bounds, health check names, and so on... it requires a holistic refactoring that unifies everything in a single go, and it would have just been too much of a change for a single PR. 😅

Change Type

Bug fix
New feature
Non-functional (chore, refactoring, docs)
Performance

How did you test this PR?

Existing tests.

References

DADP-2

tobz · 2026-06-29T13:29:32Z

This stack of pull requests is managed by Graphite. Learn more about stacking.

pr-commenter · 2026-06-29T13:37:32Z

Binary Size Analysis (Agent Data Plane)

Baseline: de7b786 · Comparison: 636fc4a · diff
Analysis Configuration: stripped binaries · Pass/Fail Threshold: +5%
Sizes: 40.49 MiB (baseline) vs 40.54 MiB (comparison)
Size Change: +45.73 KiB (+0.11%)

✅ Binary size difference within threshold

Changes by Module

Module	File Size	Symbols
`prost`	+40.20 KiB	58
`tonic_prost`	-32.19 KiB	11
`figment`	-31.59 KiB	138
`h2`	+27.29 KiB	121
`tokio`	-26.59 KiB	510
`saluki_components::sources::dogstatsd`	+24.66 KiB	30
`saluki_components::common::datadog`	+21.00 KiB	85
`saluki_core::topology::component_worker`	+18.03 KiB	25
`&mut rmp_serde`	+15.98 KiB	15
`datadog_protos::trace_piecemeal_include::datadog`	+15.17 KiB	15
`piecemeal`	-15.14 KiB	17
`[sections]`	+13.06 KiB	8
`datadog_protos::trace_include::stats`	+13.01 KiB	7
`hyper_util`	-12.24 KiB	7
`saluki_core::observability::metrics`	+11.27 KiB	23
`saluki_core::topology::built`	+10.01 KiB	26
`saluki_common::task::instrument`	+9.89 KiB	30
`quick_cache`	-9.48 KiB	14
`saluki_core::runtime::supervisor`	-9.36 KiB	14
`rustls`	-9.19 KiB	9

Detailed Symbol Changes

    FILE SIZE        VM SIZE    
 --------------  -------------- 
  +0.4% +40.1Ki  +0.3% +23.4Ki    [12066 Others]
  [NEW] +35.6Ki  [NEW] +35.4Ki    _<saluki_components::forwarders::otlp::OtlpForwarder as saluki_core::components::forwarders::Forwarder>::run::_{{closure}}::h0d88aff4d972d2cd
  [NEW] +35.3Ki  [NEW] +35.1Ki    _<saluki_components::transforms::aggregate::Aggregate as saluki_core::components::transforms::Transform>::run::_{{closure}}::h921cc9074c46833e
  [NEW] +24.4Ki  [NEW] +24.1Ki    _<saluki_components::sources::dogstatsd::_::<impl serde_core::de::Deserialize for saluki_components::sources::dogstatsd::DogStatsDConfiguration>::deserialize::__Visitor as serde_core::de::Visitor>::visit_map::hea4482349ead91bc
  [NEW] +24.3Ki  [NEW] +24.2Ki    saluki_core::topology::built::BuiltTopology::spawn_inner::_{{closure}}::h90af9313dc7b3fc4
  [NEW] +19.7Ki  [NEW] +19.5Ki    _<saluki_components::transforms::apm_stats::ApmStats as saluki_core::components::transforms::Transform>::run::_{{closure}}::he995747a3e461420
  [NEW] +14.3Ki  [NEW] +14.0Ki    saluki_components::common::datadog::apm::_::_<impl serde_core::de::Deserialize for saluki_components::common::datadog::apm::ApmConfiguration>::deserialize::h7b2647e9ff4c48d5
  [NEW] +11.7Ki  [NEW] +11.5Ki    _<agent_data_plane::components::tag_filterlist::TagFilterlist as saluki_core::components::transforms::Transform>::run::_{{closure}}::h9cdc707c1ce90c5e
  [NEW] +10.5Ki  [NEW] +10.3Ki    _<saluki_components::decoders::otlp::OtlpDecoder as saluki_core::components::decoders::Decoder>::run::_{{closure}}::hc0426e72c80e7068
  [NEW] +9.50Ki  [NEW] +9.43Ki    prost::message::Message::decode::h11c8acdcc783b7a2
  [NEW] +9.47Ki  [NEW] +9.27Ki    _<saluki_components::transforms::mrf_gateway::MrfMetricsGateway as saluki_core::components::transforms::Transform>::run::_{{closure}}::he898fb702241462f
  [DEL] -8.85Ki  [DEL]     -79    anon.1bcd48b9eb44f8503c1708e7a4889852.175.llvm.3131483667763844983
  [DEL] -8.86Ki  [DEL] -8.65Ki    _<saluki_components::destinations::dsd_debug_log::DogStatsDDebugLog as saluki_core::components::destinations::Destination>::run::_{{closure}}::hb4248ffd301bb19d
  [DEL] -10.5Ki  [DEL] -10.3Ki    _<saluki_components::decoders::otlp::OtlpDecoder as saluki_core::components::decoders::Decoder>::run::_{{closure}}::h2b7bae3fddd88f8c
  [DEL] -11.6Ki  [DEL] -11.4Ki    _<agent_data_plane::components::tag_filterlist::TagFilterlist as saluki_core::components::transforms::Transform>::run::_{{closure}}::h5b32a9c4b891b3d3
  [DEL] -14.5Ki  [DEL] -14.5Ki    figment::figment::Figment::extract::h56f08f81ae1237a9
  [DEL] -19.4Ki  [DEL] -19.2Ki    _<saluki_components::transforms::apm_stats::ApmStats as saluki_core::components::transforms::Transform>::run::_{{closure}}::hf859e66ba4df9b30
 -96.2% -20.6Ki -96.9% -20.6Ki    _<figment::value::de::ConfiguredValueDe<I> as serde_core::de::Deserializer>::deserialize_any::hac67c4eb03d63f7d
  [DEL] -20.8Ki  [DEL] -20.6Ki    saluki_core::topology::built::BuiltTopology::spawn_inner::_{{closure}}::h1372376b023b6f43
  [DEL] -33.6Ki  [DEL] -33.4Ki    _<saluki_components::transforms::aggregate::Aggregate as saluki_core::components::transforms::Transform>::run::_{{closure}}::ha19f43032363b377
  [DEL] -40.3Ki  [DEL] -40.1Ki    _<saluki_components::forwarders::otlp::OtlpForwarder as saluki_core::components::forwarders::Forwarder>::run::_{{closure}}::h8563f916d92d3ae1
  +0.1% +45.7Ki  +0.1% +37.3Ki    TOTAL

pr-commenter · 2026-06-29T13:51:51Z

Regression Detector (Agent Data Plane)

Run ID: 8492aa85-0731-48a4-a796-d170744ccdb6
Baseline: de7b786d · Comparison: 636fc4ac · diff

Optimization Goals: ✅ No significant changes detected

Fine details of change detection per experiment (5)

Experiments configured erratic: true are tagged (ignored) and skipped when determining which experiments regressed or improved. Experiments which are detected as erratic at runtime are tagged (erratic) to flag that the run's sample dispersion was high, but their regression / improvement signal still counts.

experiment	goal	Δ mean %	links
quality_gates_rss_idle	memory	⚪ +0.94	metrics profiles logs
quality_gates_rss_dsd_low	memory	⚪ +0.46	metrics profiles logs
quality_gates_rss_dsd_medium	memory	⚪ +0.23	metrics profiles logs
quality_gates_rss_dsd_ultraheavy	memory	⚪ +0.13	metrics profiles logs
quality_gates_rss_dsd_heavy	memory	⚪ -0.20	metrics profiles logs

Bounds Checks: ✅ Passed (5)

experiment	check	replicates	observed	links
quality_gates_rss_dsd_heavy	memory_usage	10/10	✅ 131 MiB ≤ 140 MiB	metrics profiles logs
quality_gates_rss_dsd_low	memory_usage	10/10	✅ 42.7 MiB ≤ 50 MiB	metrics profiles logs
quality_gates_rss_dsd_medium	memory_usage	10/10	✅ 64.3 MiB ≤ 75 MiB	metrics profiles logs
quality_gates_rss_dsd_ultraheavy	memory_usage	10/10	✅ 192 MiB ≤ 200 MiB	metrics profiles logs
quality_gates_rss_idle	memory_usage	10/10	✅ 28.5 MiB ≤ 40 MiB	metrics profiles logs

Explanation

A change is flagged as a regression when |Δ mean %| > 5.00% in the regressing direction for its optimization goal AND SMP marks the experiment as a regression (is_regression: true). Improvements use the matching criteria for the improving direction. Experiments configured erratic: true (tagged (ignored)) are skipped outright; experiments detected as erratic at runtime (tagged (erratic)) still count, since that flag describes sample dispersion rather than directional certainty. The Δ mean % cell is colored accordingly: 🟢 = improvement, 🔴 = regression, ⚪ = neutral. Reduction in CPU or memory is an improvement; reduction in ingress throughput is a regression.

…vision

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 636fc4ac07

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-06-29T19:17:58Z

+            let mut topology_shutdown_trigger = Some(topology_shutdown_trigger);
+            loop {
+                select! {
+                    result = &mut run => return result.map_err(Into::into),


Preserve shutdown timeout failures

When shutdown is requested and a component does not exit before shutdown_timeout, the per-component supervisor aborts that worker, but supervisor shutdown drains workers without surfacing those aborts as an error; this line then maps the topology supervisor's requested-shutdown Ok(()) straight through. That regresses the previous RunningTopology::shutdown_with_timeout behavior, which returned an error after the timeout, so ADP can now log a successful shutdown even when a component had to be forcefully stopped.

Useful? React with 👍 / 👎.

tobz mentioned this pull request Jun 29, 2026

enhancement(core): add support for dynamically added child processes in Supervisor #1874

Merged

4 tasks

dd-octo-sts Bot added the area/core Core functionality, event model, etc. label Jun 29, 2026

Base automatically changed from tobz/dynamic-supervisor to main June 29, 2026 13:53

This comment has been minimized.

Sign in to view

tobz added 7 commits June 29, 2026 14:27

chore(core): integrate component tasks via supervision trees

2c0489b

doc comments:

0f0cd6b

fix

0a46bb4

cleannnnnnup

5a55be1

unwind component topology sup changes

0d5b922

missed a spot

528762f

chore(core): refactor topology creation to run components under super…

636fc4a

…vision

tobz changed the base branch from main to graphite-base/1942 June 29, 2026 18:53

tobz force-pushed the tobz/component-supervision-trees branch from 4f8f870 to 636fc4a Compare June 29, 2026 18:53

tobz added the type/chore Updates to dependencies or general "administrative" tasks necessary to maintain the codebase/repo. label Jun 29, 2026

tobz changed the base branch from graphite-base/1942 to main June 29, 2026 19:11

tobz marked this pull request as ready for review June 29, 2026 19:11

tobz requested a review from a team as a code owner June 29, 2026 19:11

chatgpt-codex-connector Bot reviewed Jun 29, 2026

View reviewed changes

thieman approved these changes Jun 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore(core): integrate component tasks via supervision trees#1942

chore(core): integrate component tasks via supervision trees#1942
tobz wants to merge 7 commits into
mainfrom
tobz/component-supervision-trees

tobz commented Jun 29, 2026 •

edited

Loading

Uh oh!

tobz commented Jun 29, 2026 •

edited

Loading

Uh oh!

pr-commenter Bot commented Jun 29, 2026 •

edited

Loading

Uh oh!

pr-commenter Bot commented Jun 29, 2026 •

edited

Loading

Uh oh!

This comment has been minimized.

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

tobz commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Change Type

How did you test this PR?

References

Uh oh!

tobz commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pr-commenter Bot commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Binary Size Analysis (Agent Data Plane)

✅ Binary size difference within threshold

Uh oh!

pr-commenter Bot commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Regression Detector (Agent Data Plane)

Optimization Goals: ✅ No significant changes detected

Uh oh!

This comment has been minimized.

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tobz commented Jun 29, 2026 •

edited

Loading

tobz commented Jun 29, 2026 •

edited

Loading

pr-commenter Bot commented Jun 29, 2026 •

edited

Loading

pr-commenter Bot commented Jun 29, 2026 •

edited

Loading