Skip to content

Async profiler: V1 (TaskAsync) instrumentation + tests.#129043

Draft
lateralusX wants to merge 36 commits into
dotnet:mainfrom
lateralusX:lateralusX/async-profiler-asyncv1-support-v2
Draft

Async profiler: V1 (TaskAsync) instrumentation + tests.#129043
lateralusX wants to merge 36 commits into
dotnet:mainfrom
lateralusX:lateralusX/async-profiler-asyncv1-support-v2

Conversation

@lateralusX

@lateralusX lateralusX commented Jun 5, 2026

Copy link
Copy Markdown
Member

WIP

Summary

Enables the async profiler for the V1 TaskAsync (state-machine-based) async path. Both V1 (TaskAsync) and V2 (RuntimeAsync) now emit a uniform, well-defined event stream that downstream tools can consume without knowing which async model produced a given chain. There are smaller variations to what events V1 can support, but all-important events are supported on both async models. The callstack events are also typed, since their data will end up slightly different (Native IP only on V2 and Method Start native IP and state on V1).

PR includes extensive test suite covering both paths as well as refactoring.

Motivation

V1 async (the C#-compiler-generated state-machine IAsyncStateMachineBox model) is the dominant async path in the wild. Until now the async profiler only instrumented the V2 (RuntimeAsync) path, so V1 chains were invisible to consumers. This PR extends the instrumentation to V1 while keeping the event stream identical in shape between the two models.

What's in this PR

Runtime V1 instrumentation

  • AsyncTaskDispatcher — class deriving from Task that wraps the actual state-machine box, with per-invocation TLS-pushed state held in AsyncTaskDispatcherInfo (ref struct).
  • Per-dispatcher fields (Suspended, LastContinuation, ReachedLastContinuation, InnerBox) track the cascade so we can emit accurate Resume, Complete, and append events as chains grow.
  • Cooperative append mechanism: when a parent registers after a child has already started walking, the runtime emits AppendAsyncCallstack to backfill the visible chain. Three race outcomes are handled (parent-registers-before-child-completes, parent-registers-during-suspend, and the unrecoverable late-parent case which is a design limit).
  • StateMachineDiagnosticData / GetDiagnosticData plumbed through AsyncTaskMethodBuilderT, AsyncMethodBuilderCore, and IAsyncStateMachineBox to expose the walkable chain to the profiler. NativeAOT returns false from GetDiagnosticData due to lack of native method IP and state field access.
  • InstrumentCheckPoint guards at all V1 builder await-completion sites (TaskAwaiter, ValueTaskAwaiter, ConfiguredValueTaskAwaitable, YieldAwaitable, PoolingAsyncValueTaskMethodBuilderT), linked out when no event source support.

Async profiler V1 event model

The runtime emits Create + Resume + Complete per dispatcher MoveNext but no Suspend* events. Suspend on V1 is not possible since a state machine can decide to continue executing the chain. Instead V1 will emit the create events when hitting a possible suspend point (using the same context id), it's then possible to calculate suspends in tooling where it can see into the future and know what happened past that point in execution history. The most important use of the suspend event is to track when current resume stopped executing on a thread, for V1, the complete event will give that information, so a parser consuming both suspend/complete to detect when a resume stopped executing will work on both V1 and V2.

The same limitation applies to create callstacks on V1. The continuation chain is built and finalized after emitting a create event. Create callstacks can be calculated when parsing the whole trace since the next resume for that context will carry the callstack.

On V2, continuation chains are build and finalized before scheduled for execution. On V1 this happens in parallel and chains can end up truncated in case completion race between the thread yielding and the thread executing the resumed continuation chain. A continuation chain might also continue to build after a thread started to execute a continuation chain. To handle this a new event was added to async profiler, AppendAsyncCallstack, it can fire several times between a resume async context and its completion. This gives a parser the ability to recreate the full resumed async callstack at resume point, even if it didn't exist at that point during runtime.

Tests

Test files split by async model to keep each focused:

  • AsyncProfilerTests.cs — shared partial-class infrastructure: parsers, event listeners, scenario runners.
  • AsyncProfilerV1Tests.cs — tests under the TaskAsync_* prefix covering V1 scenarios.
  • AsyncProfilerV2Tests.cs — tests under the RuntimeAsync_* prefix covering V2 scenarios.

Total of 112 tests covering both V1 and V2 scenarios.

Out of scope (follow-ups)

  • Using the AsyncInstrumentation opens up for folding existing TPL and debugger checks into the same guard. This will be handled in a follow-up PR and could offer performance improvements on existing TPL and debugger paths where async profiler co-exists.
  • Code currently uses a dispatcher box put at the head of continuation chain to push/pop needed async dispatcher info on tls. There are some code paths that are known (thread pool, default sync context and scheduler), we could optimize those paths if we knew they are always taken at create location, removing the allocation. Having that said, every continuation in the chain is allocated, so might not be a big deal in the end anyways.
  • AsyncV1 have a late attach issue, same applies to TPL. Unless we always create/enable the async dispatcher info for AsyncV1, there will be potential blind spots in the stack when doing late attach and profiler have not been enabled at startup.
  • PoolingAsyncValueTaskMethodBuilderT instrumentation, can be added later, if needed.
  • NativeAOT V1 callstack support. NativeAOT async support is already limited in tooling. Since lack of async callstacks will void the majority of scenarios, NAOT is currently not supported on V1. Could be added in future if needed
  • Run V1 tests on Mono. AsyncProfiler + V1 instrumentation builds on Mono, but currently no tests are executed. Can be revisited later.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Extends the async-profiler runtime instrumentation to cover the V1 Task/state-machine (compiler-generated) async path, aiming to make V1 and V2 produce a uniform async-profiler event stream (including callstacks and V1 append/backfill behavior), and adds a large test suite split by async model.

Changes:

  • Adds V1 instrumentation via an AsyncTaskDispatcher wrapper (plus TLS-pushed dispatcher state) and inserts instrumentation checkpoints across key await/builder scheduling sites.
  • Plumbs continuation-walk diagnostics through IAsyncStateMachineBox.GetDiagnosticData and adds AsyncStateMachineDiagnostics<TStateMachine> to support V1 callstack capture.
  • Adds/organizes async-profiler tests into V1/V2-specific files and updates the test project to compile them.

Reviewed changes

Copilot reviewed 17 out of 19 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/libraries/System.Runtime/tests/System.Threading.Tasks.Tests/System.Threading.Tasks.Tests.csproj Includes new V1/V2 async-profiler test files in the test project.
src/libraries/System.Runtime/tests/System.Threading.Tasks.Tests/System.Runtime.CompilerServices/AsyncProfilerV1Tests.cs Adds V1 (TaskAsync) async-profiler scenario coverage and validations.
src/libraries/System.Runtime/tests/System.Threading.Tasks.Tests/System.Runtime.CompilerServices/AsyncProfilerV2Tests.cs Adds V2 (runtime-async) async-profiler scenario coverage and validations.
src/libraries/System.Private.CoreLib/src/System/Threading/Tasks/TaskContinuation.cs Wraps scheduled async state-machine boxes with dispatcher when async-profiler instrumentation is enabled.
src/libraries/System.Private.CoreLib/src/System/Threading/Tasks/Task.cs Exposes continuation object for diagnostics to support continuation walking.
src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/TaskAwaiter.cs Wraps state-machine box with dispatcher in key await continuation paths under async-profiler.
src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/ValueTaskAwaiter.cs Adds dispatcher wrapping in ValueTask await continuation paths under async-profiler.
src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/ConfiguredValueTaskAwaitable.cs Adds dispatcher wrapping in configured ValueTask await continuation paths under async-profiler.
src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/YieldAwaitable.cs Adds dispatcher wrapping in Yield await continuation paths under async-profiler.
src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/IAsyncStateMachineBox.cs Adds GetDiagnosticData API to support profiler continuation walking.
src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncTaskMethodBuilderT.cs Emits V1 method/unwind instrumentation and implements diagnostic-data plumbing for state-machine boxes.
src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/PoolingAsyncValueTaskMethodBuilderT.cs Adds a stub GetDiagnosticData implementation returning false (not yet supported).
src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncTaskDispatcher.cs Introduces dispatcher wrapper + TLS state used for V1 Create/Resume/Complete + append callstack behavior.
src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncStateMachineDiagnostics.cs Adds per-state-machine cached method-id + state-field-offset resolution for diagnostics.
src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncMethodBuilderCore.cs Adds helper to recover an IAsyncStateMachineBox from continuation Actions/wrappers.
src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncProfiler.cs Adds AppendAsyncCallstack event and shared callstack emission/walking logic used by V1.
src/coreclr/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncProfiler.CoreCLR.cs Refactors V2 callstack emission to use shared helpers and adjusts suspend emission ordering.
src/libraries/System.Private.CoreLib/src/System.Private.CoreLib.Shared.projitems Wires new compiler-services files and adjusts shared inclusion for async-profiler/instrumentation sources.

Comment thread src/libraries/System.Private.CoreLib/src/System/Threading/Tasks/Task.cs Outdated
@dotnet-policy-service

Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @steveisok, @tommcdon, @dotnet/dotnet-diag
See info in area-owners.md if you want to be subscribed.

@dotnet-policy-service

Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @dotnet/area-system-threading-tasks
See info in area-owners.md if you want to be subscribed.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 19 out of 21 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (1)

src/libraries/System.Private.CoreLib/src/System/Threading/Tasks/Task.cs:7232

  • Typo in nearby comment: "istelf" -> "itself".
        internal object? ContinuationForDiagnostics => m_continuationObject != this ? m_continuationObject : null;

        internal virtual Delegate[]? GetDelegateContinuationsForDebugger()
        {
            // Avoid an infinite loop by making sure the continuation object is not a reference to istelf.
            if (m_continuationObject != this)

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 19 out of 21 changed files in this pull request and generated 6 comments.

@noahfalk

Copy link
Copy Markdown
Member

@tarekgh - do you know who reviews System.Threading.Task stuff with Toub not around? I'll be looking at this and wouldn't be surprised if @jkotas does too, but wanted to give a heads up if there is a BCL owner that would also like to review?

@tarekgh

tarekgh commented Jun 11, 2026

Copy link
Copy Markdown
Member

@noahfalk the owners are @dotnet/area-system-threading-tasks as it stated in the doc https://github.com/dotnet/runtime/blob/main/docs/area-owners.md.

// they will be pre-allocated, so code should be linked out when diagnostics is not supported.
// Given the added complexity on Native AOT, the fact that this is only used for diagnostics,
// and that Native AOT currently have limited asyncv1 diagnostics support in tooling, we can
// postpone the support until proven needed.

@jkotas jkotas Jun 12, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still adding 100kB to sample ASP.NET TodoApi app published with NativeAOT: MichalStrehovsky/rt-sz#230 . That is quite a bit for something that does nothing useful.

@lateralusX lateralusX Jun 12, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that when event support is enabled? If so and until we implement the IP and state for Native AOT, we should make sure most of this code added in this PR (asyncv1 support) will not be included on Native AOT.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that when event support is enabled?

Yes

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ASP.NET enables event source by default, the rest have it disabled by default. MichalStrehovsky/rt-sz#230 now also has results for EventSource force enabled everywhere and I've also kicked off a run with EventSource force disabled everywhere, it should post results in the same issue shortly.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we won't be able to get the method id for MoveNext state machines or their current state under Native AOT without changes to ILC, I dropped support for async profiler async V1 events on Native AOT until proven needed, Async V1 diagnostics support is already limited on Native AOT. If we rerun MichalStrehovsky/rt-sz#230 again against this PR it should result in very small to no delta when event source is enabled.

Copilot AI review requested due to automatic review settings June 15, 2026 12:08
@lateralusX lateralusX force-pushed the lateralusX/async-profiler-asyncv1-support-v2 branch from 29713ac to fbe10e8 Compare June 15, 2026 12:08

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 19 out of 21 changed files in this pull request and generated 1 comment.

Copilot AI review requested due to automatic review settings June 16, 2026 08:15

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 19 out of 21 changed files in this pull request and generated 2 comments.

@noahfalk noahfalk left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good Johan! A couple questions/suggestions inline.

using Microsoft.DotNet.XUnitExtensions;
using Xunit;

namespace System.Threading.Tasks.Tests

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't notice any tests covering a custom awaitable at the leaf of the await chain. I suspect it currently doesn't work because all the dispatcher injection logic is based on well-known awaiters and custom awaitables won't run any of those code paths.

@lateralusX lateralusX Jun 16, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume the same limitation applies to TPL as well?

{
Debug.Assert(!IsCompleted);

if (AsyncTaskDispatcherInfo.AsyncProfilerInstrumentCheckPoint)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mentioned it in the out-of-scope part and its fine not to have it in this PR but I do think we'll need the pooling builder + StateMachineBox. Without it I assume we'd just have have gaps in the trace anywhere folks used the feature.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there was no reason not to do it, just that it would have a slightly different state model, I suggest we do it in a follow-up PR.

{
Debug.Assert(!IsCompleted);

if (AsyncTaskDispatcherInfo.AsyncProfilerInstrumentCheckPoint)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason that this check is profiler-specific? Elsewhere in the async V2 implementation I thought we kept the checks client neutral (at least in terms of the exposed naming).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have an idea of a follow-up PR, merging this with the TPL/Debugger happening at the same location. I didn't want to do it in this PR to reduce impact on those code paths, but once that is done there will be similar pattern to V2, first a generic instrumentation check, then individual instrumentation targets.

public static bool CreateAsyncCallstackEvent(EventKeywords eventKeywords) => (eventKeywords & AsyncProfilerEventSource.Keywords.CreateAsyncCallstack) != 0;
public static bool ResumeAsyncCallstackEvent(EventKeywords eventKeywords) => (eventKeywords & AsyncProfilerEventSource.Keywords.ResumeAsyncCallstack) != 0;
public static bool SuspendAsyncCallstackEvent(EventKeywords eventKeywords) => (eventKeywords & AsyncProfilerEventSource.Keywords.SuspendAsyncCallstack) != 0;
public static bool ResumeAsyncMethodEvent(EventKeywords eventKeywords) => (eventKeywords & AsyncProfilerEventSource.Keywords.ResumeAsyncMethod) != 0;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the plan to determine the splice point in an AsyncV1 callstack? Without the continuation wrapper ids from V2 I'm guessing we'd need to enable ResumeAsyncMethod events? If so I assume we'd want separate keywords for the AsyncV1 and V2 method events so that a high performance tracer can enable the V1 method events only.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have been thinking on this for some time and my initial ambition was to reuse the history in the sync callstack to get the data needed to "complete" methods, but if that fails, then we would need to fallback to CompleteAsyncMethod event, alternative is to introduce the wrapper logic for V1 as well. Having that said filter V1 events from V2 on the emit layer would still be a good thing to have, making sure V1 decisions won't impact V2 performance. Good call!

{
Debug.Assert(stateMachineBox != null);

if (AsyncTaskDispatcherInfo.AsyncProfilerInstrumentCheckPoint)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if its worth doing, but one thought was that we could probably hoist this wrapping behavior up to AwaitUnsafeOnCompleted. Something like:

IAsyncMachineStateBox box = GetMachineStateBox(...);
if(awaiter is not IDispatcherAwareAwaiter ||
   ((IDispatcherAwareAwaiter)awaiter).IsDispatcherNeeded())
{
    box = AsyncTaskDispatcherInfo.Create(box);
}
AwaitUnsafeOnCompleted(awaiter, box);

where awaiters might look like this:

struct TaskAwaiter : IDispatcherAwareAwaiter
{
    bool IsDispatcherNeeded() => (m_task is not IAsyncStateMachineBox);
}

On its own I don't know that it matters much though it is one way to get custom awaiters wrapped. If you wanted to get rid of the allocation for AsyncTaskDispatcher in the future instead of wrapping the box you could have GetMachineStateBox() create an alternate box type up-front.

@lateralusX lateralusX Jun 16, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will look into this, but sounds promising, maybe more suitable as a follow-up optimization?

Copilot AI review requested due to automatic review settings June 16, 2026 19:19

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 19 out of 21 changed files in this pull request and generated 2 comments.

Comment on lines +759 to +762
if (Environment.CurrentManagedThreadId == callerThreadId)
{
SynchronizationContext.SetSynchronizationContext(prev);
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants