Async profiler: V1 (TaskAsync) instrumentation + tests.#129043
Async profiler: V1 (TaskAsync) instrumentation + tests.#129043lateralusX wants to merge 36 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
Extends the async-profiler runtime instrumentation to cover the V1 Task/state-machine (compiler-generated) async path, aiming to make V1 and V2 produce a uniform async-profiler event stream (including callstacks and V1 append/backfill behavior), and adds a large test suite split by async model.
Changes:
- Adds V1 instrumentation via an
AsyncTaskDispatcherwrapper (plus TLS-pushed dispatcher state) and inserts instrumentation checkpoints across key await/builder scheduling sites. - Plumbs continuation-walk diagnostics through
IAsyncStateMachineBox.GetDiagnosticDataand addsAsyncStateMachineDiagnostics<TStateMachine>to support V1 callstack capture. - Adds/organizes async-profiler tests into V1/V2-specific files and updates the test project to compile them.
Reviewed changes
Copilot reviewed 17 out of 19 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| src/libraries/System.Runtime/tests/System.Threading.Tasks.Tests/System.Threading.Tasks.Tests.csproj | Includes new V1/V2 async-profiler test files in the test project. |
| src/libraries/System.Runtime/tests/System.Threading.Tasks.Tests/System.Runtime.CompilerServices/AsyncProfilerV1Tests.cs | Adds V1 (TaskAsync) async-profiler scenario coverage and validations. |
| src/libraries/System.Runtime/tests/System.Threading.Tasks.Tests/System.Runtime.CompilerServices/AsyncProfilerV2Tests.cs | Adds V2 (runtime-async) async-profiler scenario coverage and validations. |
| src/libraries/System.Private.CoreLib/src/System/Threading/Tasks/TaskContinuation.cs | Wraps scheduled async state-machine boxes with dispatcher when async-profiler instrumentation is enabled. |
| src/libraries/System.Private.CoreLib/src/System/Threading/Tasks/Task.cs | Exposes continuation object for diagnostics to support continuation walking. |
| src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/TaskAwaiter.cs | Wraps state-machine box with dispatcher in key await continuation paths under async-profiler. |
| src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/ValueTaskAwaiter.cs | Adds dispatcher wrapping in ValueTask await continuation paths under async-profiler. |
| src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/ConfiguredValueTaskAwaitable.cs | Adds dispatcher wrapping in configured ValueTask await continuation paths under async-profiler. |
| src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/YieldAwaitable.cs | Adds dispatcher wrapping in Yield await continuation paths under async-profiler. |
| src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/IAsyncStateMachineBox.cs | Adds GetDiagnosticData API to support profiler continuation walking. |
| src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncTaskMethodBuilderT.cs | Emits V1 method/unwind instrumentation and implements diagnostic-data plumbing for state-machine boxes. |
| src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/PoolingAsyncValueTaskMethodBuilderT.cs | Adds a stub GetDiagnosticData implementation returning false (not yet supported). |
| src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncTaskDispatcher.cs | Introduces dispatcher wrapper + TLS state used for V1 Create/Resume/Complete + append callstack behavior. |
| src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncStateMachineDiagnostics.cs | Adds per-state-machine cached method-id + state-field-offset resolution for diagnostics. |
| src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncMethodBuilderCore.cs | Adds helper to recover an IAsyncStateMachineBox from continuation Actions/wrappers. |
| src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncProfiler.cs | Adds AppendAsyncCallstack event and shared callstack emission/walking logic used by V1. |
| src/coreclr/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncProfiler.CoreCLR.cs | Refactors V2 callstack emission to use shared helpers and adjusts suspend emission ordering. |
| src/libraries/System.Private.CoreLib/src/System.Private.CoreLib.Shared.projitems | Wires new compiler-services files and adjusts shared inclusion for async-profiler/instrumentation sources. |
|
Tagging subscribers to this area: @steveisok, @tommcdon, @dotnet/dotnet-diag |
|
Tagging subscribers to this area: @dotnet/area-system-threading-tasks |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 19 out of 21 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (1)
src/libraries/System.Private.CoreLib/src/System/Threading/Tasks/Task.cs:7232
- Typo in nearby comment: "istelf" -> "itself".
internal object? ContinuationForDiagnostics => m_continuationObject != this ? m_continuationObject : null;
internal virtual Delegate[]? GetDelegateContinuationsForDebugger()
{
// Avoid an infinite loop by making sure the continuation object is not a reference to istelf.
if (m_continuationObject != this)
|
@noahfalk the owners are @dotnet/area-system-threading-tasks as it stated in the doc https://github.com/dotnet/runtime/blob/main/docs/area-owners.md. |
| // they will be pre-allocated, so code should be linked out when diagnostics is not supported. | ||
| // Given the added complexity on Native AOT, the fact that this is only used for diagnostics, | ||
| // and that Native AOT currently have limited asyncv1 diagnostics support in tooling, we can | ||
| // postpone the support until proven needed. |
There was a problem hiding this comment.
This is still adding 100kB to sample ASP.NET TodoApi app published with NativeAOT: MichalStrehovsky/rt-sz#230 . That is quite a bit for something that does nothing useful.
There was a problem hiding this comment.
Is that when event support is enabled? If so and until we implement the IP and state for Native AOT, we should make sure most of this code added in this PR (asyncv1 support) will not be included on Native AOT.
There was a problem hiding this comment.
Is that when event support is enabled?
Yes
There was a problem hiding this comment.
ASP.NET enables event source by default, the rest have it disabled by default. MichalStrehovsky/rt-sz#230 now also has results for EventSource force enabled everywhere and I've also kicked off a run with EventSource force disabled everywhere, it should post results in the same issue shortly.
There was a problem hiding this comment.
Since we won't be able to get the method id for MoveNext state machines or their current state under Native AOT without changes to ILC, I dropped support for async profiler async V1 events on Native AOT until proven needed, Async V1 diagnostics support is already limited on Native AOT. If we rerun MichalStrehovsky/rt-sz#230 again against this PR it should result in very small to no delta when event source is enabled.
29713ac to
fbe10e8
Compare
noahfalk
left a comment
There was a problem hiding this comment.
This looks good Johan! A couple questions/suggestions inline.
| using Microsoft.DotNet.XUnitExtensions; | ||
| using Xunit; | ||
|
|
||
| namespace System.Threading.Tasks.Tests |
There was a problem hiding this comment.
I didn't notice any tests covering a custom awaitable at the leaf of the await chain. I suspect it currently doesn't work because all the dispatcher injection logic is based on well-known awaiters and custom awaitables won't run any of those code paths.
There was a problem hiding this comment.
I assume the same limitation applies to TPL as well?
| { | ||
| Debug.Assert(!IsCompleted); | ||
|
|
||
| if (AsyncTaskDispatcherInfo.AsyncProfilerInstrumentCheckPoint) |
There was a problem hiding this comment.
You mentioned it in the out-of-scope part and its fine not to have it in this PR but I do think we'll need the pooling builder + StateMachineBox. Without it I assume we'd just have have gaps in the trace anywhere folks used the feature.
There was a problem hiding this comment.
Yes, there was no reason not to do it, just that it would have a slightly different state model, I suggest we do it in a follow-up PR.
| { | ||
| Debug.Assert(!IsCompleted); | ||
|
|
||
| if (AsyncTaskDispatcherInfo.AsyncProfilerInstrumentCheckPoint) |
There was a problem hiding this comment.
Any reason that this check is profiler-specific? Elsewhere in the async V2 implementation I thought we kept the checks client neutral (at least in terms of the exposed naming).
There was a problem hiding this comment.
I have an idea of a follow-up PR, merging this with the TPL/Debugger happening at the same location. I didn't want to do it in this PR to reduce impact on those code paths, but once that is done there will be similar pattern to V2, first a generic instrumentation check, then individual instrumentation targets.
| public static bool CreateAsyncCallstackEvent(EventKeywords eventKeywords) => (eventKeywords & AsyncProfilerEventSource.Keywords.CreateAsyncCallstack) != 0; | ||
| public static bool ResumeAsyncCallstackEvent(EventKeywords eventKeywords) => (eventKeywords & AsyncProfilerEventSource.Keywords.ResumeAsyncCallstack) != 0; | ||
| public static bool SuspendAsyncCallstackEvent(EventKeywords eventKeywords) => (eventKeywords & AsyncProfilerEventSource.Keywords.SuspendAsyncCallstack) != 0; | ||
| public static bool ResumeAsyncMethodEvent(EventKeywords eventKeywords) => (eventKeywords & AsyncProfilerEventSource.Keywords.ResumeAsyncMethod) != 0; |
There was a problem hiding this comment.
What is the plan to determine the splice point in an AsyncV1 callstack? Without the continuation wrapper ids from V2 I'm guessing we'd need to enable ResumeAsyncMethod events? If so I assume we'd want separate keywords for the AsyncV1 and V2 method events so that a high performance tracer can enable the V1 method events only.
There was a problem hiding this comment.
I have been thinking on this for some time and my initial ambition was to reuse the history in the sync callstack to get the data needed to "complete" methods, but if that fails, then we would need to fallback to CompleteAsyncMethod event, alternative is to introduce the wrapper logic for V1 as well. Having that said filter V1 events from V2 on the emit layer would still be a good thing to have, making sure V1 decisions won't impact V2 performance. Good call!
| { | ||
| Debug.Assert(stateMachineBox != null); | ||
|
|
||
| if (AsyncTaskDispatcherInfo.AsyncProfilerInstrumentCheckPoint) |
There was a problem hiding this comment.
I don't know if its worth doing, but one thought was that we could probably hoist this wrapping behavior up to AwaitUnsafeOnCompleted. Something like:
IAsyncMachineStateBox box = GetMachineStateBox(...);
if(awaiter is not IDispatcherAwareAwaiter ||
((IDispatcherAwareAwaiter)awaiter).IsDispatcherNeeded())
{
box = AsyncTaskDispatcherInfo.Create(box);
}
AwaitUnsafeOnCompleted(awaiter, box);where awaiters might look like this:
struct TaskAwaiter : IDispatcherAwareAwaiter
{
bool IsDispatcherNeeded() => (m_task is not IAsyncStateMachineBox);
}On its own I don't know that it matters much though it is one way to get custom awaiters wrapped. If you wanted to get rid of the allocation for AsyncTaskDispatcher in the future instead of wrapping the box you could have GetMachineStateBox() create an alternate box type up-front.
There was a problem hiding this comment.
I will look into this, but sounds promising, maybe more suitable as a follow-up optimization?
| if (Environment.CurrentManagedThreadId == callerThreadId) | ||
| { | ||
| SynchronizationContext.SetSynchronizationContext(prev); | ||
| } |
WIP
Summary
Enables the async profiler for the V1 TaskAsync (state-machine-based) async path. Both V1 (TaskAsync) and V2 (RuntimeAsync) now emit a uniform, well-defined event stream that downstream tools can consume without knowing which async model produced a given chain. There are smaller variations to what events V1 can support, but all-important events are supported on both async models. The callstack events are also typed, since their data will end up slightly different (Native IP only on V2 and Method Start native IP and state on V1).
PR includes extensive test suite covering both paths as well as refactoring.
Motivation
V1 async (the C#-compiler-generated state-machine IAsyncStateMachineBox model) is the dominant async path in the wild. Until now the async profiler only instrumented the V2 (RuntimeAsync) path, so V1 chains were invisible to consumers. This PR extends the instrumentation to V1 while keeping the event stream identical in shape between the two models.
What's in this PR
Runtime V1 instrumentation
Async profiler V1 event model
The runtime emits Create + Resume + Complete per dispatcher MoveNext but no Suspend* events. Suspend on V1 is not possible since a state machine can decide to continue executing the chain. Instead V1 will emit the create events when hitting a possible suspend point (using the same context id), it's then possible to calculate suspends in tooling where it can see into the future and know what happened past that point in execution history. The most important use of the suspend event is to track when current resume stopped executing on a thread, for V1, the complete event will give that information, so a parser consuming both suspend/complete to detect when a resume stopped executing will work on both V1 and V2.
The same limitation applies to create callstacks on V1. The continuation chain is built and finalized after emitting a create event. Create callstacks can be calculated when parsing the whole trace since the next resume for that context will carry the callstack.
On V2, continuation chains are build and finalized before scheduled for execution. On V1 this happens in parallel and chains can end up truncated in case completion race between the thread yielding and the thread executing the resumed continuation chain. A continuation chain might also continue to build after a thread started to execute a continuation chain. To handle this a new event was added to async profiler, AppendAsyncCallstack, it can fire several times between a resume async context and its completion. This gives a parser the ability to recreate the full resumed async callstack at resume point, even if it didn't exist at that point during runtime.
Tests
Test files split by async model to keep each focused:
Total of 112 tests covering both V1 and V2 scenarios.
Out of scope (follow-ups)