Move thin lock acquire/release in CoreCLR to managed code by VSadov · Pull Request #129502 · dotnet/runtime

VSadov · 2026-06-17T05:40:43Z

Moves the thin lock acquire/release to managed code.
Cross-porting various tweaks between CoreCLR and NativeAOT implementations where applicable. Mostly adjusting CoreCLR code to be like NativeAOT, but a couple of tweaks went the other direction too.
Fixes an issue with TryAcquire spinning extensively if a thin lock is owned by somebody else.

The overall effect of this PR is a lot more sharing between CoreCLR and NativeAOT.
The performance impact is:

mostly improvements, up to +30% improvement in throughput depending on platform and on thin/fat lock scenario.
couple scenarios see minor regressions up to -3%
These scenarios appear to be quite sensitive to minor changes in codegen or hardware differences
(i.e. minor improvement vs. regression depending on AMD Epyc 7763 vs. AMD Ryzen 7950), so I think we can assume small differences to be a kind of "architectural noise".

dotnet-policy-service · 2026-06-17T05:42:07Z

Tagging subscribers to this area: @JulieLeeMSFT, @VSadov
See info in area-owners.md if you want to be subscribed.

Copilot

Pull request overview

This PR moves the thin-lock (object header) acquire/release fast paths from CoreCLR native code into managed implementations in System.Private.CoreLib, removing the associated FCALL/ecall surface and native inline helpers.

Changes:

Removed native thin-lock helpers (syncblk.inl, ObjHeader::*HeaderThinLock, FCALL entries) and associated includes/build references.
Implemented thin-lock acquire/release in managed System.Threading.ObjectHeader (CoreCLR) and updated Monitor to call the new managed entrypoints.
Kept NativeAOT parity by renaming/updating its thin-lock entrypoints and adjusting call sites accordingly.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
src/coreclr/vm/syncblk.inl	Removes native inline thin-lock acquire/release implementation.
src/coreclr/vm/syncblk.h	Removes native `HeaderLockResult` and thin-lock method declarations from `ObjHeader`.
src/coreclr/vm/ecalllist.h	Drops the `ObjectHeader` FCALL mapping entries.
src/coreclr/vm/comsynchronizable.h	Removes FCDECLs for thin-lock FCALL entrypoints.
src/coreclr/vm/comsynchronizable.cpp	Removes FCIMPL implementations for thin-lock FCALL entrypoints.
src/coreclr/vm/common.h	Removes `syncblk.inl` from global inline includes.
src/coreclr/vm/CMakeLists.txt	Removes `syncblk.inl` from VM header lists.
src/coreclr/System.Private.CoreLib/src/System/Threading/ObjectHeader.CoreCLR.cs	Adds managed thin-lock acquire/release logic and exposes `AcquireThinLock(...)` and managed `Release(...)`.
src/coreclr/System.Private.CoreLib/src/System/Threading/Monitor.CoreCLR.cs	Routes `Monitor.Enter/TryEnter/...` to the new managed thin-lock entrypoints.
src/coreclr/nativeaot/System.Private.CoreLib/src/System/Threading/ObjectHeader.cs	Renames/reshapes NativeAOT thin-lock entrypoint to `AcquireThinLock(...)` and adjusts uncommon-path handling.
src/coreclr/nativeaot/System.Private.CoreLib/src/System/Threading/Monitor.NativeAot.cs	Updates NativeAOT `Monitor` to call `AcquireThinLock(...)`.

Copilot

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.

VSadov · 2026-06-17T20:27:37Z

+                        return HeaderLockResult.UseSlowPath;
+                    }
+
+                    if (Interlocked.CompareExchange(pHeader, oldBits | currentThreadID, oldBits) == oldBits)


Doing CAS in managed code is one of the motivations. The native CAS needs to check for the presence of LSE on ARM64, JITed code does not need that.

This affects Linux-arm64 perhaps even more than Windows-arm64.

Copilot

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.

VSadov · 2026-06-18T15:53:12Z

@MihuBot benchmark System.Collections.Concurrent -arm -intel

VSadov · 2026-06-18T15:55:21Z

@MihuBot benchmark System.Threading -arm

MihuBot · 2026-06-18T16:18:06Z

System.Collections.Concurrent.IsEmpty_String_

BenchmarkDotNet v0.16.0-nightly.20260518.1249, Linux Ubuntu 24.04.4 LTS (Noble Numbat)
Neoverse-N2, 8 physical cores
Memory: 31.27 GB Total, 1.96 GB Available
  Job-TPEJOW : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
  Job-HKHXHK : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
EvaluateOverhead=False  OutlierMode=Default  PowerPlanMode=
IterationTime=250ms  MaxIterationCount=20  MemoryRandomization=Default
MinIterationCount=15  WarmupCount=1

Method	Toolchain	Size	Mean	Error	Ratio	Allocated	Alloc Ratio
Dictionary	Main	0	128.152 ns	0.1316 ns	1.00	-	NA
Dictionary	PR	0	130.493 ns	0.0555 ns	1.02	-	NA

Queue	Main	0	3.229 ns	0.0026 ns	1.00	-	NA
Queue	PR	0	3.282 ns	0.0184 ns	1.02	-	NA

Stack	Main	0	1.505 ns	0.0004 ns	1.00	-	NA
Stack	PR	0	1.505 ns	0.0007 ns	1.00	-	NA

Bag	Main	0	8.534 ns	0.0014 ns	1.00	-	NA
Bag	PR	0	8.587 ns	0.0023 ns	1.01	-	NA

Dictionary	Main	512	3.116 ns	0.0011 ns	1.00	-	NA
Dictionary	PR	512	3.154 ns	0.0062 ns	1.01	-	NA

Queue	Main	512	2.558 ns	0.0006 ns	1.00	-	NA
Queue	PR	512	2.562 ns	0.0007 ns	1.00	-	NA

Stack	Main	512	1.504 ns	0.0008 ns	1.00	-	NA
Stack	PR	512	1.505 ns	0.0004 ns	1.00	-	NA

Bag	Main	512	8.039 ns	0.0013 ns	1.00	-	NA
Bag	PR	512	8.064 ns	0.0089 ns	1.00	-	NA

System.Collections.Concurrent.IsEmpty_Int32_

BenchmarkDotNet v0.16.0-nightly.20260518.1249, Linux Ubuntu 24.04.4 LTS (Noble Numbat)
Neoverse-N2, 8 physical cores
Memory: 31.27 GB Total, 1.96 GB Available
  Job-TPEJOW : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
  Job-HKHXHK : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
EvaluateOverhead=False  OutlierMode=Default  PowerPlanMode=
IterationTime=250ms  MaxIterationCount=20  MemoryRandomization=Default
MinIterationCount=15  WarmupCount=1

Method	Toolchain	Size	Mean	Error	Ratio	Allocated	Alloc Ratio
Dictionary	Main	0	128.425 ns	0.0369 ns	1.00	-	NA
Dictionary	PR	0	131.646 ns	0.0505 ns	1.03	-	NA

Queue	Main	0	3.250 ns	0.0022 ns	1.00	-	NA
Queue	PR	0	3.264 ns	0.0014 ns	1.00	-	NA

Stack	Main	0	1.500 ns	0.0008 ns	1.00	-	NA
Stack	PR	0	1.501 ns	0.0003 ns	1.00	-	NA

Bag	Main	0	4.861 ns	0.0015 ns	1.00	-	NA
Bag	PR	0	4.872 ns	0.0136 ns	1.00	-	NA

Dictionary	Main	512	3.159 ns	0.0008 ns	1.00	-	NA
Dictionary	PR	512	3.154 ns	0.0087 ns	1.00	-	NA

Queue	Main	512	2.592 ns	0.0091 ns	1.00	-	NA
Queue	PR	512	2.593 ns	0.0004 ns	1.00	-	NA

Stack	Main	512	1.502 ns	0.0010 ns	1.00	-	NA
Stack	PR	512	1.505 ns	0.0012 ns	1.00	-	NA

Bag	Main	512	4.312 ns	0.0016 ns	1.00	-	NA
Bag	PR	512	4.334 ns	0.0016 ns	1.00	-	NA

System.Collections.Concurrent.Count_String_

BenchmarkDotNet v0.16.0-nightly.20260518.1249, Linux Ubuntu 24.04.4 LTS (Noble Numbat)
Neoverse-N2, 8 physical cores
Memory: 31.27 GB Total, 1.96 GB Available
  Job-TPEJOW : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
  Job-HKHXHK : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
EvaluateOverhead=False  OutlierMode=Default  PowerPlanMode=
IterationTime=250ms  MaxIterationCount=20  MemoryRandomization=Default
MinIterationCount=15  WarmupCount=1

Method	Toolchain	Size	Mean	Error	Ratio	Allocated	Alloc Ratio
Dictionary	Main	512	129.939 ns	0.0379 ns	1.00	-	NA
Dictionary	PR	512	131.634 ns	0.0412 ns	1.01	-	NA

Queue	Main	512	4.702 ns	0.0046 ns	1.00	-	NA
Queue	PR	512	4.524 ns	0.0097 ns	0.96	-	NA

Queue_EnqueueCountDequeue	Main	512	21.786 ns	0.0044 ns	1.00	-	NA
Queue_EnqueueCountDequeue	PR	512	22.770 ns	0.0188 ns	1.05	-	NA

Stack	Main	512	610.766 ns	0.0793 ns	1.00	-	NA
Stack	PR	512	610.898 ns	0.0927 ns	1.00	-	NA

Bag	Main	512	43.029 ns	0.4188 ns	1.00	-	NA
Bag	PR	512	40.945 ns	0.2831 ns	0.95	-	NA

System.Collections.Concurrent.Count_Int32_

BenchmarkDotNet v0.16.0-nightly.20260518.1249, Linux Ubuntu 24.04.4 LTS (Noble Numbat)
Neoverse-N2, 8 physical cores
Memory: 31.27 GB Total, 1.96 GB Available
  Job-TPEJOW : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
  Job-HKHXHK : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
EvaluateOverhead=False  OutlierMode=Default  PowerPlanMode=
IterationTime=250ms  MaxIterationCount=20  MemoryRandomization=Default
MinIterationCount=15  WarmupCount=1

Method	Toolchain	Size	Mean	Error	Ratio	Allocated	Alloc Ratio
Dictionary	Main	512	139.712 ns	0.1150 ns	1.00	-	NA
Dictionary	PR	512	131.381 ns	0.0345 ns	0.94	-	NA

Queue	Main	512	4.366 ns	0.0104 ns	1.00	-	NA
Queue	PR	512	4.437 ns	0.0013 ns	1.02	-	NA

Queue_EnqueueCountDequeue	Main	512	23.008 ns	0.0437 ns	1.00	-	NA
Queue_EnqueueCountDequeue	PR	512	22.597 ns	0.0213 ns	0.98	-	NA

Stack	Main	512	610.776 ns	0.0428 ns	1.00	-	NA
Stack	PR	512	610.720 ns	0.0490 ns	1.00	-	NA

Bag	Main	512	42.047 ns	0.1616 ns	1.00	-	NA
Bag	PR	512	40.420 ns	0.0097 ns	0.96	-	NA

Copilot

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated no new comments.

Comments suppressed due to low confidence (1)

src/coreclr/System.Private.CoreLib/src/System/Threading/Monitor.CoreCLR.cs:90

Monitor.TryEnter(object, int) always falls back to GetLockObject(obj).TryEnter(millisecondsTimeout) after a failed one-shot thin-lock attempt. For millisecondsTimeout == 0, this can unnecessarily allocate/create the Lock (inflation) even though we already know the thin lock is currently owned by another thread. Since TryEnter(…, 0) is a one-shot operation, it can return false immediately on HeaderLockResult.Failure and avoid the slow path/inflation cost.

        public static bool TryEnter(object obj, int millisecondsTimeout)
        {
            ArgumentOutOfRangeException.ThrowIfLessThan(millisecondsTimeout, -1);

            ObjectHeader.HeaderLockResult result = ObjectHeader.AcquireThinLock(obj, isOneShot: true);
            if (result == ObjectHeader.HeaderLockResult.Success)
                return true;

            return GetLockObject(obj).TryEnter(millisecondsTimeout);

VSadov · 2026-06-19T17:50:06Z

@MihuBot benchmark System.Threading -arm

VSadov · 2026-06-19T18:19:52Z

@MihuBot benchmark System.Collections.Concurrent -arm

MihuBot · 2026-06-19T18:43:12Z

System.Threading.Tests.Perf_Volatile

BenchmarkDotNet v0.16.0-nightly.20260518.1249, Linux Ubuntu 24.04.4 LTS (Noble Numbat)
Neoverse-N2, 8 physical cores
Memory: 31.27 GB Total, 1.96 GB Available
  Job-TPEJOW : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
  Job-HKHXHK : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
  Job-NRQIIJ : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
  Job-NGSIDY : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
EvaluateOverhead=False  PowerPlanMode=  IterationTime=250ms
MaxIterationCount=20  MinIterationCount=15  WarmupCount=1

Method	Toolchain	Mean	Error	Ratio	Allocated	Alloc Ratio
Write_double	Main	2.132 ns	0.0124 ns	1.00	-	NA
Write_double	PR	2.135 ns	0.0123 ns	1.00	-	NA

Read_double	Main	3.298 ns	0.0239 ns	1.00	-	NA
Read_double	PR	3.284 ns	0.0055 ns	1.00	-	NA

System.Threading.Tests.Perf_Timer

BenchmarkDotNet v0.16.0-nightly.20260518.1249, Linux Ubuntu 24.04.4 LTS (Noble Numbat)
Neoverse-N2, 8 physical cores
Memory: 31.27 GB Total, 1.96 GB Available
  Job-NRQIIJ : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
  Job-NGSIDY : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
  Job-TPEJOW : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
  Job-HKHXHK : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
EvaluateOverhead=False  PowerPlanMode=  IterationTime=250ms
MaxIterationCount=20  MinIterationCount=15  WarmupCount=1

Method	Toolchain	Mean	Error	Ratio	Allocated	Alloc Ratio
ShortScheduleAndDispose	Main	130.0 ns	1.52 ns	1.00	120 B	1.00
ShortScheduleAndDispose	PR	130.3 ns	1.61 ns	1.00	120 B	1.00

LongScheduleAndDispose	Main	129.9 ns	1.26 ns	1.00	120 B	1.00
LongScheduleAndDispose	PR	131.0 ns	1.04 ns	1.01	120 B	1.00

ScheduleManyThenDisposeMany	Main	244,645,956.9 ns	2,605,208.19 ns	1.00	144000000 B	1.00
ScheduleManyThenDisposeMany	PR	249,305,648.6 ns	3,958,714.20 ns	1.02	144000000 B	1.00

ShortScheduleAndDisposeWithFiringTimers	Main	152.3 ns	4.49 ns	1.00	144 B	1.00
ShortScheduleAndDisposeWithFiringTimers	PR	149.8 ns	3.75 ns	0.98	144 B	1.00

SynchronousContention	Main	5,404,397,834.9 ns	34,069,649.97 ns	1.00	1152000760 B	1.00
SynchronousContention	PR	5,571,240,872.0 ns	33,764,941.19 ns	1.03	1152000760 B	1.00

AsynchronousContention	Main	4,851,066,960.9 ns	92,741,774.32 ns	1.00	1344002232 B	1.00
AsynchronousContention	PR	4,873,755,580.8 ns	40,387,695.59 ns	1.00	1344002232 B	1.00

System.Threading.Tests.Perf_ThreadStatic

BenchmarkDotNet v0.16.0-nightly.20260518.1249, Linux Ubuntu 24.04.4 LTS (Noble Numbat)
Neoverse-N2, 8 physical cores
Memory: 31.27 GB Total, 1.96 GB Available
  Job-TPEJOW : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
  Job-HKHXHK : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
EvaluateOverhead=False  OutlierMode=Default  PowerPlanMode=
IterationTime=250ms  MaxIterationCount=20  MemoryRandomization=Default
MinIterationCount=15  WarmupCount=1

Method	Toolchain	Mean	Error	Ratio	Allocated	Alloc Ratio
GetThreadStatic	Main	1.509 ns	0.0003 ns	1.00	-	NA
GetThreadStatic	PR	1.507 ns	0.0004 ns	1.00	-	NA

SetThreadStatic	Main	2.515 ns	0.0023 ns	1.00	-	NA
SetThreadStatic	PR	2.503 ns	0.0034 ns	1.00	-	NA

System.Threading.Tests.Perf_ThreadPool

BenchmarkDotNet v0.16.0-nightly.20260518.1249, Linux Ubuntu 24.04.4 LTS (Noble Numbat)
Neoverse-N2, 8 physical cores
Memory: 31.27 GB Total, 1.96 GB Available
  Job-TPEJOW : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
  Job-HKHXHK : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
EvaluateOverhead=False  OutlierMode=Default  PowerPlanMode=
IterationTime=250ms  MaxIterationCount=20  MemoryRandomization=Default
MinIterationCount=15  WarmupCount=1  Gen0=9000.0000

Method	Toolchain	WorkItemsPerCore	Mean	Error	Ratio	Allocated	Alloc Ratio
QueueUserWorkItem_WaitCallback_Throughput	Main	20000000	9.043 s	0.0811 s	1.00	610.35 MB	1.00
QueueUserWorkItem_WaitCallback_Throughput	PR	20000000	9.258 s	0.1603 s	1.02	610.35 MB	1.00

System.Threading.Tests.Perf_Thread

BenchmarkDotNet v0.16.0-nightly.20260518.1249, Linux Ubuntu 24.04.4 LTS (Noble Numbat)
Neoverse-N2, 8 physical cores
Memory: 31.27 GB Total, 1.96 GB Available
  Job-TPEJOW : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
  Job-HKHXHK : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
  Job-NRQIIJ : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
  Job-NGSIDY : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
EvaluateOverhead=False  PowerPlanMode=  IterationTime=250ms
MaxIterationCount=20  MinIterationCount=15  WarmupCount=1

Method	Toolchain	Mean	Error	Ratio	Allocated	Alloc Ratio
CurrentThread	Main	1.503 ns	0.0002 ns	1.00	-	NA
CurrentThread	PR	1.502 ns	0.0003 ns	1.00	-	NA

GetCurrentProcessorId	Main	2.889 ns	0.0135 ns	1.00	-	NA
GetCurrentProcessorId	PR	2.867 ns	0.0098 ns	0.99	-	NA

System.Threading.Tests.Perf_SpinLock

BenchmarkDotNet v0.16.0-nightly.20260518.1249, Linux Ubuntu 24.04.4 LTS (Noble Numbat)
Neoverse-N2, 8 physical cores
Memory: 31.27 GB Total, 1.96 GB Available
  Job-TPEJOW : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
  Job-HKHXHK : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
EvaluateOverhead=False  OutlierMode=Default  PowerPlanMode=
IterationTime=250ms  MaxIterationCount=20  MemoryRandomization=Default
MinIterationCount=15  WarmupCount=1

Method	Toolchain	Mean	Error	Ratio	Allocated	Alloc Ratio
EnterExit	Main	14.179 ns	0.0047 ns	1.00	-	NA
EnterExit	PR	15.610 ns	0.0049 ns	1.10	-	NA

TryEnterExit	Main	15.625 ns	0.0044 ns	1.00	-	NA
TryEnterExit	PR	14.189 ns	0.0073 ns	0.91	-	NA

TryEnter_Fail	Main	1.769 ns	0.0004 ns	1.00	-	NA
TryEnter_Fail	PR	1.770 ns	0.0003 ns	1.00	-	NA

System.Threading.Tests.Perf_SemaphoreSlim

BenchmarkDotNet v0.16.0-nightly.20260518.1249, Linux Ubuntu 24.04.4 LTS (Noble Numbat)
Neoverse-N2, 8 physical cores
Memory: 31.27 GB Total, 1.96 GB Available
  Job-TPEJOW : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
  Job-HKHXHK : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
EvaluateOverhead=False  OutlierMode=Default  PowerPlanMode=
IterationTime=250ms  MaxIterationCount=20  MemoryRandomization=Default
MinIterationCount=15  WarmupCount=1

Method	Toolchain	Mean	Error	Ratio	Allocated	Alloc Ratio
ReleaseWait	Main	42.44 ns	0.016 ns	1.00	-	NA
ReleaseWait	PR	41.75 ns	0.340 ns	0.98	-	NA

ReleaseWaitAsync	Main	36.57 ns	0.862 ns	1.00	-	NA
ReleaseWaitAsync	PR	35.66 ns	0.697 ns	0.98	-	NA

ReleaseWaitAsync_WithCancellationToken	Main	21,264.63 ns	1,525.520 ns	1.00	583 B	1.00
ReleaseWaitAsync_WithCancellationToken	PR	22,065.87 ns	640.506 ns	1.05	583 B	1.00

ReleaseWaitAsync_WithTimeout	Main	21,788.43 ns	821.668 ns	1.00	679 B	1.00
ReleaseWaitAsync_WithTimeout	PR	22,014.71 ns	729.106 ns	1.01	679 B	1.00

ReleaseWaitAsync_WithCancellationTokenAndTimeout	Main	19,469.35 ns	1,524.439 ns	1.00	672 B	1.00
ReleaseWaitAsync_WithCancellationTokenAndTimeout	PR	21,466.58 ns	853.651 ns	1.11	678 B	1.01

System.Threading.Tests.Perf_Monitor

BenchmarkDotNet v0.16.0-nightly.20260518.1249, Linux Ubuntu 24.04.4 LTS (Noble Numbat)
Neoverse-N2, 8 physical cores
Memory: 31.27 GB Total, 1.96 GB Available
  Job-TPEJOW : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
  Job-HKHXHK : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
EvaluateOverhead=False  OutlierMode=Default  PowerPlanMode=
IterationTime=250ms  MaxIterationCount=20  MemoryRandomization=Default
MinIterationCount=15  WarmupCount=1

Method	Toolchain	Mean	Error	Ratio	Allocated	Alloc Ratio
EnterExit	Main	15.04 ns	0.038 ns	1.00	-	NA
EnterExit	PR	16.67 ns	0.007 ns	1.11	-	NA

TryEnterExit	Main	17.25 ns	0.056 ns	1.00	-	NA
TryEnterExit	PR	16.78 ns	0.025 ns	0.97	-	NA

System.Threading.Tests.Perf_Lock

BenchmarkDotNet v0.16.0-nightly.20260518.1249, Linux Ubuntu 24.04.4 LTS (Noble Numbat)
Neoverse-N2, 8 physical cores
Memory: 31.27 GB Total, 1.96 GB Available
  Job-TPEJOW : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
  Job-HKHXHK : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
EvaluateOverhead=False  OutlierMode=Default  PowerPlanMode=
IterationTime=250ms  MaxIterationCount=20  MemoryRandomization=Default
MinIterationCount=15  WarmupCount=1  StdDev=0.004 ns

Method	Toolchain	Mean	Error	Ratio	Allocated	Alloc Ratio
ReaderWriterLockSlimPerf	Main	19.50 ns	0.004 ns	1.00	-	NA
ReaderWriterLockSlimPerf	PR	19.28 ns	0.004 ns	0.99	-	NA

System.Threading.Tests.Perf_Interlocked

BenchmarkDotNet v0.16.0-nightly.20260518.1249, Linux Ubuntu 24.04.4 LTS (Noble Numbat)
Neoverse-N2, 8 physical cores
Memory: 31.27 GB Total, 1.96 GB Available
  Job-TPEJOW : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
  Job-HKHXHK : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
EvaluateOverhead=False  OutlierMode=Default  PowerPlanMode=
IterationTime=250ms  MaxIterationCount=20  MemoryRandomization=Default
MinIterationCount=15  WarmupCount=1

Method	Toolchain	Mean	Error	Ratio	Allocated	Alloc Ratio
Increment_int	Main	5.889 ns	0.0012 ns	1.00	-	NA
Increment_int	PR	5.884 ns	0.0012 ns	1.00	-	NA

Decrement_int	Main	5.885 ns	0.0011 ns	1.00	-	NA
Decrement_int	PR	5.903 ns	0.0016 ns	1.00	-	NA

Increment_long	Main	5.875 ns	0.0010 ns	1.00	-	NA
Increment_long	PR	5.876 ns	0.0022 ns	1.00	-	NA

Decrement_long	Main	5.875 ns	0.0011 ns	1.00	-	NA
Decrement_long	PR	5.874 ns	0.0015 ns	1.00	-	NA

Add_int	Main	5.884 ns	0.0013 ns	1.00	-	NA
Add_int	PR	5.885 ns	0.0017 ns	1.00	-	NA

Add_long	Main	5.876 ns	0.0011 ns	1.00	-	NA
Add_long	PR	5.875 ns	0.0009 ns	1.00	-	NA

Exchange_int	Main	5.829 ns	0.0013 ns	1.00	-	NA
Exchange_int	PR	5.868 ns	0.0014 ns	1.01	-	NA

Exchange_long	Main	5.836 ns	0.0019 ns	1.00	-	NA
Exchange_long	PR	5.835 ns	0.0010 ns	1.00	-	NA

CompareExchange_int	Main	5.922 ns	0.0010 ns	1.00	-	NA
CompareExchange_int	PR	5.926 ns	0.0013 ns	1.00	-	NA

CompareExchange_long	Main	5.927 ns	0.0009 ns	1.00	-	NA
CompareExchange_long	PR	5.921 ns	0.0008 ns	1.00	-	NA

CompareExchange_object_Match	Main	8.777 ns	0.0070 ns	1.00	-	NA
CompareExchange_object_Match	PR	8.757 ns	0.0024 ns	1.00	-	NA

CompareExchange_object_NoMatch	Main	7.916 ns	0.0058 ns	1.00	-	NA
CompareExchange_object_NoMatch	PR	8.846 ns	0.0018 ns	1.12	-	NA

System.Threading.Tests.Perf_EventWaitHandle

BenchmarkDotNet v0.16.0-nightly.20260518.1249, Linux Ubuntu 24.04.4 LTS (Noble Numbat)
Neoverse-N2, 8 physical cores
Memory: 31.27 GB Total, 1.96 GB Available
  Job-TPEJOW : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
  Job-HKHXHK : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
EvaluateOverhead=False  OutlierMode=Default  PowerPlanMode=
IterationTime=250ms  MaxIterationCount=20  MemoryRandomization=Default
MinIterationCount=15  WarmupCount=1

Method	Toolchain	Mean	Error	Ratio	Allocated	Alloc Ratio
Set_Reset	Main	60.64 ns	0.027 ns	1.00	-	NA
Set_Reset	PR	61.39 ns	0.036 ns	1.01	-	NA

System.Threading.Tests.Perf_CancellationToken

BenchmarkDotNet v0.16.0-nightly.20260518.1249, Linux Ubuntu 24.04.4 LTS (Noble Numbat)
Neoverse-N2, 8 physical cores
Memory: 31.27 GB Total, 1.96 GB Available
  Job-TPEJOW : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
  Job-HKHXHK : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
  Job-NRQIIJ : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
  Job-NGSIDY : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
EvaluateOverhead=False  PowerPlanMode=  IterationTime=250ms
MaxIterationCount=20  MinIterationCount=15  WarmupCount=1

Method	Toolchain	Mean	Error	Ratio	Allocated	Alloc Ratio
RegisterAndUnregister_Serial	Main	23.597 ns	0.0082 ns	1.00	-	NA
RegisterAndUnregister_Serial	PR	23.387 ns	0.0083 ns	0.99	-	NA

Cancel	Main	82.476 ns	0.4866 ns	1.00	192 B	1.00
Cancel	PR	83.192 ns	0.4460 ns	1.01	192 B	1.00

CreateLinkedTokenSource1	Main	31.213 ns	0.6452 ns	1.00	64 B	1.00
CreateLinkedTokenSource1	PR	29.633 ns	0.1386 ns	0.95	64 B	1.00

CreateLinkedTokenSource2	Main	49.607 ns	0.0680 ns	1.00	80 B	1.00
CreateLinkedTokenSource2	PR	50.340 ns	0.2822 ns	1.01	80 B	1.00

CreateLinkedTokenSource3	Main	89.048 ns	0.2250 ns	1.00	128 B	1.00
CreateLinkedTokenSource3	PR	90.959 ns	0.3102 ns	1.02	128 B	1.00

CreateTokenDispose	Main	6.723 ns	0.0733 ns	1.00	48 B	1.00
CreateTokenDispose	PR	7.064 ns	0.1054 ns	1.05	48 B	1.00

CreateRegisterDispose	Main	53.404 ns	0.8601 ns	1.00	192 B	1.00
CreateRegisterDispose	PR	53.420 ns	0.3030 ns	1.00	192 B	1.00

CreateManyRegisterDispose	Main	23.430 ns	0.0164 ns	1.00	-	NA
CreateManyRegisterDispose	PR	23.416 ns	0.0249 ns	1.00	-	NA

CreateManyRegisterMultipleDispose	Main	127.455 ns	0.2441 ns	1.00	-	NA
CreateManyRegisterMultipleDispose	PR	125.995 ns	0.2425 ns	0.99	-	NA

CancelAfter	Main	93.419 ns	0.9790 ns	1.00	144 B	1.00
CancelAfter	PR	93.466 ns	1.0823 ns	1.00	144 B	1.00

System.Threading.Tasks.Tests.Perf_AsyncMethods

BenchmarkDotNet v0.16.0-nightly.20260518.1249, Linux Ubuntu 24.04.4 LTS (Noble Numbat)
Neoverse-N2, 8 physical cores
Memory: 31.27 GB Total, 1.96 GB Available
  Job-TPEJOW : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
  Job-HKHXHK : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
EvaluateOverhead=False  OutlierMode=Default  PowerPlanMode=
IterationTime=250ms  MaxIterationCount=20  MemoryRandomization=Default
MinIterationCount=15  WarmupCount=1

Method	Toolchain	Mean	Error	Ratio	Allocated	Alloc Ratio
EmptyAsyncMethodInvocation	Main	5.542 ns	0.0074 ns	1.00	-	NA
EmptyAsyncMethodInvocation	PR	5.542 ns	0.0395 ns	1.00	-	NA

SingleYieldMethodInvocation	Main	181.643 ns	0.8766 ns	1.00	168 B	1.00
SingleYieldMethodInvocation	PR	172.875 ns	0.2976 ns	0.95	168 B	1.00

Yield	Main	79.109 ns	0.0667 ns	1.00	24 B	1.00
Yield	PR	76.516 ns	0.0672 ns	0.97	24 B	1.00

System.Threading.Tasks.ValueTaskPerfTest

BenchmarkDotNet v0.16.0-nightly.20260518.1249, Linux Ubuntu 24.04.4 LTS (Noble Numbat)
Neoverse-N2, 8 physical cores
Memory: 31.27 GB Total, 1.96 GB Available
  Job-UXTJFQ : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
  Job-WCARJH : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
  Job-XVCCJK : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
  Job-HMQJNI : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
EvaluateOverhead=False  PowerPlanMode=  IterationTime=250ms
MaxIterationCount=20  MaxWarmupIterationCount=10  MinIterationCount=15
MinWarmupIterationCount=2  WarmupCount=-1

Method	Toolchain	Mean	Error	Ratio	Allocated	Alloc Ratio
Await_FromResult	Main	7.583 ns	0.0985 ns	1.00	-	NA
Await_FromResult	PR	7.528 ns	0.0245 ns	0.99	-	NA

Await_FromCompletedTask	Main	13.598 ns	0.1548 ns	1.00	72 B	1.00
Await_FromCompletedTask	PR	14.026 ns	0.1929 ns	1.03	72 B	1.00

Await_FromCompletedValueTaskSource	Main	25.403 ns	0.4806 ns	1.00	72 B	1.00
Await_FromCompletedValueTaskSource	PR	26.530 ns	0.8105 ns	1.04	72 B	1.00

CreateAndAwait_FromResult	Main	7.554 ns	0.0129 ns	1.00	-	NA
CreateAndAwait_FromResult	PR	7.566 ns	0.0097 ns	1.00	-	NA

CreateAndAwait_FromResult_ConfigureAwait	Main	9.240 ns	0.1781 ns	1.00	-	NA
CreateAndAwait_FromResult_ConfigureAwait	PR	7.495 ns	0.0418 ns	0.81	-	NA

CreateAndAwait_FromCompletedTask	Main	9.429 ns	0.0373 ns	1.00	-	NA
CreateAndAwait_FromCompletedTask	PR	9.535 ns	0.0658 ns	1.01	-	NA

CreateAndAwait_FromCompletedTask_ConfigureAwait	Main	9.361 ns	0.0832 ns	1.00	-	NA
CreateAndAwait_FromCompletedTask_ConfigureAwait	PR	9.312 ns	0.0225 ns	0.99	-	NA

CreateAndAwait_FromCompletedValueTaskSource	Main	10.362 ns	0.0258 ns	1.00	-	NA
CreateAndAwait_FromCompletedValueTaskSource	PR	10.630 ns	0.0887 ns	1.03	-	NA

CreateAndAwait_FromYieldingAsyncMethod	Main	290.620 ns	0.8332 ns	1.00	392 B	1.00
CreateAndAwait_FromYieldingAsyncMethod	PR	293.741 ns	1.1317 ns	1.01	392 B	1.00

CreateAndAwait_FromDelayedTCS	Main	21,915.089 ns	464.6644 ns	1.00	519 B	1.00
CreateAndAwait_FromDelayedTCS	PR	21,953.612 ns	480.2800 ns	1.00	518 B	1.00

Copy_PassAsArgumentAndReturn_FromResult	Main	4.835 ns	0.0076 ns	1.00	-	NA
Copy_PassAsArgumentAndReturn_FromResult	PR	4.843 ns	0.0007 ns	1.00	-	NA

Copy_PassAsArgumentAndReturn_FromTask	Main	8.296 ns	0.0055 ns	1.00	-	NA
Copy_PassAsArgumentAndReturn_FromTask	PR	8.311 ns	0.0122 ns	1.00	-	NA

Copy_PassAsArgumentAndReturn_FromValueTaskSource	Main	13.139 ns	0.0048 ns	1.00	-	NA
Copy_PassAsArgumentAndReturn_FromValueTaskSource	PR	13.156 ns	0.1064 ns	1.00	-	NA

CreateAndAwait_FromCompletedValueTaskSource_ConfigureAwait	Main	10.558 ns	0.1272 ns	1.00	-	NA
CreateAndAwait_FromCompletedValueTaskSource_ConfigureAwait	PR	10.565 ns	0.1278 ns	1.00	-	NA

System.Threading.Channels.Tests.UnboundedChannelPerfTests

BenchmarkDotNet v0.16.0-nightly.20260518.1249, Linux Ubuntu 24.04.4 LTS (Noble Numbat)
Neoverse-N2, 8 physical cores
Memory: 31.27 GB Total, 1.96 GB Available
  Job-TPEJOW : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
  Job-HKHXHK : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
EvaluateOverhead=False  OutlierMode=Default  PowerPlanMode=
IterationTime=250ms  MaxIterationCount=20  MemoryRandomization=Default
MinIterationCount=15  WarmupCount=1

Method	Toolchain	Mean	Error	Ratio	Allocated	Alloc Ratio
TryWriteThenTryRead	Main	37.79 ns	0.007 ns	1.00	-	NA
TryWriteThenTryRead	PR	40.39 ns	0.028 ns	1.07	-	NA

WriteAsyncThenReadAsync	Main	50.86 ns	0.067 ns	1.00	-	NA
WriteAsyncThenReadAsync	PR	45.99 ns	0.079 ns	0.90	-	NA

ReadAsyncThenWriteAsync	Main	77.99 ns	0.490 ns	1.00	-	NA
ReadAsyncThenWriteAsync	PR	75.48 ns	0.334 ns	0.97	-	NA

PingPong	Main	5,514,871.03 ns	106,290.832 ns	1.00	1087 B	1.00
PingPong	PR	5,255,181.47 ns	73,386.517 ns	0.95	1087 B	1.00

System.Threading.Channels.Tests.SpscUnboundedChannelPerfTests

BenchmarkDotNet v0.16.0-nightly.20260518.1249, Linux Ubuntu 24.04.4 LTS (Noble Numbat)
Neoverse-N2, 8 physical cores
Memory: 31.27 GB Total, 1.96 GB Available
  Job-TPEJOW : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
  Job-HKHXHK : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
EvaluateOverhead=False  OutlierMode=Default  PowerPlanMode=
IterationTime=250ms  MaxIterationCount=20  MemoryRandomization=Default
MinIterationCount=15  WarmupCount=1

Method	Toolchain	Mean	Error	Ratio	Allocated	Alloc Ratio
TryWriteThenTryRead	Main	30.80 ns	0.032 ns	1.00	-	NA
TryWriteThenTryRead	PR	27.20 ns	0.012 ns	0.88	-	NA

WriteAsyncThenReadAsync	Main	43.33 ns	0.598 ns	1.00	-	NA
WriteAsyncThenReadAsync	PR	38.68 ns	0.122 ns	0.89	-	NA

ReadAsyncThenWriteAsync	Main	74.22 ns	0.290 ns	1.00	-	NA
ReadAsyncThenWriteAsync	PR	73.65 ns	1.433 ns	0.99	-	NA

PingPong	Main	5,692,114.84 ns	79,885.611 ns	1.00	1087 B	1.00
PingPong	PR	5,464,477.26 ns	116,552.029 ns	0.96	1087 B	1.00

System.Threading.Channels.Tests.BoundedChannelPerfTests

BenchmarkDotNet v0.16.0-nightly.20260518.1249, Linux Ubuntu 24.04.4 LTS (Noble Numbat)
Neoverse-N2, 8 physical cores
Memory: 31.27 GB Total, 1.96 GB Available
  Job-TPEJOW : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
  Job-HKHXHK : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
EvaluateOverhead=False  OutlierMode=Default  PowerPlanMode=
IterationTime=250ms  MaxIterationCount=20  MemoryRandomization=Default
MinIterationCount=15  WarmupCount=1

Method	Toolchain	Mean	Error	Ratio	Allocated	Alloc Ratio
TryWriteThenTryRead	Main	52.18 ns	0.061 ns	1.00	-	NA
TryWriteThenTryRead	PR	48.51 ns	0.693 ns	0.93	-	NA

WriteAsyncThenReadAsync	Main	60.95 ns	0.328 ns	1.00	-	NA
WriteAsyncThenReadAsync	PR	60.13 ns	0.562 ns	0.99	-	NA

ReadAsyncThenWriteAsync	Main	73.42 ns	0.340 ns	1.00	-	NA
ReadAsyncThenWriteAsync	PR	70.95 ns	0.137 ns	0.97	-	NA

PingPong	Main	5,425,824.17 ns	102,591.823 ns	1.00	1091 B	1.00
PingPong	PR	5,359,848.63 ns	105,279.337 ns	0.99	1087 B	1.00

Copilot

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 1 comment.

jkotas · 2026-06-27T04:00:34Z

reconcile the last change with NativeAOT

Move the reconciled code to https://github.com/dotnet/runtime/blob/main/src/libraries/System.Private.CoreLib/src/System/Threading/Monitor.cs ? (It is ok to put it under #if !MONO).

VSadov · 2026-06-27T19:45:08Z

reconcile the last change with NativeAOT

Move the reconciled code to https://github.com/dotnet/runtime/blob/main/src/libraries/System.Private.CoreLib/src/System/Threading/Monitor.cs ? (It is ok to put it under #if !MONO).

I think all other parts of Monitor.CoreCLR.cs and Monitor.NativeAOT.cs can also be unified, except for GetLockObject, which can go to corresponding ObjectHeader.Xxx.cs.
After that the CoreCLR/NativeAot specialized Monitor files can be deleted.

Copilot

Pull request overview

Copilot reviewed 20 out of 20 changed files in this pull request and generated 2 comments.

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 20 out of 20 changed files in this pull request and generated 2 comments.


            Enter(obj);
            lockTaken = true;
        }


@@ -72,12 +91,47 @@
            lockTaken = TryEnter(obj, millisecondsTimeout);
        }


…istency

Copilot

Pull request overview

Copilot reviewed 20 out of 20 changed files in this pull request and generated no new comments.

Copilot

Pull request overview

Copilot reviewed 20 out of 20 changed files in this pull request and generated no new comments.

VSadov · 2026-06-27T22:59:14Z

Benchmark results on x64 (AMD Ryzen 9 7950X 16-Core, 32 logical)

Higher throughput score is better.

=== Baseline:

MonitorEnterExitThroughput_ThinLock
Score: 201576.726397
Score: 202436.333271
Score: 203180.694334
Score: 204067.363702
MonitorEnterExitThroughput_FatLock
Score: 39479.625025
Score: 38443.514581
Score: 38061.173308
Score: 38238.165156
MonitorReliableEnterExitThroughput_ThinLock
Score: 188490.232410
Score: 191453.478039
Score: 194553.775943
Score: 194445.956914
MonitorReliableEnterExitThroughput_FatLock
Score: 42856.454652
Score: 43312.340304
Score: 48272.270894
Score: 45227.479887
MonitorTryEnterExitWhenUnlockedThroughput_ThinLock
Score: 198000.253556
Score: 198732.319686
Score: 194977.181678
Score: 198214.666421
MonitorTryEnterExitWhenUnlockedThroughput_FatLock
Score: 50053.625368
Score: 50594.531324
Score: 51643.817642
Score: 54752.335148
MonitorTryEnterWhenLockedThroughput_ThinLock
Score: 187.621943
Score: 187.446513
Score: 186.867217
Score: 185.050039
MonitorTryEnterWhenLockedThroughput_FatLock
Score: 122036.117604
Score: 121911.685806
Score: 121771.070935
Score: 121101.464243
MonitorEnterExitThroughput_ThinLock 4 threads
Score: 15091.656060
Score: 15534.213503
Score: 14993.479097
Score: 19736.586282

=== The PR:

MonitorEnterExitThroughput_ThinLock
Score: 154464.095607
Score: 217711.380406
Score: 218026.091756
Score: 217935.895951
MonitorEnterExitThroughput_FatLock
Score: 79986.064468
Score: 82071.964781
Score: 82068.629759
Score: 82160.283568
MonitorReliableEnterExitThroughput_ThinLock
Score: 209866.727373
Score: 209585.926338
Score: 209941.740823
Score: 209841.104732
MonitorReliableEnterExitThroughput_FatLock
Score: 79885.028700
Score: 79803.379456
Score: 80225.828491
Score: 79852.378934
MonitorTryEnterExitWhenUnlockedThroughput_ThinLock
Score: 189936.575859
Score: 206003.327189
Score: 206216.118004
Score: 121574.662526
MonitorTryEnterExitWhenUnlockedThroughput_FatLock
Score: 74653.816314
Score: 70082.300140
Score: 74374.349556
Score: 73745.227369
MonitorTryEnterWhenLockedThroughput_ThinLock
Score: 164699.232703
Score: 173589.528792
Score: 171988.211683
Score: 176175.936385
MonitorTryEnterWhenLockedThroughput_FatLock
Score: 117888.342466
Score: 118657.910006
Score: 119202.648298
Score: 119379.729877
MonitorEnterExitThroughput_ThinLock 4 threads
Score: 15060.329220
Score: 14092.070436
Score: 14038.439235
Score: 14360.385036

VSadov · 2026-06-27T23:01:10Z

Benchmark results on x64 (AMD EPYC 7763, 32core VM)

Higher throughput score is better.

=== Baseline:

MonitorEnterExitThroughput_ThinLock
Score: 112428.900747
Score: 111783.752488
Score: 113419.808611
Score: 113442.662281
MonitorEnterExitThroughput_FatLock
Score: 46157.123683
Score: 48000.371799
Score: 48099.023496
Score: 47988.831344
MonitorReliableEnterExitThroughput_ThinLock
Score: 106102.947661
Score: 106286.004254
Score: 106221.572736
Score: 106272.136345
MonitorReliableEnterExitThroughput_FatLock
Score: 47304.596231
Score: 47163.620709
Score: 47399.212754
Score: 47329.591684
MonitorTryEnterExitWhenUnlockedThroughput_ThinLock
Score: 113513.919261
Score: 113225.298418
Score: 113564.640819
Score: 112520.718262
MonitorTryEnterExitWhenUnlockedThroughput_FatLock
Score: 42566.084846
Score: 42940.802887
Score: 43032.479556
Score: 42996.606459
MonitorTryEnterWhenLockedThroughput_ThinLock
Score: 173.389527
Score: 173.548475
Score: 172.357821
Score: 171.551225
MonitorTryEnterWhenLockedThroughput_FatLock
Score: 66307.880609
Score: 67656.424536
Score: 67674.791044
Score: 67664.123052
MonitorEnterExitThroughput_ThinLock 4 threads
Score: 8149.181411
Score: 8233.202662
Score: 8035.122987
Score: 7812.506877

=== The PR:

MonitorEnterExitThroughput_ThinLock
Score: 96649.044730
Score: 121305.114272
Score: 121486.719146
Score: 121451.534836
MonitorEnterExitThroughput_FatLock
Score: 46248.808055
Score: 47248.444368
Score: 47263.382549
Score: 47251.438058
MonitorReliableEnterExitThroughput_ThinLock
Score: 123780.517946
Score: 123860.514063
Score: 123708.817702
Score: 123277.294370
MonitorReliableEnterExitThroughput_FatLock
Score: 46393.301474
Score: 46339.351840
Score: 46410.241802
Score: 46388.893502
MonitorTryEnterExitWhenUnlockedThroughput_ThinLock
Score: 107954.966750
Score: 119534.704969
Score: 119669.035695
Score: 119581.321136
MonitorTryEnterExitWhenUnlockedThroughput_FatLock
Score: 45128.384040
Score: 45677.127578
Score: 45672.856415
Score: 45685.326649
MonitorTryEnterWhenLockedThroughput_ThinLock
Score: 98841.406193
Score: 106145.134669
Score: 106016.043160
Score: 104742.230834
MonitorTryEnterWhenLockedThroughput_FatLock
Score: 70098.952991
Score: 70709.638975
Score: 70713.785228
Score: 70706.522298
MonitorEnterExitThroughput_ThinLock 4 threads
Score: 9018.821144
Score: 9383.411536
Score: 9614.167826
Score: 9359.461361

VSadov · 2026-06-27T23:01:34Z

Benchmark results on ARM64 (Ampere Altra, 32core VM)

Higher throughput score is better.

=== Baseline:

MonitorEnterExitThroughput_ThinLock
Score: 56660.840612
Score: 56774.778486
Score: 56776.290530
Score: 56773.270009
MonitorEnterExitThroughput_FatLock
Score: 33092.046995
Score: 33860.236575
Score: 33629.040063
Score: 33671.006944
MonitorReliableEnterExitThroughput_ThinLock
Score: 50486.523951
Score: 50489.640400
Score: 50498.821481
Score: 50495.320848
MonitorReliableEnterExitThroughput_FatLock
Score: 33005.903001
Score: 33006.283203
Score: 33024.192097
Score: 33009.303281
MonitorTryEnterExitWhenUnlockedThroughput_ThinLock
Score: 56415.386677
Score: 56388.496396
Score: 56422.446890
Score: 56382.934901
MonitorTryEnterExitWhenUnlockedThroughput_FatLock
Score: 32905.674919
Score: 33031.659251
Score: 33084.943202
Score: 33067.449134
MonitorTryEnterWhenLockedThroughput_ThinLock
Score: 259.025333
Score: 259.219450
Score: 259.235537
Score: 259.250594
MonitorTryEnterWhenLockedThroughput_FatLock
Score: 71438.197628
Score: 73018.990510
Score: 73017.067410
Score: 73017.266668
MonitorEnterExitThroughput_ThinLock 4 threads
Score: 12254.268761
Score: 15645.607837
Score: 14481.160326
Score: 12710.443639

The PR:

MonitorEnterExitThroughput_ThinLock
Score: 49461.468399
Score: 54036.115980
Score: 54025.064366
Score: 53988.524878
MonitorEnterExitThroughput_FatLock
Score: 41047.923291
Score: 41925.404542
Score: 41902.637321
Score: 41904.975306
MonitorReliableEnterExitThroughput_ThinLock
Score: 50360.361549
Score: 50340.437096
Score: 50364.993324
Score: 50385.710838
MonitorReliableEnterExitThroughput_FatLock
Score: 40316.685411
Score: 40270.641518
Score: 40252.428043
Score: 40308.981007
MonitorTryEnterExitWhenUnlockedThroughput_ThinLock
Score: 53033.949830
Score: 53847.974752
Score: 53886.469440
Score: 53849.428887
MonitorTryEnterExitWhenUnlockedThroughput_FatLock
Score: 40812.373842
Score: 41376.893741
Score: 41364.518289
Score: 41342.095638
MonitorTryEnterWhenLockedThroughput_ThinLock
Score: 94713.681424
Score: 110839.692580
Score: 110892.094065
Score: 110858.858214
MonitorTryEnterWhenLockedThroughput_FatLock
Score: 80283.533073
Score: 81486.092133
Score: 81497.282190
Score: 81493.168143
MonitorEnterExitThroughput_ThinLock 4 threads
Score: 11633.823851
Score: 12972.612262
Score: 11602.659564
Score: 11747.000660

VSadov · 2026-06-27T23:21:46Z

The impact from the accidental inlining was fairly noticeable. Undoing that reverted a good portion of gains.
I have also made some changes that introduced more sharing and helped x64 a bit by allowing to reduce temp usage in native code.

The overall effect of this PR is a lot more sharing between CoreCLR and NativeAOT.
The performance impact is:

mostly improvements, up to +30% improvement in throughput depending on platform and on thin/fat lock scenario.
couple scenarios see minor regressions up to -3%
These scenarios appear to be quite sensitive to minor changes in codegen or hardware differences
(i.e. minor improvement vs. regression depending on AMD Epyc 7763 vs. AMD Ryzen 7950), so I think we can assume small differences to be a kind of "architectural noise".

The baseline benefited from partial inlining of the entry points as well since the entry points on CoreCLR were not NoInline.
Also there was some benefit from better codegen in c++.
Some of that is by design - since FCall does not need to pin, it does not need to save something to stack. But some differences in native code I cannot explain. It is possible that the presence of pinned locals force JIT into some less common paths and result in unnecessary temps, which results in longer code, pushing/poping on x64 and so on..

I do not think getting codegen on par with c++ here is a huge concern though. Pinning is a relatively rare case for the JIT and overall impact on real apps may be uninteresting.

Copilot AI review requested due to automatic review settings June 17, 2026 05:40

VSadov added the area-System.Threading label Jun 17, 2026

dotnet-policy-service Bot assigned VSadov Jun 17, 2026

Copilot started reviewing on behalf of VSadov June 17, 2026 05:41 View session

Copilot AI reviewed Jun 17, 2026

View reviewed changes

Comment thread src/coreclr/System.Private.CoreLib/src/System/Threading/ObjectHeader.CoreCLR.cs Outdated

This was referenced Jun 17, 2026

slow macOS - "##[error]The job running on agent Azure Pipelines 9 ran longer than the maximum time of 60 minutes." dotnet/dnceng#1883

Open

The Operation will be canceled. The next steps may not contain expected logs. dotnet/dnceng#3008

Open

jkotas reviewed Jun 17, 2026

View reviewed changes

Comment thread src/coreclr/System.Private.CoreLib/src/System/Threading/Monitor.CoreCLR.cs Outdated

Copilot AI review requested due to automatic review settings June 17, 2026 15:22

Copilot started reviewing on behalf of VSadov June 17, 2026 15:23 View session

Copilot AI reviewed Jun 17, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings June 18, 2026 02:01

Copilot started reviewing on behalf of VSadov June 18, 2026 02:01 View session

Copilot AI reviewed Jun 18, 2026

View reviewed changes

MihuBot mentioned this pull request Jun 18, 2026

[Benchmark ARM64] [VSadov] Move thin lock acquire/release in CoreCLR to mana ... MihuBot/runtime-utils#1998

Open

MihuBot mentioned this pull request Jun 18, 2026

[Benchmark ARM64] [VSadov] Move thin lock acquire/release in CoreCLR to mana ... MihuBot/runtime-utils#1999

Open

Copilot AI review requested due to automatic review settings June 19, 2026 16:50

VSadov force-pushed the manThin branch from cab63d4 to 276e18b Compare June 19, 2026 16:51

Copilot started reviewing on behalf of VSadov June 19, 2026 16:51 View session

Copilot AI reviewed Jun 19, 2026

View reviewed changes

MihuBot mentioned this pull request Jun 19, 2026

[Benchmark ARM64] [VSadov] Move thin lock acquire/release in CoreCLR to mana ... MihuBot/runtime-utils#2010

Open

MihuBot mentioned this pull request Jun 19, 2026

[Benchmark ARM64] [VSadov] Move thin lock acquire/release in CoreCLR to mana ... MihuBot/runtime-utils#2011

Open

MihuBot mentioned this pull request Jun 19, 2026

[Benchmark ARM64] [VSadov] Move thin lock acquire/release in CoreCLR to mana ... MihuBot/runtime-utils#2012

Open

Copilot AI review requested due to automatic review settings June 26, 2026 22:15

Copilot started reviewing on behalf of VSadov June 26, 2026 22:15 View session

Copilot AI reviewed Jun 26, 2026

View reviewed changes

Comment thread src/coreclr/System.Private.CoreLib/src/System/Threading/ObjectHeader.CoreCLR.cs Outdated

reconcile the last change with NativeAOT

fd42487

VSadov added 2 commits June 27, 2026 11:56

more Core/Aot unification

6048f2d

reorder Enter/TryEnter/Exit

5ea979e

Delete specialized Monitor files

ced7660

Copilot AI review requested due to automatic review settings June 27, 2026 19:47

Copilot started reviewing on behalf of VSadov June 27, 2026 19:48 View session

Copilot AI reviewed Jun 27, 2026

View reviewed changes

Comment thread src/coreclr/System.Private.CoreLib/src/System/Threading/ObjectHeader.CoreCLR.cs Outdated

Comment thread src/coreclr/System.Private.CoreLib/src/System/Threading/ObjectHeader.CoreCLR.cs Outdated

Potential fix for pull request finding

ab5d4c7

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings June 27, 2026 20:31

Copilot started reviewing on behalf of VSadov June 27, 2026 20:32 View session

comments use TryAcquireUncommon to match the method's name

57fbd1d

Copilot AI reviewed Jun 27, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings June 27, 2026 20:41

Copilot started reviewing on behalf of VSadov June 27, 2026 20:41 View session

NoInlining on synchronized method helpers - just in case and for cons…

0540700

…istency

Copilot AI reviewed Jun 27, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings June 27, 2026 20:49

Copilot started reviewing on behalf of VSadov June 27, 2026 20:49 View session

Copilot AI reviewed Jun 27, 2026

View reviewed changes

This was referenced Jun 28, 2026

Unable to pull image from mcr.microsoft.com #117164

Open

TestNativeDigits fails for ur-IN on Apple platforms (xunit v3 exposed previously non-running test) #125933

Open

		@@ -72,12 +91,47 @@
		lockTaken = TryEnter(obj, millisecondsTimeout);
		}

Uh oh!

Conversation

VSadov commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dotnet-policy-service Bot commented Jun 17, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

VSadov Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

VSadov commented Jun 18, 2026

Uh oh!

VSadov commented Jun 18, 2026

Uh oh!

MihuBot commented Jun 18, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

VSadov commented Jun 19, 2026

Uh oh!

VSadov commented Jun 19, 2026

Uh oh!

MihuBot commented Jun 19, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

jkotas commented Jun 27, 2026

Uh oh!

VSadov commented Jun 27, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

VSadov commented Jun 27, 2026

Uh oh!

VSadov commented Jun 27, 2026

Uh oh!

VSadov commented Jun 27, 2026

VSadov commented Jun 17, 2026 •

edited

Loading

VSadov Jun 17, 2026 •

edited

Loading

VSadov commented Jun 27, 2026 •

edited

Loading