[VSD] DispatchCache self-loop: unlocked UnlinkEntry during collectible ALC unload races Insert → process hang

## Summary

A macOS x64 hang dump from CI (runtime `11.0.0-preview.5.26272.112`, macOS 15.2) shows a
deadlock caused by a **cycle in a Virtual Stub Dispatch `DispatchCache` bucket chain**. One
thread spins forever walking the cycle while holding `m_writeLock`; every other VSD resolve
piles up behind it, hanging the whole process.

## Evidence from the dump

- 32 threads. **16** are blocked identically:
  `__psynch_mutexwait → CrstBase::Enter → DispatchCache::Insert →
   VirtualCallStubManager::ResolveWorker → VSD_ResolveWorker`
- Thread 25 (tid 0x18) is **running**, PC at `DispatchCache::Insert+192`, inside the inlined
  `Lookup` collision-chain walk (`movq 0x18(%rax),%rax` = follow `pNext`, loop if no match
  and not the sentinel).
- Walking that bucket in memory, `ResolveCacheElem @ 0x10c887480` has **`pNext` (+0x18)
  pointing at itself**:
  - `pMT=0x104ce34d0  token=0x200000001  target=0x592ff3a80  pNext=0x10c887480` (self)
- So thread 25 walks `… → 0x10c887480 → 0x10c887480 → …` forever while holding `m_writeLock`;
  the other 16 threads deadlock on `CrstBase::Enter()`. The hung host then trips Helix's
  15-min inactivity timer → `createdump` → "Test host process crashed."

## Root cause

The cache enforces a single-writer invariant — `DispatchCache::SetCacheEntry` asserts
`m_writeLock.OwnedByCurrentThread()` (CHAIN_LOOKUP). The normal writers honor it:

- `DispatchCache::Insert` — takes `m_writeLock` ✔
- `DispatchCache::PromoteChainEntry` — takes `m_writeLock` ✔

But `VirtualCallStubManager::~VirtualCallStubManager()` purges this manager's entries from the
**shared global** `g_resolveCache` via `DispatchCache::Iterator::UnlinkEntry()` **without taking
`m_writeLock`** (it rewrites chain pointers directly, bypassing `SetCacheEntry` and its assert).
When a **collectible** LoaderAllocator (assembly/ALC) is unloaded while other live threads
concurrently `Insert`/`PromoteChainEntry` into the same buckets, the two writers interleave and
can splice `elem->pNext = elem` (or a larger cycle).

This is a collectible-unload-only path (non-collectible managers only run it at process exit),
which matches the failure profile: a different random assembly hangs each run, only under heavy
ALC churn.

> Note: the racy code is old (≈2015) and we did **not** find a recent change to this file. The
> recent spike most likely comes from increased collectible-ALC unload activity rather than a
> change here; a repro would confirm.

## Proposed fix

Restore the single-writer invariant by taking the cache write lock around the unlink loops.
`m_writeLock` is `CrstStubDispatchCache` (level 0 / leaf, `CRST_UNSAFE_ANYMODE`), so it's
compatible with the destructor's `GC_NOTRIGGER`/`NOTHROW` contract and adds no lock-ordering risk.

```diff
diff --git a/src/coreclr/vm/virtualcallstub.h b/src/coreclr/vm/virtualcallstub.h
--- a/src/coreclr/vm/virtualcallstub.h
+++ b/src/coreclr/vm/virtualcallstub.h
@@ class DispatchCache
     // ... existing public members ...
+
+    // The cache enforces a single-writer invariant: callers that mutate a
+    // bucket chain (Insert / PromoteChainEntry / Iterator::UnlinkEntry) must
+    // hold this lock. Exposed so the collectible-unload purge in
+    // ~VirtualCallStubManager can serialize against concurrent inserts.
+    Crst *GetWriteLock() { LIMITED_METHOD_CONTRACT; return &m_writeLock; }

 private:
     Crst m_writeLock;

diff --git a/src/coreclr/vm/virtualcallstub.cpp b/src/coreclr/vm/virtualcallstub.cpp
--- a/src/coreclr/vm/virtualcallstub.cpp
+++ b/src/coreclr/vm/virtualcallstub.cpp
@@ VirtualCallStubManager::~VirtualCallStubManager()
 #ifdef FEATURE_VIRTUAL_STUB_DISPATCH
     // Go through each cache entry and if the cache element there is in
     // the cache entry heap of the manager being deleted, then we just
     // set the cache entry to empty.
+    // Serialize against concurrent Insert/PromoteChainEntry on the shared
+    // global cache; otherwise unlinking can race a concurrent insert and
+    // splice a chain into a self-referential cycle (process-wide VSD hang).
+    CrstHolder lh(g_resolveCache->GetWriteLock());
     DispatchCache::Iterator it(g_resolveCache);
     while (it.IsValid())
     {
         while (it.IsValid() && cache_entry_rangeList.IsInRange((TADDR)it.Entry()))
         {
             it.UnlinkEntry();
         }
         it.Next();
     }
 #endif // FEATURE_VIRTUAL_STUB_DISPATCH

@@ (stats-reset / manager-delete purge loop, ~line 420)
 #ifdef FEATURE_VIRTUAL_STUB_DISPATCH
     ...
+    CrstHolder lh(g_resolveCache->GetWriteLock());
     DispatchCache::Iterator it(g_resolveCache);
     while (it.IsValid())
     {
         it.UnlinkEntry();
     }
 #endif // FEATURE_VIRTUAL_STUB_DISPATCH
```

Optional defense-in-depth (not in the diff): bound the `Lookup`/`Insert` chain walk so any
future cycle degrades gracefully instead of hanging the host.

## Open questions for the runtime team

1. Confirm the collectible-unload teardown runs concurrently with other managed threads (the
   evidence says yes; if it were stop-the-world the lock would be a no-op and the cause lies elsewhere).
2. Confirm no lock held across `~VirtualCallStubManager` would invert ordering with the leaf lock.
3. Decide whether the bounded-walk hardening is worth adding alongside the lock fix.


### Known Issue Error Message

**DO NOT USE JSON BELOW IF THIS IS A BUILD BREAK** otherwise build analysis will allow pull requests to merge that break the build worse. For a build break, do not use this issue form. Make a regular new issue.

Fill the error message using [step by step known issues guidance](https://github.com/dotnet/arcade/blob/main/Documentation/Projects/Build%20Analysis/KnownIssueJsonStepByStep.md).



```json
{
  "ErrorMessage": "",
  "ErrorPattern": "",
  "BuildRetry": false,
  "ExcludeConsoleLog": false
}
```




### Report
#### Summary
|24-Hour Hit Count|7-Day Hit Count|1-Month Count|
|---|---|---|
|0|0|0|

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[VSD] DispatchCache self-loop: unlocked UnlinkEntry during collectible ALC unload races Insert → process hang #128859

Summary

Evidence from the dump

Root cause

Proposed fix

Open questions for the runtime team

Known Issue Error Message

Report

Summary

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[VSD] DispatchCache self-loop: unlocked UnlinkEntry during collectible ALC unload races Insert → process hang #128859

Description

Summary

Evidence from the dump

Root cause

Proposed fix

Open questions for the runtime team

Known Issue Error Message

Report

Summary

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions