[Wasm RyuJIT] Spill live ref/byref values to pinned stack slots at calls#129059
[Wasm RyuJIT] Spill live ref/byref values to pinned stack slots at calls#129059kg wants to merge 24 commits into
Conversation
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
There was a problem hiding this comment.
Pull request overview
This PR introduces a Wasm-specific mechanism intended to make ref/byref values GC-visible across call sites by injecting a new IR node (GT_WASM_SPILL_REF) and a new Wasm phase (WasmSpillRefs) that inserts these nodes and allocates pinned stack spill slots used during Wasm codegen.
Changes:
- Add
GT_WASM_SPILL_REFnode kind and operand iteration support. - Add
Compiler::WasmSpillRefsphase to insert spill nodes around calls and allocate spill locals. - Extend Wasm codegen/regalloc to track a spill index and (temporarily) force-enregister a scratch “splash zone” local.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| src/coreclr/jit/regallocwasm.cpp | Forces the “splash zone” local to be treated as a reg candidate. |
| src/coreclr/jit/gtlist.h | Adds new Wasm node WASM_SPILL_REF. |
| src/coreclr/jit/gentree.cpp | Updates operand-edge iterator to treat GT_WASM_SPILL_REF as unary. |
| src/coreclr/jit/fgwasm.cpp | Implements Compiler::WasmSpillRefs to insert spill nodes and allocate spill locals. |
| src/coreclr/jit/compphases.h | Adds PHASE_WASM_SPILL_REFS. |
| src/coreclr/jit/compmemkind.h | Adds WasmSpillRefs memory kind. |
| src/coreclr/jit/compiler.h | Adds m_wasmSpillSlots field and WasmSpillRefs declaration. |
| src/coreclr/jit/compiler.cpp | Wires WasmSpillRefs into the Wasm compilation pipeline. |
| src/coreclr/jit/codegenwasm.cpp | Emits Wasm for GT_WASM_SPILL_REF and resets spill index at calls. |
| src/coreclr/jit/codegenlinear.cpp | Resets spill index at block boundaries on Wasm. |
| src/coreclr/jit/codegen.h | Adds wasmSpillRefIndex state. |
AndyAyersMS
left a comment
There was a problem hiding this comment.
Generally looks good.
I'm curious how this intersects/overlaps with LSRA's spill temp mechanism. Not saying we should use that here, but it likely serves a similar purpose.
|
Still working on this, going to try moving to STORE_LCL_VAR/LCL_VAR like Jakob suggested. |
|
Updated to use STORE_LCL_VAR and LCL_VAR instead, removing the splash zone var. |
| spillSlot = lvaGrabTemp(false DEBUGARG("WasmSpillRefs spill slot")); | ||
| LclVarDsc* const varDsc = lvaGetDesc(spillSlot); | ||
| varDsc->lvType = TYP_BYREF; | ||
| varDsc->lvPinned = true; |
There was a problem hiding this comment.
Does this need to be pinned? This will still have the unfortunate side effect that if someone introduces a local for an edge before this point then they will see spurious regressions because the handling here differs from the handling in all other places.
There was a problem hiding this comment.
It doesn't have to be pinned, but the idea was to avoid needing to read the value back later. A GC ref can live on the wasm evaluation stack across a call if we know it's value can't change.
There was a problem hiding this comment.
I understand, but I think optimizing the LIR edge case but not the short-lived local case is unfortunate. I wouldn't expect the LIR edge case to be all that common in comparison, so it is a bit odd to me to care about that case but not about the short-lived local case. But I suppose a separate pinning optimization for short-lived locals could always be done at a later point.
|
I had to come up with a different test program to get failures without this PR's changes - the previous test program was only failing because of some other bugs that AndyA has since fixed. public class C {
public string S = "";
public byte[]? Trash;
[MethodImpl(MethodImplOptions.NoInlining)]
public void AppendToS (string value) {
Trash = new byte[32768];
S += value;
}
}
[MethodImpl(MethodImplOptions.NoInlining)]
public static int Main () {
Console.WriteLine("Hello world!");
string s = "";
for (int i = 0; i < 1000; i++) {
C c = new C();
for (int j = 0; j < 2000; j++)
c.AppendToS(j.ToString());
s = c.S;
}
Console.WriteLine(s);
return 42;
}With the spilling pass disabled this asserts inside the GC during execution, i.e. |
AndyAyersMS
left a comment
There was a problem hiding this comment.
Think this is close, though we probably still need slot zeroing.
Another option would be to try and report these slots as tracked, since we know their lifetimes, but that can happen later.
Interested to see the code size impact.
AndyAyersMS
left a comment
There was a problem hiding this comment.
LGTM
@jakobbotsch you want to take another look?
| // For the spill slots we used, zero them at the end of the block to avoid keeping objects pinned way longer | ||
| // than absolutely necessary. | ||
| if (spillSlotsToZeroAtEndOfBlock.size()) | ||
| { | ||
| // There's no point in wasting time zeroing slots immediately before a return at the end of the block | ||
| // since the GC will have no opportunity to inspect the slots. We could potentially do this for throws | ||
| // too, but it's possible the exception would be caught and the method would continue running | ||
| if (!block->KindIs(BBJ_RETURN)) | ||
| { | ||
| for (unsigned lclNum : spillSlotsToZeroAtEndOfBlock) | ||
| { | ||
| GenTree* zero = gtNewZeroConNode(TYP_I_IMPL); | ||
| GenTree* store = gtNewStoreLclVarNode(lclNum, zero); | ||
| LIR::Range range = LIR::SeqTree(this, store); | ||
| LIR::InsertBeforeTerminator(block, std::move(range)); | ||
| } | ||
| } | ||
|
|
||
| spillSlotsToZeroAtEndOfBlock.clear(); | ||
| } |
There was a problem hiding this comment.
I kind of wonder how much this slot reuse and pinning gains us compared to a simpler approach of just creating new unpinned locals.
There was a problem hiding this comment.
I think the slot reuse probably isn't that important, it's an artifact of the original design.
I think making this work without pinning is a lot more complex, though.
jakobbotsch
left a comment
There was a problem hiding this comment.
In general I would have preferred to keep the slot reuse and pinning optimization separate if a simpler "just create a new local for the LIR edge" would have sufficed for correctness.
That aside this looks good to me, I left mainly nits as comments. Feel free to ignore them.
GC refs/byrefs can end up temporarily living on the Wasm shadow stack - where the GC can't see them - across calls, which creates a GC hole. This PR adds a pass that 'spills' these live refs/byrefs to dedicated pinned locals so that the GC won't move or collect those refs.