Skip to content

JIT: Allow more containment opts in Tier0#117622

Merged
tannergooding merged 4 commits into
dotnet:mainfrom
saucecontrol:more-t0-opts
Jul 22, 2025
Merged

JIT: Allow more containment opts in Tier0#117622
tannergooding merged 4 commits into
dotnet:mainfrom
saucecontrol:more-t0-opts

Conversation

@saucecontrol

@saucecontrol saucecontrol commented Jul 14, 2025

Copy link
Copy Markdown
Member

This enables embedded broadcast of non-const values in Tier0

Diffs are a net improvement, although there are a few regressions where an extra temp ends up being introduced due to arg swapping.

There are also a few 1- or 2-byte regressions where we swapped from containing a full vector load arg to containing a broadcast arg, which then forces EVEX encoding. It would be interesting to look at optimizing around that (separately -- it would impact FullOpts as well)

@github-actions github-actions Bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jul 14, 2025
@dotnet-policy-service dotnet-policy-service Bot added the community-contribution Indicates that the PR has been added by a community member label Jul 14, 2025
@saucecontrol

Copy link
Copy Markdown
Member Author

cc @tannergooding

@tannergooding

Copy link
Copy Markdown
Member

There are also a few 1- or 2-byte regressions where we swapped from containing a full vector load arg to containing a broadcast arg

We view this as an explicit improvement and the real "issue" is more that SPMI doesn't surface any size savings in the data section size. -- That is, while the codegen is 1-2 bytes bigger, we save 8-60 bytes of data section size and improve cache locality.

Comment thread src/coreclr/jit/lowerxarch.cpp
Comment thread src/coreclr/jit/lowerxarch.cpp Outdated
@saucecontrol

Copy link
Copy Markdown
Member Author

We view this as an explicit improvement and the real "issue" is more that SPMI doesn't surface any size savings in the data section size. -- That is, while the codegen is 1-2 bytes bigger, we save 8-60 bytes of data section size and improve cache locality.

The cases I'm referring to are like this:
image

where it's a broadcast either way, and we can contain either the broadcast or the full vector. It's always 2 instructions because they can't both be contained. Switching from containing the full vector to containing the broadcast means you have to switch to EVEX, so it's a net increase in size.

This particular regression only applies to instructions where we swap operands in order to be able to contain one, so I think we could simply give lower preference to CnsVec operands that might be turned into broadcast. Or something like that?

Comment thread src/coreclr/jit/lowerxarch.cpp
@tannergooding

Copy link
Copy Markdown
Member

This particular regression only applies to instructions where we swap operands in order to be able to contain one, so I think we could simply give lower preference to CnsVec operands that might be turned into broadcast. Or something like that?

Ah, I see.

Yeah, in general we want to prefer loads from arbitrary memory, then broadcastable constants, then regular constants.

@saucecontrol

Copy link
Copy Markdown
Member Author

Disabled the aligned load containment. Diffs are smaller but still a net improvement.

@saucecontrol

Copy link
Copy Markdown
Member Author

I've split the TryFoldCnsVecForEmbeddedBroadcast changes out into to #117700

@tannergooding tannergooding left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. CC. @dotnet/jit-contrib for secondary review

@tannergooding

Copy link
Copy Markdown
Member

/ba-g unrelated arm64 timeouts

@tannergooding tannergooding merged commit 0b2f272 into dotnet:main Jul 22, 2025
102 of 110 checks passed
@saucecontrol saucecontrol deleted the more-t0-opts branch July 22, 2025 04:09
@github-actions github-actions Bot locked and limited conversation to collaborators Aug 21, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants