Skip to content

Refactor GELU and Sigmoid epilogue to use a common template (and add SiLu, Hardswish epilogue)#379

Merged
hwu36 merged 12 commits into
NVIDIA:masterfrom
masahi:gelu-sigmoid-refactor
Dec 18, 2021
Merged

Refactor GELU and Sigmoid epilogue to use a common template (and add SiLu, Hardswish epilogue)#379
hwu36 merged 12 commits into
NVIDIA:masterfrom
masahi:gelu-sigmoid-refactor

Conversation

@masahi

@masahi masahi commented Dec 13, 2021

Copy link
Copy Markdown
Contributor

Since Sigmoid and GeLU epilogue functors are almost identical, I propose to refactor them. The motivation is to make it easy to add another functors.

Now Sigmoid and GeLU epilogue functors are defined as:

using LinearCombinationSigmoid = LinearCombinationGeneric<Sigmoid, ElementOutput_, Count, ElementAccumulator_,
                                                          ElementCompute_, FloatRoundStyle::round_to_nearest>;

using LinearCombinationGELU = LinearCombinationGeneric<GELU, ElementOutput_, Count, ElementAccumulator_,
                                                          ElementCompute_, FloatRoundStyle::round_to_nearest>;

I'm going to add two new functors for SiLU and HardSwish activation, which can be implemented in the same manner.

What do you think? @hwu36

@masahi

masahi commented Dec 14, 2021

Copy link
Copy Markdown
Contributor Author

ok I've pushed two activation using this common template.

@masahi masahi changed the title Refactor GELU and Sigmoid epilogue to use a common template Refactor GELU and Sigmoid epilogue to use a common template (and add SiLu, Hardswish epilogue) Dec 14, 2021
@d-k-b

d-k-b commented Dec 14, 2021

Copy link
Copy Markdown
Collaborator

Thanks for helping clean this up! Can you add documentation with a short description and links to resources to the SiLu and HardSwish structs for future reference?

@masahi

masahi commented Dec 14, 2021

Copy link
Copy Markdown
Contributor Author

@d-k-b Thanks for the feedback, the doc is updated.

@hwu36

hwu36 commented Dec 17, 2021

Copy link
Copy Markdown
Collaborator

Overall, looks good to me. I think just change FloatRoundStyle::round_to_nearest to Round everywhere.

@masahi

masahi commented Dec 18, 2021

Copy link
Copy Markdown
Contributor Author

Oops thanks for spotting this, fixed.

@masahi

masahi commented Dec 18, 2021

Copy link
Copy Markdown
Contributor Author

Not sure why Ci has failed

Comment thread include/cutlass/epilogue/thread/linear_combination_gelu.h Outdated
Comment thread include/cutlass/epilogue/thread/activation.h Outdated
@hwu36

hwu36 commented Dec 18, 2021

Copy link
Copy Markdown
Collaborator

These 4 activations look expensive to me. You can try to set kIsHeavy to true, check the generated code is different, and compare the performance to verify it.

If you setting them to true, the epilogue will not be fully unrolled which is good for the I$. Usually complex activations such as gelu need to be set to true.

@masahi

masahi commented Dec 18, 2021

Copy link
Copy Markdown
Contributor Author

These 4 activations look expensive to me. You can try to set kIsHeavy to true, check the generated code is different, and compare the performance to verify it.

If you setting them to true, the epilogue will not be fully unrolled which is good for the I$. Usually complex activations such as gelu need to be set to true.

My patch respects the current choice for the existing epilogues: isHeavy = true for GELU

and whatever the default for Sigmoid which I assumed was false .
// If the epilogue functor does not define `kIsHeavy` or if it is `false`, then
// the behavior from CUTLASS 2.5 and before is retained. The epilogue is fully
// unrolled and inlined.
//

For the two new activations, my reasoning was false for Silu because it is just sigmoid + mul, and also false for hardswish since it is similar to relu.

@hwu36

hwu36 commented Dec 18, 2021

Copy link
Copy Markdown
Collaborator

SGTM.

kIsHeavy was newly introduced. We benchmarked gelu, but not sigmoid. If you find sigmoid is expensive in the future, you can create a PR and flip the bool.

@masahi

masahi commented Dec 18, 2021

Copy link
Copy Markdown
Contributor Author

Making sigmoid and silu isHeavy = true did improve performance on efficientnetv2 slightly: 8.58 msec -> 8.26 msec where 8.58 was the number from my PR apache/tvm#9746. So I pushed that change. The change didn't help when the vectorized variant of sigmoid is used.

@hwu36

hwu36 commented Dec 18, 2021

Copy link
Copy Markdown
Collaborator

The change didn't help when the vectorized variant of sigmoid is used.

Vectorized one has less instruction and less burden to I$.

@hwu36

hwu36 commented Dec 18, 2021

Copy link
Copy Markdown
Collaborator

Making sigmoid and silu isHeavy = true did improve performance on efficientnetv2 slightly: 8.58 msec -> 8.26 msec where 8.58 was the number from my PR apache/tvm#9746.

You can update your TVM PR now. :)

@hwu36 hwu36 merged commit 0dc3ba6 into NVIDIA:master Dec 18, 2021
Albresky pushed a commit to Albresky/cutlass that referenced this pull request Oct 11, 2025
…SiLu, Hardswish epilogue) (NVIDIA#379)

* Support half precision sigmoid activation

* introduce a vectorized variant using fast_tanh

* refactored sigmoid using the new interface

* refactored gelu

* add silu activation

* add hardswish

* remove sigmoid for now

* add description to silu and hardswish, and other doc update

* Do not ignore Round

* use constant N

* Set isHeavy = true in sigmoid and silu epilogue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants