aes_gcm/x86_64: Tweak `gcm_ghash_vpclmulqdq_avx2_16`. by briansmith · Pull Request #2478 · briansmith/ring

briansmith · 2025-03-12T19:16:20Z

Instead of starting with the body of the original
gcm_ghash_vpclmulqdq_avx2 and removing the multi-block support, start with
gcm_gmult_vpclmulqdq_avx2 and add the XOR of aad.

The instruction scheduling seems a bit better. Also, this computes
bswap(Xi ^ aad) instead of bswap(Xi) ^ bswap(aad), saving one pshufb.

Rename the function to gcm_ghash_vpclmulqdq_avx2_16 to better reflect its
constraint on aad_len_16.

This is the diff between this function and BoringSSL's
gcm_gmult_vpclmulqdq_avx2, as of
14d05a3.

--- a/crypto/fipsmodule/aes/asm/aes-gcm-avx2-x86_64.pl
+++ b/crypto/fipsmodule/aes/asm/aes-gcm-avx2-x86_64.pl
@@ -436,10 +436,17 @@ sub _ghash_4x {
     return $code;
 }

-# void gcm_gmult_vpclmulqdq_avx2(uint8_t Xi[16], const u128 Htable[16]);
-$code .= _begin_func "gcm_gmult_vpclmulqdq_avx2", 1;
+# void gcm_ghash_vpclmulqdq_avx2_16(uint8_t Xi[16], const u128 Htable[16],
+#                                   const uint8_t aad[16], size_t aad_len_16);
+#
+# Using the key |Htable|, update the GHASH accumulator |Xi| with the data given
+# by |aad| and |aad_len_16|. |aad_len_16| must be exactly 16.
+#
+# This has the same signature `gcm_ghash_vpclmulqdq_avx2` but uses the
+# implementation from `gcm_gmult_vpclmulqdq_avx2`, with the XOR of `aad` added.
+$code .= _begin_func "gcm_ghash_vpclmulqdq_avx2_16", 1;
 {
-    my ( $GHASH_ACC_PTR, $HTABLE ) = @argregs[ 0 .. 1 ];
+    my ( $GHASH_ACC_PTR, $HTABLE, $AAD, $AAD_LEN_16 ) = @argregs[ 0 .. 3 ];
     my ( $GHASH_ACC, $BSWAP_MASK, $H_POW1, $GFPOLY, $T0, $T1, $T2 ) =
       map( "%xmm$_", ( 0 .. 6 ) );

@@ -448,6 +455,10 @@ $code .= _begin_func "gcm_gmult_vpclmulqdq_avx2", 1;
     .seh_endprologue

     vmovdqu         ($GHASH_ACC_PTR), $GHASH_ACC
+
+    # XOR the AAD into the accumulator.
+    vpxor           ($AAD), $GHASH_ACC, $GHASH_ACC
+
     vmovdqu         .Lbswap_mask(%rip), $BSWAP_MASK
     vmovdqu         $OFFSETOFEND_H_POWERS-16($HTABLE), $H_POW1
     vmovdqu         .Lgfpoly(%rip), $GFPOLY
@@ -463,108 +474,6 @@ ___
 }
 $code .= _end_func;

See the full diff:

git difftool 14d05a3b27f6706f9842583af405163b434c5c0e \
    crypto/fipsmodule/aes/asm/aes-gcm-avx2-x86_64.pl

codecov · 2025-03-12T19:17:58Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.60%. Comparing base (ec4f5be) to head (aa42898).
Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2478      +/-   ##
==========================================
- Coverage   96.60%   96.60%   -0.01%     
==========================================
  Files         180      180              
  Lines       21820    21820              
  Branches      539      539              
==========================================
- Hits        21080    21079       -1     
  Misses        623      623              
- Partials      117      118       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Instead of starting with the body of the original `gcm_ghash_vpclmulqdq_avx2` and removing the multi-block support, start with `gcm_gmult_vpclmulqdq_avx2` and add the XOR of `aad`. The instruction scheduling seems a bit better. Also, this computes `bswap(Xi ^ aad)` instead of `bswap(Xi) ^ bswap(aad)`, saving one pshufb. Rename the function to `gcm_ghash_vpclmulqdq_avx2_16` to better reflect its constraint on `aad_len_16`. This is the diff between this function and BoringSSL's `gcm_gmult_vpclmulqdq_avx2`, as of 14d05a3. ```diff --- a/crypto/fipsmodule/aes/asm/aes-gcm-avx2-x86_64.pl +++ b/crypto/fipsmodule/aes/asm/aes-gcm-avx2-x86_64.pl @@ -436,10 +436,17 @@ sub _ghash_4x { return $code; } -# void gcm_gmult_vpclmulqdq_avx2(uint8_t Xi[16], const u128 Htable[16]); -$code .= _begin_func "gcm_gmult_vpclmulqdq_avx2", 1; +# void gcm_ghash_vpclmulqdq_avx2_16(uint8_t Xi[16], const u128 Htable[16], +# const uint8_t aad[16], size_t aad_len_16); +# +# Using the key |Htable|, update the GHASH accumulator |Xi| with the data given +# by |aad| and |aad_len_16|. |aad_len_16| must be exactly 16. +# +# This has the same signature `gcm_ghash_vpclmulqdq_avx2` but uses the +# implementation from `gcm_gmult_vpclmulqdq_avx2`, with the XOR of `aad` added. +$code .= _begin_func "gcm_ghash_vpclmulqdq_avx2_16", 1; { - my ( $GHASH_ACC_PTR, $HTABLE ) = @argregs[ 0 .. 1 ]; + my ( $GHASH_ACC_PTR, $HTABLE, $AAD, $AAD_LEN_16 ) = @argregs[ 0 .. 3 ]; my ( $GHASH_ACC, $BSWAP_MASK, $H_POW1, $GFPOLY, $T0, $T1, $T2 ) = map( "%xmm$_", ( 0 .. 6 ) ); @@ -448,6 +455,10 @@ $code .= _begin_func "gcm_gmult_vpclmulqdq_avx2", 1; .seh_endprologue vmovdqu ($GHASH_ACC_PTR), $GHASH_ACC + + # XOR the AAD into the accumulator. + vpxor ($AAD), $GHASH_ACC, $GHASH_ACC + vmovdqu .Lbswap_mask(%rip), $BSWAP_MASK vmovdqu $OFFSETOFEND_H_POWERS-16($HTABLE), $H_POW1 vmovdqu .Lgfpoly(%rip), $GFPOLY @@ -463,108 +474,6 @@ ___ } $code .= _end_func; ``` See the full diff: ``` git difftool 14d05a3 \ crypto/fipsmodule/aes/asm/aes-gcm-avx2-x86_64.pl ```

briansmith self-assigned this Mar 12, 2025

briansmith force-pushed the b/aes-gcm-avx2-tweak-2 branch from 70a769c to cd9adfe Compare March 12, 2025 19:18

This was referenced Mar 12, 2025

aes-gcm/x86-64: Tweak AVX2 VAES-VCLMUL. #2477

Closed

aes-gcm: Enable AVX-512 implementation. #2444

Open

briansmith force-pushed the b/aes-gcm-avx2-tweak-2 branch from cd9adfe to c41cfcb Compare March 12, 2025 19:34

briansmith added this to the 0.17.15 milestone Mar 12, 2025

briansmith force-pushed the b/aes-gcm-avx2-tweak-2 branch 9 times, most recently from d5bfd1e to aa412fa Compare March 19, 2025 23:19

briansmith changed the title ~~aes_gcm/x86_64: Tweak gcm_ghash_vpclmulqdq_avx2_1.~~ aes_gcm/x86_64: Tweak gcm_ghash_vpclmulqdq_avx2_16. Mar 19, 2025

briansmith force-pushed the b/aes-gcm-avx2-tweak-2 branch from aa412fa to aa42898 Compare March 19, 2025 23:21

briansmith merged commit bcf68dd into main Mar 20, 2025

briansmith deleted the b/aes-gcm-avx2-tweak-2 branch March 20, 2025 00:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

aes_gcm/x86_64: Tweak `gcm_ghash_vpclmulqdq_avx2_16`.#2478

aes_gcm/x86_64: Tweak `gcm_ghash_vpclmulqdq_avx2_16`.#2478
briansmith merged 1 commit into
mainfrom
b/aes-gcm-avx2-tweak-2

briansmith commented Mar 12, 2025 •

edited

Loading

Uh oh!

codecov Bot commented Mar 12, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

briansmith commented Mar 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented Mar 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

briansmith commented Mar 12, 2025 •

edited

Loading

codecov Bot commented Mar 12, 2025 •

edited

Loading