Skip to content

aes_gcm/x86_64: Tweak gcm_ghash_vpclmulqdq_avx2_16.#2478

Merged
briansmith merged 1 commit into
mainfrom
b/aes-gcm-avx2-tweak-2
Mar 20, 2025
Merged

aes_gcm/x86_64: Tweak gcm_ghash_vpclmulqdq_avx2_16.#2478
briansmith merged 1 commit into
mainfrom
b/aes-gcm-avx2-tweak-2

Conversation

@briansmith

@briansmith briansmith commented Mar 12, 2025

Copy link
Copy Markdown
Owner

Instead of starting with the body of the original
gcm_ghash_vpclmulqdq_avx2 and removing the multi-block support, start with
gcm_gmult_vpclmulqdq_avx2 and add the XOR of aad.

The instruction scheduling seems a bit better. Also, this computes
bswap(Xi ^ aad) instead of bswap(Xi) ^ bswap(aad), saving one pshufb.

Rename the function to gcm_ghash_vpclmulqdq_avx2_16 to better reflect its
constraint on aad_len_16.

This is the diff between this function and BoringSSL's
gcm_gmult_vpclmulqdq_avx2, as of
14d05a3.

--- a/crypto/fipsmodule/aes/asm/aes-gcm-avx2-x86_64.pl
+++ b/crypto/fipsmodule/aes/asm/aes-gcm-avx2-x86_64.pl
@@ -436,10 +436,17 @@ sub _ghash_4x {
     return $code;
 }

-# void gcm_gmult_vpclmulqdq_avx2(uint8_t Xi[16], const u128 Htable[16]);
-$code .= _begin_func "gcm_gmult_vpclmulqdq_avx2", 1;
+# void gcm_ghash_vpclmulqdq_avx2_16(uint8_t Xi[16], const u128 Htable[16],
+#                                   const uint8_t aad[16], size_t aad_len_16);
+#
+# Using the key |Htable|, update the GHASH accumulator |Xi| with the data given
+# by |aad| and |aad_len_16|. |aad_len_16| must be exactly 16.
+#
+# This has the same signature `gcm_ghash_vpclmulqdq_avx2` but uses the
+# implementation from `gcm_gmult_vpclmulqdq_avx2`, with the XOR of `aad` added.
+$code .= _begin_func "gcm_ghash_vpclmulqdq_avx2_16", 1;
 {
-    my ( $GHASH_ACC_PTR, $HTABLE ) = @argregs[ 0 .. 1 ];
+    my ( $GHASH_ACC_PTR, $HTABLE, $AAD, $AAD_LEN_16 ) = @argregs[ 0 .. 3 ];
     my ( $GHASH_ACC, $BSWAP_MASK, $H_POW1, $GFPOLY, $T0, $T1, $T2 ) =
       map( "%xmm$_", ( 0 .. 6 ) );

@@ -448,6 +455,10 @@ $code .= _begin_func "gcm_gmult_vpclmulqdq_avx2", 1;
     .seh_endprologue

     vmovdqu         ($GHASH_ACC_PTR), $GHASH_ACC
+
+    # XOR the AAD into the accumulator.
+    vpxor           ($AAD), $GHASH_ACC, $GHASH_ACC
+
     vmovdqu         .Lbswap_mask(%rip), $BSWAP_MASK
     vmovdqu         $OFFSETOFEND_H_POWERS-16($HTABLE), $H_POW1
     vmovdqu         .Lgfpoly(%rip), $GFPOLY
@@ -463,108 +474,6 @@ ___
 }
 $code .= _end_func;

See the full diff:

git difftool 14d05a3b27f6706f9842583af405163b434c5c0e \
    crypto/fipsmodule/aes/asm/aes-gcm-avx2-x86_64.pl

@briansmith briansmith self-assigned this Mar 12, 2025
@codecov

codecov Bot commented Mar 12, 2025

Copy link
Copy Markdown

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.60%. Comparing base (ec4f5be) to head (aa42898).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2478      +/-   ##
==========================================
- Coverage   96.60%   96.60%   -0.01%     
==========================================
  Files         180      180              
  Lines       21820    21820              
  Branches      539      539              
==========================================
- Hits        21080    21079       -1     
  Misses        623      623              
- Partials      117      118       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@briansmith briansmith force-pushed the b/aes-gcm-avx2-tweak-2 branch from 70a769c to cd9adfe Compare March 12, 2025 19:18
@briansmith briansmith force-pushed the b/aes-gcm-avx2-tweak-2 branch from cd9adfe to c41cfcb Compare March 12, 2025 19:34
@briansmith briansmith added this to the 0.17.15 milestone Mar 12, 2025
@briansmith briansmith force-pushed the b/aes-gcm-avx2-tweak-2 branch 9 times, most recently from d5bfd1e to aa412fa Compare March 19, 2025 23:19
@briansmith briansmith changed the title aes_gcm/x86_64: Tweak gcm_ghash_vpclmulqdq_avx2_1. aes_gcm/x86_64: Tweak gcm_ghash_vpclmulqdq_avx2_16. Mar 19, 2025
Instead of starting with the body of the original
`gcm_ghash_vpclmulqdq_avx2` and removing the multi-block support, start with
`gcm_gmult_vpclmulqdq_avx2` and add the XOR of `aad`.

The instruction scheduling seems a bit better. Also, this computes
`bswap(Xi ^ aad)` instead of `bswap(Xi) ^ bswap(aad)`, saving one pshufb.

Rename the function to `gcm_ghash_vpclmulqdq_avx2_16` to better reflect its
constraint on `aad_len_16`.

This is the diff between this function and BoringSSL's
`gcm_gmult_vpclmulqdq_avx2`, as of
14d05a3.

```diff
--- a/crypto/fipsmodule/aes/asm/aes-gcm-avx2-x86_64.pl
+++ b/crypto/fipsmodule/aes/asm/aes-gcm-avx2-x86_64.pl
@@ -436,10 +436,17 @@ sub _ghash_4x {
     return $code;
 }

-# void gcm_gmult_vpclmulqdq_avx2(uint8_t Xi[16], const u128 Htable[16]);
-$code .= _begin_func "gcm_gmult_vpclmulqdq_avx2", 1;
+# void gcm_ghash_vpclmulqdq_avx2_16(uint8_t Xi[16], const u128 Htable[16],
+#                                   const uint8_t aad[16], size_t aad_len_16);
+#
+# Using the key |Htable|, update the GHASH accumulator |Xi| with the data given
+# by |aad| and |aad_len_16|. |aad_len_16| must be exactly 16.
+#
+# This has the same signature `gcm_ghash_vpclmulqdq_avx2` but uses the
+# implementation from `gcm_gmult_vpclmulqdq_avx2`, with the XOR of `aad` added.
+$code .= _begin_func "gcm_ghash_vpclmulqdq_avx2_16", 1;
 {
-    my ( $GHASH_ACC_PTR, $HTABLE ) = @argregs[ 0 .. 1 ];
+    my ( $GHASH_ACC_PTR, $HTABLE, $AAD, $AAD_LEN_16 ) = @argregs[ 0 .. 3 ];
     my ( $GHASH_ACC, $BSWAP_MASK, $H_POW1, $GFPOLY, $T0, $T1, $T2 ) =
       map( "%xmm$_", ( 0 .. 6 ) );

@@ -448,6 +455,10 @@ $code .= _begin_func "gcm_gmult_vpclmulqdq_avx2", 1;
     .seh_endprologue

     vmovdqu         ($GHASH_ACC_PTR), $GHASH_ACC
+
+    # XOR the AAD into the accumulator.
+    vpxor           ($AAD), $GHASH_ACC, $GHASH_ACC
+
     vmovdqu         .Lbswap_mask(%rip), $BSWAP_MASK
     vmovdqu         $OFFSETOFEND_H_POWERS-16($HTABLE), $H_POW1
     vmovdqu         .Lgfpoly(%rip), $GFPOLY
@@ -463,108 +474,6 @@ ___
 }
 $code .= _end_func;
```

See the full diff:
```
git difftool 14d05a3 \
    crypto/fipsmodule/aes/asm/aes-gcm-avx2-x86_64.pl
```
@briansmith briansmith force-pushed the b/aes-gcm-avx2-tweak-2 branch from aa412fa to aa42898 Compare March 19, 2025 23:21
@briansmith briansmith merged commit bcf68dd into main Mar 20, 2025
@briansmith briansmith deleted the b/aes-gcm-avx2-tweak-2 branch March 20, 2025 00:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant