Skip to content

Commit e7a08f7

Browse files
committed
Add x86-64 runtime selection target
1 parent 282e493 commit e7a08f7

19 files changed

Lines changed: 2511 additions & 72 deletions

.github/workflows/ci.yml

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,3 +45,31 @@ jobs:
4545

4646
- name: Run Unit Tests
4747
run: bin/generic64/UnitTests -a
48+
49+
all_x86-64_implementation:
50+
runs-on: ubuntu-latest
51+
52+
steps:
53+
- name: Check out the code
54+
uses: actions/checkout@v3
55+
56+
- name: Initialize submodules
57+
run: git submodule update --init --recursive
58+
59+
- name: Install dependencies
60+
run: sudo apt-get install -y xsltproc
61+
62+
- name: Build x86-64 Unit Tests
63+
run: make x86-64/UnitTests
64+
65+
- name: Run x86-64 Unit Tests
66+
run: bin/x86-64/UnitTests -a
67+
68+
- name: Run x86-64 Unit Tests with AVX512 disabled
69+
run: bin/x86-64/UnitTests -a --disableAVX512
70+
71+
- name: Run x86-64 Unit Tests with AVX2 disabled
72+
run: bin/x86-64/UnitTests -a --disableAVX512 --disableAVX2
73+
74+
- name: Run x86-64 Unit Tests with SSSE3 disabled
75+
run: bin/x86-64/UnitTests -a --disableAVX512 --disableAVX2 --disableSSSE3

CI/Dockerfile.ci

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -72,7 +72,7 @@ COPY ./CI/ci_tests.sh ./
7272
# the execution of the script
7373
ADD "https://www.random.org/cgi-bin/randbyte?nbytes=10&format=h" skipcache
7474

75-
RUN sh ci_tests.sh; echo $? > artifacts/status; tar -czvf artifacts.tar.gz artifacts/
75+
RUN bash ci_tests.sh; echo $? > artifacts/status; tar -czvf artifacts.tar.gz artifacts/
7676

7777
# Prepare our binary
7878
ARG FAILURE_ARTIFACTS

CI/ci_tests.sh

Lines changed: 69 additions & 66 deletions
Large diffs are not rendered by default.

Makefile.build

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -169,6 +169,9 @@ http://creativecommons.org/publicdomain/zero/1.0/
169169
<msvc>/arch:AVX512</msvc>
170170
</fragment>
171171

172+
<!-- Dispatcher over implementations selected for x86_64 platforms -->
173+
<fragment name="x86-64" inherits="K1600-x86-64 Xoodoo-x86-64"/>
174+
172175
<!-- Implementations selected for ARMv6 -->
173176
<fragment name="ARMv6" inherits="K1600-ARMv6M-u2 Xoodoo-ARMv6"/>
174177
<!-- Implementations selected for ARMv6M -->
@@ -187,7 +190,7 @@ http://creativecommons.org/publicdomain/zero/1.0/
187190
<!-- Target names are of the form x/y where x is taken from the first set and y from the second set. -->
188191
<group all="XKCP">
189192
<product delimiter="/">
190-
<factor set="reference reference32bits compact generic32 generic32lc generic64 generic64lc SSSE3 AVX XOP AVX2 AVX2noAsm AVX512 AVX512noAsm ARMv6 ARMv6M ARMv7M ARMv7A ARMv8A AVR8"/>
193+
<factor set="reference reference32bits compact generic32 generic32lc generic64 generic64lc SSSE3 AVX XOP AVX2 AVX2noAsm AVX512 AVX512noAsm x86-64 ARMv6 ARMv6M ARMv7M ARMv7A ARMv8A AVR8"/>
191194
<factor set="UnitTests Benchmarks KeccakSum libXKCP.a libXKCP.so libXKCP.dylib"/>
192195
</product>
193196
</group>

README.markdown

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ Then, to build **libXKCP**, the quick answer is to launch:
3333
make <target>/libXKCP.so
3434
```
3535

36-
where `<target>` is to be replaced with the actual target (e.g., `ARMv6M` or `AVX512`), and where `.so` can be replaced with `.a` for a static library or with `.dylib` for a dynamic library on macOS.
36+
where `<target>` is to be replaced with the actual target (e.g., `x86-64` or `ARMv6M`), and where `.so` can be replaced with `.a` for a static library or with `.dylib` for a dynamic library on macOS.
3737
More details, and in particular the list of targets, can be found in the section on how to build the XKCP below.
3838

3939
If your compiler supports it, you may add `EXTRA_CFLAGS="-march=native -mtune=native"` at the end of the command line so that the code is further optimized for the platform on which it is compiled.
@@ -135,13 +135,17 @@ make generic64/UnitTests
135135
or
136136

137137
```
138-
make AVX512/Benchmarks
138+
make x86-64/Benchmarks
139139
```
140140

141-
to build UnitTests using plain 64-bit code or to build the Benchmarks tool with AVX-512 code. The name before the slash indicates the target, i.e., the platform or instruction set used, while the part after the slash is the executable or library to build. As another example, the static (resp. dynamic) library is built by typing `make ARMv7M/libXKCP.a` (resp. `.so`) or similarly with `ARMv7M` replaced with the appropriate platform or instruction set name. An alternate C compiler can be specified via the `CC` environment variable.
141+
to build UnitTests using plain 64-bit code or to build the Benchmarks tool with x86-64 code.
142+
The name before the slash indicates the target, i.e., the platform or instruction set used, while the part after the slash is the executable or library to build.
143+
As another example, the static (resp. dynamic) library is built by typing `make ARMv7M/libXKCP.a` (resp. `.so`) or similarly with `ARMv7M` replaced with the appropriate platform or instruction set name.
144+
An alternate C compiler can be specified via the `CC` environment variable.
142145

143146
At the time of this writing, the possible target names before the slash are:
144147

148+
* `x86-64`: automatic runtime selection among 64-bit plain C and SSSE3, AVX2 and AVX-512 instruction sets (**recommended for x86-64 platforms**);
145149
* `compact`: plain C compact implementations;
146150
* `generic32`: plain C implementation, generically optimized for 32-bit platforms;
147151
* `generic32lc`: same as `generic32` but featuring the lane complementing technique for platforms without a "and not" instruction;
@@ -164,7 +168,7 @@ If your compiler supports it, you may add `EXTRA_CFLAGS="-march=native -mtune=na
164168
Instead of building an executable with *GCC*, one can choose to select the files needed and make a package. For this, simply append `.pack` to the target name, e.g.,
165169

166170
```
167-
make generic64/UnitTests.pack
171+
make x86-64/UnitTests.pack
168172
```
169173

170174
This creates a `.tar.gz` archive with all the necessary files to build the given target.
@@ -270,6 +274,7 @@ We wish to thank all the contributors, and in particular:
270274
- Kent Ross for various improvements in [XKCP/K12](https://github.com/XKCP/K12) imported here
271275
- Larry Bassham, NIST for the original `genKAT.c` developed during the SHA-3 contest
272276
- Ryad Benadjila for adding continuous integration on different platforms with qemu
277+
- Samuel Neves and Jack O'Connor for their processor capability detection code
273278
- Stéphane Léon for helping support macOS
274279
- And to all those who fixed bugs or brought improvements (in no specific order): Tyler Young, Robert J Spencer, amane-c, Øystein Heskestad, Norman (Hongyu) Xu, Jorrit Jongma, David Adrian, Sebastian Ramacher, lvd2, Sam Chen, Thom Wiggers, Thomas van der Burgt, Donald Tsang, MoorayJenkins, UnePierre, Diggory Hardy, Joost Rijneveld, Steve Thomas, Benoît Viguier, Ko Stoffelen, Bogdan Vaneev, Alf Watt, surrim, Robert Crossfield, David Leon Gil, Matt Kelly, Ross Biro
275280

lib/LowLevel.build

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@ The fragments below allow to select the desired implementation of the permutatio
3737
* K1600-AVX512-C: an optimized implementation taking advantage of the AVX-512 instruction set (in C) [obsolete: K1600-AVX512 is faster]
3838
* K1600-XOP-u6: an optimized implementation taking advantage of the XOP instruction set, with 6 rounds unrolled
3939
* K1600-XOP-ua: same as K1600-XOP-u6, but with all rounds unrolled
40+
* K1600-x86-64: an implementation that chooses among { plain-64bits-ua, SSE3, AVX2, AVX512 }, { x2-SSSE3, x2-AVX512 }, { x4-AVX2, x4-AVX512 }, { x8-AVX512 } upon runtime
4041
* K1600-ARMv6M-u1: an assembly-optimized implementation for ARMv6M (no round unrolling)
4142
* K1600-ARMv6M-u2: same as K1600-ARMv6M-u1 but with 2 rounds unrolled
4243
* K1600-ARMv7A-NEON: an assembly-optimized implementation for ARMv7A
@@ -80,6 +81,7 @@ The fragments below allow to select the desired implementation of the permutatio
8081
* Xoodoo-AVR8: an assembly-optimized implementation for AVR8
8182
* Xoodoo-SSSE3: an optimized implementation taking advantage of the SSSE3 instruction set
8283
* Xoodoo-AVX512: an optimized implementation taking advantage of the AVX-512 instruction set
84+
* Xoodoo-x86-64: an implementation that chooses among { plain-ua, SSSE3, AVX512 }, { x4-SSSE3, x4-AVX512 }, { x8-AVX512 }, { x16-AVX512 } upon runtime
8385
8486
# For Xoodoo×4:
8587
@@ -242,6 +244,16 @@ The fragments below allow to select the desired implementation of the permutatio
242244
<config>KeccakP1600_XOP_fullUnrolling</config>
243245
</fragment>
244246

247+
<fragment name="K1600-x86-64" inherits="K1600-plain-64bits-pure K1600-AVX2-pure K1600-AVX512-pure K1600x2-SSSE3-pure K1600x2-AVX512-pure K1600x4-AVX2-pure K1600x4-AVX512-pure K1600x8-AVX512-pure">
248+
<config>XKCP_has_x86_64_CPU_detection</config>
249+
<c>lib/low/x86-64-dispatch/x86-64-dispatch.c</c>
250+
<h>lib/low/x86-64-dispatch/x86-64-dispatch.h</h>
251+
<h>lib/low/x86-64-dispatch/KeccakP-1600-SnP.h</h>
252+
<h>lib/low/x86-64-dispatch/KeccakP-1600-times2-SnP.h</h>
253+
<h>lib/low/x86-64-dispatch/KeccakP-1600-times4-SnP.h</h>
254+
<h>lib/low/x86-64-dispatch/KeccakP-1600-times8-SnP.h</h>
255+
</fragment>
256+
245257
<fragment name="K1600-ARMv6M" inherits="K1600 optimized">
246258
<h>lib/low/KeccakP-1600/ARM/KeccakP-1600-SnP.h</h>
247259
</fragment>
@@ -514,6 +526,16 @@ The fragments below allow to select the desired implementation of the permutatio
514526
<h>lib/low/Xoodoo/AVX512/SnP/Xoodoo-SnP.h</h>
515527
</fragment>
516528

529+
<fragment name="Xoodoo-x86-64" inherits="Xoodoo-plain-pure Xoodoo-SSSE3-pure Xoodoo-AVX512-pure Xoodoox4-SSSE3-pure Xoodoox4-AVX512-pure Xoodoox8-AVX2-pure Xoodoox8-AVX512-pure Xoodoox16-AVX512-pure">
530+
<config>XKCP_has_x86_64_CPU_detection</config>
531+
<c>lib/low/x86-64-dispatch/x86-64-dispatch.c</c>
532+
<h>lib/low/x86-64-dispatch/x86-64-dispatch.h</h>
533+
<h>lib/low/x86-64-dispatch/Xoodoo-SnP.h</h>
534+
<h>lib/low/x86-64-dispatch/Xoodoo-times4-SnP.h</h>
535+
<h>lib/low/x86-64-dispatch/Xoodoo-times8-SnP.h</h>
536+
<h>lib/low/x86-64-dispatch/Xoodoo-times16-SnP.h</h>
537+
</fragment>
538+
517539
<!-- Xoodoo×4 -->
518540

519541
<fragment name="Xoodoox4">
Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
/*
2+
The eXtended Keccak Code Package (XKCP)
3+
https://github.com/XKCP/XKCP
4+
5+
The Keccak-p permutations, designed by Guido Bertoni, Joan Daemen, Michaël Peeters and Gilles Van Assche.
6+
7+
Implementation by Gilles Van Assche and Ronny Van Keer, hereby denoted as "the implementer".
8+
9+
For more information, feedback or questions, please refer to the Keccak Team website:
10+
https://keccak.team/
11+
12+
To the extent possible under law, the implementer has waived all copyright
13+
and related or neighboring rights to the source code in this file.
14+
http://creativecommons.org/publicdomain/zero/1.0/
15+
16+
---
17+
18+
Please refer to SnP-documentation.h for more details.
19+
*/
20+
21+
#ifndef _KeccakP_1600_SnP_h_
22+
#define _KeccakP_1600_SnP_h_
23+
24+
#include "KeccakP-1600-plain64.h"
25+
#include "KeccakP-1600-AVX2.h"
26+
#include "KeccakP-1600-AVX512.h"
27+
28+
typedef union {
29+
KeccakP1600_plain64_state plain64_state;
30+
KeccakP1600_AVX2_state AVX2_state;
31+
KeccakP1600_plain64_state AVX512_state;
32+
} KeccakP1600_state;
33+
34+
const char * KeccakP1600_GetImplementation();
35+
int KeccakP1600_GetFeatures();
36+
37+
void KeccakP1600_StaticInitialize();
38+
void KeccakP1600_Initialize(KeccakP1600_state *state);
39+
void KeccakP1600_AddByte(KeccakP1600_state *state, unsigned char data, unsigned int offset);
40+
void KeccakP1600_AddBytes(KeccakP1600_state *state, const unsigned char *data, unsigned int offset, unsigned int length);
41+
void KeccakP1600_OverwriteBytes(KeccakP1600_state *state, const unsigned char *data, unsigned int offset, unsigned int length);
42+
void KeccakP1600_OverwriteWithZeroes(KeccakP1600_state *state, unsigned int byteCount);
43+
void KeccakP1600_Permute_Nrounds(KeccakP1600_state *state, unsigned int nrounds);
44+
void KeccakP1600_Permute_12rounds(KeccakP1600_state *state);
45+
void KeccakP1600_Permute_24rounds(KeccakP1600_state *state);
46+
void KeccakP1600_ExtractBytes(const KeccakP1600_state *state, unsigned char *data, unsigned int offset, unsigned int length);
47+
void KeccakP1600_ExtractAndAddBytes(const KeccakP1600_state *state, const unsigned char *input, unsigned char *output, unsigned int offset, unsigned int length);
48+
size_t KeccakF1600_FastLoop_Absorb(KeccakP1600_state *state, unsigned int laneCount, const unsigned char *data, size_t dataByteLen);
49+
size_t KeccakP1600_12rounds_FastLoop_Absorb(KeccakP1600_state *state, unsigned int laneCount, const unsigned char *data, size_t dataByteLen);
50+
size_t KeccakP1600_ODDuplexingFastInOut(KeccakP1600_state *state, unsigned int laneCount, const unsigned char *idata, size_t len, unsigned char *odata, const unsigned char *odataAdd, uint64_t trailencAsLane);
51+
size_t KeccakP1600_12rounds_ODDuplexingFastInOut(KeccakP1600_state *state, unsigned int laneCount, const unsigned char *idata, size_t len, unsigned char *odata, const unsigned char *odataAdd, uint64_t trailencAsLane);
52+
size_t KeccakP1600_ODDuplexingFastOut(KeccakP1600_state *state, unsigned int laneCount, unsigned char *odata, size_t len, const unsigned char *odataAdd, uint64_t trailencAsLane);
53+
size_t KeccakP1600_12rounds_ODDuplexingFastOut(KeccakP1600_state *state, unsigned int laneCount, unsigned char *odata, size_t len, const unsigned char *odataAdd, uint64_t trailencAsLane);
54+
size_t KeccakP1600_ODDuplexingFastIn(KeccakP1600_state *state, unsigned int laneCount, const uint8_t *idata, size_t len, uint64_t trailencAsLane);
55+
size_t KeccakP1600_12rounds_ODDuplexingFastIn(KeccakP1600_state *state, unsigned int laneCount, const uint8_t *idata, size_t len, uint64_t trailencAsLane);
56+
57+
#endif
Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
/*
2+
The eXtended Keccak Code Package (XKCP)
3+
https://github.com/XKCP/XKCP
4+
5+
The Keccak-p permutations, designed by Guido Bertoni, Joan Daemen, Michaël Peeters and Gilles Van Assche.
6+
7+
Implementation by Gilles Van Assche and Ronny Van Keer, hereby denoted as "the implementer".
8+
9+
For more information, feedback or questions, please refer to the Keccak Team website:
10+
https://keccak.team/
11+
12+
To the extent possible under law, the implementer has waived all copyright
13+
and related or neighboring rights to the source code in this file.
14+
http://creativecommons.org/publicdomain/zero/1.0/
15+
16+
---
17+
18+
Please refer to SnP-documentation.h for more details.
19+
*/
20+
21+
#ifndef _KeccakP_1600_times2_SnP_h_
22+
#define _KeccakP_1600_times2_SnP_h_
23+
24+
#include "KeccakP-1600-times2-SIMD128.h"
25+
#include "KeccakP-1600-times2-AVX512.h"
26+
#include "PlSnP-common.h"
27+
28+
typedef union {
29+
KeccakP1600times2_SIMD128_states SSSE3_states;
30+
KeccakP1600times2_align512SIMD128_states AVX512_states;
31+
} KeccakP1600times2_states;
32+
33+
const char * KeccakP1600times2_GetImplementation();
34+
int KeccakP1600times2_GetFeatures();
35+
36+
void KeccakP1600times2_StaticInitialize();
37+
void KeccakP1600times2_InitializeAll(KeccakP1600times2_states *states);
38+
void KeccakP1600times2_AddByte(KeccakP1600times2_states *states, unsigned int instanceIndex, unsigned char data, unsigned int offset);
39+
void KeccakP1600times2_AddBytes(KeccakP1600times2_states *states, unsigned int instanceIndex, const unsigned char *data, unsigned int offset, unsigned int length);
40+
void KeccakP1600times2_AddLanesAll(KeccakP1600times2_states *states, const unsigned char *data, unsigned int laneCount, unsigned int laneOffset);
41+
void KeccakP1600times2_OverwriteBytes(KeccakP1600times2_states *states, unsigned int instanceIndex, const unsigned char *data, unsigned int offset, unsigned int length);
42+
void KeccakP1600times2_OverwriteLanesAll(KeccakP1600times2_states *states, const unsigned char *data, unsigned int laneCount, unsigned int laneOffset);
43+
void KeccakP1600times2_OverwriteWithZeroes(KeccakP1600times2_states *states, unsigned int instanceIndex, unsigned int byteCount);
44+
void KeccakP1600times2_PermuteAll_4rounds(KeccakP1600times2_states *states);
45+
void KeccakP1600times2_PermuteAll_6rounds(KeccakP1600times2_states *states);
46+
void KeccakP1600times2_PermuteAll_12rounds(KeccakP1600times2_states *states);
47+
void KeccakP1600times2_PermuteAll_24rounds(KeccakP1600times2_states *states);
48+
void KeccakP1600times2_ExtractBytes(const KeccakP1600times2_states *states, unsigned int instanceIndex, unsigned char *data, unsigned int offset, unsigned int length);
49+
void KeccakP1600times2_ExtractLanesAll(const KeccakP1600times2_states *states, unsigned char *data, unsigned int laneCount, unsigned int laneOffset);
50+
void KeccakP1600times2_ExtractAndAddBytes(const KeccakP1600times2_states *states, unsigned int instanceIndex, const unsigned char *input, unsigned char *output, unsigned int offset, unsigned int length);
51+
void KeccakP1600times2_ExtractAndAddLanesAll(const KeccakP1600times2_states *states, const unsigned char *input, unsigned char *output, unsigned int laneCount, unsigned int laneOffset);
52+
53+
size_t KeccakF1600times2_FastLoop_Absorb(KeccakP1600times2_states *states, unsigned int laneCount, unsigned int laneOffsetParallel, unsigned int laneOffsetSerial, const unsigned char *data, size_t dataByteLen);
54+
size_t KeccakP1600times2_12rounds_FastLoop_Absorb(KeccakP1600times2_states *states, unsigned int laneCount, unsigned int laneOffsetParallel, unsigned int laneOffsetSerial, const unsigned char *data, size_t dataByteLen);
55+
56+
size_t KeccakP1600times2_KravatteCompress(uint64_t *xAccu, uint64_t *kRoll, const unsigned char *input, size_t inputByteLen);
57+
size_t KeccakP1600times2_KravatteExpand(uint64_t *yAccu, const uint64_t *kRoll, unsigned char *output, size_t outputByteLen);
58+
59+
void KeccakP1600times2_KT128ProcessLeaves(const unsigned char *input, unsigned char *output);
60+
void KeccakP1600times2_KT256ProcessLeaves(const unsigned char *input, unsigned char *output);
61+
62+
#endif

0 commit comments

Comments
 (0)