This graph compares sorting performance for various int32 array sizes on one Zen 2 core (in an AMD EPYC 7742, boost disabled, gcc 10.2.1, clang 11.0.1-2) using the following libraries (as of 20240116):
- djbsort (first introduced 2018, before Zen 2 existed)
- far (first introduced 2020)
- vxsort (first introduced 2020)
- Google's vqsort (first introduced 2022)
- Intel's x86-simd-sort (first introduced 2023)
There are also many slower sorting libraries.
The graph includes one of those, std::sort
, for comparison.
The most important lesson from the graph
is that there's still room for speed improvement in all of these libraries
(including djbsort, which wins for some sizes but not others).
This page explains how to rebuild the above graph. It would be interesting to systematically track how the graph changes as CPUs and compilers vary. This page also gives some performance numbers for other libraries.
This page focuses purely on int32 sorting speed using AVX2. Some libraries obtain an extra speed boost from AVX-512. Aside from the speed battle, a reason to use a library other than djbsort is support for more data types. Reasons to use djbsort include security and verification.
The sortbench script
The script generating the above graph is called sortbench. Here's how to run sortbench.
These instructions are for Debian/Ubuntu systems running on CPUs with AVX2. Other modern Linux/BSD/UNIX systems should work with minor adjustments to the instructions. These instructions need the following packages:
- gcc and other compiler tools:
apt install build-essential
- cmake:
apt install cmake
- meson:
apt install meson
- Python 3:
apt install python3
- matplotlib:
apt install python3-matplotlib
In a root terminal, create a sortbench
user:
adduser --disabled-password --gecos sortbench sortbench
Run a shell as that user:
su - sortbench
As that user, download, unpack, and run the latest version of sortbench:
wget -m https://sorting.cr.yp.to/sortbench-latest-version.txt
version=$(cat sorting.cr.yp.to/sortbench-latest-version.txt)
wget -m https://sorting.cr.yp.to/sortbench-$version.tar.gz
tar -xzf sorting.cr.yp.to/sortbench-$version.tar.gz
cd sortbench-$version
./do
This should finish in well under an hour.
The exact time depends on your CPU.
The resulting graph will be in sortbench-$version/plot.pdf
.
sortbench archives
sortbench-20240116.tar.gz
browse
Other libraries
Software predating djbsort
(see the separate list of references)
takes much more time than djbsort does.
The following software
is now incorporated into the crypto_sort/int32
directory in the
SUPERCOP
benchmarking framework:
cycles on kizomba |
cycles on samba |
cycles on titan0 |
constant-time | software |
---|---|---|---|---|
4929 | 5250 | 6596 | yes | avx2 from djbsort |
9664 | fail | fail | no | sid1607 from 2016 Santurkar |
13756 | 13466 | 15136 | no | krasnov from 2017 Gueron–Krasnov |
15682 | 15875 | 18548 | yes | oldavx2 from 2017 Bernstein–Chuengsatiansup–Lange–van Vredendaal |
19517 | 19832 | 21844 | no | herf based on float32 code from 2001 Herf |
26998 | 28664 | 25608 | no | stdsort using C++ standard library |
83041 | 86442 | 95672 | yes | portable4 from djbsort |
90155 | 97191 | 100840 | yes | portable3 from djbsort |
fail | fail | fail | no | aspas from 2018 Hou–Wang–Feng |
These cycle counts are for int32[768] arrays with random contents;
variable-time software could be
considerably slower for some array contents than for others.
The sid1607
failures are because sid1607
needs C++11,
which is not supported with the default compiler options
for the compilers shipped with Ubuntu before 18.04.
The aspas
failures have not been diagnosed,
but other measurements suggest that aspas
is slower than stdsort
.
Version: This is version 2024.01.17 of the "Comparison" web page.