djbsort: Intro

djbsort is a software library for sorting arrays of int32 or uint32 or float32 or int64 or uint64 or float64. It provides the following features:

Speed: djbsort is the fastest sorting library available for typical array sizes on typical 64-bit CPUs from Intel, AMD, and ARM.
Security: djbsort is designed to be safe for cryptographic and non-cryptographic contexts.
Verification: djbsort includes not just conventional tests but also tools to automatically verify correctness.

These features are not separate options: there is unified code providing all of these features simultaneously, rather than separate sort_fast and sort_secure and sort_verified functions.

Latest release: 20260210.

Limitations

Interface limitations:

This library is only for sorting numeric types. Numeric types are important targets for sorting (for example, Google's vqsort paper reported that half of Google's std::sort calls were without specialized comparators, and that integer-array sorting in particular was taking 2/3 of the total time for those calls) but not the only targets for sorting. To sort other data types, you need to switch to another library.
This library is only for sorting memory-mapped data (for example, arrays that fit into RAM). This is not an issue for typical 64-bit machines, but means that a 32-bit machine cannot use this library to sort (e.g.) 10GB of numbers on disk. The underlying techniques can be adapted to arrays stored on disk, supporting a regular data-access pattern along with standard techniques to further reduce disk accesses, but the API does not directly support disks.

Speed limitations:

The library's speed records are for various CPUs that support the vector-instruction sets targeted in the library (currently AVX2 and NEON, plus SSE4.2 for some older CPUs if this is enabled at compilation time). On CPUs that don't support these instruction sets (or that support AVX-512), the library works but is less competitive in speed.
For very large arrays, this library provides suboptimal latency, in part because arrays are sorted using one core on one machine and in part because the library uses sorting networks that scale as Θ(n log²n). However, the underlying techniques can easily be parallelized across cores and across machines. Furthermore, applications that need top speed for very large arrays, and that do not need the security advantages and verification advantages of sorting networks, can use sorting networks up to a cutoff and then a partitioning algorithm such as quicksort, mergesort, or MSD radixsort past the cutoff. (This is already how various other libraries work, but with small cutoffs since their sorting networks are not optimized as well as djbsort.)

Verification limitations:

The verification has been applied to binaries produced by various versions of gcc and clang on various machines but is likely to need tweaks to handle new platforms and new compiler versions. It is also possible that the C code has portability problems that damage correctness: for example, perhaps the C code triggers compiler bugs on some platforms where the verification has not been applied.
The verification is only for the int32 and int64 sorting code. The uint and float sorting code consists of much smaller wrappers around the int sorting code, but those wrappers could still have bugs that slip past tests; the verification does not inspect those wrappers.
The verification does not check memory safety. The tests in djbsort include tests under valgrind, but it is possible for memory-safety bugs to slip past valgrind.
The verification runs separately for each array size, and becomes slower as the array size increases. The verification has been run for many specific array sizes, including some important array sizes for post-quantum cryptography, but this does not guarantee correctness for other sizes.
The verification relies on large binary-analysis tools and on djbsort's sortverif toolkit. Bugs in sorting code could be hidden by bugs in these tools.

Credits

The djbsort author is Daniel J. Bernstein.

djbsort builds upon results from the following paper: Daniel J. Bernstein, Chitchanok Chuengsatiansup, Tanja Lange, Christine van Vredendaal, "NTRU Prime: reducing attack surface at low cost", Selected Areas in Cryptography 2017. The NTRU Prime paper explained how to make constant-time sorting software run faster than Intel's Integrated Performance Primitives sorting software on Intel CPUs, and demonstrated this with a software release in 2017. djbsort includes verification and provides another 4x speedup.

See the list of references for related work.

Version: This is version 2026.02.10 of the "Intro" web page.