djbsort: Comparison

This page compares djbsort to some other numeric-array sorting libraries. The libraries selected here for comparison are the same as djbsort's main competitors on the speed page: "the vxsort library used in Microsoft's .NET starting in 2020; the vqsort/highway library introduced by Google; the far library, which is the (sometimes faster) parent of vqsort; and the x86simdsort library introduced by Intel".

One can do better on some of these comparison axes when speed is not an issue: for example, one can simultaneously (1) use less code and (2) sort a wider range of data types.

Interface

Data types:

Array lengths allowed by API on 64-bit CPUs:

API is CPU-independent, with implementation automatically selected based on run-time CPU:

Correctness and security

Memory safety and protection against other bugs:

Protection against timing leaks:

Protection against denial of service:

More research is required to quantify the cost of the far, vqsort, vxsort, and x86simdsort libraries when array contents are chosen adversarially. Note that a library that allows a larger relative slowdown may still provide better protection against denial of service if it is faster to begin with.

Cycle counts for random data

CPU cycles/byte to sort int32[16] on AMD Zen 3 (the three numbers are stabilized quartiles of many measurements excluding 31 cycles of cycle-counting overhead):

CPU cycles/byte to sort int32[1024] on AMD Zen 3 (the three numbers are stabilized quartiles of many measurements):

CPU cycles/byte to sort int32[65536] on AMD Zen 3 (the three numbers are stabilized quartiles of many measurements):

See the separate speed page for a more comprehensive comparison of cycle counts.

Code size

Compiled AVX2 code size for a simple int32.cc that defines void sortint32(int32_t *y,int n) by calling the library (text+data+bss for int32.o, plus text+data+bss for the relevant *.o files in the library):

An application with many calls to a big-header library such as far or vxsort will incur many copies of the first number, but can avoid this by instead calling an intermediate function such as sortint32, so the overall code size includes only one copy of that number.

Compiled AVX2 code size for a simple all32.cc that defines sortint32, sortuint32, and sortfloat32 (text+data+bss for all32.o, plus text+data+bss for the relevant *.o files in the library):

The uint64 and float64 functions in djbsort are similarly small wrappers around int64, which is separate code from int32.


Version: This is version 2026.01.27 of the "Comparison" web page.