This page compares djbsort to some other numeric-array sorting libraries. The libraries selected here for comparison are the same as djbsort's main competitors on the speed page: "the vxsort library used in Microsoft's .NET starting in 2020; the vqsort/highway library introduced by Google; the far library, which is the (sometimes faster) parent of vqsort; and the x86simdsort library introduced by Intel".
One can do better on some of these comparison axes when speed is not an issue: for example, one can simultaneously (1) use less code and (2) sort a wider range of data types.
Interface
Data types:
- djbsort: int32, uint32, float32, int64, uint64, float64
- far: int32
- vqsort: int16, uint16, float16, int32, uint32, float32, int64, uint64, float64, uint128
- vxsort: int16, uint16, int32, uint32, float32, int64, uint64, float64
- x86simdsort: int16, uint16, float16, int32, uint32, float32, int64, uint64, float64
Array lengths allowed by API on 64-bit CPUs:
- djbsort: up to 263 (
void djbsort_int32(int32_t *,long long)) - far: up to 231 (
void quicksort(int *arr, int n)) - vqsort: up to 264 (
void VQSort(int32_t* HWY_RESTRICT keys, size_t n, SortAscending)) - vxsort: up to 264 (
void sort(T* left, T* right, T left_hint = std::numeric_limits<T>::min(), T right_hint = std::numeric_limits<T>::max())) - x86simdsort: up to 264 (
void qsort(T *arr, size_t arrsize, bool hasnan = false, bool descending = false))
API is CPU-independent, with implementation automatically selected based on run-time CPU:
- djbsort: yes (chooses between AVX2 and portable fallback)
- far: no (provides AVX2 code; caller handles fallback)
- vqsort: yes (chooses between AVX-512, AVX2, ARM NEON, more options, and portable fallback)
- vxsort: no (provides AVX-512 code and AVX2 code; caller handles selection and fallback)
- x86simdsort: yes (chooses between AVX-512, AVX2, portable fallback)
Correctness and security
Memory safety and protection against other bugs:
- djbsort: conventional tests; tests under valgrind; verification
- far: conventional tests
- vqsort: conventional tests
- vxsort: conventional tests
- x86simdsort: conventional tests
Protection against timing leaks:
- djbsort: yes (constant cycle counts when array size is constant)
- far: no
- vqsort: no
- vxsort: no
- x86simdsort: no
Protection against denial of service:
- djbsort: clear (changing array contents does not increase cycle counts)
- far: unclear (chooses pivot from 72 pseudorandom elements but deterministically; recognizes constant subarrays; falls back to radix exchange)
- vqsort: unclear (chooses pivot from 576 bytes using per-thread call to OS RNG; falls back to heapsort)
- vxsort: unclear (chooses deterministic median-of-three pivot; falls back to heapsort)
- x86simdsort: unclear (similar to the far library; falls back to
std::sort)
More research is required to quantify the cost of the far, vqsort, vxsort, and x86simdsort libraries when array contents are chosen adversarially. Note that a library that allows a larger relative slowdown may still provide better protection against denial of service if it is faster to begin with.
Cycle counts for random data
CPU cycles/byte to sort int32[16] on AMD Zen 3
(the three numbers are stabilized quartiles of many measurements excluding 31 cycles of cycle-counting overhead):
- djbsort: 0.33, 0.34, 0.35
- far: 1.14, 1.14, 1.14
- vqsort: 0.92, 0.92, 0.92
- vxsort: 1.85, 1.93, 2.09
- x86simdsort: 0.91, 1.00, 1.01
CPU cycles/byte to sort int32[1024] on AMD Zen 3
(the three numbers are stabilized quartiles of many measurements):
- djbsort: 1.14, 1.15, 1.16
- far: 1.85, 1.90, 1.95
- vqsort: 2.34, 2.46, 2.56
- vxsort: 2.61, 2.63, 2.64
- x86simdsort: 2.61, 2.78, 2.96
CPU cycles/byte to sort int32[65536] on AMD Zen 3
(the three numbers are stabilized quartiles of many measurements):
- djbsort: 3.27, 3.28, 3.31
- far: 3.28, 3.30, 3.31
- vqsort: 3.51, 3.55, 3.60
- vxsort: 3.67, 3.71, 3.76
- x86simdsort: 4.15, 4.18, 4.21
See the separate speed page for a more comprehensive comparison of cycle counts.
Code size
Compiled AVX2 code size for a simple int32.cc that defines void sortint32(int32_t *y,int n) by calling the library
(text+data+bss for int32.o, plus text+data+bss for the relevant *.o files in the library):
- djbsort: 56+0+0 for
int32.oplus 25050+0+0 forint32_avx2_C0-sort.oplus 1772+0+0 for portable fallback and dispatch (dispatch_auto-sort_int32.o,int32_portable4_C3-sort.o,compilers*.o) - far: 24387+0+0 for
int32.oplus 0+0+0 (far is header-only) - vqsort: 277+8+0 for
int32.oplus 266420+1224+24 forvqsort_i32a.cc.oplus some small support routines for dispatch etc. - vxsort: 23483+0+0 for
int32.oplus 4096+0+0 foravx2_masks.cpp.o - x86simdsort: 60+0+0 for
int32.oplus 3951712+8+0 forx86simdsort-avx2.cpp.o(all data types in the library are in a single.ofile) plus some small support routines
An application with many calls to a big-header library such as far or vxsort
will incur many copies of the first number,
but can avoid this by instead calling an intermediate function such as sortint32,
so the overall code size includes only one copy of that number.
Compiled AVX2 code size for a simple all32.cc that defines sortint32, sortuint32, and sortfloat32
(text+data+bss for all32.o, plus text+data+bss for the relevant *.o files in the library):
- djbsort: 128+0+0 for
all32.oplus 25050+0+0 forint32_avx2_C0-sort.oplus 445+0+0 foruint32_avx2useint32_C0-sort.oplus 493+0+0 forfloat32_avx2useint32_C0-sort.oplus 3960+0+0 for portable fallback and dispatch (dispatch_auto-sort_int32.o,dispatch_auto-sort_uint32.o,dispatch_auto-sort_float32.o,int32_portable4_C3-sort.o,uint32_useint32_C3-sort.o,float32_useint32_C3-sort.o,compilers*.o) - far: not applicable
- vqsort: 677+8+0 for
all32.oplus 266420+1224+24 forvqsort_i32a.cc.oplus 279990+1224+24 forvqsort_u32a.cc.oplus 244433+1224+24 forvqsort_f32a.cc.oplus some small support routines - vxsort: 53557+0+0 for
all32.oplus 4096+0+0 foravx2_masks.cpp.o - x86simdsort: 132+0+0 for
all32.oplus 3951712+8+0 forx86simdsort-avx2.cpp.oplus some small support routines
The uint64 and float64 functions in djbsort are similarly small wrappers around int64,
which is separate code from int32.
Version: This is version 2026.01.27 of the "Comparison" web page.