-rw-r--r-- 7832 djbsort-20260127/doc/refs.md raw
There is a vast literature on sorting, so this page cannot hope to be comprehensive. Reasons to include something here: * This is, or could be, something to credit for ideas used in djbsort. * The results are, or have been claimed to be, competitive with djbsort in [speed](speed.html). Speed results are not excluded on the basis of [security](security.html) or [verification](verif.html). Results are also not excluded on the basis of array-size limitations (only multiples of 64, only powers of 2, etc.) or array-alignment requirements. ## Available software <a name="2001-herf"></a> 2001 Herf ["radix"](http://stereopsis.com/radix.html) <a name="2016-santurkar"></a> 2016 Santurkar ["avx2-merge-sort"](https://github.com/sid1607/avx2-merge-sort) <a name="2017-gueron"></a> 2017 Gueron–Krasnov ["avx_qsort"](https://github.com/vkrasnov/avx_qsort) <a name="2017-bernstein"></a> 2017 Bernstein–Chuengsatiansup–Lange–van Vredendaal ["ntruprime-20170815.tar.gz"](https://ntruprime.cr.yp.to/software.html): constant time; direct ancestor of djbsort; see also [paper](https://ntruprime.cr.yp.to/papers.html) "NTRU Prime: reducing attack surface at low cost" 2017 Bramas ["avx-512-sort"](https://gitlab.inria.fr/bramas/avx-512-sort): only for AVX-512; see also [paper](https://thesai.org/Downloads/Volume8No10/Paper_44-A_Novel_Hybrid_Quicksort_Algorithm_Vectorized.pdf) "A novel hybrid quicksort algorithm vectorized using AVX-512 on Intel Skylake" <a name="2018-hou"></a> 2018 Hou–Wang–Feng ["aspas"](https://github.com/vtsynergy/aspas_sort): see also paper "A framework for the automatic vectorization of parallel sort on x86-based processors" <a name="2020-blacher"></a> 2020 Blacher–Giesen–Kühne ["fast-and-robust"](https://github.com/simd-sorting/fast-and-robust): see also [paper](https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SEA.2021.3) "Fast and robust vectorized in-place sorting of primitive types" <a name="2020-shechter"></a> 2020 Shechter ["vxsort"](https://github.com/damageboy/vxsort-cpp): see also [blog post](https://bits.houmus.org/2020-01-28/this-goes-to-eleven-pt1) 2020 Bingmann–Marianczuk–Sanders ["SmallSorters"](https://github.com/JMarianczuk/SmallSorters): only for small sizes; see also [paper](https://arxiv.org/abs/2002.05599) "Engineering faster sorters for small sets of items" <a name="2022-blacher"></a> 2022 Blacher–Giesen–Sanders–Wassenberg ["vqsort"/"highway"](https://github.com/google/highway): see also [paper](https://arxiv.org/abs/2205.05982) "Vectorized and performance-portable Quicksort" <a name="2022-intel"></a> 2022 Intel ["x86-simd-sort"](https://github.com/intel/x86-simd-sort.git) (development taken over by numpy in 2025) ## Harder-to-verify performance claims [1987 Rönsch–Strauss](https://www.sciencedirect.com/science/article/pii/0167819187900627) "Timing results of some internal sorting algorithms on vector computers" [2007 Furtak–Amaral–Niewiadomski](https://webdocs.cs.ualberta.ca/~amaral/papers/furtak-spaa07.pdf) "Using SIMD registers and instructions to enable instruction-level parallelism in sorting algorithms" [2008 Chhugani–Macy–Baransi–Nguyen–Hagog–Kumar–Lee–Chen–Dubey](https://web.archive.org/web/20120508095814/http://www.vldb.org/pvldb/1/1454171.pdf) "Efficient implementation of sorting on multi-core SIMD CPU architecture" [2013 Xiaochen–Rocki–Suda](https://web.archive.org/web/20170809023503/http://olab.is.s.u-tokyo.ac.jp/~kamil.rocki/xiaochen_rocki_IA3_SC13.pdf) "Register level sort algorithm on multi-core SIMD processors" [2015 Hayes–Palomar–Unsal–Cristal–Valero](https://www.semanticscholar.org/paper/VSR-sort%3A-A-novel-vectorised-sorting-algorithm-%26-Hayes-Palomar/dbbd5cccbbae8bb7ebed4854371901e763d2c08c) "VSR sort: A novel vectorised sorting algorithm &amp; architecture extensions for future microprocessors" [2015 Hou–Wang–Feng](https://synergy.cs.vt.edu/pubs/papers/hou-aspas-ics15.pdf) "ASPaS: a framework for automatic SIMDization of parallel sorting on x86-based many-core processors": reports, e.g., about 220 million cycles (55 cycles/byte) on 1.05GHz Xeon Phi 5110P (Knights Corner) for n=1000000 [2016 Xu–Xu–Yin–Zheng](https://web.archive.org/web/20210418053109/https://journal.uad.ac.id/index.php/TELKOMNIKA/article/view/2741) "Optimized merge sort on modern commodity multi-core CPUs", Telkomnika **14** (2016), 209–318: reports, e.g., about 110 million cycles (27 cycles/byte) on 2.4GHz Core i7 4700MQ (Haswell) for n=1048576 [2016 Codish–Cruz-Filipe–Nebel–Schneider-Kamp](https://web.archive.org/web/20240815205312/https://www.imada.sdu.dk/u/lcf/pubs/paper36.pdf) "Optimizing sorting algorithms by using sorting networks", Formal Aspects of Computing **29** (2017), 559–579: reports, e.g., about 150 cycles (2.3 cycles/byte) on unspecified "Core i7" for n=16; about 120000 cycles (3 cycles/byte) for n=10000 [2016 Gueron–Krasnov](https://web.archive.org/web/20260123055800/https://valeriyvan.com/assets/docs/Fast%20Quicksort%20Implementation%20Using%20AVX%20Instructions.pdf) "Fast quicksort implementation using AVX instructions", The Computer Journal **59** (2016), 83–90: reports, e.g., about 160000 cycles (40 cycles/byte) for n=1000 `int32` on Haswell (Figure 4), and about 130000 cycles (32 cycles/byte) for n=1000 `int32` on Haswell using radix sort in Intel's Integrated Performance Primitives library, "the fastest in-memory sort we are aware of" [2019 Yin–Zhang–Müller–Liu–Wei–Schmidt–Liu](https://ieeexplore.ieee.org/abstract/document/8855628) "Efficient parallel sort on AVX-512-based multi-core and many-core Architectures" ## Sorting networks [1968 Batcher](https://www.cs.kent.edu/~batcher/sort.pdf) "Sorting networks and their applications" [1983 Kumar–Hirschberg](https://www.ics.uci.edu/~dan/pubs/BatcherMerge.pdf) "An efficient implementation of Batcher's odd-even merge algorithm and its application in parallel sorting schemes" [1986 Marberg–Gafni](https://web.archive.org/web/20250221203539/https://www.cal.cs.ucla.edu/tech-report/198_-reports/860043.pdf) "Sorting in constant number of row and column phases on a mesh" <a name="1987-chung"></a> [1987 Chung–Ravikumar](https://www.sciencedirect.com/science/article/pii/0012365X9090173F) "Bounds on the size of test sets for sorting and related networks", Discrete Mathematics (1990) [1994 Lee–Batcher](https://link.springer.com/chapter/10.1007/3-540-58325-4_233) "A multiway merging network" <a name="1998-lang"></a> [1998 Lang](https://hwlang.de/algorithmen/sortieren/bitonic/oddn.htm) "Bitonic sorting network for n not a power of 2" [2006 Chaudhry–Cormen](https://www.cs.dartmouth.edu/~thc/papers/slabpose.pdf) "Slabpose columnsort: a new oblivious algorithm for out-of-core sorting on distributed-memory clusters" <a name="2007-even"></a> [2007 Even–Levi–Litman](https://web.archive.org/web/20250103112116/https://www.eng.tau.ac.il/~guy/Papers/conclusive.pdf) "Optimal conclusive sets for comparator networks", Theoretical Computer Science (2009) [2015 Ebbesen](https://www.cs.au.dk/~gerth/advising/thesis/kris-vestergaard-ebbesen.pdf) "On the practicality of data-oblivious sorting" ## Symbolic execution <a name="2007-nethercote"></a> [2007 Nethercote–Seward](https://www.valgrind.org/docs/valgrind2007.pdf) "Valgrind: a framework for heavyweight dynamic binary instrumentation", ACM SIGPLAN Conference on Programming Language Design and Implementation <a name="2016-shoshitaishvili"></a> [2016 Shoshitaishvili–Wang–Salls–Stephens–Polino–Dutcher–Grosen–Feng–Hauser–Kruegel–Vigna](https://www.cs.ucsb.edu/~vigna/publications/2016_SP_angrSoK.pdf) "SoK: (State of) The Art of War: Offensive Techniques in Binary Analysis", IEEE Symposium on Security and Privacy