-rw-r--r-- 7832 djbsort-20260127/doc/refs.md rawThere is a vast literature on sorting,
so this page cannot hope to be comprehensive.
Reasons to include something here:

* This is, or could be, something to credit for ideas used in djbsort.

* The results are, or have been claimed to be,
  competitive with djbsort in [speed](speed.html).
  Speed results are not excluded on the basis of
  [security](security.html) or
  [verification](verif.html).
  Results are also not excluded on the basis of array-size limitations
  (only multiples of 64, only powers of 2, etc.) or array-alignment requirements.

## Available software

<a name="2001-herf"></a>
2001 Herf
["radix"](http://stereopsis.com/radix.html)

<a name="2016-santurkar"></a>
2016 Santurkar
["avx2-merge-sort"](https://github.com/sid1607/avx2-merge-sort)

<a name="2017-gueron"></a>
2017 Gueron–Krasnov
["avx_qsort"](https://github.com/vkrasnov/avx_qsort)

<a name="2017-bernstein"></a>
2017 Bernstein–Chuengsatiansup–Lange–van Vredendaal
["ntruprime-20170815.tar.gz"](https://ntruprime.cr.yp.to/software.html):
constant time;
direct ancestor of djbsort;
see also
[paper](https://ntruprime.cr.yp.to/papers.html)
"NTRU Prime: reducing attack surface at low cost"

2017 Bramas
["avx-512-sort"](https://gitlab.inria.fr/bramas/avx-512-sort):
only for AVX-512;
see also
[paper](https://thesai.org/Downloads/Volume8No10/Paper_44-A_Novel_Hybrid_Quicksort_Algorithm_Vectorized.pdf)
"A novel hybrid quicksort algorithm vectorized using AVX-512 on Intel Skylake"

<a name="2018-hou"></a>
2018 Hou–Wang–Feng
["aspas"](https://github.com/vtsynergy/aspas_sort):
see also paper
"A framework for the automatic vectorization of parallel sort on x86-based processors"

<a name="2020-blacher"></a>
2020 Blacher–Giesen–Kühne
["fast-and-robust"](https://github.com/simd-sorting/fast-and-robust):
see also
[paper](https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SEA.2021.3)
"Fast and robust vectorized in-place sorting of primitive types"

<a name="2020-shechter"></a>
2020 Shechter
["vxsort"](https://github.com/damageboy/vxsort-cpp):
see also
[blog post](https://bits.houmus.org/2020-01-28/this-goes-to-eleven-pt1)

2020 Bingmann–Marianczuk–Sanders
["SmallSorters"](https://github.com/JMarianczuk/SmallSorters):
only for small sizes;
see also
[paper](https://arxiv.org/abs/2002.05599)
"Engineering faster sorters for small sets of items"

<a name="2022-blacher"></a>
2022 Blacher–Giesen–Sanders–Wassenberg
["vqsort"/"highway"](https://github.com/google/highway):
see also
[paper](https://arxiv.org/abs/2205.05982)
"Vectorized and performance-portable Quicksort"

<a name="2022-intel"></a>
2022 Intel
["x86-simd-sort"](https://github.com/intel/x86-simd-sort.git)
(development taken over by numpy in 2025)

## Harder-to-verify performance claims

[1987 Rönsch–Strauss](https://www.sciencedirect.com/science/article/pii/0167819187900627)
"Timing results of some internal sorting algorithms on vector computers"

[2007 Furtak–Amaral–Niewiadomski](https://webdocs.cs.ualberta.ca/~amaral/papers/furtak-spaa07.pdf)
"Using SIMD registers and instructions to enable instruction-level parallelism in sorting algorithms"

[2008 Chhugani–Macy–Baransi–Nguyen–Hagog–Kumar–Lee–Chen–Dubey](https://web.archive.org/web/20120508095814/http://www.vldb.org/pvldb/1/1454171.pdf)
"Efficient implementation of sorting on multi-core SIMD CPU architecture"

[2013 Xiaochen–Rocki–Suda](https://web.archive.org/web/20170809023503/http://olab.is.s.u-tokyo.ac.jp/~kamil.rocki/xiaochen_rocki_IA3_SC13.pdf)
"Register level sort algorithm on multi-core SIMD processors"

[2015 Hayes–Palomar–Unsal–Cristal–Valero](https://www.semanticscholar.org/paper/VSR-sort%3A-A-novel-vectorised-sorting-algorithm-%26-Hayes-Palomar/dbbd5cccbbae8bb7ebed4854371901e763d2c08c)
"VSR sort: A novel vectorised sorting algorithm &amp; architecture extensions for future microprocessors"

[2015 Hou–Wang–Feng](https://synergy.cs.vt.edu/pubs/papers/hou-aspas-ics15.pdf)
"ASPaS: a framework for automatic SIMDization of parallel sorting on x86-based many-core processors":
reports, e.g.,
about 220 million cycles (55 cycles/byte) on 1.05GHz Xeon Phi 5110P (Knights Corner) for n=1000000

[2016 Xu–Xu–Yin–Zheng](https://web.archive.org/web/20210418053109/https://journal.uad.ac.id/index.php/TELKOMNIKA/article/view/2741)
"Optimized merge sort on modern commodity multi-core CPUs",
Telkomnika **14** (2016), 209–318:
reports, e.g.,
about 110 million cycles (27 cycles/byte) on 2.4GHz Core i7 4700MQ (Haswell) for n=1048576

[2016 Codish–Cruz-Filipe–Nebel–Schneider-Kamp](https://web.archive.org/web/20240815205312/https://www.imada.sdu.dk/u/lcf/pubs/paper36.pdf)
"Optimizing sorting algorithms by using sorting networks",
Formal Aspects of Computing **29** (2017), 559–579:
reports, e.g.,
about 150 cycles (2.3 cycles/byte) on unspecified "Core i7" for n=16;
about 120000 cycles (3 cycles/byte) for n=10000

[2016 Gueron–Krasnov](https://web.archive.org/web/20260123055800/https://valeriyvan.com/assets/docs/Fast%20Quicksort%20Implementation%20Using%20AVX%20Instructions.pdf)
"Fast quicksort implementation using AVX instructions",
The Computer Journal **59** (2016), 83–90:
reports, e.g.,
about 160000 cycles (40 cycles/byte) for n=1000 `int32` on Haswell (Figure 4),
and
about 130000 cycles (32 cycles/byte) for n=1000 `int32` on Haswell
using radix sort in Intel's Integrated Performance Primitives library,
"the fastest in-memory sort we are aware of"

[2019 Yin–Zhang–Müller–Liu–Wei–Schmidt–Liu](https://ieeexplore.ieee.org/abstract/document/8855628)
"Efficient parallel sort on AVX-512-based multi-core and many-core Architectures"

## Sorting networks

[1968 Batcher](https://www.cs.kent.edu/~batcher/sort.pdf)
"Sorting networks and their applications"

[1983 Kumar–Hirschberg](https://www.ics.uci.edu/~dan/pubs/BatcherMerge.pdf)
"An efficient implementation of Batcher's odd-even merge algorithm
and its application in parallel sorting schemes"

[1986 Marberg–Gafni](https://web.archive.org/web/20250221203539/https://www.cal.cs.ucla.edu/tech-report/198_-reports/860043.pdf)
"Sorting in constant number of row and column phases on a mesh"

<a name="1987-chung"></a>
[1987 Chung–Ravikumar](https://www.sciencedirect.com/science/article/pii/0012365X9090173F)
"Bounds on the size of test sets for sorting and related networks",
Discrete Mathematics (1990)

[1994 Lee–Batcher](https://link.springer.com/chapter/10.1007/3-540-58325-4_233)
"A multiway merging network"

<a name="1998-lang"></a>
[1998 Lang](https://hwlang.de/algorithmen/sortieren/bitonic/oddn.htm)
"Bitonic sorting network for n not a power of 2"

[2006 Chaudhry–Cormen](https://www.cs.dartmouth.edu/~thc/papers/slabpose.pdf)
"Slabpose columnsort: a new oblivious algorithm for out-of-core sorting on distributed-memory clusters"

<a name="2007-even"></a>
[2007 Even–Levi–Litman](https://web.archive.org/web/20250103112116/https://www.eng.tau.ac.il/~guy/Papers/conclusive.pdf)
"Optimal conclusive sets for comparator networks",
Theoretical Computer Science (2009)

[2015 Ebbesen](https://www.cs.au.dk/~gerth/advising/thesis/kris-vestergaard-ebbesen.pdf)
"On the practicality of data-oblivious sorting"

## Symbolic execution

<a name="2007-nethercote"></a>
[2007 Nethercote–Seward](https://www.valgrind.org/docs/valgrind2007.pdf)
"Valgrind: a framework for heavyweight dynamic binary instrumentation",
ACM SIGPLAN Conference on Programming Language Design and Implementation

<a name="2016-shoshitaishvili"></a>
[2016 Shoshitaishvili–Wang–Salls–Stephens–Polino–Dutcher–Grosen–Feng–Hauser–Kruegel–Vigna](https://www.cs.ucsb.edu/~vigna/publications/2016_SP_angrSoK.pdf)
"SoK: (State of) The Art of War: Offensive Techniques in Binary Analysis",
IEEE Symposium on Security and Privacy