-r--r--r-- 12179 djbsort-20260127/doc/html/comparison.html raw<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<style type="text/css">
html{overflow-y:scroll;background-color:#012345}
body{font-family:"Noto Sans","Droid Sans","DejaVu Sans","Arial",sans-serif;line-height:1.5}
tt,code{background-color:#f0f0f0;font-family:"Noto Sans Mono","Droid Sans Mono","DejaVu Sans Mono","Courier New",monospace,sans-serif;font-size:1em;}
pre{margin-left:3em}
p,ul,ol,blockquote,pre{font-size:1.0em;line-height:1.6}
li p{font-size:1.0em}
blockquote p{font-size:1.0em}
h1{font-size:1.5em}
h2{font-size:1.3em}
h3{font-size:1.0em}
h1 a{text-decoration:none}
table{border-collapse:collapse}
th,td{border:1px solid black}
table a{text-decoration:none}
table tr{font-size:1.0em;line-height:1.6em}
table tr{font-size:1.0em;line-height:1.5}
tbody tr:nth-child(2n+1){background-color:#f0ffff}
tbody tr:nth-child(2n+2){background-color:#fffff0}

#headline{display:block;margin:0;padding:0;color:#ffffff;background-color:#012345}
#headline .text{font-weight:bold;font-size:1.0em}
#headline input{display:none}
#nav ul{margin:0;padding:0}
#nav li{list-style-type:none;margin:0;padding:0}
.navtop{padding-bottom:0.5em;font-weight:bold;font-size:1.0em}
.navtop{background-color:#012345;color:#ffffff}
#nav .here{background-color:#012345;color:#ffffff}
#nav .away{background-color:#012345;color:#ffffff}
#nav .away a{text-decoration:none;display:block;color:#ffffff}
#nav .away a:hover,.away a:active{text-decoration:underline}
#hidemenu{visibility:hidden;display:none;overflow:hidden;position:fixed;top:0;left:0;height:100%;width:100%}

.main{padding:5px}
.main{background-color:#ffffff}
.pagetitle{font-size:1.4em;font-weight:bold}
@media only screen and (min-width:512px) {
  .navtop{padding-top:5px}
  #headline{top:0;margin:0;width:160px;height:100%;position:fixed;overflow:auto}
  #headline .noselect{display:none}
  #headline #nav{visibility:visible;display:block;width:auto;height:auto}
  .main{margin-left:170px}
  #headline #hidemenu{visibility:hidden}
}
@media not screen and (min-width:512px) {
  #headline .noselect{-webkit-user-select:none;-ms-user-select:none;user-select:none;}
  #headline #nav #navbot{visibility:hidden;position:fixed;top:0;left:-70%;z-index:2;transition:0.2s;margin:0;padding:0}
  #headline input:checked ~ #nav #navbot{height:100%;position:fixed;top:0;left:0;visibility:visible;display:block;box-sizing:border-box;-moz-box-sizing:border-box;-webkit-box-sizing:border-box;vertical-align:center;font-size:1.0em;width:70%;overflow:auto}
  #headline input:checked ~ #hidemenu{visibility:visible;display:block;background:black;opacity:0.3;z-index:1}
}
</style>
<title>
djbsort: Comparison</title>
</head>
<body>
<label id=headline>
<input type=checkbox />
<nav id=nav>
<div class=navtop>
<span class=noselect>≡</span>
djbsort</div>
<ul id=navbot>
<li class=away><a href=index.html>Intro</a>
</li><li class=away><a href=download.html>Download</a>
</li><li class=away><a href=install.html>Install</a>
</li><li class=away><a href=test.html>Test</a>
</li><li class=away><a href=api.html>API</a>
</li><li class=away><a href=speed.html>Speed</a>
</li><li class=away><a href=security.html>Security</a>
</li><li class=away><a href=verif.html>Verification</a>
</li><li class=away><a href=internals.html>Internals</a>
</li><li class=away><a href=license.html>License</a>
</li><li class=away><a href=refs.html>References</a>
</li><li class=here>Comparison
</li></ul></nav>
<div id=hidemenu></div>
</label>
<div class=main>
<div class=pagetitle>djbsort: Comparison</div>
<p>This page compares djbsort to some other numeric-array sorting libraries.
The libraries selected here for comparison are the same as djbsort's main competitors on the <a href="speed.html">speed page</a>:
"the <a href="https://github.com/damageboy/vxsort-cpp">vxsort</a> library
<a href="https://github.com/dotnet/runtime/pull/37159">used in Microsoft's .NET</a>
starting in 2020;
the <a href="https://github.com/simd-sorting/fast-and-robust">vqsort/highway</a> library
<a href="https://opensource.googleblog.com/2022/06/Vectorized%20and%20performance%20portable%20Quicksort.html">introduced by Google</a>;
the <a href="https://github.com/simd-sorting/fast-and-robust">far</a> library, which is the (sometimes faster) parent of vqsort;
and the <a href="https://github.com/intel/x86-simd-sort.git">x86simdsort</a> library
<a href="https://www.phoronix.com/news/Intel-AVX-512-Quicksort-Numpy">introduced by Intel</a>".</p>
<p>One can do better on some of these comparison axes when speed is not an issue:
for example, one can simultaneously (1) use less code and (2) sort a wider range of data types.</p>
<h2>Interface</h2>
<p>Data types:</p>
<ul>
<li>djbsort: int32, uint32, float32, int64, uint64, float64</li>
<li>far: int32</li>
<li>vqsort: int16, uint16, float16, int32, uint32, float32, int64, uint64, float64, uint128</li>
<li>vxsort: int16, uint16, int32, uint32, float32, int64, uint64, float64</li>
<li>x86simdsort: int16, uint16, float16, int32, uint32, float32, int64, uint64, float64</li>
</ul>
<p>Array lengths allowed by API on 64-bit CPUs:</p>
<ul>
<li>djbsort: up to 2<sup>63</sup> (<code>void djbsort_int32(int32_t *,long long)</code>)</li>
<li>far: up to 2<sup>31</sup> (<code>void quicksort(int *arr, int n)</code>)</li>
<li>vqsort: up to 2<sup>64</sup> (<code>void VQSort(int32_t* HWY_RESTRICT keys, size_t n, SortAscending)</code>)</li>
<li>vxsort: up to 2<sup>64</sup> (<code>void sort(T* left, T* right, T left_hint = std::numeric_limits&lt;T&gt;::min(), T right_hint = std::numeric_limits&lt;T&gt;::max())</code>)</li>
<li>x86simdsort: up to 2<sup>64</sup> (<code>void qsort(T *arr, size_t arrsize, bool hasnan = false, bool descending = false)</code>)</li>
</ul>
<p>API is CPU-independent, with implementation automatically selected based on run-time CPU:</p>
<ul>
<li>djbsort: yes (chooses between AVX2 and portable fallback)</li>
<li>far: no (provides AVX2 code; caller handles fallback)</li>
<li>vqsort: yes (chooses between AVX-512, AVX2, ARM NEON, more options, and portable fallback)</li>
<li>vxsort: no (provides AVX-512 code and AVX2 code; caller handles selection and fallback)</li>
<li>x86simdsort: yes (chooses between AVX-512, AVX2, portable fallback)</li>
</ul>
<h2>Correctness and security</h2>
<p><a href="security.html#memory">Memory safety</a> and protection against <a href="security.html#bugs">other bugs</a>:</p>
<ul>
<li>djbsort: <a href="test.html">conventional tests</a>; <a href="test.html#dataflow">tests under valgrind</a>; <a href="verif.html">verification</a></li>
<li>far: conventional tests</li>
<li>vqsort: conventional tests</li>
<li>vxsort: conventional tests</li>
<li>x86simdsort: conventional tests</li>
</ul>
<p>Protection against <a href="security.html#timing">timing leaks</a>:</p>
<ul>
<li>djbsort: yes (constant cycle counts when array size is constant)</li>
<li>far: no</li>
<li>vqsort: no</li>
<li>vxsort: no</li>
<li>x86simdsort: no</li>
</ul>
<p>Protection against <a href="security.html#dos">denial of service</a>:</p>
<ul>
<li>djbsort: clear (changing array contents does not increase cycle counts)</li>
<li>far: unclear (chooses pivot from 72 pseudorandom elements but deterministically; recognizes constant subarrays; falls back to radix exchange)</li>
<li>vqsort: unclear (chooses pivot from 576 bytes using per-thread call to OS RNG; falls back to heapsort)</li>
<li>vxsort: unclear (chooses deterministic median-of-three pivot; falls back to heapsort)</li>
<li>x86simdsort: unclear (similar to the far library; falls back to <code>std::sort</code>)</li>
</ul>
<p>More research is required to quantify the cost of the far, vqsort, vxsort, and x86simdsort libraries
when array contents are chosen adversarially.
Note that a library that allows a larger relative slowdown
may still provide better protection against denial of service
if it is faster to begin with.</p>
<h2>Cycle counts for random data</h2>
<p>CPU cycles/byte to sort <code>int32[16]</code> on AMD Zen 3
(the three numbers are stabilized quartiles of many measurements <em>excluding</em> 31 cycles of cycle-counting overhead):</p>
<ul>
<li>djbsort: 0.33, 0.34, 0.35</li>
<li>far: 1.14, 1.14, 1.14</li>
<li>vqsort: 0.92, 0.92, 0.92</li>
<li>vxsort: 1.85, 1.93, 2.09</li>
<li>x86simdsort: 0.91, 1.00, 1.01</li>
</ul>
<p>CPU cycles/byte to sort <code>int32[1024]</code> on AMD Zen 3
(the three numbers are stabilized quartiles of many measurements):</p>
<ul>
<li>djbsort: 1.14, 1.15, 1.16</li>
<li>far: 1.85, 1.90, 1.95</li>
<li>vqsort: 2.34, 2.46, 2.56</li>
<li>vxsort: 2.61, 2.63, 2.64</li>
<li>x86simdsort: 2.61, 2.78, 2.96</li>
</ul>
<p>CPU cycles/byte to sort <code>int32[65536]</code> on AMD Zen 3
(the three numbers are stabilized quartiles of many measurements):</p>
<ul>
<li>djbsort: 3.27, 3.28, 3.31</li>
<li>far: 3.28, 3.30, 3.31</li>
<li>vqsort: 3.51, 3.55, 3.60</li>
<li>vxsort: 3.67, 3.71, 3.76</li>
<li>x86simdsort: 4.15, 4.18, 4.21</li>
</ul>
<p>See the separate <a href="speed.html">speed page</a>
for a more comprehensive comparison of cycle counts.</p>
<h2>Code size</h2>
<p>Compiled AVX2 code size for a simple <code>int32.cc</code> that defines <code>void sortint32(int32_t *y,int n)</code> by calling the library
(text+data+bss for <code>int32.o</code>, plus text+data+bss for the relevant <code>*.o</code> files in the library):</p>
<ul>
<li>djbsort: 56+0+0 for <code>int32.o</code>
  plus 25050+0+0 for <code>int32_avx2_C0-sort.o</code>
  plus 1772+0+0 for portable fallback and dispatch (<code>dispatch_auto-sort_int32.o</code>, <code>int32_portable4_C3-sort.o</code>, <code>compilers*.o</code>)</li>
<li>far: 24387+0+0 for <code>int32.o</code>
  plus 0+0+0 (far is header-only)</li>
<li>vqsort: 277+8+0 for <code>int32.o</code>
  plus 266420+1224+24 for <code>vqsort_i32a.cc.o</code> plus some small support routines for dispatch etc.</li>
<li>vxsort: 23483+0+0 for <code>int32.o</code>
  plus 4096+0+0 for <code>avx2_masks.cpp.o</code></li>
<li>x86simdsort: 60+0+0 for <code>int32.o</code>
  plus 3951712+8+0 for <code>x86simdsort-avx2.cpp.o</code> (all data types in the library are in a single <code>.o</code> file) plus some small support routines</li>
</ul>
<p>An application with many calls to a big-header library such as far or vxsort
will incur many copies of the first number,
but can avoid this by instead calling an intermediate function such as <code>sortint32</code>,
so the overall code size includes only one copy of that number.</p>
<p>Compiled AVX2 code size for a simple <code>all32.cc</code> that defines <code>sortint32</code>, <code>sortuint32</code>, and <code>sortfloat32</code>
(text+data+bss for <code>all32.o</code>, plus text+data+bss for the relevant <code>*.o</code> files in the library):</p>
<ul>
<li>djbsort: 128+0+0 for <code>all32.o</code>
  plus 25050+0+0 for <code>int32_avx2_C0-sort.o</code>
  plus 445+0+0 for <code>uint32_avx2useint32_C0-sort.o</code>
  plus 493+0+0 for <code>float32_avx2useint32_C0-sort.o</code>
  plus 3960+0+0 for portable fallback and dispatch (<code>dispatch_auto-sort_int32.o</code>, <code>dispatch_auto-sort_uint32.o</code>, <code>dispatch_auto-sort_float32.o</code>, <code>int32_portable4_C3-sort.o</code>, <code>uint32_useint32_C3-sort.o</code>, <code>float32_useint32_C3-sort.o</code>, <code>compilers*.o</code>)</li>
<li>far: not applicable</li>
<li>vqsort: 677+8+0 for <code>all32.o</code> plus 266420+1224+24 for <code>vqsort_i32a.cc.o</code> plus 279990+1224+24 for <code>vqsort_u32a.cc.o</code> plus 244433+1224+24 for <code>vqsort_f32a.cc.o</code> plus some small support routines</li>
<li>vxsort: 53557+0+0 for <code>all32.o</code> plus 4096+0+0 for <code>avx2_masks.cpp.o</code></li>
<li>x86simdsort: 132+0+0 for <code>all32.o</code> plus 3951712+8+0 for <code>x86simdsort-avx2.cpp.o</code> plus some small support routines</li>
</ul>
<p>The <code>uint64</code> and <code>float64</code> functions in djbsort are similarly small wrappers around <code>int64</code>,
which is separate code from <code>int32</code>.</p><hr><font size=1><b>Version:</b>
This is version 2026.01.27 of the "Comparison" web page.
</font>
</div>
</body>
</html>