-r--r--r-- 12179 djbsort-20260127/doc/html/comparison.html raw
<html> <head> <meta http-equiv="content-type" content="text/html; charset=utf-8"> <meta name="viewport" content="width=device-width, initial-scale=1"> <style type="text/css"> html{overflow-y:scroll;background-color:#012345} body{font-family:"Noto Sans","Droid Sans","DejaVu Sans","Arial",sans-serif;line-height:1.5} tt,code{background-color:#f0f0f0;font-family:"Noto Sans Mono","Droid Sans Mono","DejaVu Sans Mono","Courier New",monospace,sans-serif;font-size:1em;} pre{margin-left:3em} p,ul,ol,blockquote,pre{font-size:1.0em;line-height:1.6} li p{font-size:1.0em} blockquote p{font-size:1.0em} h1{font-size:1.5em} h2{font-size:1.3em} h3{font-size:1.0em} h1 a{text-decoration:none} table{border-collapse:collapse} th,td{border:1px solid black} table a{text-decoration:none} table tr{font-size:1.0em;line-height:1.6em} table tr{font-size:1.0em;line-height:1.5} tbody tr:nth-child(2n+1){background-color:#f0ffff} tbody tr:nth-child(2n+2){background-color:#fffff0} #headline{display:block;margin:0;padding:0;color:#ffffff;background-color:#012345} #headline .text{font-weight:bold;font-size:1.0em} #headline input{display:none} #nav ul{margin:0;padding:0} #nav li{list-style-type:none;margin:0;padding:0} .navtop{padding-bottom:0.5em;font-weight:bold;font-size:1.0em} .navtop{background-color:#012345;color:#ffffff} #nav .here{background-color:#012345;color:#ffffff} #nav .away{background-color:#012345;color:#ffffff} #nav .away a{text-decoration:none;display:block;color:#ffffff} #nav .away a:hover,.away a:active{text-decoration:underline} #hidemenu{visibility:hidden;display:none;overflow:hidden;position:fixed;top:0;left:0;height:100%;width:100%} .main{padding:5px} .main{background-color:#ffffff} .pagetitle{font-size:1.4em;font-weight:bold} @media only screen and (min-width:512px) { .navtop{padding-top:5px} #headline{top:0;margin:0;width:160px;height:100%;position:fixed;overflow:auto} #headline .noselect{display:none} #headline #nav{visibility:visible;display:block;width:auto;height:auto} .main{margin-left:170px} #headline #hidemenu{visibility:hidden} } @media not screen and (min-width:512px) { #headline .noselect{-webkit-user-select:none;-ms-user-select:none;user-select:none;} #headline #nav #navbot{visibility:hidden;position:fixed;top:0;left:-70%;z-index:2;transition:0.2s;margin:0;padding:0} #headline input:checked ~ #nav #navbot{height:100%;position:fixed;top:0;left:0;visibility:visible;display:block;box-sizing:border-box;-moz-box-sizing:border-box;-webkit-box-sizing:border-box;vertical-align:center;font-size:1.0em;width:70%;overflow:auto} #headline input:checked ~ #hidemenu{visibility:visible;display:block;background:black;opacity:0.3;z-index:1} } </style> <title> djbsort: Comparison</title> </head> <body> <label id=headline> <input type=checkbox /> <nav id=nav> <div class=navtop> <span class=noselect>≡</span> djbsort</div> <ul id=navbot> <li class=away><a href=index.html>Intro</a> </li><li class=away><a href=download.html>Download</a> </li><li class=away><a href=install.html>Install</a> </li><li class=away><a href=test.html>Test</a> </li><li class=away><a href=api.html>API</a> </li><li class=away><a href=speed.html>Speed</a> </li><li class=away><a href=security.html>Security</a> </li><li class=away><a href=verif.html>Verification</a> </li><li class=away><a href=internals.html>Internals</a> </li><li class=away><a href=license.html>License</a> </li><li class=away><a href=refs.html>References</a> </li><li class=here>Comparison </li></ul></nav> <div id=hidemenu></div> </label> <div class=main> <div class=pagetitle>djbsort: Comparison</div> <p>This page compares djbsort to some other numeric-array sorting libraries. The libraries selected here for comparison are the same as djbsort's main competitors on the <a href="speed.html">speed page</a>: "the <a href="https://github.com/damageboy/vxsort-cpp">vxsort</a> library <a href="https://github.com/dotnet/runtime/pull/37159">used in Microsoft's .NET</a> starting in 2020; the <a href="https://github.com/simd-sorting/fast-and-robust">vqsort/highway</a> library <a href="https://opensource.googleblog.com/2022/06/Vectorized%20and%20performance%20portable%20Quicksort.html">introduced by Google</a>; the <a href="https://github.com/simd-sorting/fast-and-robust">far</a> library, which is the (sometimes faster) parent of vqsort; and the <a href="https://github.com/intel/x86-simd-sort.git">x86simdsort</a> library <a href="https://www.phoronix.com/news/Intel-AVX-512-Quicksort-Numpy">introduced by Intel</a>".</p> <p>One can do better on some of these comparison axes when speed is not an issue: for example, one can simultaneously (1) use less code and (2) sort a wider range of data types.</p> <h2>Interface</h2> <p>Data types:</p> <ul> <li>djbsort: int32, uint32, float32, int64, uint64, float64</li> <li>far: int32</li> <li>vqsort: int16, uint16, float16, int32, uint32, float32, int64, uint64, float64, uint128</li> <li>vxsort: int16, uint16, int32, uint32, float32, int64, uint64, float64</li> <li>x86simdsort: int16, uint16, float16, int32, uint32, float32, int64, uint64, float64</li> </ul> <p>Array lengths allowed by API on 64-bit CPUs:</p> <ul> <li>djbsort: up to 2<sup>63</sup> (<code>void djbsort_int32(int32_t *,long long)</code>)</li> <li>far: up to 2<sup>31</sup> (<code>void quicksort(int *arr, int n)</code>)</li> <li>vqsort: up to 2<sup>64</sup> (<code>void VQSort(int32_t* HWY_RESTRICT keys, size_t n, SortAscending)</code>)</li> <li>vxsort: up to 2<sup>64</sup> (<code>void sort(T* left, T* right, T left_hint = std::numeric_limits&lt;T&gt;::min(), T right_hint = std::numeric_limits&lt;T&gt;::max())</code>)</li> <li>x86simdsort: up to 2<sup>64</sup> (<code>void qsort(T *arr, size_t arrsize, bool hasnan = false, bool descending = false)</code>)</li> </ul> <p>API is CPU-independent, with implementation automatically selected based on run-time CPU:</p> <ul> <li>djbsort: yes (chooses between AVX2 and portable fallback)</li> <li>far: no (provides AVX2 code; caller handles fallback)</li> <li>vqsort: yes (chooses between AVX-512, AVX2, ARM NEON, more options, and portable fallback)</li> <li>vxsort: no (provides AVX-512 code and AVX2 code; caller handles selection and fallback)</li> <li>x86simdsort: yes (chooses between AVX-512, AVX2, portable fallback)</li> </ul> <h2>Correctness and security</h2> <p><a href="security.html#memory">Memory safety</a> and protection against <a href="security.html#bugs">other bugs</a>:</p> <ul> <li>djbsort: <a href="test.html">conventional tests</a>; <a href="test.html#dataflow">tests under valgrind</a>; <a href="verif.html">verification</a></li> <li>far: conventional tests</li> <li>vqsort: conventional tests</li> <li>vxsort: conventional tests</li> <li>x86simdsort: conventional tests</li> </ul> <p>Protection against <a href="security.html#timing">timing leaks</a>:</p> <ul> <li>djbsort: yes (constant cycle counts when array size is constant)</li> <li>far: no</li> <li>vqsort: no</li> <li>vxsort: no</li> <li>x86simdsort: no</li> </ul> <p>Protection against <a href="security.html#dos">denial of service</a>:</p> <ul> <li>djbsort: clear (changing array contents does not increase cycle counts)</li> <li>far: unclear (chooses pivot from 72 pseudorandom elements but deterministically; recognizes constant subarrays; falls back to radix exchange)</li> <li>vqsort: unclear (chooses pivot from 576 bytes using per-thread call to OS RNG; falls back to heapsort)</li> <li>vxsort: unclear (chooses deterministic median-of-three pivot; falls back to heapsort)</li> <li>x86simdsort: unclear (similar to the far library; falls back to <code>std::sort</code>)</li> </ul> <p>More research is required to quantify the cost of the far, vqsort, vxsort, and x86simdsort libraries when array contents are chosen adversarially. Note that a library that allows a larger relative slowdown may still provide better protection against denial of service if it is faster to begin with.</p> <h2>Cycle counts for random data</h2> <p>CPU cycles/byte to sort <code>int32[16]</code> on AMD Zen 3 (the three numbers are stabilized quartiles of many measurements <em>excluding</em> 31 cycles of cycle-counting overhead):</p> <ul> <li>djbsort: 0.33, 0.34, 0.35</li> <li>far: 1.14, 1.14, 1.14</li> <li>vqsort: 0.92, 0.92, 0.92</li> <li>vxsort: 1.85, 1.93, 2.09</li> <li>x86simdsort: 0.91, 1.00, 1.01</li> </ul> <p>CPU cycles/byte to sort <code>int32[1024]</code> on AMD Zen 3 (the three numbers are stabilized quartiles of many measurements):</p> <ul> <li>djbsort: 1.14, 1.15, 1.16</li> <li>far: 1.85, 1.90, 1.95</li> <li>vqsort: 2.34, 2.46, 2.56</li> <li>vxsort: 2.61, 2.63, 2.64</li> <li>x86simdsort: 2.61, 2.78, 2.96</li> </ul> <p>CPU cycles/byte to sort <code>int32[65536]</code> on AMD Zen 3 (the three numbers are stabilized quartiles of many measurements):</p> <ul> <li>djbsort: 3.27, 3.28, 3.31</li> <li>far: 3.28, 3.30, 3.31</li> <li>vqsort: 3.51, 3.55, 3.60</li> <li>vxsort: 3.67, 3.71, 3.76</li> <li>x86simdsort: 4.15, 4.18, 4.21</li> </ul> <p>See the separate <a href="speed.html">speed page</a> for a more comprehensive comparison of cycle counts.</p> <h2>Code size</h2> <p>Compiled AVX2 code size for a simple <code>int32.cc</code> that defines <code>void sortint32(int32_t *y,int n)</code> by calling the library (text+data+bss for <code>int32.o</code>, plus text+data+bss for the relevant <code>*.o</code> files in the library):</p> <ul> <li>djbsort: 56+0+0 for <code>int32.o</code> plus 25050+0+0 for <code>int32_avx2_C0-sort.o</code> plus 1772+0+0 for portable fallback and dispatch (<code>dispatch_auto-sort_int32.o</code>, <code>int32_portable4_C3-sort.o</code>, <code>compilers*.o</code>)</li> <li>far: 24387+0+0 for <code>int32.o</code> plus 0+0+0 (far is header-only)</li> <li>vqsort: 277+8+0 for <code>int32.o</code> plus 266420+1224+24 for <code>vqsort_i32a.cc.o</code> plus some small support routines for dispatch etc.</li> <li>vxsort: 23483+0+0 for <code>int32.o</code> plus 4096+0+0 for <code>avx2_masks.cpp.o</code></li> <li>x86simdsort: 60+0+0 for <code>int32.o</code> plus 3951712+8+0 for <code>x86simdsort-avx2.cpp.o</code> (all data types in the library are in a single <code>.o</code> file) plus some small support routines</li> </ul> <p>An application with many calls to a big-header library such as far or vxsort will incur many copies of the first number, but can avoid this by instead calling an intermediate function such as <code>sortint32</code>, so the overall code size includes only one copy of that number.</p> <p>Compiled AVX2 code size for a simple <code>all32.cc</code> that defines <code>sortint32</code>, <code>sortuint32</code>, and <code>sortfloat32</code> (text+data+bss for <code>all32.o</code>, plus text+data+bss for the relevant <code>*.o</code> files in the library):</p> <ul> <li>djbsort: 128+0+0 for <code>all32.o</code> plus 25050+0+0 for <code>int32_avx2_C0-sort.o</code> plus 445+0+0 for <code>uint32_avx2useint32_C0-sort.o</code> plus 493+0+0 for <code>float32_avx2useint32_C0-sort.o</code> plus 3960+0+0 for portable fallback and dispatch (<code>dispatch_auto-sort_int32.o</code>, <code>dispatch_auto-sort_uint32.o</code>, <code>dispatch_auto-sort_float32.o</code>, <code>int32_portable4_C3-sort.o</code>, <code>uint32_useint32_C3-sort.o</code>, <code>float32_useint32_C3-sort.o</code>, <code>compilers*.o</code>)</li> <li>far: not applicable</li> <li>vqsort: 677+8+0 for <code>all32.o</code> plus 266420+1224+24 for <code>vqsort_i32a.cc.o</code> plus 279990+1224+24 for <code>vqsort_u32a.cc.o</code> plus 244433+1224+24 for <code>vqsort_f32a.cc.o</code> plus some small support routines</li> <li>vxsort: 53557+0+0 for <code>all32.o</code> plus 4096+0+0 for <code>avx2_masks.cpp.o</code></li> <li>x86simdsort: 132+0+0 for <code>all32.o</code> plus 3951712+8+0 for <code>x86simdsort-avx2.cpp.o</code> plus some small support routines</li> </ul> <p>The <code>uint64</code> and <code>float64</code> functions in djbsort are similarly small wrappers around <code>int64</code>, which is separate code from <code>int32</code>.</p><hr><font size=1><b>Version:</b> This is version 2026.01.27 of the "Comparison" web page. </font> </div> </body> </html>