149 points by matslina on June 11, 2014 | hide | past | favorite | 52 comments

 One semi-interesting thing here is that the "killer adversary" described on that page requires knowing that the algorithm you're trying to exploit is quicksort.Long ago, I wrote adversaries that attempt to force _any_ algorithm that is extracting a partial or total order from data to approach their worst case.These adversaries, like McIlroy's adversary, sits in the compare() function, but it asks itself "what answer can I give, consistent with the data so far, to force the algorithm to ask me the largest number of subsequent questions?"For non-introspective quicksorts this will be O(n^2), but it should also bring out the worst constant factors in O(n\logn) worst case time algorithms.I did some experiments: http://nicknash.me/2012/07/31/adversaries/A proper reference is the Kahn's "Sorting and Entropy", IIRC.
 Very interesting writeup. How did you determine the number of topological sortings for your "Yes" and "No" examples? I'm getting 360 and 144, rather than 120 and 48 that you wrote.
 I'm surprised the article does not point out more directly that it's usually pretty simple to mitigate this attack vector by switching to mergesort. The worst case is O(n log n) so there's no real "killer input".I think it's a good rule of thumb to say "if you need to sort large quantities of untrusted data you should probably use mergesort".
 I've always liked mergesort for this reason. It's easier to understand and there's no wondering about complexity.
 Well, sometimes you have space constraints. The O(n) extra-space required by mergesort isn't always available. But if you don't have space constraints, mergesort is a very good choice because you can easily write a highly scalable parallel version of the algorithm.
 You can implement merge sort in place too:http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.22....Looks like the algo is more complicated, but surely implementable. In most modern architectures, the key for performance is how one exploits the caching behavior. Quicksort is very cache friendly, as well as the common implementation of merge sort. Not so for heapsort.
 phkahler on June 11, 2014 "The O(n) extra-space required by mergesort isn't always available."Heapsort
 lsb on June 11, 2014 No, but mergesort can easily fall back to dumping its sorted contents to disk, for later merging; this disk-based merge sort was invented by van Neumann in the 1940s, and is how the unix utility sort works (dozen-way merge of sorted shards of a file).
 nly on June 11, 2014 Mergesort gets pretty damn efficient, even with moderate buffer space.
 Mergesort would be a poor choice for qsort() due to the linear space complexity. With N bytes of RAM, you'd only be able to sort (a bit less than) N/2 bytes of data. An in-place algorithm is preferable.Glibc is the only libc I'm aware of that implements mergesort. It still falls back to quicksort for large inputs though.
 mzs on June 11, 2014 I just heard about smoothsort: http://code.google.com/p/combsortcs2p-and-other-sorting-algo...Also, nice little write-up, thanks. I also dig the extracted sort implementations you dug-out: https://github.com/matslina/qsort
 tedunangst on June 11, 2014 All BSD derived libcs contains mergesort(), and heapsort() as well.
 mzs on June 11, 2014 True, but parent meant used in the qsort and qsort_r implementations.
 If you are ever handling input data from basically anything outside of the program. Its better to use a constant speed algorithm like merge sort. Just because of attacks like this, I'm really surprised algorithm exploits attacks like this haven't come to the forefront of attacks recently.Application level DDoS's leave the kernel/network/sockets layers exposed for exploitation. Since their tasks will take priority over user's applications.
 > Application level DDoS's leave the kernel/network/sockets layers exposed for exploitation. Since their tasks will take priority over user's applications.So what's the benefit of that, just that you can maximize the chaos you cause by attacking in two different ways?
 One attack and obfuscate the other. Like if you server is at 90% cpu use, and the vast majority of it is server application. You won't consider anything strange happening. You can keep attacking the system with strange algorithm exploits or stop. Nobody will expect an attack in the system since well last time I check the box was at 90% CPU, and it was the server. I'd check again, but maybe we just have a lot of traffic today or something similar.Basically its the old, can't see the forest past the trees routine.Its more gaming the operators then the system.
 Ah okay, I wondered if it was that simple.
 I think heapsort is a better replacement - in-place, guaranteed O(n log n) time (so also quite resistant to information leakage via timing disclosure, which could be important in higher security applications.)
 Just in case anyone is curious, this does not effect any implementation of std::sort in C++ I am aware of.They all use introsort -- basically use quicksort until you have done some number of partitions, then switch to heapsorting the cells of the partition. This ensures fast quick-sort performance while guaranteeing O(n log n) worst case.
 In Linux-land, glibc's qsort also uses introsort, and musl libc's uses smoothsort:
 As someone who implemented an open source quicksort (http://www.mqseries.net/phpBB2/viewtopic.php?p=273722) which is used by many in production:Now this is very interesting, but could also be prevented with a random number generator (which influences the selection of the pivot element). If this would have been exploited, people would have looked into mitigating this.Edit: You can detect recursion-tree-degeneration (just check the depth) and react to it (pivot selection)
 Yeah, random pivot quicksort is n log n expected case. Its pretty easy to show that it is realistically impossible to exceed n log n for large data sets. The constants are quite high, though, due to the need to generate a random number for each element.Additionally, quicksort is just as easy to implement in parallel as mergesort and runs about as fast since it can be performed in-place.
 Some historic quick-sort mis-implementations had O(n^2) runtime when trying to sort constant data (all keys equal, the effect was due to not picking the partition carefully in that case). See: http://www.win-vector.com/blog/2008/04/sorting-in-anger/
 "Inspecting a diff between the qsort() of 4.4BSD-Lite and that of current day FreeBSD reveals that very little has changed since 1994."That reminds me :)`````` \$ expr \( -2147483648 \) \* -1 -2147483648 \$ expr \( -2147483648 \) / -1 Floating point exception: 8 (core dumped)﻿ \$ uname -srm FreeBSD 8.4-RELEASE-p9 i386``````
 Looks like it is fixed now:`````` \$ expr \( -2147483648 \) \* -1 2147483648 \$ expr \( -2147483648 \) / -1 2147483648 \$ uname -srm FreeBSD 10.0-RELEASE amd64 `````` Or maybe it doesn't work on i386?
 mzs on June 11, 2014 It's likely an i386 thing:`````` \$ echo | awk '{ print 2 ** 31 }' 2147483648 `````` Make sure you are using /bin/expr and maybe give -9223372036854775808 a whirl too.
 It now reports overflow instead of crashing, but the first case is still incorrect:`````` \$ which expr /bin/expr \$ expr \( -9223372036854775808 \) \* -1 -9223372036854775808 \$ expr \( -9223372036854775808 \) / -1 expr: overflow `````` Looking at the source code:`````` void assert_times(intmax_t a, intmax_t b, intmax_t r) { /* * if first operand is 0, no overflow is possible, * else result of division test must match second operand */ if (a != 0 && r / a != b) errx(ERR_EXIT, "overflow"); } struct val * op_times(struct val *a, struct val *b) { struct val *r; assert_to_integer(a); assert_to_integer(b); r = make_integer(a->u.i * b->u.i); assert_times(a->u.i, b->u.i, r->u.i); free_value(a); free_value(b); return (r); } void assert_div(intmax_t a, intmax_t b) { if (b == 0) errx(ERR_EXIT, "division by zero"); /* only INTMAX_MIN / -1 causes overflow */ if (a == INTMAX_MIN && b == -1) errx(ERR_EXIT, "overflow"); } struct val * op_div(struct val *a, struct val *b) { struct val *r; assert_to_integer(a); assert_to_integer(b); /* assert based on operands only, not on result */ assert_div(a->u.i, b->u.i); r = make_integer(a->u.i / b->u.i); free_value(a); free_value(b); return (r); } `````` Looks like the check for overflow in the multiplication case is broken since the check itself does not account for overflow. I'll try to remember to submit a patch when I get home.
 mzs on June 11, 2014 thanks!
 In comparison, the Go standard sort function is not vulnerable to this, and references the same paper as this article:
 I don't see it, seems just a median of three quick sort - the median of three for the partition selection is to mitigate poor performance of quick sort on sorted and reverse sorted inputs.edit: I'm an idiot, totally glossed-over the switch to heap sort logic.
 This is the same approach used the the .NET framework in Array.Sort: http://referencesource.microsoft.com/#mscorlib/system/collec...You can see the fallback to heapsort here: http://referencesource.microsoft.com/#mscorlib/system/collec...
 Huh? And by what black magic its qsort is not O(n^2) (worst)?
 It sets a stack limit proportional to log(n) and gives up and heapsorts the subproblem if the limit is exceeded. You get no more than n log(n) quicksort operations, plus heapsorting any partition of the input which is also n log(n)In other words, they use the worst-case n log(n) heapsort with an optimization to use the much faster quicksort for non-pathological inputs, which is almost always.
 chowells on June 11, 2014 Well, Tarjan developed an O(n) worst-case order statistics algorithm, which you could use to implement exact pivot selection and partitioning in O(n) time. The constant factors would be horrid, but it would be a valid implementation of quicksort with a worst-case running time of O(n log n)
 The bogosort is well known to be invulnerable to such attacks.
 A similar DoS-with-degenerate-input these days is hash flooding. In the same way that quicksort can degrade from O(n lg n) to O(n^2) with degenerate input, so can hash tables degrade from O(1) to O(n) when all of the keys have hash collisions.This is the main motivation behind SipHash, a new hash function that is designed to be cryptographically collision-resistant but fast enough to use in hash tables: https://131002.net/siphash/
 You mean the one mentioned in the article?Though you and the article both are perpetuating the lie that hash table operations are O(1). What magical hash function distributes n inputs into O(n) buckets in O(1) time? (Hint: distributing into O(n) buckets requires looking at O(log n) bits.)
 > Though you and the article both are perpetuating the lie that hash table operations are O(1).Hash table operations are O(1) on average, assuming a uniform hash function, and assuming that the number of buckets is kept greater than the number of elements. More info here: http://en.wikipedia.org/wiki/Hash_table#Performance_analysisComputing the hash function itself is not usually factored into this analysis, probably because it's an orthogonal concern (ie. hash functions and hash table implementations can be mixed and matched), and because computing the hash function for every operation is not strictly required (the hash of the key can be cached).But even if you are factoring in computation of the hash function, this is a strange argument:> Hint: distributing into O(n) buckets requires looking at O(log n) bits.Computers can look at O(log n) bits in a single operation for any "n" that can reasonably be held in memory.For example, if your hash table has less than 4B entries, "log n" is less than 32. Any 32 bit computer can "look at" 32 bits in a single operation. So I can't see why you think that "log n" needs to be a factor in the complexity analysis.
 There is no "killer input" for randomized quicksort. I'm surprised libc doesn't select pivots randomly already.
 Except if you know the PRNG state, but I imagine that's really hard to pull off.
 I wonder of it's possible to do some statistical attacks by sending sample input and measuring response time? That is, after getting the response times for a million sorts, perhaps you can make inferences into the current PRNG state?I suspect fluctuations in latency would make this exceedingly difficult, but I am constantly amazed by the statistical attacks people pull off.
 Considering the fact that timing attacks based on memcmp short circuiting were shown to be remotely practical a decade ago, latency isn't going to be an issue. The bigger question is whether you can get down your inputs to a small enough number to practically determine PRNG state (you probably can).
 Quite nifty. Imagine how you could seed a service with data over a very long period of time knowing and then at the wrong moment you make a single request causing the server to fall over.This sort of thing might even work where the target comes to you for data (such as a search engine crawling you).
 PostgreSQL removed the switch to insertion sort even before NetBSD, in 2006.
 Was anyone able to view the linked research paper on Hash Table exploits?
 You gotta dig a bit. But here: