Hacker News new | comments | show | ask | jobs | submit login
Proving that Android’s, Java’s and Python’s sorting algorithm is broken (envisage-project.eu)
608 points by amund on Feb 24, 2015 | hide | past | web | favorite | 140 comments



Their corrected version:

    if (n > 0 && runLen[n-1] <= runLen[n] + runLen[n+1]  
    || n-1 > 0 && runLen[n-2] <= runLen[n] + runLen[n-1])
In the first clause they add earlier to later runLen elements, but in the second they add later to earlier elements. Switching the order just makes the expression harder to understand, like reusing variables within a scope. Addition is commutative and there's nothing technically wrong with it, but this construction makes it appear like there may be something special about element n when there's not.

The programmer also has to do mental arithmetic to check the bounds. Bounds checks can be written so that the largest index subtracted is also the value tested:

    if (n >= 1 && runLen[n-1] <= runLen[n] + runLen[n+1]  
     || n >= 2 && runLen[n-2] <= runLen[n-1] + runLen[n])
This makes it easier to see that the bounds are correct. Formal methods found an important bug that would not have been found otherwise, but a lot of lesser bugs can be prevented just by writing clear and consistent code.


Question: Formal methods found the bug, but now that we know about it, what lessons can human programmers learn? That is, in the spirit of "20/20 hindsight," can we see the false assumption that made the bug possible as an instance of a certain kind of mistake, which we can look for and avoid in the future?

Or do you think the only lesson here is "Never fully trust anything that hasn't been formally verified"?


To me, https://mail.python.org/pipermail/python-dev/2002-July/02689... reads like the bug was known (err, the possibility of running out of slots in the bookkeeping stack was understood). That indicates that it could be an intentional tradeoff.


The source code for the python version has this comment:

  /* The maximum number of entries in a MergeState's pending-runs stack.
  * This is enough to sort arrays of size up to about
  *     32 * phi ** MAX_MERGE_PENDING
  * where phi ~= 1.618.  85 is ridiculouslylarge enough, good for an array
  * with 2**64 elements.
  */
According to the linked article that's incorrect, it is only good enough for an array with 2^49 elements. So, if the (implicit) design guarantee is for lists up to 2^64, the implementation is technically bugged. But it'd be pretty hard to run into it in practice. 2^49 of python's plain integers consume more than 4.5 petabytes of RAM.


2⁴⁹ item references in a Python list will consume about 2⁵² bytes of RAM on an LP64 system, which is 4 pebibytes (4.5 petabytes, as you said). You might also need to allocate space for the objects that the references refer to, which will almost certainly be bigger. I don’t understand the bug well enough to know if it will work with large numbers of duplicate/identical items.

Nitpick: the idiomatic term is “buggy”, not “bugged”. Something is “bugged” if it has a hidden microphone in it transmitting to spies, not if it contains a software “bug”.


4.5 petabytes of RAM... that's what swap is for.


I would say both are correct. From a psychological\SE standpoint know your faults and develop defenses for them. That could be use a language that doesn't allow them to even be thought in the first place, or develop defensive coding standards that prevent them. But that answer is more stochastic, it helps but anything short of formal proof is just raising the likely hood of success not a guarantee.

I also wonder if a static analysis tool would have caught the bug. Are there any good static analysis tools for Java or Python that might have caught the bug without the overhead of writing the formal proof?


As I understand, they didn't write a formal proof, but only the pre- and post-conditions, along with loop invariants. The formal proof was then completed by automatic theorem provers.

There has been some research into contracts inference, but with very limited results. Even if you added all the pre- and post-conditions manually, I'm not sure loop invariants could be inferred (especially as they can get pretty complex).


True, I more meant their method required O(n) effort since they had to write pre\post conditions and invariant for the tool. Whereas something like codeSonar or PVS Studio requires O(1) effort, just point it at your build and press go. The bug they found was an implementation defect, something I believe the low effort tools would have caught. Where their tool should shine would be in detecting something the low effort analysis tools miss.


> something I believe the low effort tools would have caught.

It's worth noting that the formal verification method brought to light the root cause, where prior attempts at fixing this bug in TimSort had merely made it less common.

Would codeSonar or PVS Studio actually have provided the insight, in "O(1)", required to fix the root cause? I'm not familiar with those tools, but I sincerely doubt it.


Working with KeY is similar to writing a proof. The development of pre-, post-conditions and invariants is done interactively with the theorem prover, which shows you what is missing.


From the article: The reaction of the Java developer community to our report is somewhat disappointing: instead of using our fixed (and verified!) version of mergeCollapse(), they opted to increase the allocated runLen “sufficiently”. As we showed, this is not necessary. In consequence, whoever uses java.utils.Collection.sort() is forced to over allocate space. Given the astronomical number of program runs that such a central routine is used in, this leads to a considerable waste of energy.


It looks like the Python version that's good for 2^49 items uses 1360 bytes for this array, allocated on the stack.

I wouldn't worry about an extra couple of bytes nearly as much as I would worry about changing the behaviour of a function used in an "astronomical number of program", so this looks like a pretty reasonable and conservative choice, at least for an immediate patch.


Does it "change behavior"? As far as I understand, the only change is that this test wouldn't crash anymore. Which certainly is "changing the behavior", but allocating more memory is as well. If so, I don't see any reason not to use formally verified version. Not that it is really important, but quite reasonable.


They fixed it. The question of whether or not it was the "proper" fix (out of the two proposed solutions) is complicated. The maximum working array length was changed from 40 (ints) to 49. The difference in this case means that even an additional cache line won't be necessary, and if it is -- only for arrays 120K long. This may have been judged to be better than adding more branches to each iteration (and the code lengthened) which will need to run no matter the input size.

My guess is that they ran a JMH benchmark with each proposed fixed, and picked the better one. In either case, I doubt that makes much difference.


Which costs more, doubling the number of runLength comparisons or allocating more memory? Without knowing that answer it's hard to say which is the better solution (assuming more memory actually fixes the bug).


Exchanging memory usage for equality comparations is an almost universal no brainer. The number of comparations must increase hundreds of times for compensating doubling the memory usage... and that's just for performance, for energy savings you'll need more.


They've added 36 bytes for input sizes > 120K elements. That's not even one cache line. This, instead of a few more branches that would need to run every iteration, no matter the input size. I haven't run a benchmark (they probably did), but my gut says they've made the right choice.


Sounds like the amount of memory involved is quite small. Below a certain threshold, doubling is practically free. And IIUC even over-allocating memory doesn't necessarily mean it uses all of the memory, only for the the inputs that cause the bug. The extra comparison however would be performed for all inputs.


Below what threshold though? L1 cache size? O(64 bytes), no?


Depends upon the operation of the rest of the program. 64 bytes is the size of a cache line, the actual L1 cache (on recent Intel processors) is 1024 of those cache lines. If the extra memory just means that you pull an extra line into cache for a bookkeeping array, the effect is negligible. If it means that you thrash the cache and have to evict some of your sorted data, the effect could be orders of magnitude.

I'd really hope that the JDK maintainers benchmarked against real-world programs before deciding which fix to use. My intuition is that the set of programs where increasing the memory usage matters is tiny, but the effect is large for them - it's the set of programs where the working set is greater than the L1 cache size. Equality comparisons aren't free either, they can often cause branch mispredicts. But the point is that performance tuning on real-world programs is very counterintuitive, and so the best way to do it is to run some actual numbers on actual programs.


All Intel CPUs worth mentioning in last 10 years have 512 L1 data cache entries, not 1024. Including Haswell and Sandy Bridge.

I do agree with your analysis otherwise.


The extra comparison ought to return the same value on all but the inputs that cause the bug too, so it should be perfectly predicted.


Memory isn't being doubled though, some bookkeeping that is a fraction of the total memory being used is being increased by less than 2x.


Where do you base these numbers on? If it still fits inside the L1 cache, I assume that increasing the comparisons will have a relatively bigger impact?


Yes, that's right. The thing is that nearly all of the times you want to order arrays that fit inside the L1 cache, neither execution time nor allocation size are a problem.


Really symptomatic of the way the JDC works.


[flagged]


It is widely known that every corporate developer has endless time for his work. Mundane aspects like costs of development should have no impact on our code, the pureness of which is a goal in itself.


I thought it was nodejs evangelists that did that.


Java did that with data. Node.js did it with your file system (NPM, I'm looking at you).


This is actually really cool. They tried to verify Timsort as implemented in Python and Java using formal methods. When it didn't seem to work, they discovered that there was actually a missing case in both implementations, which could lead to array out of bounds exceptions.

Really shines as an example of how important proof is in computer science.


In the Haskell community there is a tool called QuickCheck [1]: it is able to generate inputs based on preconditions, and you provide a function to verify the postconditions.

I am not sure why these kind of testing methods aren't used more often in other communities, since it makes it really easy to catch corner cases; in the Haskell world, at least, using QuickCheck is somewhat pervasive.

[1] https://wiki.haskell.org/Introduction_to_QuickCheck2

EDIT: I have never done this before, but could anyone explain why I am being downvoted? I wasn't making the claim that this is a substitute for a formal proof, I was merely adding this information to the discussion since it seemed relevant.


There's a difference, though: QuickCheck randomly tries a large-ish number of possible inputs to try and disprove a postcondition, whereas formal methods prove the postcondition for all inputs. Practically, if the number of possible inputs is huge compared to the number of failure cases, it's unlikely that QuickCheck will find it. I strongly suspect - and this would be an interesting experiment - that QuickCheck wouldn't have found this particular bug, since Timsort has been in use for years without anyone noticing it.


What about AFL?


What's AFL?


American Fuzzy Lop (http://lcamtuf.coredump.cx/afl/), a fuzzer which instruments programs at compile-time to help find interesting inputs faster than brute-force fuzzing.

It may have indeed found a bug in TimSort, since it's more apt at exercising branches, but I think AFL is C/C++ only.


It wouldn't catch this bug. The algorithm works for lists of size less than 2^49.


In C# there's a similar tool called Code Digger[1]. It analyzes all possible branches of a function and generates input to cover all of them, this includes not only your own "if's" but also branches inside library functions and stuff like divide by zero and dereferencing null pointers. It's really impressive and can even generate strings and arrays with data that causes a different branch to be taken.

Sadly i found it to be no more than that, cool. On real life code it's not useful. Real code usually has a lot of mutable state, this tool works best with pure functions. Real code does pretty complex things like parsing xml, that takes too much time to analyze for these tools. Bugs in real code are not always exceptions or error codes returned, maybe you assigned foo.x to bar.y instead of bar.x, maybe the sort doesn't crash but your numbers came out unsorted, how would the tool know that's a fault? Real bugs happen across threads or even modules running on different machines. And the list goes on. Only if you have functions that are very pure and mostly work with simple data types, few strings and lists, it can be good to use as startng point for input to a test suite but doesn't replace it fully.

[1] https://visualstudiogallery.msdn.microsoft.com/fb5badda-4ea3...


Here is an implementation of Timsort in Haskell that also exhibits the bug:

https://github.com/timjb/haskell-timsort/commit/d9c3e1e454c7...

I find interesting that this Haskell version of Timsort also exhibits the bug, despite Haskell having a type system more advanced than Python or Java. This is a reminder that type checking is not a magic bullet, and is hardly enough to prove a program correctness. Formal proofs and whole program static analysis are still necessary, with or without an advanced type system:

http://www.cl.cam.ac.uk/~srk31/blog/2014/10/07/


I'd say that QuickCheck would have a hard time coming up with a failing test case since the input needs to be rather large to trigger the bug.


I am trying (with some success) to get QuickCheck used in the Rust community.

Listing my projects that us QuickCheck would be cheating, so I'll link projects that aren't my own. The most impressive, by far, is this linear algebra library: https://github.com/japaric/linalg.rs/tree/master/tests --- Just pick a file at random and look for `#[quickcheck]`. :-)

Here's some fancy testing for a "concurrent, lock-free, linearizable, insert-only, sorted map": https://github.com/danburkert/pawn/blob/1afe50c618a733de63b9...

Another: https://github.com/apasel422/tree/blob/6656140c44687793e31ae...

And OK, I'll stop now, but you can see more here: https://github.com/search?q=extern+crate+quickcheck&type=Cod...


At a guess: A lack of practical ways to perform such checks without extensive extra knowledge and annotation as seen in the article. A large factor is likely that Haskell is pure by default - something that is reasonably rare in many languages (where a method likely reads (or mutates) state from instance variables or even globals).


I think it also has to do with Haskell's type system. Quickcheck is able to "discover" the bounds within any arbitrary type can operate, and thus makes generating random values within these bounds fairly easy.


Eh, you still have to write instances of Arbitrary. And often those instances are not great; wayyy to many libraries are tested with Arbitrary String instances that only generate ascii, for example.


The use of these sorting functions is already orders of magnitude more than random testing would cover. This is a very rare failure, only affecting huge arrays, arrays so large that quickcheck would not even test it.


It would be interesting to see if QC would find data easily that brake the implementations or require some tuning in the way it generates test data.

And sorry, I have downvoted you by mistake when I wanted to reply.


Just FYI for anyone who's interested, but there's a Swift implementation of QuickCheck described in this book:

http://www.objc.io/books/

It's very minimal, but the chapter outlines what you'd need to do to make it more complete and get it on par with the Haskell version. There are also a few versions on GitHub too.


Has anyone used the KeY project in industry?

I'd certainly prefer proofs over unit tests. However, I don't understand formal proof systems sufficiently well to know whether these they would work in terms of your typical "app" that makes RPC calls, DB changes, and generally has lots of moving parts and statefulness.


I have tried to use Coq and Frama-C to prove commercial OCaml and C programs, without, it has to be said, any success. I'm waiting for someone to write the brilliant tutorial.


You're waiting for a tutorial on Coq integrated with Frama-C, or for a tutorial on any of them?


I'm waiting for a tutorial on how to prove correctness of either OCaml or (especially) C code in real programs.


Research tools are getting closer to what people consider "real" programs, even if they often focus on embedded systems software. CompCert already deals with a quite large subset of C. Frama-C can deal with most if not all syntactic features of C, but then the next challenge is the C standard library, e.g. specifying every useful function (and even "simple" ones such as memcpy can be quite tricky). Afterwards, you have to deal with the glibc, then other high-level libraries, etc...

Most of these tools are either still in a mostly-academic setting (where "documentation = conference paper"), or do not have enough funding to pay for the development of more user-friendly features and extensive documentation. But with the ever-increasing security issues receiving media attention lately, we can hope more funding will allow these tools to reach a more mainstream status.

By the way, could you give an example of a small program that you would consider "real"? Just to have an idea of its size and complexity.


Firstly I don't expect to prove a whole program. However being able to prove the correctness of key functions or algorithms in a program could be useful.

You mention glibc, and it would be great to prove that (for example) 'qsort' is correct. That wouldn't be entirely trivial:

https://sourceware.org/git/?p=glibc.git;a=blob;f=stdlib/msor... https://sourceware.org/git/?p=glibc.git;a=blob;f=stdlib/qsor...

For an example of a larger program, I'd like to prove various invariants of C programs, such as that 'reply_with_error' is called exactly once on every error path in this program:

https://github.com/libguestfs/libguestfs/tree/master/daemon


It is possible, see e.g. http://ssrg.nicta.com.au/projects/TS/, though it's not really the tutorial you've been hoping for.


Oh, crap. I suppose this also means we have the bug in our GNU Octave implementation:

http://hg.savannah.gnu.org/hgweb/octave/file/0486a29d780f/li...

Well, time to patch it there too.


Nice. As usual, entry and exit conditions aren't that hard to write; it's loop invariants that are hard.

  /*@ loop_invariant
    @  (\forall int i; 0<=i && i<stackSize-4; 
    @             runLen[i] > runLen[i+1] + runLen[i+2])
    @  && runLen[stackSize-4] > runLen[stackSize-3])
    @*/
It's surprising how close their notation is to our Pascal-F verifier from 30 years ago.[1] Formal verification went away in the 1980s because of the dominance of C, where the language doesn't know how big anything is. There were also a lot of diversions into exotic logic systems (I used to refer to this as the "logic of the month club"). The Key system is back to plain old first-order predicate calculus, which is where program verification started in the 1970s.

For the invariant, you have to prove three theorems: 1) that the invariant is true the first time the loop is executed, given the entry conditions, 2) that the invariant is true for each iteration after the first if it was true on the previous iteration, and 3) that the exit condition is true given that the invariant is true on the last iteration. You also have to prove loop termination, which you do by showing that a nonnegative integer gets smaller on each iteration. (That, by the way, is how the halting problem is dealt with in practice.) #2 is usually the hardest, because it requires an inductive proof. The others can usually be handled by a simple prover. There's a complete decision procedure by Oppen and Nelson for theorems which contain only integer (really rational) addition, subtraction, multiplication by constants, inequalities, subscripts, and structures. For those, you're guaranteed a proof or a counterexample. But when you have an internal quantifier (the "forall int i") above, proof gets harder. Provers are better now, though.

A big practical problem with verification systems is that they usually require a lot of annotation. Somebody has to write all those entry and exit conditions, and it's usually not the original programmer. A practical system has to automate as much of that as possible. In the example shown, someone had to tell the system that a function was "pure" (no side effects, no inputs other than the function arguments). That could be detected automatically. The tools have to make the process much, much easier. Most verification is done by people into theory, not shipping products.

[1] http://www.animats.com/papers/verifier/verifiermanual.pdf


This is an excellent use case of formal verification. I can definitely see the merit in using tools like this in my code.

They mention KeY http://www.key-project.org/ . Is anyone using this here? Are there any good resources on it except for the official site (and this blog post)?


It is in use for undergraduate courses in the universities that develop it, and I used it too. It ships with small training problems. What would you like to know? There are many documents on JML. The prover works for you, you just left click logic expressions in KeY and you can prove large parts by point-and-click. It can also finish proofs for you if it's trivial. Often, you only need to specify the loop invariants.

I'd recommend opening up KeY, loading one of the trivial examples first: Contraposition. This will be a quick reminder on the logic concepts like implication, but you pretty much cannot screw it up. Try to understand why the proof tree branches when you choose certain steps.

Afterwards, try something else from "Getting started" like the examples that the proof searcher can prove (for example SumAndMax) and exploring the proof tree, and trying out yourself from scratch. The automatic proofs are not always pretty, so it's more for to do the manual work first. KeY will only let you do valid proof steps, so you learn quickly how proofs work.


Hi, I will ask the authors of the blog post and corresponding academic paper about additional KeY resources and follow-up.


Thanks and thanks for the interesting read. I've been looking for real uses of formal verification for a long time. I've played a lot with code contracts in C# and I've played some with languages like Eifel - the advantage of this approach is that it's static and it performs actual proof rather than enforcement.

These forms of formal verification could really help with building robust software and if someone makes them easy enough to use I can definitely see them as useful alongside if not instead of unit tests.


Everyone here is talking about Timsort, but you should also check out the materials they've published about the KeY project, which they used to carry out this verification. http://www.key-project.org/~key/eclipse/SED/index.html describes their "symbolic execution debugger", which lets you debug any Java code even if you don't know all the inputs (by simply taking the input as a symbolic value). The screencast is very accessible.


Spark 1.1 switched default sorting algorithm from quicksort to TimSort as well in both the map and reduce phases- https://databricks.com/blog/2014/10/10/spark-petabyte-sort.h...


Where can I get that PS1 from the shell in the video? It appears to have a green check mark if the previous command returned 0 otherwise some red symbol. Looks cool



The default oh-my-zsh theme does something similar - first character in the PS1 is `➜`, in grey if the previous command returned 0, red otherwise. It just checks `$?`, I assume.


This is a good example of why formal verification is incredibly useful. I tried to invest some time in learning some of the proof verification languages and tools, but so far I wasn't too successful.

It looks like that to formulate a proof, I always have to rewrite the algorithm/problem first in the tool's language, which is often not easy. I could see myself making mistakes in writing the proof just as well as I do when I'm programming.

Proof validation is also tricky. Coq isn't fully automatic as I initially was expecting. I actually used "prover9" which is first-order only, but does automatic validation. I guess Coq is really useful when you need to understand the proof and interactive validation can guide you, whereas prover9 could help with automation.

The thing is, it's still too much work, even for seemingly simple algorithms, to write a proof in either system in order to improve on the current situation of unit testing (that is: if I wanted to get something with more intrinsic value than a test case).

Formally verified languages are nice, but for a gazillion of reasons you still need to verify what's running currently.


With tools like Coq, you may even have the benefit of extracting an implementation from your proof! Some assembly may be required though, and extraction only works to functional languages.


The authors have posted a follow-up posting: about KeY - the tool used to prove the bug in TimSort - http://envisage-project.eu/key-deductive-verification-of-sof...


And I always thought sort functions are always tested with millions of randomly generated sequences and the results are compared with the results of other implementations known to be good.


That was done. It's talked about here:

http://svn.python.org/projects/python/trunk/Objects/listsort...

The bug would only be triggered by generating a truly massive array (the implementation mentions that it will work for up to 2^64 elements: http://svn.python.org/projects/python/trunk/Objects/listobje... search for MAX_MERGE_PENDING).


As mentioned by another commenter, the bug is that it's (for python) documented to work for 2^64 elements, but "only" works for 2^49. 2^49 is still pretty big... note that for 64-bit integers, 2^49 integers is 2^(49+3)=2^52 bytes... or 4 petabytes of raw data. Even if you're sorting single bits (one and zero) it's quite a bit of data to chew through.

[ed: Hm, that's not quite right. A 64-bit integer is 2^6, so that should be 2^(49+6)=2^55, or 32 petabytes). I think :-) ]


Sure, but what is the probability that you generate exactly the type of sequence needed to trigger this bug.


Very often random sequences are very different from sequences encountered in the wild.


As long as your < operator works, you don't really need another sort implementation to compare with.

Also, it seems like the bug is only truly present in the Java version due to a slightly different implementation, even though the original Python (and the "fixed" Java) is technically incorrect.


> As long as your < operator works

That's not true. sort() can be broken in many different ways.

For instance

  sort([3, 7, 5]) -> [1, 2, 3]
The result it sorted and has the correct length, but it's wrong.


Terasort handles this by taking a sum of md5sums for each value. The teravalidate program then checks that all the items are in order and that the checksum is the same.

It's so unlikely that could generate data that passes teravalidate but not really be correct that it would probably be an important work of computational science in it's own right.


The parametricity (edit: thanks imaginenore, I misremembered the name) theorem implies that a generic sort function written in a safe subset of the language could never do that though.


Is that the idea that since the code should neither invent new values, nor duplicate values, it will always end up returning a permutation?

It seems like inventing values not being possible is fine, but duplicating a value seems very plausible - what safe subset of Java would ensure that doesn't happen?


Maybe http://types.cs.washington.edu/checker-framework/current/che... ? (I've used other parts of the checkers framework, but not that one yet)


The parameterization theorem doesn't imply such thing.

http://en.wikipedia.org/wiki/Smn_theorem


That's different from what is being referred to, which is (very briefly) described at http://en.wikipedia.org/wiki/Parametricity#History.

Edit: Ah, I didn't realize that there was an edit involved in the post you were referring to. At any rate, things are hopefully now clear to all.


This is bad-ass. Thanks for taking the time and investigating something that so many overlook yet use every day (myself included)



anyone know the list of bugs for this?

https://bugs.openjdk.java.net/browse/JDK-8072909

It seems they havn't submitted to any other trackers, which is a bit unfortunate


There is also http://bugs.java.com/view_bug.do?bug_id=8011944, where IIUC the suggested "fix" was to use a VM switch to enable the old (and slower) sorting...


I'm surprised that modern implementations don't use something like quicksort, since it has linear space requirements (sorts in-place) and good constants on average for its n*log2(n) running time.


Quicksort has poor worst-case behaviour, so it's easy to carry out a DoS attack on network services that use it. Language maintainers don't like restricting their library to input from friendly users, at least not when there are other algorithms that work reasonably with unfriendly input.


Isn't the malicious attack fought by just picking a random pivot?


There are still inputs that will make it go quadratic, that just obscures it slightly. A proper fix is to fall back on a different sort if it looks like quicksort is doing that: https://en.wikipedia.org/wiki/Introsort


> There are still inputs that will make it go quadratic

Can you elaborate on this? To force quicksort into a quadratic running time, you need to ensure that each pivot splits off a bounded number of elements (e.g. no more than three, or no more than twenty million) from the rest of the list. If the pivot is being chosen at random, then it looks to me like the guarantee you'd need to make is "every single element of this list [because any of them might be chosen] is larger, and smaller, than no more than k other elements of the list". But as the size of the list grows, that condition forces it to be mostly composed of the same element repeated over and over again, which is really easy to sort, and in particular is really easy for quicksort to handle.


Depends on how you interact with the quicksort, really.

If you have control of the comparison routine, there's Doug McIlroy's classic "quicksort killer"[0]. If you have information about the state of the random number generator used to pick the pivot, then you can do the same without having to actually interact through the comparison routine. Many libraries use an LCG[1] with 32 to 64 bit state, which is trivial to reconstruct through a small sample of its output not much longer than the internal state.

I don't know about you, but requiring my sort() routine to have access to a cryptographically secure random number generator doesn't seem right; I much prefer any algorithm (e.g. HeapSort, Mergesort or TimSort) that can guarantee n\*log(n) behavior with deterministic (and especially, no secure random generator requirement!) behavior.

[0] http://www.cs.dartmouth.edu/~doug/mdmspe.pdf

[1] http://en.wikipedia.org/wiki/Linear_congruential_generator


You might be interested in my new sorting algorithm, Pattern-defeating Quicksort (pdqsort): https://github.com/orlp/pdqsort

It's fully deterministic, as fast as Quicksort in the average case, O(n) in the best case (all equal, but it scales smoothly as the number of unique elements decreases), is optimized for common inputs like ascending/descending (both O(n)) and uses a novel idea of guaranteeing O(n log n) worst case, that guarantees the worst case is no slower than three times the average case, assuming heapsort is twice as slow as Quicksort.

I'm still tweaking the final details and working on the paper, so it's not ready for a full release yet, but a sneak peek here and there is nice :)

This is a benchmark comparing pdqsort against introsort (std::sort), heap sort and timsort: http://i.imgur.com/TSvXnG5.png .


Which timsort implementation did you compare to? (your profile.cpp on github doesn't include it).

Heapsort implementation vary widely in their cache coherence. The vast majority are extremely simple but result in essentially complete incoherence. I was once able to speed a heapsort about 2x by rearranging the scan order into a cache-oblivious one (Mostly the heap building part; I'm not aware of a way to make the extraction part cache-friendy).

Heapsorts, quicksorts and bubble sorts have the ability to sort just the "top-n" and stop there (unlike mergesort), which is often useful, and a significat speed up (goes from nlog(n) to n+mlog(n) to get top m, or from n^2 to n times m for bubble sort) - I wish library routines actually provided that as part of a standard interface. Perhaps you could be the torchbearer in your publication?

Personally, I think just about every standard library except APL/J/K got sorting wrong. The primary sort operation should be "compute stable ordering permutation", with "sort this array" (basically only available operation in most libraries) being at most a shortcut when you don't care about the permutation.


I'm sorry, the profiling code for timsort wasn't included in my remote repository yet. I profiled against https://github.com/gfx/cpp-TimSort .

Also, C++ has always had the "top-n" interface called partial sort: http://en.cppreference.com/w/cpp/algorithm/partial_sort .


I wrote a partial quicksort for Newzbin - once I saw how simple it was to add I was surprised it wasn't more widespread: https://github.com/Freaky/pqsort


>pdqsort


OK. But if you have control of the comparison routine, you can easily force mergesort to perform a quadratic number of comparisons by having the comparison function call itself an appropriate number of times. It feels a little unfair to call that a vulnerability in quicksort but not in mergesort.


Control comes in many flavors - e.g. you might be able to influence random element selection but not general code flow. (e.g - you might be able to set the seed before the call to quicksort).

And I only mentioned control because that's what McIlroy's example required to use as-is. But as I already stated, it is enough to know in advance the output of the (pseudo) random used to select a pivot to make it go quadratic, which is usually easy -- unless it is cryptographically secure (I've never seen qsort use one. have you?)

So, no, it is completely fair: There is no way to cause mergesort to go quadratic. However, it is quite easy to cause quicksort to go quadratic if you know how it selects it pivots (whether or not you can directly influence it during runtime or not).


See my bug report on libc++: http://llvm.org/bugs/show_bug.cgi?id=20837 .


But that adversary works by causing the sorting agent to run arbitrary, adversary-supplied code every time it makes a comparison. It has to do that because it invents the values in its list on the fly when it detects them being accessed by the sort (also, in order to detect them being accessed by the sort). That's not really the same thing. I mean, if you want to tie up a process, and you can already make it run arbitrary code that you supply, just give it something like while(1); .

If this adversary were forced to realize all the values it fed to the sorter before the sorter did any work, or if it were unable to supply its own code to the sorter, random pivot selection would be a defense.

edit:

I feel like pointing out that a quicksort implementation could defeat this adversary, without hurting its O(n log n) running time, by just comparing the first element of the sublist it was working with to every other element in the sublist -- and throwing away the results -- and then proceeding as normal. This is O(n) comparisons, which violates the vulnerability criterion of making only O(1) comparisons per call, but doesn't affect the big-O running time at all. What it does do, with an eye to this particular adversary, is realize all the values before doing any sorting work. It still doesn't fix the actual vulnerability the paper identifies, which is that you're running adversary-supplied code. I'm growing to feel like your bug report was frivolous.


What you fail to realize is that this attack is simply an academic proof that shows the libc++ implementation is broken and can have quadratic performance. This is a bug, because the C++ standard mandates a worst case of O(n log n).

What you also don't seem to realize is that this attack merely uses a comparison function to find the worst case. Once the worst case is found you can feed this input to any program using libc++'s std::sort, without comparison function, and trigger the worst case.

So no, this is not frivolous at all.


> What you also don't seem to realize is that this attack merely uses a comparison function to find the worst case. Once the worst case is found you can feed this input to any program using libc++'s std::sort, without comparison function, and trigger the worst case.

This is true iff pivots are selected deterministically. In which case, why did you post it as a response to "how can an adversary force quadratic behavior when pivots are chosen randomly?"


> This is a bug, because the C++ standard mandates a worst case of O(n log n).

Specifically, this is a requirement in C++11. Earlier C++ standards only require it to be average-case O(n log n).


So, another algorithmic complexity bug in libc++. I found one about `std::make_heap` before when testing one of my codes before (http://llvm.org/bugs/show_bug.cgi?id=20161). So much about the bug always being in your own code nowadays.


libc++ doesn't use randomized quicksort so that bug is irrelevant. It is an attack on a clever pivot selection method, but the exploit derives from the fact that the pivot isn't actually random, but deterministically pseudorandom. A truly random quicksort isn't vulnerable (since the pivots will be chosen in a different order every time).

The real problem with random quicksort is that randomness is really, really hard, and good RNGs are slow. Also, timsort is adaptive, while quicksort is not.


Putting a cryptographically secure pseudo-random number generator into a sorting library is a bit awkward, and if it's not a good random number generator, then it's still DoS-able.


Pretend you're the maintainer of OpenJDK. Do you want to go down that road or do you want to change to another algorithm? Remember that randomness can be difficult to find and that the malevolent has a copy of the JDK to use for experiments.

Is Quicksort that much better, or do you just change?


I don't think so.. If you implement quicksort the way I've normally seen it implemented (elements less than pivot to left, others to right), you could just give the array elements that are all the same value.

Then, you would easily get the O(n^2) runtime because your pivot would always be the left most element.


Only if you use a secure random, otherwise you are just obscuring the problem and are still vulnerable.


No, it is fought by switching to heapsort if the recursion depth exceeds 2*log2(n).


Heapsort has poor cache performance though. Its asymptotic complexity depends on the assumption that memory access is O(1), which is not borne out in practice.


This is true, but the worst case of quicksort is so rare that this does not matter for average performance.


It doesn't have to have poor worst-case behavior: https://www.cs.princeton.edu/~rs/talks/QuicksortIsOptimal.pd... (from the author of the Algorithms textbook used at my college).


Correct me if I'm wrong but I got the impression that prof Sedgewick's slides were about quicksort behaviour when the input has many equal keys. That would still result in quadratic runtime against maliciously crafted input.


I may have oversold Sedgewick's claim: he doesn't actually say that a properly implemented quicksort will never go quadradic; only that "The average number of compares per element ... is always within a constant factor of the entropy" and "No comparison-based algorithm can do better" (page 16).

I've never looked into whether a Sedgewick-approved Quicksort can avoid worst case behavior against antiquicksort ( http://www.cs.dartmouth.edu/~doug/aqsort.c , http://www.cs.dartmouth.edu/~doug/mdmspe.pdf ).


You can always pick median element in O(n) time and it makes quicksort's worst case be O(nlogn)


Got it. That definitely makes sense :-)


Timsort is stable, has an O(n) best case and O(n log n) worst case and has specific optimisations for e.g. partially-sorted sequences.

It can also be implemented in-place to reduce space requirements from O(n) to O(log n).

> since it has linear space requirements (sorts in-place)

Quicksort has a O(log n) worst-case space requirement


There is an in-situ quicksort that has the same asymptotic time bounds as the usual kind. It (kind of) encodes stack by shuffling some elements in unsorted data.


I for one appreciate the fact that it's stable. That has definite benefits sometimes.


This is making me wonder if there is a theorem that would imply a stable sort on data that doesn't need it would be less efficient than a comparable unstable sort. There has to be a cost for stability, right?


Not in asymtotic terms, at least.

There exists a stable sorting algorithm that's O(1) axillary space, and O(n log n) worst-case time - wikisort. As both of those are provable lower bounds, this seems false.


TimSort is supposed to have better performance on real world datasets.


Agree. More info: http://en.wikipedia.org/wiki/Timsort

The original mail with the proposal is: http://bugs.python.org/file4451/timsort.txt


Timsort is especially good at reducing the number of comparisons needed by taking advantage of already ordered subsequences in the input. This advantage comes at a cost of more overhead compared to e.g. quicksort. Therefore Timsort is faster if the comparison operations are expensive, quicksort if they are cheap. Java takes advantage of this by using Timsort to sort arrays of objects with a user specified comparison function, and quicksort to sort arrays of primitives. (Or at least it used to.) In Python everything including comparisons is relatively slow so they just use Timsort all the time.


Many years ago, I ran into an intermittent performance problem that was caused by a self-propagating worst case in our quicksort implementation. If you sorted a sequence with one item out of place in an otherwise sorted array, you would get an n-1/1 split, and the pattern would repeat itself on the larger side of the split.

We fixed this by picking a pseudo-random pivot, I think.


> its n*log2(n) running time

If you're going to talk about asymptotic running time, why specify log_2? As soon as you elide a multiplicative constant, all logs are equivalent.


JDK is using quicksort (more like variation of) for sorting primitives. (being not stable doesn't matter for primitives)


Wish they'd put that up on Github or Bitbucket. There are so many things to be learned from that. I don't want to just download things or just use a tool (although it's pretty awesome that they made the tool available.)


What are you looking for?


To watch how change sets are going into the tool. To see what is being done and why as it happens. I find that one of the best ways to learn these things. Especially when it is theory heavy and watching the implementation congele around the theoretical framework.


Sorry to keep following up, but we'd really like to know what we can improve -- into which tool? Are you talking about the KeY tool, or the various libraries they are now catching up on this issue?


Looks like there is a clone on bitbucket, but it hasn't been updated in a while.

https://bitbucket.org/adoptopenjdk


Just a nitpick, but the article makes a mathematical mistake. In the last paragraph of section 1.2, it says

    For performance reasons, it is crucial to allocate as
    little memory as possible for runLen, but still enough to
    store all the runs.  *If the invariant is satisfied by all
    runs, the length of each run grows exponentially (even faster
    than fibonacci: the length of the current run must be
    strictly bigger than the sum of the next two runs lengths).*
However, fibonacci growth is strictly faster than exponential. In fact, this is why n * log(n) is the lower bound on the number of comparisons a comparison-based sorting algorithm must use: because n! is approximately n * log(n).


None of this is true.

>fibonacci growth is strictly faster than exponential

There is even an explicit formula for F(n) that shows fibonacci growth is exactly exponential[1]

>In fact, this is why nlog(n) is the lower bound on the number of comparisons a comparison-based sorting algorithm

It's true that lower bound for comparison-based sorting is nlog(n) but it has nothing to do with fibonacci

>n! is approximately nlog(n)

Stirling's formula[2] says otherwise (that n! is exponential)

[1] http://en.wikipedia.org/wiki/Fibonacci_number#Closed-form_ex...

[2] http://en.wikipedia.org/wiki/Stirling%27s_approximation


How embarrassing! It looks like I flipped around Fibonacci and factorial in my head when writing my reply.

You're completely right that the comparison-based sorting bound has nothing to do with Fibonacci numbers. And in fact log_2(n!) is about n*log_2(n), which is where the comparison-based sorting bound comes from. And Fibonacci growth is absolutely exponential, just with an exponent larger than 2.


> fibonacci growth is strictly faster than exponential.

This is wrong. The easiest intuition for why this is true that I can come up with is that next Fibonacci number is no bigger than double the previous, and so fib(n) <= 2^n.


Fibonacci is n(i) = n(i-1) + n(i-2). Factorial is n! = n * n-1 * n-2 * ... * 2 * 1.

In the second sentence, you might have wanted "lg(n!) is approximately n lg n". That would be a good reason for needing n lg n comparisons to distinguish between the n! permutations.

I have edited this comment as my understanding of the parent comment developed >_>


Thanks for pointing this out! Yes, I meant log_2(n!) is approximately n*log_2(n). I appreciate you spotting the error. :-)


Fibonacci( n ) = ( P^n - ( -P )^-n ) / sqrt(5)

Where P is Golden ratio ( about 1.618 )

So yes it resembles exponential growth but as you can see it is slightly less due to the "- ( -P )^-n" part.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: