if (n > 0 && runLen[n-1] <= runLen[n] + runLen[n+1]
|| n-1 > 0 && runLen[n-2] <= runLen[n] + runLen[n-1])
The programmer also has to do mental arithmetic to check the bounds. Bounds checks can be written so that the largest index subtracted is also the value tested:
if (n >= 1 && runLen[n-1] <= runLen[n] + runLen[n+1]
|| n >= 2 && runLen[n-2] <= runLen[n-1] + runLen[n])
Or do you think the only lesson here is "Never fully trust anything that hasn't been formally verified"?
/* The maximum number of entries in a MergeState's pending-runs stack.
* This is enough to sort arrays of size up to about
* 32 * phi ** MAX_MERGE_PENDING
* where phi ~= 1.618. 85 is ridiculouslylarge enough, good for an array
* with 2**64 elements.
Nitpick: the idiomatic term is “buggy”, not “bugged”. Something is “bugged” if it has a hidden microphone in it transmitting to spies, not if it contains a software “bug”.
I also wonder if a static analysis tool would have caught the bug. Are there any good static analysis tools for Java or Python that might have caught the bug without the overhead of writing the formal proof?
There has been some research into contracts inference, but with very limited results. Even if you added all the pre- and post-conditions manually, I'm not sure loop invariants could be inferred (especially as they can get pretty complex).
It's worth noting that the formal verification method brought to light the root cause, where prior attempts at fixing this bug in TimSort had merely made it less common.
Would codeSonar or PVS Studio actually have provided the insight, in "O(1)", required to fix the root cause? I'm not familiar with those tools, but I sincerely doubt it.
I wouldn't worry about an extra couple of bytes nearly as much as I would worry about changing the behaviour of a function used in an "astronomical number of program", so this looks like a pretty reasonable and conservative choice, at least for an immediate patch.
My guess is that they ran a JMH benchmark with each proposed fixed, and picked the better one. In either case, I doubt that makes much difference.
I'd really hope that the JDK maintainers benchmarked against real-world programs before deciding which fix to use. My intuition is that the set of programs where increasing the memory usage matters is tiny, but the effect is large for them - it's the set of programs where the working set is greater than the L1 cache size. Equality comparisons aren't free either, they can often cause branch mispredicts. But the point is that performance tuning on real-world programs is very counterintuitive, and so the best way to do it is to run some actual numbers on actual programs.
I do agree with your analysis otherwise.
Really shines as an example of how important proof is in computer science.
I am not sure why these kind of testing methods aren't used more often in other communities, since it makes it really easy to catch corner cases; in the Haskell world, at least, using QuickCheck is somewhat pervasive.
EDIT: I have never done this before, but could anyone explain why I am being downvoted? I wasn't making the claim that this is a substitute for a formal proof, I was merely adding this information to the discussion since it seemed relevant.
It may have indeed found a bug in TimSort, since it's more apt at exercising branches, but I think AFL is C/C++ only.
Sadly i found it to be no more than that, cool. On real life code it's not useful. Real code usually has a lot of mutable state, this tool works best with pure functions. Real code does pretty complex things like parsing xml, that takes too much time to analyze for these tools. Bugs in real code are not always exceptions or error codes returned, maybe you assigned foo.x to bar.y instead of bar.x, maybe the sort doesn't crash but your numbers came out unsorted, how would the tool know that's a fault? Real bugs happen across threads or even modules running on different machines. And the list goes on. Only if you have functions that are very pure and mostly work with simple data types, few strings and lists, it can be good to use as startng point for input to a test suite but doesn't replace it fully.
Listing my projects that us QuickCheck would be cheating, so I'll link projects that aren't my own. The most impressive, by far, is this linear algebra library: https://github.com/japaric/linalg.rs/tree/master/tests --- Just pick a file at random and look for `#[quickcheck]`. :-)
Here's some fancy testing for a "concurrent, lock-free, linearizable, insert-only, sorted map": https://github.com/danburkert/pawn/blob/1afe50c618a733de63b9...
And OK, I'll stop now, but you can see more here: https://github.com/search?q=extern+crate+quickcheck&type=Cod...
I find interesting that this Haskell version of Timsort also exhibits the bug, despite Haskell having a type system more advanced than Python or Java. This is a reminder that type checking is not a magic bullet, and is hardly enough to prove a program correctness. Formal proofs and whole program static analysis are still necessary, with or without an advanced type system:
And sorry, I have downvoted you by mistake when I wanted to reply.
It's very minimal, but the chapter outlines what you'd need to do to make it more complete and get it on par with the Haskell version. There are also a few versions on GitHub too.
I'd certainly prefer proofs over unit tests. However, I don't understand formal proof systems sufficiently well to know whether these they would work in terms of your typical "app" that makes RPC calls, DB changes, and generally has lots of moving parts and statefulness.
Most of these tools are either still in a mostly-academic setting (where "documentation = conference paper"), or do not have enough funding to pay for the development of more user-friendly features and extensive documentation. But with the ever-increasing security issues receiving media attention lately, we can hope more funding will allow these tools to reach a more mainstream status.
By the way, could you give an example of a small program that you would consider "real"? Just to have an idea of its size and complexity.
You mention glibc, and it would be great to prove that (for example) 'qsort' is correct. That wouldn't be entirely trivial:
For an example of a larger program, I'd like to prove various invariants of C programs, such as that 'reply_with_error' is called exactly once on every error path in this program:
Well, time to patch it there too.
@ (\forall int i; 0<=i && i<stackSize-4;
@ runLen[i] > runLen[i+1] + runLen[i+2])
@ && runLen[stackSize-4] > runLen[stackSize-3])
For the invariant, you have to prove three theorems: 1) that the invariant is true the first time the loop is executed, given the entry conditions, 2) that the invariant is true for each iteration after the first if it was true on the previous iteration, and 3) that the exit condition is true given that the invariant is true on the last iteration. You also have to prove loop termination, which you do by showing that a nonnegative integer gets smaller on each iteration. (That, by the way, is how the halting problem is dealt with in practice.) #2 is usually the hardest, because it requires an inductive proof. The others can usually be handled by a simple prover. There's a complete decision procedure by Oppen and Nelson for theorems which contain only integer (really rational) addition, subtraction, multiplication by constants, inequalities, subscripts, and structures. For those, you're guaranteed a proof or a counterexample. But when you have an internal quantifier (the "forall int i") above, proof gets harder. Provers are better now, though.
A big practical problem with verification systems is that they usually require a lot of annotation.
Somebody has to write all those entry and exit conditions, and it's usually not the original programmer. A practical system has to automate as much of that as possible. In the example shown, someone had to tell the system that a function was "pure" (no side effects, no inputs other than the function arguments). That could be detected automatically. The tools have to make the process much, much easier. Most verification is done by people into theory, not shipping products.
They mention KeY http://www.key-project.org/ . Is anyone using this here? Are there any good resources on it except for the official site (and this blog post)?
I'd recommend opening up KeY, loading one of the trivial examples first: Contraposition. This will be a quick reminder on the logic concepts like implication, but you pretty much cannot screw it up. Try to understand why the proof tree branches when you choose certain steps.
Afterwards, try something else from "Getting started" like the examples that the proof searcher can prove (for example SumAndMax) and exploring the proof tree, and trying out yourself from scratch. The automatic proofs are not always pretty, so it's more for to do the manual work first. KeY will only let you do valid proof steps, so you learn quickly how proofs work.
These forms of formal verification could really help with building robust software and if someone makes them easy enough to use I can definitely see them as useful alongside if not instead of unit tests.
It looks like that to formulate a proof, I always have to rewrite the algorithm/problem first in the tool's language, which is often not easy. I could see myself making mistakes in writing the proof just as well as I do when I'm programming.
Proof validation is also tricky. Coq isn't fully automatic as I initially was expecting. I actually used "prover9" which is first-order only, but does automatic validation. I guess Coq is really useful when you need to understand the proof and interactive validation can guide you, whereas prover9 could help with automation.
The thing is, it's still too much work, even for seemingly simple algorithms, to write a proof in either system in order to improve on the current situation of unit testing (that is: if I wanted to get something with more intrinsic value than a test case).
Formally verified languages are nice, but for a gazillion of reasons you still need to verify what's running currently.
The bug would only be triggered by generating a truly massive array (the implementation mentions that it will work for up to 2^64 elements: http://svn.python.org/projects/python/trunk/Objects/listobje... search for MAX_MERGE_PENDING).
[ed: Hm, that's not quite right. A 64-bit integer is 2^6, so that should be 2^(49+6)=2^55, or 32 petabytes). I think :-) ]
Also, it seems like the bug is only truly present in the Java version due to a slightly different implementation, even though the original Python (and the "fixed" Java) is technically incorrect.
That's not true. sort() can be broken in many different ways.
sort([3, 7, 5]) -> [1, 2, 3]
It's so unlikely that could generate data that passes teravalidate but not really be correct that it would probably be an important work of computational science in it's own right.
It seems like inventing values not being possible is fine, but duplicating a value seems very plausible - what safe subset of Java would ensure that doesn't happen?
Edit: Ah, I didn't realize that there was an edit involved in the post you were referring to. At any rate, things are hopefully now clear to all.
It seems they havn't submitted to any other trackers, which is a bit unfortunate
Can you elaborate on this? To force quicksort into a quadratic running time, you need to ensure that each pivot splits off a bounded number of elements (e.g. no more than three, or no more than twenty million) from the rest of the list. If the pivot is being chosen at random, then it looks to me like the guarantee you'd need to make is "every single element of this list [because any of them might be chosen] is larger, and smaller, than no more than k other elements of the list". But as the size of the list grows, that condition forces it to be mostly composed of the same element repeated over and over again, which is really easy to sort, and in particular is really easy for quicksort to handle.
If you have control of the comparison routine, there's Doug McIlroy's classic "quicksort killer". If you have information about the state of the random number generator used to pick the pivot, then you can do the same without having to actually interact through the comparison routine. Many libraries use an LCG with 32 to 64 bit state, which is trivial to reconstruct through a small sample of its output not much longer than the internal state.
I don't know about you, but requiring my sort() routine to have access to a cryptographically secure random number generator doesn't seem right; I much prefer any algorithm (e.g. HeapSort, Mergesort or TimSort) that can guarantee n\*log(n) behavior with deterministic (and especially, no secure random generator requirement!) behavior.
It's fully deterministic, as fast as Quicksort in the average case, O(n) in the best case (all equal, but it scales smoothly as the number of unique elements decreases), is optimized for common inputs like ascending/descending (both O(n)) and uses a novel idea of guaranteeing O(n log n) worst case, that guarantees the worst case is no slower than three times the average case, assuming heapsort is twice as slow as Quicksort.
I'm still tweaking the final details and working on the paper, so it's not ready for a full release yet, but a sneak peek here and there is nice :)
This is a benchmark comparing pdqsort against introsort (std::sort), heap sort and timsort: http://i.imgur.com/TSvXnG5.png .
Heapsort implementation vary widely in their cache coherence. The vast majority are extremely simple but result in essentially complete incoherence. I was once able to speed a heapsort about 2x by rearranging the scan order into a cache-oblivious one (Mostly the heap building part; I'm not aware of a way to make the extraction part cache-friendy).
Heapsorts, quicksorts and bubble sorts have the ability to sort just the "top-n" and stop there (unlike mergesort), which is often useful, and a significat speed up (goes from nlog(n) to n+mlog(n) to get top m, or from n^2 to n times m for bubble sort) - I wish library routines actually provided that as part of a standard interface. Perhaps you could be the torchbearer in your publication?
Personally, I think just about every standard library except APL/J/K got sorting wrong. The primary sort operation should be "compute stable ordering permutation", with "sort this array" (basically only available operation in most libraries) being at most a shortcut when you don't care about the permutation.
Also, C++ has always had the "top-n" interface called partial sort: http://en.cppreference.com/w/cpp/algorithm/partial_sort .
And I only mentioned control because that's what McIlroy's example required to use as-is. But as I already stated, it is enough to know in advance the output of the (pseudo) random used to select a pivot to make it go quadratic, which is usually easy -- unless it is cryptographically secure (I've never seen qsort use one. have you?)
So, no, it is completely fair: There is no way to cause mergesort to go quadratic. However, it is quite easy to cause quicksort to go quadratic if you know how it selects it pivots (whether or not you can directly influence it during runtime or not).
If this adversary were forced to realize all the values it fed to the sorter before the sorter did any work, or if it were unable to supply its own code to the sorter, random pivot selection would be a defense.
I feel like pointing out that a quicksort implementation could defeat this adversary, without hurting its O(n log n) running time, by just comparing the first element of the sublist it was working with to every other element in the sublist -- and throwing away the results -- and then proceeding as normal. This is O(n) comparisons, which violates the vulnerability criterion of making only O(1) comparisons per call, but doesn't affect the big-O running time at all. What it does do, with an eye to this particular adversary, is realize all the values before doing any sorting work. It still doesn't fix the actual vulnerability the paper identifies, which is that you're running adversary-supplied code. I'm growing to feel like your bug report was frivolous.
What you also don't seem to realize is that this attack merely uses a comparison function to find the worst case. Once the worst case is found you can feed this input to any program using libc++'s std::sort, without comparison function, and trigger the worst case.
So no, this is not frivolous at all.
This is true iff pivots are selected deterministically. In which case, why did you post it as a response to "how can an adversary force quadratic behavior when pivots are chosen randomly?"
Specifically, this is a requirement in C++11. Earlier C++ standards only require it to be average-case O(n log n).
The real problem with random quicksort is that randomness is really, really hard, and good RNGs are slow. Also, timsort is adaptive, while quicksort is not.
Is Quicksort that much better, or do you just change?
Then, you would easily get the O(n^2) runtime because your pivot would always be the left most element.
I've never looked into whether a Sedgewick-approved Quicksort can avoid worst case behavior against antiquicksort ( http://www.cs.dartmouth.edu/~doug/aqsort.c , http://www.cs.dartmouth.edu/~doug/mdmspe.pdf ).
It can also be implemented in-place to reduce space requirements from O(n) to O(log n).
> since it has linear space requirements (sorts in-place)
Quicksort has a O(log n) worst-case space requirement
There exists a stable sorting algorithm that's O(1) axillary space, and O(n log n) worst-case time - wikisort. As both of those are provable lower bounds, this seems false.
The original mail with the proposal is: http://bugs.python.org/file4451/timsort.txt
We fixed this by picking a pseudo-random pivot, I think.
If you're going to talk about asymptotic running time, why specify log_2? As soon as you elide a multiplicative constant, all logs are equivalent.
For performance reasons, it is crucial to allocate as
little memory as possible for runLen, but still enough to
store all the runs. *If the invariant is satisfied by all
runs, the length of each run grows exponentially (even faster
than fibonacci: the length of the current run must be
strictly bigger than the sum of the next two runs lengths).*
>fibonacci growth is strictly faster than exponential
There is even an explicit formula for F(n) that shows fibonacci growth is exactly exponential
>In fact, this is why nlog(n) is the lower bound on the number of comparisons a comparison-based sorting algorithm
It's true that lower bound for comparison-based sorting is nlog(n) but it has nothing to do with fibonacci
>n! is approximately nlog(n)
Stirling's formula says otherwise (that n! is exponential)
You're completely right that the comparison-based sorting bound has nothing to do with Fibonacci numbers. And in fact log_2(n!) is about n*log_2(n), which is where the comparison-based sorting bound comes from. And Fibonacci growth is absolutely exponential, just with an exponent larger than 2.
This is wrong. The easiest intuition for why this is true that I can come up with is that next Fibonacci number is no bigger than double the previous, and so fib(n) <= 2^n.
In the second sentence, you might have wanted "lg(n!) is approximately n lg n". That would be a good reason for needing n lg n comparisons to distinguish between the n! permutations.
I have edited this comment as my understanding of the parent comment developed >_>
Where P is Golden ratio ( about 1.618 )
So yes it resembles exponential growth but as you can see it is slightly less due to the "- ( -P )^-n" part.