> It would appear that Moore’s law provides a disincentive for developing polynomial algorithms. After all, if an algorithm is exponential, why not wait it out until Moore’s law makes it feasible? But in reality the exact opposite happens: Moore’s law is a huge incentive for developing efficient algorithms, because such algorithms are needed in order to take advantage of the exponential increase in computer speed.
Here is why. If, for example, an O(2n) algorithm for Boolean satisfiability (SAT) were given an hour to run, it would have solved instances with 25 variables back in 1975, 31 variables on the faster computers available in 1985, 38 variables in 1995, and about 45 variables with today’s machines. Quite a bit of progress—except that each extra variable requires a year and a half’s wait, while the appetite of applications (many of which are, ironically, related to computer design) grows much faster. In contrast, the size of the instances solved by an O(n) or O(n log n) algorithm would be multiplied by a factor of about 100 each decade. In the case of an O(n2) algorithm, the instance size solvable in a fixed time would be multiplied by about 10 each decade. Even an O(n6) algorithm, polynomial yet unappetizing, would more than double the size of the instances solved each decade. When it comes to the growth of the size of problems we can attack with an algorithm, we have a reversal: exponential algorithms make polynomially slow progress, while polynomial algorithms advance exponentially fast! For Moore’s law to be reflected in the world we need efficient algorithms.
Nitpick, but this is technically wrong. There needs to be a "no faster than" somewhere in there. Merge sort is, for example, included in O(2^n) by the actual definition of big O (vs big theta, which is defined as a tight bound). So there exist algorithms that are O(N) that don't grow linearly.
f(x) is O(g(x)) just means there is some constant K such that f(x) <= Kg(x) for all x beyond some value. This has nothing to do with 'worst case performance'.
Thus, most times people say O(n) on practice, they mean O(n). But most times people define O(n), they cite the Theta(n) definition.
It isn't hard to put an "up to", "no more than" or "or less" at the definition, and it does not make it threatening or hard to read.
 There's probably a small corner in Hell reserved for people that mean O(n) on average but refuse both to say that qualifier and care about what data distribution created that average.
When we say "quicksort is O(g(n))" for some g we also need to specify, or assume, the distribution of the input (i.e. best/average/worst case), and we are free to assume whatever distribution we like. Then, assuming that distribution, the running time of the algorithm is just a function of the input size, and we can use the above definition to make a statement about that function (i.e. what complexity class it is in).
A complete statement looks like "quicksort is O(n^2) in the worst case", which means "for every n, pick the list of length n that takes the longest to sort (i.e. the worst-case input), and the number of steps executed by quicksort will be no more than Kn^2 as long as n is large enough, for some fixed K".
The difference between O, Theta and Omega only concerns what functions the set contains - in particular, Theta(g(x)) is a subset of O(g(x)). This is a separate issue to whether we are talking about best/average/worst case input.
In my experience, outside of technicality-penis-measuring contests (like this thread, or a few particular minutes during the class I was leading a few hours ago), every time someone says O(foo), they're describing an algorithm that they believe to be Theta(foo). On the other hand, I've never seen someone _define_ big-Oh and accidentally define big-Theta, which is unsurprising, given that the big-Theta definition is literally twice as much work to write out as the big-Oh.
Relatedly, Theta is a highly useful concept, except for that fact that it's much easier to type O(n) and also mildly faster to whiteboard it (relative to Theta(n)). Why do you think it's almost useless? Mind you, I don't think it's useful enough to care about the fact that most non-specialists can't keep track of the difference between these complexity classes.
Also, I question your claims about Hell, and what I take to be your implication that you disapprove of both saying using big-Oh to refer to average-case without saying so, and of saying "average" without specifying a distribution or an aggregation function.
To the former, that's how people use the language, and meaning average-case is 100% as reasonable as assuming that big-Oh always means worst-case. Big-Oh is a system for dividing arbitrary functions into nested equality classes (well, I guess maybe the right name for this is inequality classes?), and <<average-case resource consumption vs input size>> is just as useful an input function as <<worst-case resource consumption vs input size>>. There's no technical reason, and no pragmatic reason, to insist otherwise.
As for distribution, if someone doesn't specify distribution, it means they're implying that it's true across all the reasonable distributions that they can think would apply, just like it always means something like that when a human leaves out a detail. Every single time I've looked into a case like this, the assumed distribution is a uniform random distribution of all possible inputs. There are always things left unsaid, why claim that this particular elision is a sin?
From higher up in the article. You picked the O(N) section description to pick a nit with the whole article when it covers exactly what you want it to earlier.
Espcially the graph which hammers home how quickly things go wrong.
The chart says you can search a Cartesian tree in O(log n) expected time, but I don't see how that's possible given that there's no ordering between a node's children.
Also, it's possible to do mergesort in O(1) space; in fact, it's one of the exercises in TAOCP, if I recall correctly. But the algorithm is so complicated that nobody uses it in practice.
> An example of an O(2^N) function is the recursive calculation of Fibonacci numbers
No, the naive recursive evaluation of the Fibonacci sequence has complexity O(1.618^N) (or just O(Fib(n)). It is unequal to O(2^N) because the base of the power is different.
This example is helpful: https://en.wikipedia.org/wiki/Big_O_notation#Use_in_computer...
Informally we tend to use big-O to mean big-theta which only adds to the confusion.
For example Quicksort is O(n^2) but Omega(nlogn) it is neither Theta(nlogn) nor Theta(n^2).
You probably meant that informally we just assume that the stated bound is as tight as possible.
No, this is flat out wrong. Big O is an upper bound on an asymptotic growth rate. Big Omega is a lower bound on the asymptotic growth rate. Big Theta is a tight bound on the asymptotic growth rate. These are independent to the average case, best case, or worst case run time of a given algorithm.
Quicksort has an average run time of Theta(n lg n). Equivalently, its average run time is O(n lg n) and Omega(n lg n). It has a worst case run time of Theta(n^2). Equivalently, its worst case run time is O(n^2) and Omega(n^2).
> Actually we don't use it informally as big-Theta,
That f is in O(g(n)) means there exist constants C and N_0 such that for all x in S, provided that n(x) >= N_0, it is true that |f(x)| <= |C g(n(x))|.
And that f is in Omega(g(n)) means there exist C > 0 and N_0 such that for all x in S, provided that n(x) >= N_0, it is true that |f(x)| >= |C g(n(x))|.
They are independent only if you explicitly state that you are computing them for best/average/worst case independently. Is it commonly done? Sure. But there is nothing in the definition that forces us to do that.
If the things were as you state would you have any use for Omega and Omicron then? Wouldn't just Theta suffice?
That's true. You're right.
> If the things were as you state would you have any use for Omega and Omicron then? Wouldn't just Theta suffice?
Couldn't there be cases where we don't have a known tight asymptotic bound but do have an upper and/or lower bound? And although it's an abuse of notation, you do often see big-O used in place of Theta. From CLRS:
"In the literature, we sometimes find O-notation informally describing asymptotically tight bounds, that is, what we have defined using Theta-notation."
The reason is that f(n) = O(g(n)) means that for some constant k and integer N, if N < n then |f(n)| < kg(n). In other words "grows no faster than a constant times". However when you've got exponential growth the slightest of things can cause the exponent to be slightly different, or there to be a small polynomial factor.
That was the case in his example. (The exponent was different.)
One I've used is sorting with a deck of cards. Take a single suit, A-K, and have people sort it using bubble or insertion or other sorts. It's only 13 cards, but the difference even between some of the O(n^2) algorithms becomes obvious when swapping is postponed until the latest stage versus earliest.
Then you can also demonstrate radix sort. One of the most underrated sorting algorithms, in my opinion. A lot of people focus on the (potentially) large constant. But it can be appropriate, or may be reduced depending on your use-case. And it's a fantastic sort when you can't give a neat numeric quality for sorting and have to sort over multiple dimensions. Again with cards, you can do two passes. One with buckets by value, and one with buckets by suit. After the second, you collect each of the 4 stacks and the deck is sorted.
Cards, phone books, recipes, these are things that most people encounter (ok, maybe not phone books anymore), in their lives. They offer great aids to demonstrating CS concepts to either new students or people who have only a passing curiosity (or none but they shouldn't have asked :P).
I've skipped some details here: string comparisons can finish early, and maybe N should be the size of the phone book rather than the number of names in the phone book -- but even if you consider these factors, the runtime complexity is still more than O(log N).
Try with mergesort and see how it ends being nlogn
(In this case the problem is probably made worse by the mental interpretation of "NP" as "Not Polynomial", when it really means "Nondeterministic machine can solve in Polynomial time", and if a deterministic machine can solve something in polynomial time, a nondeterministic machine can do so as well!)
I like how Rob Bell's piece has headings with the most common time costs--sorta makes it easier to see at a glance what you're going to get, and I imagine gives folks a quick sense of, "Right, I've seen that! Been /wondering/ what that means."
Argh, I hate this every time I see Big O notation covered.
Big O != performance. If you have an O(N) algorithm that walks the data set in the order that it's laid out in memory(assuming contiguous data) it will beat your O(NlogN) and sometimes even O(logN).
meant to omit nlogn, that's what I get for any early morning rant pre-coffee.
Radix Sort is the classic example I always bring up. On machine word size keys, with a separate pointer look-up table(to get final value) it will beat QSort, MergeSort and all the other NlogN sorts by 10-50x. This includes having to walk the data 3-4 times depending on how you want to split the radix to line up with cache sizes.
Friends don't let Friends use Big O to describe absolute performance.
The thing that Big-O knowingly hides is the constant factor. For example, some algorithms are 2N and some are 1000N. For large enough N, this may not matter, but for any reasonably sized N, it does.
In practice, of course that Big-O is not sufficient to describe absolute algorithm performance, as it depends on so many factors such as implementation, computer architecture, relative size of data, etc. But, when looking at an algorithm and knowing its Big-O complexity can often be a useful input to your decision on whether to use it. So I disagree with this Big-O bashing...
Not sure what you're trying to say with this particular statement. Of course it will. NLogN grows way faster than N.
And Radix sort is of course drastically faster than comparison sorts if the word size is smaller than logN regardless of the particularities of the implementation.
People don't and shouldn't use big O to describe absolute performance but it's a great place to start and you can't reach the level of implementation details performance tuning if you can't understand or formalize the basics.
But you do have a point overall.
In practice, performance may be different. Big O assumes a particular 'virtual machine' with a particular word size and a particular time it takes to execute an instruction.
My meta point is the constant factors that Big O throws out usually end up dominating real-world performance.
This does not even start on the reality that memory access really is O(n^(1/3)), addition is O(n) and a lot of other things we assume are O(1) are not.
The point I was trying(and failed) to drive home is that it matters how you walk your N not just what the N is.
If you've got to do a pointer indirection for each N rather than a linear read that can end up dominating the non-constant factors in a lot of cases.
> If you've got to do a pointer indirection for each N rather than a linear read that can end up dominating the non-constant factors in a lot of cases.
Huh? Pointer indirection and cache miss for each element is already non-constant (you have to do it N times) so I don't understand what it means that this "can end up dominating the non-constant factors".
Take the linked-list case for example. Let's say the allocator you have is a small-block allocator, of which most good malloc implementations use for linked-list style allocations.
Since the SBA allocates in chunks your cache locality(driving your "constant-factor" performace) isn't linear any more. It'll change in jumps & starts as you break chunk boundraries. Also another class allocating in-between your linked list allocations(really common) will break up the chunk locality and now your real-world O(N) performance no longer scales linearly.
There's nothing wrong with that sentence. No absolutes.
Radix sort is a linear time sort because the hardware (random access memory) is built to allow constant-time lookup of memory. Radix sort wouldn't be linear time on a sequential access memory system, like tape, but merge sort would still be O(NlgN) on such a system.
Under this assumption, a correctly implemented quicksort also runs in O(N) time.
BigO has nothing to do with performance even though that what we most use it for.
Its simply a classification of growth characteristics, relationship between two functions.