
A beginner's guide to Big O notation (2009) - g4k
https://rob-bell.net/2009/06/a-beginners-guide-to-big-o-notation/
======
krat0sprakhar
On the topic of efficient algorithms, I recently read a nice note in an
Algorithms textbook -

> _It would appear that Moore’s law provides a disincentive for developing
> polynomial algorithms. After all, if an algorithm is exponential, why not
> wait it out until Moore’s law makes it feasible? But in reality the exact
> opposite happens: Moore’s law is a huge incentive for developing efficient
> algorithms, because such algorithms are needed in order to take advantage of
> the exponential increase in computer speed._

 _Here is why. If, for example, an O(2n) algorithm for Boolean satisfiability
(SAT) were given an hour to run, it would have solved instances with 25
variables back in 1975, 31 variables on the faster computers available in
1985, 38 variables in 1995, and about 45 variables with today’s machines.
Quite a bit of progress—except that each extra variable requires a year and a
half’s wait, while the appetite of applications (many of which are,
ironically, related to computer design) grows much faster. In contrast, the
size of the instances solved by an O(n) or O(n log n) algorithm would be
multiplied by a factor of about 100 each decade. In the case of an O(n2)
algorithm, the instance size solvable in a fixed time would be multiplied by
about 10 each decade. Even an O(n6) algorithm, polynomial yet unappetizing,
would more than double the size of the instances solved each decade. When it
comes to the growth of the size of problems we can attack with an algorithm,
we have a reversal: exponential algorithms make polynomially slow progress,
while polynomial algorithms advance exponentially fast! For Moore’s law to be
reflected in the world we need efficient algorithms._

~~~
fenomas
In the quote, O(2n) and O(n6) are meant to be O(2^n) and O(n^6), right?

~~~
krat0sprakhar
Yup, that's correct. Sadly, I can't update the comment now :(

------
mattlutze
Looks like someone finally got it to the front page:

[https://news.ycombinator.com/from?site=rob-
bell.net](https://news.ycombinator.com/from?site=rob-bell.net)

~~~
emodendroket
Looks like the asymptotic complexity of getting this article to the frontpage
is pretty high.

------
Lxr
> O(N) describes an algorithm whose performance will grow linearly and in
> direct proportion to the size of the input data set

Nitpick, but this is technically wrong. There needs to be a "no faster than"
somewhere in there. Merge sort is, for example, included in O(2^n) by the
actual definition of big O (vs big theta, which is defined as a tight bound).
So there exist algorithms that are O(N) that don't grow linearly.

f(x) is O(g(x)) just means there is some constant K such that f(x) <= Kg(x)
for all x beyond some value. This has nothing to do with 'worst case
performance'.

~~~
umanwizard
You'll be a lot happier if you mentally replace O(n) with Theta(n) every time
you see it, rather than feeling the need to nitpick when you know that that's
what people mean when they say O(n), 99% of the time.

~~~
marcosdumay
Problem is that Theta(n) is an almost useless notation.

Thus, most times people say O(n) on practice[1], they mean O(n). But most
times people define O(n), they cite the Theta(n) definition.

It isn't hard to put an "up to", "no more than" or "or less" at the
definition, and it does not make it threatening or hard to read.

[1] There's probably a small corner in Hell reserved for people that mean O(n)
_on average_ but refuse both to say that qualifier and care about what data
distribution created that average.

~~~
Lxr
There seems to be a misconception here that this has something to do with
best/average/worst case input. "f(x) is O(g(x))" is a purely mathematical
statement about two functions, and means that f(x) is smaller than or equal to
(K times) g(x) for all x beyond some threshold. I.e. O(g(x)) is a set of
functions that includes f(x).

When we say "quicksort is O(g(n))" for some g we also need to specify, or
assume, the distribution of the input (i.e. best/average/worst case), and we
are free to assume whatever distribution we like. Then, assuming that
distribution, the running time of the algorithm is just a function of the
input size, and we can use the above definition to make a statement about that
function (i.e. what complexity class it is in).

A complete statement looks like "quicksort is O(n^2) in the worst case", which
means "for every n, pick the list of length n that takes the longest to sort
(i.e. the worst-case input), and the number of steps executed by quicksort
will be no more than Kn^2 as long as n is large enough, for some fixed K".

The difference between O, Theta and Omega only concerns what functions the set
contains - in particular, Theta(g(x)) is a subset of O(g(x)). _This is a
separate issue to whether we are talking about best /average/worst case
input._

------
njharman
I like this page [http://bigocheatsheet.com/](http://bigocheatsheet.com/)

Espcially the graph which hammers home how quickly things go wrong.

~~~
rawnlq
That table is so wrong it makes me mad.

~~~
teraflop
In what way? Having skimmed it, I see a couple of minor mistakes but the bulk
of it seems correct.

The chart says you can search a Cartesian tree in O(log n) expected time, but
I don't see how that's possible given that there's no ordering between a
node's children.

Also, it's possible to do mergesort in O(1) space; in fact, it's one of the
exercises in TAOCP, if I recall correctly. But the algorithm is so complicated
that nobody uses it in practice.

------
nayuki
The article is mostly good, but I have one nitpick.

> An example of an O(2^N) function is the recursive calculation of Fibonacci
> numbers

No, the naive recursive evaluation of the Fibonacci sequence has complexity
O(1.618^N) (or just O(Fib(n)). It is _unequal_ to O(2^N) because the base of
the power is different.

~~~
Scea91
Technically 1.618^N is still O(2^N) but I agree with your point.

~~~
mantasm
I was about to correct you until I realized you used big-O instead of big-
theta notation.

Informally we tend to use big-O to mean big-theta which only adds to the
confusion.

~~~
Scea91
Actually we don't use it informally as big-Theta, because big-Theta assumes
that the lower and upper bound are asymptotically the same.

For example Quicksort is O(n^2) but Omega(nlogn) it is neither Theta(nlogn)
nor Theta(n^2).

You probably meant that informally we just assume that the stated bound is as
tight as possible.

~~~
anseladdams
> _For example Quicksort is O(n^2) but Omega(nlogn) it is neither Theta(nlogn)
> nor Theta(n^2)._

No, this is flat out wrong. Big O is an upper bound on an asymptotic growth
rate. Big Omega is a lower bound on the asymptotic growth rate. Big Theta is a
tight bound on the asymptotic growth rate. These are independent to the
average case, best case, or worst case run time of a given algorithm.

Quicksort has an average run time of Theta(n lg n). Equivalently, its average
run time is O(n lg n) and Omega(n lg n). It has a worst case run time of
Theta(n^2). Equivalently, its worst case run time is O(n^2) and Omega(n^2).

> _Actually we don 't use it informally as big-Theta,_

Wrong again.

~~~
SamReidHughes
You can state that quicksort has run time O(n^2). The function you're
describing with O(n^2) is f : S -> R, where S is the set of input values and R
is the set of real numbers, and f(x) is the running time of running quicksort
on input value x, and n : S -> N is the _size_ of input value x. The notation
O(g(n)) describes the function f in terms of the function n.

That f is in O(g(n)) means there exist constants C and N_0 such that for all x
in S, provided that n(x) >= N_0, it is true that |f(x)| <= |C g(n(x))|.

And that f is in Omega(g(n)) means there exist C > 0 and N_0 such that for all
x in S, provided that n(x) >= N_0, it is true that |f(x)| >= |C g(n(x))|.

------
btilly
Common mistake. When people say O(2^n) they USUALLY mean something closer to
2^O(n).

The reason is that f(n) = O(g(n)) means that for some constant k and integer
N, if N < n then |f(n)| < kg(n). In other words "grows no faster than a
constant times". However when you've got exponential growth the slightest of
things can cause the exponent to be slightly different, or there to be a small
polynomial factor.

That was the case in his example. (The exponent was different.)

------
minikites
The best layman's example of O(log N) I've heard is finding a name in the
phone book. You open it around the middle, select which half gets you closer,
and repeat. It's then easy to "get" why it's not a big deal if it's a huge
phone book versus a tiny phone book.

~~~
noobiemcfoob
That seems more like a layman's explanation of a binary search than O(log N).
Binary search just happens to be O(log N)

~~~
minikites
I'm not sure how drawing that distinction helps a layman.

~~~
noobiemcfoob
Clarifying something as an example of a principle instead of the principle
itself seems pretty important, regardless of lay status.

~~~
minikites
Ah, when you phrase it that way I see what you mean, I agree.

------
nathan_long
Nice! Very succinct and clear. I wrote an intro myself, as someone who'd
recently learned the concept. Wordier, but might be helpful if you think like
I do. [http://nathanmlong.com/2015/03/understanding-big-o-
notation/](http://nathanmlong.com/2015/03/understanding-big-o-notation/)

------
gameguy43
Super interesting comparing this to the one we have on Interview Cake:
[https://www.interviewcake.com/article/big-o-notation-time-
an...](https://www.interviewcake.com/article/big-o-notation-time-and-space-
complexity)

I like how Rob Bell's piece has headings with the most common time costs--
sorta makes it easier to see at a glance what you're going to get, and I
imagine gives folks a quick sense of, "Right, I've seen that! Been /wondering/
what that means."

------
curiousDog
I think the nicest way to learn this if you don't have a formal CS education
is to still pick up a Discrete Math text book (like Chapter 3 in Rosen) and
then read chapters 3 and 4 in CLRS.

~~~
stcredzero
If something puzzles you, don't be afraid to start drawing yourself a diagram.
O(n^2) vs. O(n) amortized for naive string concatenation vs. growing a buffer
by doubling:

    
    
      x
      xx
      xxx
      xxxx
      xxxxx
      xxxxxx
      xxxxxxx
      xxxxxxxx
    
      x
      xx
      xxxx
      xxxxxxxx
    

This can be a good way of getting an intuitive feel for what's going on. Note
in the 2nd case, you can stack all of 1st two rows representing cost into the
3rd row. In fact, you can always stack the execution costs into row (n-1) and
never exceed the size of row n.

~~~
curiousDog
Looks like you're alluding to the recurrence tree method for solving
recurrences mentioned in CLRS. And you're right, it's a great way to visualize
and get an intuitive feel of big-oh

------
emodendroket
It would be sensible to put the terms like "constant" and "logarithmic" in
there, IMO.

------
kevindeasis
Is there a beginner's guide to proving different time complexities?

------
vvanders
> O(N) describes an algorithm whose performance will grow linearly and in
> direct proportion to the size of the input data set.

Argh, I hate this every time I see Big O notation covered.

Big O != performance. If you have an O(N) algorithm that walks the data set in
the order that it's laid out in memory(assuming contiguous data) it _will_
beat your O(NlogN) and sometimes even O(logN).

[edit]meant to omit nlogn, that's what I get for any early morning rant pre-
coffee.

Radix Sort is the classic example I always bring up. On machine word size
keys, with a separate pointer look-up table(to get final value) it will beat
QSort, MergeSort and all the other NlogN sorts by 10-50x. This includes having
to walk the data 3-4 times depending on how you want to split the radix to
line up with cache sizes.

Friends don't let Friends use Big O to describe absolute performance.

~~~
decebalus1
> If you have an O(N) algorithm that walks the data set in the order that it's
> laid out in memory(assuming contiguous data) it will beat your O(NlogN)

Not sure what you're trying to say with this particular statement. Of course
it will. NLogN grows way faster than N.

And Radix sort is of course drastically faster than comparison sorts if the
word size is smaller than logN regardless of the particularities of the
implementation.

People don't and shouldn't use big O to describe absolute performance but it's
a great place to start and you can't reach the level of implementation details
performance tuning if you can't understand or formalize the basics.

But you do have a point overall.

In practice, performance may be different. Big O assumes a particular 'virtual
machine' with a particular word size and a particular time it takes to execute
an instruction.

~~~
vvanders
Yeah, meant to omit NlogN there, I blame it on lack of coffee.

My meta point is the constant factors that Big O throws out usually end up
dominating real-world performance.

~~~
copperx
That's why some Data Stuctures courses use tilde notation instead (same as big
O, but the constant is preserved).

