
Big-O notation explained by a self-taught programmer - justinlilly
http://justin.abrah.ms/computer-science/big-o-notation-explained.html
======
YZF
The math of Big-O isn't that hard and the article while having good intentions
misses the point. Big-O is about asymptotic behaviour and the graphs are
misleading in that regard (well, they're simply wrong, not misleading). There
are algorithms where if you just look at the Big-O you'd think one has faster
run time than the other but because the constants fall out that wouldn't be
the case for any practical problem size.

What O(N) means is there is some large enough number where the run-time (edit:
or any function really) is bounded by a constant times the input size for any
input size larger than that number (see, math, not hard). That constant may be
so large that an O(N^2) may be a better solution for any practical purpose.

EDIT: As an example of this we can look at two multiplication algorithms,
Karatsuba and Schönhage–Strassen, the latter having better asymptotic
performance but that really kicks in once you have large enough numbers
(100,000 digits or so). (
[http://en.wikipedia.org/wiki/Sch%C3%B6nhage%E2%80%93Strassen...](http://en.wikipedia.org/wiki/Sch%C3%B6nhage%E2%80%93Strassen_algorithm)
)

~~~
forsaken
As a data point, I have no idea what you just said.

~~~
kyllo
He's saying that big O only matters for big input sizes, because big O is
specifically about the algorithm's asymptotic performance, which means its
performance for a large value of n. If you have a small value of n then the
other constant time operations in the algorithm may affect the running time
more.

n^2 is less than n^3 right?

But is 2 * n^2 + 100 less than n^3?

Depends on how big n is, right?

Big O notation just assumes that n is big enough to make the answer "yes."

~~~
tsahyt
More or less, yes. What he is saying is that there is a constant hidden in the
big O. Say, we have two algorithms A and B with a runtime that can be bounded
by the functions a(n) = 1000n^2 and b(n) = 0.001n^3 respectively. Hence a ∈
O(n^2) and b ∈ O(n^3). So the first algorithm A is clearly faster
asymptotically. However, suppose we have input sizes of around n=50000, it
actually turns out that algorithm B is faster because g(50000) < f(50000).
This is because of the constant that is hidden in the big O. For sufficiently
large n however (n > 10000000 in this case), algorithm A will always be
faster.

A good implementation could check problem sizes beforehand and chose the
algorithm accordingly.

What I don't agree with is that "big O only matters for big input sizes". Big
is not really a well-defined term. The problem here is that "big" depends
entirely on the algorithms. It might also be n > 10\. There's nothing in the
definition of the landau symbols that prevents that.

~~~
kyllo
The definition of a "big" input size is somewhat circular, yes.

"The big O performance of an algorithm only matters for a sufficiently large
value of n."

"Define a sufficiently large value of n."

"Large enough that the big O performance of the algorithm starts to matter
more than the coefficients of n and constant time operations."

So yes, that could be 10 or 10 million depending on the nature of the problem,
the constants and coefficients of the algorithm, the language, the hardware,
etc etc. You could have an algorithm that takes factorial time but maybe
you're only using it in a problem domain where n is always < 10 so you'll
probably never notice or care that it's O(n!)

------
sillysaurus
This sort of contributes to giving self-taught programmers a rather bad name.
The writeup is good, but the idea that Big O is "scary" is just absurd. It's
an elementary concept that every working programmer should be familiar with,
regardless of whether they're self-taught. Algorithms are not "scary". If you
can't reason about algorithms, you may not be a very good programmer yet.

To be clear, I really appreciate the writeup. I just wish it had been framed
better. It should be clear that this is for beginner programmers, regardless
of whether they have a degree or whether they're self-taught.

~~~
cocokr1sp
Ahh yes. Let's berate the OP for being intimidated by a topic and then diving
in and learning it on their own. This will really encourage others to learn on
their own and contribute back.

~~~
sillysaurus
Well, whether we like it or not, self-taught programmers are held to a higher
standard. It doesn't help us to further the stereotype that self-taught
programmers are afraid of the basics, haven't attained a general education in
computer science on their own, or are less reliable than their peers who have
degrees.

Not trying to berate the OP. I'm trying to say I wish OP had framed it better.

------
yid
Unfortunately, there are some misconceptions that are propagated in this
article. Kudos on the effort, but some statements are just flat out wrong,
such as this statement: "Big-O is all about the approximate worst-case
performance". Big-O has nothing to do with worst-case, but is a _bounding_
function. An O(n) algorithm is also O(n^2), O(2^n), etc. Those are valid
bounds on the O(n) algorithm, just not the smallest.

~~~
baddox
Note that there are other bounding functions, like bounded from the bottom
(which still isn't the same thing as worst-case). See
[https://en.wikipedia.org/wiki/Big_O_notation#Family_of_Bachm...](https://en.wikipedia.org/wiki/Big_O_notation#Family_of_Bachmann.E2.80.93Landau_notations).

Speaking of worst-case (or best-case, average-case, etc.) scenarios, how does
big O notation relate? The variables inside an O() notation as far as I know
refer only to the size of the input, so when we say that finding a value in an
unsorted list is in O(n), we're referring to the worst-case scenario
(obviously, finding a value in a list when that value is the head of the list
is constant time, and not very interesting). Of course, that's a simplistic
example, but with more complex algorithms like Quicksort, when we say it's in
O(n log n) we're talking about average-case. Is this just because we know that
worst-case performance in Quicksort is _exceedingly_ rare so we don't bother
mentioning that O(n log n) is average-case unless we're studying it more
deeply?

~~~
jfarmer
Here's an answer of mine on Quora you might find useful:
[https://www.quora.com/Algorithms/How-can-I-determine-
whether...](https://www.quora.com/Algorithms/How-can-I-determine-whether-
f-n-%CE%A9-g-n-f-n-theta-g-n-or-f-n-O-g-n-for-the-functions)

There are two things going on.

First, when we talk about Big-O we're not talking about the "worst case
scenario." Big-O gives us an upper bound on the worst case scenario, but the
actual worst case scenario might be better. Big-O means "no worse than" not
"as bad as."

When most people say Big-O they really mean Big-Θ, which does encapsulate the
idea of "asymptotically equivalent, up to a constant." See my answer on Quora
for more technical details.

Second, Big-O and other forms of notation used in the asymptotic analysis of
functions were invented before physical computers existed. They're statements
about pure functions.

When applied to the analysis of algorithms the function we're "really"
analyzing isn't the algorithm. Rather, if we have an algorithm A that takes as
its input a positive integer n, we're really analyzing the function "the
length of time it takes algorithm A to run given input n."

The up-to-a-constant nature of Big-O notation is nice because that constant
can encapsulate things like processor speed, memory access times, and so
forth. This enables us to make intelligent statements about algorithms per se
without reference to the underlying machine on which the algorithm might be
implemented.

Even with ideal functions, this naïve asymptotic analysis has some problems.
For a toy example, imagine a spiky function like this:

    
    
      f(n) = 800*n^2 if n is divisible by 1000000000
      f(n) = 400*n   otherwise

This function is not O(n) but it is O(n^2). The "worst case" behaves like
O(n^2), but for "most inputs" it behaves like O(n). We can't say "f(n) is
asymptotically no worse than n, up to a constant" because for infinitely many
inputs it is.

Lots of algorithms behave like this in practice because we optimize for common
cases perhaps at the expense of less common cases. "Common" is dictated by how
our algorithm is used.

Taking quicksort as an example, let's call the algorithm Q. We want to measure
its running time given an input of length n. For input x, let's say it's
running time is T(x).

Well, there are many x such that len(x) == n, so what does it even mean to say
"its running time given an input of length n?" Are we given a particular input
of length n? A uniformly-selected-but-random input of length n? To the extent
that we can, we want to be making statements about the algorithm per se, not
statements about the algorithm given a particular input.

On way to answer this to ask "Given an input of length n, what's the most time
my algorithm could take?" In that case we're analyzing the following function:

    
    
        W(n) = max { T(x) | x is valid input and len(x) == n }
    

On the other hand, maybe we care more about the average case. Perhaps we
randomly pick 1,000 inputs of length n and average the running time. Now we're
talking about something that looks more like a probability distribution than a
discrete thing like "running time" because we've sampled the input space.

And in fact, we could calculate "expected running time given input n" in this
way and graph that. We could then make Big-O like statements about that new
function, which is the kind of thing folks mean when they talk about "average
case."

Hope that helps!

~~~
baddox
It sounds to me like you're mixing up two things: "worst case performance" and
"asymptotic performance." You say

> Big-O is concerned with worst case performance. Colloquially, f∈O(g) means
> that "Eventually, f performs no worse than g."

Aren't these two different concepts? Worst case performance deals with the
worst possible input of any given input size (like pathological inputs to
Quicksort), while asymptotic performance is talking about _sufficiently large
inputs_ (like the vertical lines you've drawn on the graphs).

When I say that merge sorting a set of n elements is in O(n lg n), I'm saying
that there's some value on the n axis beyond which n lg n >= MergeSort(n). But
when I say that Quicksort is O(n^2) in the worst case, it's as if I'm talking
about another other function called WorstQuicksort which when given a set of n
items always takes as long as Quicksort would take to sort the most
pathological set of n items, and there is some value on the n axis beyond
which n^2 >= WorstQuicksort(n).

~~~
jfarmer
Let's be precise. I'm being more precise here in my comment than I was on
Quora.

Big-O and related notations are ways of categorizing functions. O(n^2) for
example is actually a set of functions, which is why I wrote f ∈ O(n^2) rather
than something like f = O(n^2) or f(n) = O(n^2). That is, f is a member of
some set of functions which all satisfy a particular, precisely-defined
property.

To understand that property, first, let's get rid of the idea of "performance"
because asymptotic analysis has nothing to do with "performance" _per se_ and
predates even the first precise definitions of things like "algorithm" or
"computability." The notation itself was invented in the late 19th century.

Instead, let's just talk about "upper bounds." If we have a function it's easy
to talk about upper bounds. For example,

    
    
      f(x) = sin(x)
    

is bounded above by 1, 1.5, 10, 100, 80457, and an infinitude of other numbers
for any real number x. It's bounded below by -1.

Now, in this case, it's east for us to see that not only is

    
    
      sin(x) <= 1 for all real x
    

but also that

    
    
      max { sin(x) : x is real } = 1
    

So in this sense the upper bound of 1 is strict. 2 is also an upper bound in
the sense that

    
    
      sin(x) <= 2 for all real x
    

but it's not strict. There are other upper bounds which are strictly smaller
than 2, e.g., 1.5. So, we can say that "the value of sin(x) for real x is no
greater than 2," but we can't say that it "is" 2.

So, to answer your point before diving deeper, Big-O is about "worst case
performance" in this sense. By itself it doesn't tell us what the worst case
performance is. Instead, it gives us an upper bound on the worst case
performance. It says "the worst case performance is no worse than FOO." The
actual worst case performance might be better.

Big-Θ is the asymptotic equivalent to "this is a tight upper bound."

I'll skip further development of this for now and jump back to the issue of
algorithms. The issue is this: given an algorithm with input of length N, we
want to say something about how long it takes to run.

This means that the function we're analyzing isn't "QuickSort(n)". What does
that even mean? The input of QuickSort is an array of integers and it returns
a sorted array of integers. How can an array of anything be greater than or
equal to n^2? So that's one way in which the CS vernacular equivocates --
we're not really talking about QuickSort we're talking about some other
function:

    
    
      T(n) = the amount of time it takes QuickSort to run given an input of length n
    

We're then talking about bounds on this other function T, asymptotic or
otherwise.

But now we're in a pickle because what does "the amount of time it takes
QuickSort to run given an input of length n" mean? There are many inputs of
length n. If we're talking about just arrays of integers of length n, there
are n! if all we care about is relative ordering and not the actual values in
the array. If we care about the actual values in the array then there are an
infinitude of inputs of length n.

There are a few ways we can handle this. Let's re-define T(n) like so:

    
    
      T(x) = the amount of time it takes QuickSort to run given input x
    

One way is the "worst case" method. This says, ok, look at this function:

    
    
      W(n) = max { T(x) : x is a valid input to QuickSort and len(x) == n }
    

We can now do Big-O, bounds, asymptotic analysis, etc. on W(n). This is what
we mean when we say the worst case is O(n^2). It means W ∈ O(n^2).

Another way is the "average case" method. This says, ok, look at this
function:

    
    
      A(n) = avg { T(x) : x is a valid input to QuickSort and len(x) == n }
    

This is tricky if there are in principle an infinite number of valid inputs of
a given length. There are various ways of handling this issue. For something
like QuickSort we can see that it's really only the ordering that matters,
i.e., for the purposes of QuickSort [1,10,5] is the same operation-wise as
[-50, 80, 0], so there are only n! inputs we really need to check for a given
n.

Yet another way is the "best case" method, which looks at

    
    
      B(n) = min { T(x) : x is a valid input to QuickSort and len(x) == n }
    

So, given an algorithm we can derive these three functions and then answer
Big-O questions about them. We're never answering Big-O questions about the
algorithm _per se_ , although we can get away with equivocating when W(n),
A(n), and B(n) are always equal or it's obvious we only care about one of
them.

For simple examples this is often the case, e.g., calculating the nth
Fibonacci number in the standard iterative way has best, average, and worse
case performance of O(n).

To make matters worse, most people say Big-O but mean Big-Θ, or at the very
least aren't clear when they mean one or the other. So, when one says "worst
case performance" and we have W(n), A(n), and B(n) all being the same, it can
be particularly confusing.

Depending on the algorithm in question which it might be understood what we
care about one more than the others. For example, if worst case inputs are
particularly pathological we might talk as if we mean the performance of the
algorithm per se but really be talking about A(n). However, if "bad" inputs
are common we might really be talking about W(n).

~~~
anonymoushn
f∈Θ(g) is equivalent to f∈O(g)∧g∈O(f). This isn't really a "tight upper
bound." For your spikey function f above, we have f∈O(n^2) and f∈Ω(n). These
bounds are tight in the sense that there are no strictly tighter bounds. That
is, there exists no function g with f∈O(g)∧g∈o(n^2) (there is no function
asymptotically smaller than n^2 that bounds f from above). So for this f one
could reasonably consider n^2 to be a "tight upper bound", although one would
need to pick a spikey bounding function to say anything involving f and Θ.

The problem of having an infinite number of inputs of size n is usually not a
problem, because there are finitely many bit strings of each length. If an
algorithm uses subroutines that act on input objects of unbounded length (like
comparing arbitrary-precision integers), and you are only interested in the
bound for the number of subroutine calls, then there might be some trouble
with the notion of an average case across a fixed input size. This is a bit
silly though; it's more a way to define "fixed input size" into describing
"unboundedly large input size" than something I would actually want to do for
some useful purpose.

~~~
jfarmer
Er, well, I was using the function as an example of something which is
f∈O(n^2) but not f∈Θ(n^2) under the assumption that when the OP reads the
symbols "f∈O(n^2)" he's really thinking something more like f∈Θ(n^2).

------
tbrownaw
_O(N) is read "Order of N" because the O function is also known as the Order
function. I think this is because we're doing approximation, which deals in
"orders of magnitude"._

It's a different meaning of "order", that has to do with the shape of the
size-vs-time curve. It's the same meaning as the order of a polynomial ("x" is
linear or 1st order, "x^2" is quadratic or 2nd order, etc).

 _but Big-O is all about the approximate worst-case performance of doing
something._

It can be, but I think it's more commonly taken to mean the average case. One
example is quicksort, which is O(n*log(n)) on average but can be quadratic
(n^2) in the worst case.

~~~
justinlilly
Thanks for your comment. I must admit that my lack of mathematical background
makes your clarification difficult to understand. I've not heard the term
"order of a polynomial". If you wrote up a post and emailed me at
justin@abrah.ms explaining the concept, I'd happily link it in this article.
:)

~~~
Ixiaus
I'm self taught too. Pick up Knuths books and start going through them, stop
and learn the math if you don't understand it. His first book in particular
begins with some mathematical foundations that will dramatically change how
you reason about your programs.

This stuff isn't scary. It's thick but learnable and you can even develop an
appreciation for its elegance.

------
alanctgardner2
The post is kind of misleading because it confuses "order of magnitude" with
the order of a polynomial. To make this very clear:

You have two functions, x^4 and x^3. You can multiply x^3 by any constant
multiple a (any number, regardless of size), and as x gets sufficiently large
x^4 will be bigger than a * x^3. This is the point of evaluating asymptotic
performance: as your input data approaches an infinite size, only the highest
order term in a polynomial really matters. For example, x^2 + x + 1 will
approach x^2 for sufficiently large x - the lower order terms (x^1 and x^0)
don't really matter for a big x.

Technically, big-O refers to a bounding function: x^2 is O(x^2) AND O(x^3),
etc. because x^2 is less than or equal to x^2 and x^3 as you approach
infinity. For convenience we're usually only interested in the best fit; it's
useless to say your algorithm is faster than O(x^5), but it's interesting if
it's O(n).

Finally, we also have small-oh notation, which is a lower bound. If your
algorithm is never faster, on an ideal input set, than x^2, it's also o(x^2).
Note that the same algorithm is also o(n) and o(1), because any algorithm
which is slower than (or equal to) x^2 must necessarily be slower than x or 1
(constant-time).

edit: It's worth pointing out I only talked about polynomials because it's
pretty intuitive. You can extend this notation to any class of functions -
exponentials, logarithms, etc. but you just have to know that log(x) is O(x)
and o(1), etc.

~~~
repsilat
> Finally, we also have small-oh notation, which is a lower bound.

Little-oh is a strict upper-bound. If `f` is `o(g)` then

    
    
      lim_(x -> inf) f(x)/g(x) -> 0.
    

Or something. It has been a while. If you want an asymptotic lower-bound you
want big Omega (non-strict) or little omega (strict).

------
ambiate
I find Big-O is mostly applicable to SQL queries. A hard drive/memory
bottleneck exists in all database queries. Programmers will easily ignore bad
SQL and chalk it up to 'database bottleneck etc'. The truth usually goes along
the lines of 'I do not understand temporary tables or views. I just SELECT. I
opened a 30,000 row cursor, then, a 1,000,000 row cursor, and finally another
500,000 row cursor and got the data (while storing all of the fetches in
leaking arrays)!' Usually on keys without indices or tables without primary
keys.

SELECT a, b, c FROM abc (30,000 rows)

BEGIN

    
    
      SELECT e, d, f FROM edf WHERE e = a (30k * 1m)
    
      BEGIN
    
        SELECT x, y, z FROM xyz WHERE d = x (30k*1m*500k scanned)
    
        BEGIN
    
          process()
    
        END
    
      END
    

END

A customer or manager can all find respect in lowering the growth rate of a
query. No mathematics required. Simply, "Your report now runs in 30 seconds
instead of 20 minutes." Anyone can compute that!

The mathematics of Big-O gets annoying for average case scenarios. Instances
of

def fun(abc):

    
    
      l=[]
    
      for x in abc:
    
        if x%2==0:
    
          for y in abc:
    
            l.append(y)
    
      return l

------
shoyer
This is a nice explanation, but I couldn't help but notice that the estimate
of the number of gumballs in the pictured machine seems closer to 1000 than
100 (contrary to the claim in the article).

You can actually see about 100 gumballs in the picture, so there must be far
more hidden behind them -- my guess is closer to 500, which is about twice as
close (in order-of-magnitude) to 1000 as to 100.

~~~
justinlilly
I've updated the wording so as not to make such strong claims. :) Thanks!

------
cliveowen
I don't know first thing about math, but even I find that the classical
definition (i.e. that find in the CLRS book) is pretty straightforward: given
an input big enough, the running time will be at most that of a multiple of
function g(n) if f(n) = O(g(n)), where f is the function that describes the
algorithm's running time.

~~~
yid
> at most that of a function multiple

I think you meant "constant" multiple of g(n)

~~~
cliveowen
I meant a multiple of a function g(n), corrected ;)

------
fmax30
I don't really know why Big-O notation is so common , even though Big O is the
upper bound . For me it is more practical and logical to use the Big Θ (Theta)
notation as it provides a tighter bounder which is more understand able. Also
Big O is very misleading to the new comers, as they are usually confused when
they see something like O(n) = O(n^2) which is perfectly valid , as the Big O
notation is only the upper bound albeit it will be a loose upper bound.

For all we care , we can write the Big O of

for ( i = 0 ; i < n ; i++) { cout<<"I am constant time operation"; }

as O( n!) , it won't be mathematically wrong but again it would be very
misleading and loose :). So my advice to everyone is to use the Big Θ notation

As f(x)= Big Θ(g(x)) when f(x) = Big O ( g(x) ) and Big Ω(g(x)) .

Here for those that don't know what Big Ω(g(x)) (read Big omega) is, it is a
lower bound . In English that would be that your loop will execute/iterate at
least g(x) times.

Now before people get any more confused Big Θ(g(x)) is a tight bound , that
means that your code/loop will run at least C1 * (g(x)) and at most C2 *
(g(x)) . where C1 and C2 are two constants .

If anyone is interested they should really read CLRS. It has an excellent
chapter on calculating and explaining the time complexities.

~~~
oskarkv
Of course O(n) is not equal to O(n^2). O(n) is a set, and O(n^2) is a
different set.

~~~
fmax30
Big O is a upper bound , what I meant was that if a piece of code has O(n)
then it also has O(n^2). With emphasis on the upper bound f(x) = O(g(x)) if
f(x) <= c _g(x).

so my point is if f(N) = O(N) then f(N) is also equal to O(N^2) as f(N) <=
C_N^2 .

------
bps4484
This is definitely a good starting point. If you wanted to expand this, I
would recommend diving into how the relationship between the real time an
algorithm takes to execute and the order of the function, and constants or
lesser order terms are unimportant with order notation (can best be shown with
graphs, which has already been introduced).

One thing I would stay away from is the talk about "orders of magnitude"
because the order of a function and an order of magnitude are very different
topics, and it could cause a reader to make bad conclusions like "one
algorithm takes 10 seconds to run, one take 100 to run, they must have
different big-O notation". I understand why the analogy is made, it's because
you're estimating things from a high level, but I think it could cause
confusion on the fundamentals.

------
Hannan
Nice writeup!

That said, I got ~500 gumballs in the machine ((container diameter / gumball
diameter)^3 * .64) so both guesses of 100 and 1000 should be within an order
of magnitude. ;)

~~~
shardling
Aside: If an order of magnitude is a factor of 10^1, half an order of
magnitude is actually 10^0.5 ~= 0.3.

So rounding 500 to the nearest order of magnitude gives 1000. 310 is the mid
point.

------
calhoun137
One of the nice things about big-O notation from a mathematical point of view
is that it allows you to avoid dealing with limits.

As everyone knows, calculus is based on limits, but when handled rigorously
limits have a lot of subtle pitfalls that take a lot of work to deal with. For
example, given a function f(x,y), if we want to take one limit after another,
the result might depend on the order in which the limits are taken. In other
words, taking limits is not commutative unless certain conditions are
satisfied.

None other than the great Donald Knuth went so far as to claim that all of
calculus could be taught without limits! Quoting from [1]:

"Students will be motivated to use O notation for two important reasons.
First, it significantly simplifies calculations because it allows us to be
sloppy — but in a satisfactorily controlled way. Second, it appears in the
power series calculations of symbolic algebra systems like Maple and
Mathematica, which today’s students will surely be using.

For more than 20 years I have dreamed of writing a calculus text entitled O
Calculus, in which the subject would be taught along the lines sketched
above."

[1] [http://micromath.wordpress.com/2008/04/14/donald-knuth-
calcu...](http://micromath.wordpress.com/2008/04/14/donald-knuth-calculus-via-
o-notation/)

------
nhamann
This article has way too many words and not enough math. There is, in fact,
nothing scary about big O notation once you dissect it, and it's a shame that
so many people seem to think otherwise.

Here's the definition: if f and g are functions (let's say real-valued
functions defined on the positive reals), then we say that f is _big O of g_ ,
written f = O(g), if there exists a real number y and a real number K, K > 0,
such that

    
    
      f(x) <= K * g(x)
    

for every x > y.

If that makes sense, then done.

Else:

The first thing to do when meeting any mathematical definition that you don't
understand is to throw away parts of the definition until you do understand
it, then add them back in one by one. In this case, let's forget about the
constant.

 _New definition_ : For functions f, g, f is _blorb of g_ , denoted f =
blorb(g), if there is a y such that

    
    
      f(x) <= g(x)
    

for every x > y.

"f is blorb of g" actually just means that there comes a point after which g
is never smaller than f. This gives us the first ingredient of big O: we are
concerned only with asymptotic behavior. f could take on immense values for
small x and still be O(g) as long as f eventually becomes always smaller than
g.

The reason for caring about asymptotic behavior is that we often don't care
about the time complexity of an algorithm for very small problem sizes. Even
the traveling salesman problem is solvable on Raspberry Pi for very small
problem sizes.

Okay, I hope we understand the above definition. Now we add the constant back
into the fold and see if we can make sense of it. From what I can see, the
constant is there for computer scientists who want to paint with broader
strokes. There can be a huge practical difference between f1(n) = 2n and f2(n)
= 2000n (the difference between a computation taking a day and taking 3
years), but they're both O(n) because complexity theorists are more concerned
with O(n^2) versus O(2^n) than they are with O(n) versus O(5n). (Also could be
because in practice algorithms with wildly varying constant factors out in
front are rarely seen?)

For an alternative to big O notation, you should check out Sedgewick and
Wayne's Algorithms, 4th ed. They use something they call "tilde notation"
which preserves the leading constant factor. (See:
[http://introcs.cs.princeton.edu/java/41analysis/](http://introcs.cs.princeton.edu/java/41analysis/))

~~~
algorias
> Also could be because in practice algorithms with wildly varying constant
> factors out in front are rarely seen?

The main reason is that you want a result that does not depend on small
implementation details, i.e. is consistent across programing languages and CPU
architectures.

Things as simple as larger cache size or a slightly better hashing function in
a dict can increase the running speed of a program by a constant factor.

------
oskarkv
First of all, O(g(n)) is a set. It is the set of functions f(n) such that
there exists positive constants n0 and C, and C*g(n) > f(n) when n > n0.

Second, talking about O(g(n)) does not imply that the time complexity being
discussed is the worst-case (or any other case) time complexity. One could for
example say that the algorithm A's best-case time complexity is in O(n), and
it's worst-case time complexity is in O(n^2).

------
stormbrew
I think the difficult part for someone with no math background isn't so much
the ones he outlines here, which have fairly obvious causes that follow
directly from the code, but the various logarithmic complexities which require
a bit more reasoning. Certainly that's what always tripped me up before I put
some effort into understanding it more.

------
justinlilly
Not sure if others are interested, but I'm considering writing a book on CS
topics with self-taught programmers in mind. If that's interesting to you,
check out [https://leanpub.com/computer-science-for-self-taught-
program...](https://leanpub.com/computer-science-for-self-taught-programmers/)

------
tryitnow
I'm a newbie, and I enjoy articles like this as a starting point to frame the
overall concept.

Like most simplifications, it's not entirely correct.

Which is why I like the comments section of HN - where I can learn more about
the concept by observing how a wide variety of people explain it (and in the
process correct the errors of the original article).

------
datalus
I always get big and little oh mixed up.

------
tomrod
I don't know if I could explain it much better myself. Kudos.

