
WTF Is Big O Notation? - soheilpro
https://rob.conery.io/2019/03/25/wtf-is-big-o-notation
======
roel_v
(Warning: nerd sniping incoming)

The main problem with this article is that it doesn't stress (or even mention,
as far as I can tell) that it's performance characteristics _at the margins_.
Meaning, it's about a generalized operation on a very large data set. Your
fancy hashing algorithm might be slower than just doing a linear scan over a
10-element table, not to mention the maintenance of your tree structure. In
the article, this is most obvious in the following sentence:

"Let’s say we have 1000 records in our film table. To find “Academy Dinosaur”
our database will need to do 1000 operations (comparing the title in each
row)."

No you don't. If your table is sorted, you'll need only 1 operation to find
'Academy Dinosaur', because it's your first element. What _does_ happen is
that _on average_ , to find any title, you will need n / 2 operations (so not
1000, but 500, in this example - this part is plain wrong as it's written). Of
course, n / 2 is linear in n, so the complexity stays the same. But my point
is that to understand big O notation for real world impact, you _first_ need
to understand the concepts the article mentions, _and then_ you also need to
be able to accurately assess the lower-dimensional terms that are left out,
and estimate them.

(preempting counter sniping: this is assuming that titles are unique)

(to put this in other words, an algorithm or data structure operation
described as being 'O(n)' could really be 'O(n) + x' where x is a constant;
but we leave that out because at the margin, it doesn't matter (i.e. is
dwarfed by the O(n) term. But when n is small and x is large, x becomes
dominant, so assessing the effects of the complexity, in the real world, need
to keep the expected magnitude of n in mind.)

~~~
c3534l
> What does happen is that on average, to find any title, you will need n / 2
> operations (so not 1000, but 500, in this example - this part is plain wrong
> as it's written).

You're conflating theta and omega notation for big O. Big O is the worst case
scenario. The average or even typical operation doesn't matter. If the
algorithm is the most complex when sorting a list of items that were already
sorted in descending order, then that is the use-case that is considered when
figuring out Big O. If you want average, use Big theta.

~~~
teraflop
No, this is also inaccurate. Big-theta and average-case complexity are
entirely different concepts.

Big-O/theta/omega are all ways of _categorizing functions_. Those functions
typically take as their argument some measure of the "size" of the input
(which may be one variable or multiple). The output of the function could be:

* worst-case performance ("largest possible running time for all inputs of size N")

* average-case performance ("expected running time for an input of size N, chosen from some well-defined distribution")

* amortized performance ("largest possible running time among all possible sequences of N inputs, divided by N")

(We could also apply any of these definitions to something other than time,
such as memory or I/O operations.)

For _any_ of these definitions, we can talk about big-O, or big-theta, or big-
omega in a well-defined way.

As an example, the worst-case time complexity of bubble sort is Ɵ(n^2). The
average-case time complexity (assuming an input consisting of distinct
elements, uniformly randomly permuted) is also Ɵ(n^2). Nevertheless, the best-
case performance is Ɵ(n).

------
davismwfl
That was a good write up, and it will help people. I come from the self taught
side, and find that it is critical to know these things when you work in
certain areas of a code base and once you reach a certain level. But I don't
expect everyone needs to have the same level of understanding.

I also find interviewers asking detailed questions about Big O and specific
different algorithms just idiotic, especially the esoteric ones I've been
asked in the past and/or heard asked. When I interview someone, all I want to
know is that you know there is a difference and know to look when it matters.

My method for learning if a candidate understands this all is to get people to
choose data structures based upon different problem sets I will give them and
then ask the pros and cons. The more questions they ask about usage usually
the more they understand in my experience. And this is just way more
informative as a hiring manager as you will learn more if they understand what
they are saying, versus if they can regurgitate something they memorized from
a book or website.

~~~
labster
Having for the most part avoided the computer science department during my
earth science education, this is helpful.

I have sort of an intrinsic idea of what's going on -- self-joins can easily
compare a data set with every other member of the data set, and hit O(N²). And
that if your N is 5, the DB will probably going spend more time parsing SQL
than executing your query, so no point in optimizing. So I don't understand or
care about what the definition is, but it's useful tool to think about how
slow your code can be.

------
gpm
This seems like a good time to bring up one of my pet peeves about big O
notation.

Every theoretical model I've ever seen says that indexing into n bits of
memory takes O(1) time. That's obviously impossible:

\- The pointer you need to read is log(n) bits.

\- The physical memory is at best O(n^(1/3)) distance away from the CPU, and
thus takes that much time to get back to you. In reality it's probably
O(n^(1/2)) because we build on flat planes (once you start talking about
petabytes of data anyways).

Maybe this doesn't matter in practice, because the constants associated with
these are small enough (but are they, how many memory/disk bound applications
are there? How much extra performance could we squeeze out by using 16/32 bit
pointers for small arrays?). It certainly doesn't matter in theory where
things work like we say they do regardless of the physical reality. But it
annoys me that it's so obviously wrong.

~~~
mwfunk
Part of it is that there's very casual, conversational usages of Big O vs.
much more formal usages when trying to actually quantify performance.

Conversational Big O tends to include a lot of spherical cows
([https://en.wikipedia.org/wiki/Spherical_cow](https://en.wikipedia.org/wiki/Spherical_cow)),
because when Big O comes up in conversation, people are more likely to be
talking in generalities about high level designs. More detail could be
counterproductive in that context.

When someone is actually trying to quantify and predict performance of an
algorithm, then the other extreme becomes desirable- the more detailed and
specific the function, the better.

~~~
gpm
Ya, I'm directly addressing non-conversational use of Big O notation here,
including the areas of academia I'm familiar with and know people in.

------
derekp7
My experience, as someone who isn't a professional full-time developer
(systems engineer / devops focused), but who personally geeks out on CS
theory, is I find it fairly rare for developers that I work with that think of
Big O considerations when writing their code.

A prime example, was a piece of code that has to process incoming results that
get stuck in a transnational database table. Instead of selecting, then
processing, the records that the module cares about, it does the selection
(which typically returns 1 - 10 records), sorts them, processes the record
from the top of the list, then throws the rest of the initial select results
away. It then goes back to the DB an does the same select, and repeats the
process.

End result: if the queue gets backed up to 100, or 1000 records, the process
never catches up. You have to temporarily change the status of the inbound
records to something else, then put maybe 10 at a time back to pending.

There are countless other times I've seen similar types of issues where on
small data sets the code runs fine, but the performance degrades to n^2, or
even worse is when it degrades in a factorial manor. And the worst of it is
that if I talk to the developers using common CS notation (such as, your
algorithm should have logarithmic or at worse linear degradation as the input
queue grows, and it is behaving exponentially), their eyes glaze over.

~~~
eindiran
That's not my experience as a software engineer at all. In fact, I would say
that I see people worrying about writing algorithms that are asymptotically
optimal (well before that should ever be a consideration) 10x as often as I
see people failing to consider the efficiency of the algorithms they write.

~~~
derekp7
Do you think this is due to a difference between working at a place that has
software requiring a specific non-computer related skill set (such as medical
diagnostics software), vs being at a place that hires primarily CS grads for
regular software? Other developers that I've worked around at previous jobs
were in manufacturing (Informix 4GL devs), and an IT services company, among
others. The one common thread I've seen though is they tend to be impressed by
what I can do in C, for example, or helping them get to the root of an issue
they have (I will typically reverse engineer what they are doing from the
outside looking in, using things like strace, then writing up an analysis of
what I see from the systems side).

Another recent example is a case where the app needed to start a number of
sub-processes. So the dev used an equiv of a system() call, which spawned a
bash shell, to do a ps -ef |grep process_name |grep -v grep |grep -v vi |grep
-v vim |grep -v less, to see if the process was running, then doing another
system() call to start the sub process. The whole string would be repeated for
each sub process, and then they wondered why it took 8 minutes to start all
their sub interfaces when they deployed to a larger customer that needed 700
interfaces started (instead of the typical 20 - 50). Oh, and after they would
start an interface, the code would do the "ps..." commands on all the other
interfaces to update an internal table on their status.

~~~
eindiran
That might very well be there difference. My coworkers are almost exclusively
people who studied CS, and several wrote C professionally at some point.

------
ykevinator
It's a great post and everyone stop flexing. Learn basic complexity if you
don't know it, it has huge dividends.

------
bogomipz
>"My rule of thumb here is that if I have to use a loop within a loop, that’s
O(n^2)."

This might potentially do someone a disservice. The whole point of analysis is
just that - to "analyze" the runtime and not resort to "rules of thumb."

The presence of nested for loops itself is not a good reason to declare
something quadratic. Consider the following which is most certainly not
O(N^2):

for (int i = 0; i < 1000; i++)

    
    
      for (int j = n; j >= 1; j = j / 2)
    
        // do something
    

EDIT: initialized j to the val of n on the inner loop per the comments below.

~~~
anon946
Well, as written, it is 'O(1)'. Or did you mean to put N instead of 1000?

~~~
bogomipz
The inner loop is log(N), it halves the value of J after every iteration. The
"do something" was meant to stand for a constant time operation so that it's N
log N. But my point was really that if you told an interviewer that it was
O(N^2) simply because it contained nested for loops it would be a very clear
sign that you didn't fully grasp Big O.

~~~
enedil
Nah, every algorithm that is O(N logN) is also O(n^2). That would be a sign,
that you as an interviewer didn't fully grasp Big O.

~~~
bogomipz
Nah "f(n) is said to be in O( g(n) )" formally speaking but practically
speaking an algorithm in some production code that runs in O(n log n) vs one
that runs in O(n^2) are not the same thing at all.

That would be a sign that you as a candidate are being needlessly pedantic and
usually a red flag.

~~~
enedil
You're confusing O with Theta.

~~~
bogomipz
No I am most certainly not. Nowhere did anything I said in the comment above
indicated I was referring to both a lower AND upper bound which is Big Theta.

Honestly it sounds as if you might not understand the difference between Theta
and O. I understand that a function that is O(N^2) grows no faster than O(N^3)
asymptotically speaking however the intention of my original comment and
example is quite clear about their context.

Please stop, you are adding nothing to the discussion.

~~~
vecter
You most certainly are confusing O with Theta. O is just an asymptotic upper
bound. Of course a function that’s bounded by n lg n is also bounded by n^2.

~~~
bogomipz
No I am not. The other commenter inexplicably and needlessly decided to
introduce Theta and assert that I was confused.

Maybe you might reread my original comment. This whole side discussion is of
no value to my original comment which has a very clear and very narrow
context.

I am not sure why the both of you want to belabor some ancillary talking point
that you yourselves decided to introduce. It is of exactly no value to the
context of my original comment and not in the least bit productive.

------
mbell
> I don’t want to turn this into a Redis commercial, but I will say that it
> (and systems like it) have a lot to offer when you start thinking about
> things in terms of time complexity, which you should! It’s not premature
> optimization to think about Big O upfront, it’s programming and I don’t mean
> to sound snotty about that! If you can clip an O(n) operation down to O(log
> n) then you should, don’t you think?

The thing is, I don't care about 'time complexity', I care about performance.
Big O can serve as a useful datapoint for what the performance may be, but
it's only that, one data point. e.g. It's not uncommon to find that a 'worse'
algorithm/data structure in Big O terms will out perform a 'better' one
because the 'worse' one has better cache locality. So no I don't think you
should use an O(log n) operation in place of an O(n) operation just because of
Big O, what matters is which one is faster.

~~~
robconery
OP here - Big O notation is simply shorthand math. When you're discussing
things in this way, time complexity and performance are the same thing. When
you care about resource usage (memory etc) that's _space complexity_ , which
is different. Either way, they're good things to understand.

>So no I don't think you should use an O(log n) operation in place of an O(n)
operation just because of Big O, what matters is which one is faster.

Mathematically the log n is _always_ faster :). Realistically... well that
would be a tough one to prove, even with caching, but I say go for it.

~~~
mbell
> time complexity and performance are the same thing

By definition, they are not.

> Mathematically the log n is always faster :)

No, it's not. Time complexity only gives you an asymptotic bound on the number
of 'operations', it tells you nothing about what the actual run time will be.

~~~
robconery
I believe we're talking past each other. Big O has _nothing_ to do with
"actual run time*. It doesn't care what about the number of inputs you have -
just that you have them.

Mathematically, if n=1000 then log n is 10. 10 operations vs. 1000 is,
theoretically and time complexity wise, faster.

Our disconnect is "actual" vs. "theoretical" and I want to stress again that
Big O is purely theoretical. It's a technical adjective.

------
js2
Formal definition from my ancient copy of Sedgewick:

 _A function g(N) is said to be O(f(N)) if there exist constants c₀ and n₀
such that g(N) is less than c₀f(N) for all N > N₀._

Continuing: "Informally, this encapsulates the notion of “is proportional to”
and frees the analyst from considering the details of particular machine
characteristics. Furthermore, the statement that the running time of an
algorithm is _O(f(N))_ is independent of the algorithm's input. Since we're
interested in studying the _algorithm_ , not the input or the implementation,
the _O_ -notation is a useful way to state upper bounds on running time which
are independent of both inputs and implementation details."

I'll also repeat this bit from the end of the analysis chapter:

Perspective:

Many of the algorithms in this book have been subjected to detailed
mathematical analysis and performance studies far too complex to be discussed
here. Indeed, it is on the basis of such studies that we are able to recommend
many of the algorithms we discuss.

Not all algorithms are worthy of such intense scrutiny; indeed during the
design process, it is preferable to work with approximate performance
indicators to guide the design process without extraneous detail. As the
design becomes more refined, so must the analysis, and more sophisticated
mathematical tools need to be applied. Often, the design process leads to
detailed complexity studies that lead to "theoretical" algorithms rather far
from any particular application. It is a common mistake to assume that rough
analyses from complexity studies will translate immediately to efficient
practical algorithms: this often leads to unpleasant surprises. On the other
hand, computational complexity is a powerful tool for suggesting departures in
design upon which important new methods can be based.

One should not use an algorithm without some indication of how it will
perform. [...]

------
yxmayt
A precise way of being imprecise.

------
User23
The formal definition is a fairly gentle introduction to logical math. Here is
a decent enough write-up: [https://justin.abrah.ms/computer-
science/understanding-big-o...](https://justin.abrah.ms/computer-
science/understanding-big-o-formal-definition.html)

------
toadi
I mostly don't think about it too much. If I have a piece of code that seems
slow I make it less complicated and mostly look up the big-O. Reason is that I
only had a few time in my 20 year career that I really had issues with this.

~~~
mixmastamyk
True, after a few years of experience you recognize pitfalls to avoid. I
remember figuring this out on my own as a kid writing a chat client, years
before learning O terminology. As the number of clients reached ten
performance dropped and quickly realized I needed to handle the situation more
efficiently and did.

Doesn't provide the smug satisfaction of using the academic notation/jargon
given in tech interviews and hn threads however.

------
auraham
I really like the post, I think it is very intuitive. However, it was a bit
confusing at the end. Consider these scenarios: (1) looking for an item in you
cart using a hash, and (2) looking for an item in the whole databse using a
hash. From the article, I assume that (1) is O(1) because of the hash, but for
(2) it is O(log n). Maybe I am confusing a hash and an index. If so, what is
the difference?

------
gaze
Oh ffs another article about Big-O without the actual definition anywhere in
it. Put it at the end after motivating it, put it at the beginning and then
explain it, I don’t care. Just actually give it at some point.

------
ratling
This is legitimately a good post.

