
Big O Notation: Using not-boring math to measure code's efficiency - gameguy43
https://www.interviewcake.com/big-o-notation-time-and-space-complexity
======
TheNewAndy
I don't think the attitude of being allowed to be hand-wavey is a good one for
this sort of thing. Big O notation has very precise semantics, and it is very
easily misused.

Many of the examples are wrong. A python "print" statement is (for the built-
in types at least) O(n), so printing the first item from a list must be at
least O(n).

Fundamentally, looking at big O notation as a way of measuring runtime is
misguided. It is a way of describing a class of functions. But there is
typically a level of indirection between the function that is being talked
about, and the one which the analysis is applied to. For the "return the first
item in the list" example, the function which is being analysed is a function
from Int to Real. That function iterates through all values with size
corresponding to the input to function, for each of these it runs the "return
the first item in the list" function and measures the time it takes. The
maximum time taken is then the Real output.

That intermediate function is important, and needs to be specified properly.
If it isn't, then you get confusion The section "N could be the actual input,
or the size of the input" is an example of this confusion. Here, the
intermediate function which is being analysed is much simpler (but it is still
present).

I should probably also add, that I am not by any means an expert here, and I
wouldn't be surprised if I've made sloppy mistakes here. I'll just make the
pre-claim that this only further proves my point that being hand-wavey is not
a good thing :-)

~~~
scott_s
When doing complexity analysis, one should state what the base operation is.
For example, when looking at sorting algorithms, we typically use the
operation "is _x_ less than _y_?", or something similar to it. What the
analysis then tells us is how many comparisons we make.

The author does not state what his operations are. So, with regards to the
examples, he implies that the operation he's counting is "print". So his
examples are correct.

You are right, though, that we can get in trouble if it turns out that our
base operation - which we assume is constant time - is actually not constant
time. When looking at algorithms that deal with very large numbers, for
example, we can't consider multiplication to be constant time.

So, with regards to the complexity of print, you are correct that it is linear
- O( _n_ ) - in the _item being printed_. But I believe you are confusing the
quantity "size of the list" and the quantity "size of an item in the list". If
we have a list of size _n_ , and that list contains items whose largest size
is _m_ , then we would say the cost of that first loop is O( _n_ * _m_ ). If
we can just assume that the size of an item is not large enough to matter -
that is, _m_ is actually a constant - then we can safely say that the cost is
O( _n_ ).

~~~
gameguy43
Agreed. I'm trying to strike a balance here between painting the full picture
and being approachable for people who are scared of math. These notes are
super helpful though--we can probably nod briefly to this complexity without
getting /too/ stuck in the weeds. Really appreciate the thoughts!

------
kazinator
An O claim can always be pessimistic. A linear time algorithm in N is not only
O(N) but also O(N*N) an O(2^N): it is at least as good as all of these. If we
are sure that the algorithm is faster or slower than N, then can say Theta(N),
the tight bound.

[http://stackoverflow.com/questions/471199/what-is-the-
differ...](http://stackoverflow.com/questions/471199/what-is-the-difference-
between-%CE%98n-and-on)

(Often, we are sure but we still use O.)

~~~
dllthomas
A start-up I applied for once asked me to take a test, and one of the problems
was "Implement a sorting algorithm that is not O(n^2)." I thought about
snarkily adding extra walks through the data, but decided instead to explain
the problem with the question and answer the question they were trying to ask.

------
nkurz
While these are probably the answers that an interviewer wants, I cringe every
time I see someone say that Big O tells you "how long an algorithm takes to
run". It's clear the author understands the limitations, but I worry that
encouraging people to equate algorithmic complexity with run time in the real
world does more harm than good. Big O is fine for theoretical analysis, and at
one point it was a reasonable approximation for performance, but with current
superscalar SIMD processors with several layers of cache, the gap between
theory and practice is getting pretty wide.

Big O tells you (to within some smaller order terms) how many more operations
an algorithm needs to perform if you double the size of the input from really
big to really, really big. This corresponds to increase in run time at small
scale only if all else remains equal, which usually it doesn't. And even then,
it only tells you the increase for that same algorithm, rather than how to
compare the actual run time to other algorithms with different constants.

My hope is that someday Big O will be reserved for theory and programmers will
preach something more like the roofline model:
[http://www.eecs.berkeley.edu/Pubs/TechRpts/2008/EECS-2008-13...](http://www.eecs.berkeley.edu/Pubs/TechRpts/2008/EECS-2008-134.pdf)

------
ggchappell
This is very nicely done.

I do think it has one major omission -- something that is unfortunately
omitted all too often. This is that, just as we get to decide what "n" means,
we also get to decide what operations we are counting. Furthermore, our
choices _matter_.

For example, if we are dealing with data accessed via a slow connection, or
just about any kind of distributed system, operations that use the slow
connection will be much slower than those that do not. So often we will count
only those. O(n^2) doesn't give the full picture, if we only use O(n) slow
operations.

For another example, if we are doing computations using large integers, then
it usually does not make sense to count arithmetic operations as single
operations. In such circumstances, a large-integer multiply can take a lot
more time than a pointer dereference.

------
shloub
Says it studies the complexity wrt the size of the input but doesn't say the
size of n is O(log n) when it comes to the example :(

~~~
gameguy43
Hm, sounds like a bug. Which example are you referring to?

~~~
jackpirate
The size of an integer n is O(log n).

~~~
gameguy43
Ah, gotcha. I'll patch that up. Thanks!

~~~
shloub
I was referring to the print s*n example (O(n) being exponential wrt the size
of the input). You're welcome ;)

------
cortesoft
Not-boring math? Isn't all math not boring?

~~~
marcosdumay
That depends entirely on how deep you look.

If you only look at the surface, and never care to think about what it's good
for, mat can be quite boring.

------
joshvm
I think a log n example would be useful, e.g. a basic divide and conquer.
Given that many algorithms strive for logarithmic complexity, it seems a shame
to miss this out.

Also how about a list of complexities in order of growth time?

~~~
gameguy43
Agreed. I'm working on an "advanced" follow-up that digs into log n and a
couple other things!

------
Bahamut
"Big O notation is like math except it's an awesome, not-boring kind of math"

Except big O notation is math and the use of it for complexity analysis is one
application of it - this is a completely off-base statement.

------
mdxn
I'll ignore discussion as to what counts as a step, what "n" is, or how it
doesn't necessarily reflect on the real world. I still feel like this article
focuses on the wrong thing. It's equivocating Landau notation (Big O) with
complexity/efficiency, but never explains what complexity is. It then proceeds
to link this to the efficiency of code under certain circumstances. While this
work gives a couple of examples as to how to determine the "big O" of simple
algorithms, it ignores its meaning and context. This is a huge mistake.

First off, big O notation wasn't created with complexity theory or algorithms
in mind. If I recall correctly, it has its roots in complex analysis. The
notation was later borrowed as computational complexity was formalized. What
happened is that people needed a way to describe the hardness or "logical
sophistication" of problems and tasks. Looking into the asymptotic behavior of
a complexity measure was just one means that they used. There are plenty of
other ways to do this. The article reads like big O exists for the purpose of
complexity analysis of algorithms. I doubt the author intended this, but they
should be a bit more careful.

Time and space aren't the only resources of interest. Sometimes you need to
quantify how much information you have or how much entropy there is. If you
want to understand "what is basically happening", this is something you should
pay attention to. Are we handling more information than we have to? What do we
need and not need? Circuit sizes, sophistication of proofs, setups of
databases, and the complexity of communications/protocols also play a role.

Even ignoring all this, the article is missing an elementary half of the
picture: the structure of the input. It's not just about code and its design.
Data structures matter, too. How information is organized, how it is encoded,
and how it can be manipulated (computed on) makes an absolute huge difference.
Are things sorted or unsorted? What kind of list or tree structure is your
code working with? From what I understand, this is the whole point of a
significant chunk of the interview questions candidates are presented with.
It's ridiculous to omit this when introducing algorithmic complexity.

The author claims that Big O notation is an "awesome, not-boring kind of math
where you get to wave your hands through the details and just focus on what's
basically happening". They are hand waving not just the math, but also hand
waving (and ignoring) most of the context as well. It doesn't even directly
explain why big O notation models code efficiency in a fruitful way. It just
mentions run time and memory usage superficially. In my opinion, this needs to
be significantly revised or rewritten.

~~~
scott_s
_The article reads like big O exists for the purpose of complexity analysis of
algorithms. I doubt the author intended this, but they should be a bit more
careful._

I see no problem with this. My first taste of algorithm analysis was in a
freshmen year course, where we were taught the phrase "big oh" and how to
analysis algorithms in quite a similar fashion as this article. The idea was
to give us the briefest of introductions of the topic, just enough so we could
talk intelligently about the elementary data structures we were being
introduced to (vectors, lists, trees). We didn't get the real deal until
senior year.

"Education is a series of small lies", is what my intro to computer
engineering professor said when he said that, actually, _some_ circuits are
ternary, not binary.

Basically, the author is trying to _introduce_ a concept to the reader. I
don't fault him for not heaping the entirety of an area into an introduction.

~~~
mdxn
I get the gist of what you are saying and agree with some of it. I still think
the article isn't sufficient, at all, for even an elementary introduction to
anything. It's incredibly incomplete. An introductory cs course is going to
present asymptotic complexity analysis (written using Landau notation) as one
of the TOOLS for thinking about the efficiency of code. It will hopefully not
just hop to big O without setting up the context. The author can fix this by
just reworking the preface. It needs to answer the question: "why are we doing
this and how will this accomplish anything?"

If you fail to frame the context and the semantics well enough, you'll end up
with people running around saying that an algorithm is "big O of blah." AGH

The introduction to the concept is articulated by the following comparison:

'We want to measure code efficiency. Big O notation describes this. Here are
some examples.'

vs.

'We want to measure code efficiency. We can do this by thinking about its
usage of resources (complexity). for example, total run time and memory usage.
One of the ways to compare the efficiencies of algorithms is to compare their
asymptotic behaviors. We can express this using big O notation. Here are some
examples.'

\-----

The author talks about how big O notation is awesome because you can drop the
constants. It is not explained why this is awesome or useful. Interestingly,
it later says that sometimes it isn't good because the constants sometimes
matter. What is the reader supposed to understand here?

There's no followup as to what to think about after you determine the "big O"
of an algorithm. Either he forgot to write this or figured that it was obvious
at that point. Considering that he didn't sufficiently explain the concept's
motivation either, I don't see how an uneducated reader will get a clear
picture.

------
triplenineteen
> But to be more specific we could say this is worst case O(n) and best case
> O(1) runtime.

Those should really say θ(n) and θ(1).

~~~
marcosdumay
That first one should be a big O.

~~~
triplenineteen
Hmm. Big O already encompasses "at worst".

Theta is the notation for a single case, be it the worst case, the best case,
one run of a randomized algo, etc.

At least that is my understanding.

~~~
ephemeralgomi
Big O is an upper bound, little o is a lower bound. Big theta means both an
upper and lower bound.

