
Big-O Misconceptions - denzil_correa
http://ssp.impulsetrain.com/2012/10/15/big-o-misconceptions/
======
btilly
This is a worthy attempt, but the horse has already left the barn. Programmers
everywhere understand big-O to mean what big-Theta should mean. And since O is
easier to type than Θ and Θ is the more useful concept, this will not change.

Given that language is defined by usage, you need to figure out what usage the
other person likely has, and work with that. I will happily use the correct
notation with people who understand it, but don't bother complaining about
incorrect use of the notation from people who will never have reason to care.

~~~
B-Con
I'm confused by what the article is saying about the meaning of Big-O... FTA:

> Misconception 4: Big-O Is About Worst Case

But it is. Big-O is an _upper bound_ which is by definition a maximum (aka,
"worst") case. "f = O(n)" means that at worst "f" will perform linear work,
not that it is expected to. He gave this definition himself in the very
beginning.

Then he states Misconception #4 and he says that Big-O is not about worst
case. He doesn't sound like he is implying that programmers casually use Big-O
this way, he seems to be saying this is how the real definition actually is.
But it isn't.

~~~
tylerneylon
Upper bound != worst case, though I can see why the language is confusing.

When we have a timing function f(n) we pretend f has one input, which is n =
the size of the input to the algorithm. In reality, it's actually more like
f(n, m) where m = a parameter specifying the _exact_ input to the algorithm.
(Example: sorting algorithms have many possible input lists of length n.)

So when we write f(n) = O(g(n)), we are trying to make two simplifications:
(1) Simplify away the actual input, replaced by only the _size_ of the input;
and (2) simplify the actual function f to something that describes f but is
much simpler.

I will make up an artificial example that is not perfect but is useful to
explain the ideas. Suppose for any input size n we have parameters m in the
range [-n-sin(n), n+sin(n)] and the exact timing function

f(n, m) = n + sin(n) + m

Then the first simplification is to determine how to ignore m. We could pick
out the best m for n (the smallest f(n, m) for a fixed n), pick out the worst
m, or take the average over all m. This is best-case, average-case, and worst-
case complexity. For this function, we have:

f(n) = 0 [best case] f(n) = n + sin(n) [avg case] f(n) = 2(n + sin(n)) [worst
case]

Then we could get an upper bound on _any_ of these.

f(n) = O(1) [best case] f(n) = O(n) [avg and worst case, not always the same,
but they are the same in this example]

If we wanted a lower bound, we could say that

f(n) = Omega(n) [avg & worst case]

because f(n) >= n - 1 for all n (in general there could be more room for
flexibility on the right-hand side, but it's not needed in this example).

The point here is that step 1 chooses a simplification across input instances,
that we call worse case or average case, etc; and step 2 chooses a
simplification in how to write down the function f(n), and this is what we
call an upper bound for big-oh, or for theta, both an upper and lower bound.

~~~
B-Con
Yes. Big-O is the worst case for the _input_ , not for the algorithm. If the
input is the expected average, then its Big-O is the upper-bound/worst-case
for the average. But if the input is the worst case or just the algorithm
itself, its Big-O will be an upper-bound/worst-case for the entire algorithm.

In other words, the original article's point should have been about
vocabulary. Big-O is about worst-case, but sometimes it's implied that the
Big-O analysis is about the algorithm's average case, not the algorithm's
worst case.

------
buro9
The biggest misconception I ever came across was during a job interview in
which I was the interviewee.

After explaining skip-lists and sketching up the Big-O complexities for best
and worst case for sorted and unsorted data... the interviewer turned to me
and asked why wouldn't I use a skip-list (it should already have rung bells
that it wasn't "in which instances might you not use").

My answer was that each problem should be understood in terms of the data and
task before choosing a data structure and algorithm. That answer was rejected.

The interviewers answer: "No, you shouldn't use skip lists because the memory
of all those extra pointers is too much for the JVM."

The biggest misconception I've come across is that Big-O says anything at all
about the language you're coding in and your implementation.

I did point this out, but was shot down pretty fast.

The company in that scenario was AWS, the position was architect level. I
didn't take the job. Interestingly they offered me another job different from
the one that I had been interviewed devoid of any product dev at all on the
basis that I was "too technical" for any role that included product.

Big-O doesn't care for your implementation.

~~~
jiggy2011
Weird, I don't quite understand how the pointers would be "too much for the
JVM". I suppose this will depend on how many layers your skip list has and how
big it is etc.

Is there some reason (related to GC or whatever?) that it is bad in particular
to use pointer heavy data structures on the JVM?

~~~
buro9
I questioned the same thing, but he answered with "the list might have
millions of items".

Java isn't my strong point but I struggled to think how it might be an issue.
Interviews aren't the place to possibly end up making the interviewer look
foolish. I just let it go.

If I was working with him I'd say "Show me". As I'm still curious as to
whether it makes any difference. Wasn't the point though, Big-O still doesn't
care about the implementation.

~~~
dbaupp
Java isn't my strong point either, so this is more of a question than an
answer. But Java is built around objects (and so everything is a pointer), so
the JVM is designed to work with pointer-heavy code, I would think.

(Or is that incorrect?)

~~~
jiggy2011
Java does have primitive types like int , char etc which I believe are stack
allocated. Java programmers do like to "box" these into classes like "Integer"
etc though.

What I find confusing about the answer is that it implies that there is some
reason that a skip list (compared to another structure) would be particularly
inefficient in the JVM (compared to say native C++).

Basically a skip list will cost you some extra memory by storing more pointers
which allow you to "skip" ahead the list but as you add extra layers each
layer will have less pointers than the one below it.

A skip list might be a bad idea if you are working a severely memory
constrained system and saving memory is more important than lookup speed on
your data structure. They would also of course be a bad idea if there is
another structure better suited to what you are doing (B tree or whatever).

But saying "Skip lists are bad because there are too many pointers for Java"
just seems like a big _wtf_ to me, unless there is something I am missing
here.

~~~
Espressosaurus
While he worded it poorly, he is correct. Kinda. In a skip list with millions
of items, you're going to have something like n log(n) SkipListObjects. The
references are virtually free. The objects are not.

On a typical JVM implementation, each object takes ~100 bytes. IIRC even an
empty ArrayList will cost you something like 60-80 bytes.

If the extra log(n) of space complexity is going to make a difference, then a
skip list might not make sense.

But honestly, if you've got millions of objects, you might want to consider a
different language/environment. Java is pretty fast these days, but it will
still chew up considerably more memory than most native solutions.

~~~
btilly
The factor isn't log(n), it is a constant. Depending on the implementation,
that constant is generally 2. (That's because each level is half the size of
the previous. So n + n/2 + n/4 + ... = 2n.)

That said, I could well believe that the constant associated with a SkipList
is worse than the constants with other data structures. What works well in C++
does not always translate directly to other languages.

~~~
DanWaterworth
The alternative would be some sort of tree. The obvious choice, some sort of
balanced binary search tree, also has two pointers per item, though half of
them don't point to anything.

------
flebron
I wrote this as a comment there, but I'll repaste it here since it's not
showing up.

\-----------

One other thing I'd like to add, when considering asymptotic analysis, one
needs to take into account both the cost model (i.e. what is assumed about the
cost of operations) and the input size (i.e. what the variables mean).

So for instance, if I tell you that the following program to find a number's
divisors is O(n) operations, I wouldn't necessarily be wrong:

    
    
      for(i = 1; i <= n; ++i) if(n % i == 0) printf("%d\n", i);
    

After all, this is doing a single loop, and it will loop exactly n times. So
why isn't factoring a solved problem? We can factor numbers "in polynomial
time!", I'll just check this list!

Factoring, as a problem, is measured using bit complexity. That is, the input
size is the number of bits needed to store the number I want to factor. So my
program is actually O(2^m) operations, where m is the input size. And so this
isn't factoring "in linear time", I just misunderstood what the input size
was.

Likewise, one can do analyses where the operation cost (also called
computational model) is different from the uniform model. If I am doing
crypto, using GMP, it would not be smart to say that adding any two GMP
integers is O(1) operations, like we assume with our standard model of costs,
where adding integers is O(1). We can certainly assume that, and call that a
basic operation, but we will see the number of operations and time it takes to
be wildly different, so one should use an analysis where the cost of adding,
say, the numbers n and m, is O(log n + log m) operations for instance, if one
wants operation-counting analyses to give a hint at runtime. It is important
to remember that it is the analyst who defines what operations are basic, and
one can do analysis of the same algorithm in different cost models, to get
different information about it.

So asymptotic analyses of algorithms depend on both the input size, and the
cost model. My divisors finding algorithm, for instance, is best measured
using the logarithmic model of computation, and logarithmic input size.

As a trick, it's fun to try to construct two functions f:N -> N, g:N -> N,
such that neither f is O(g), nor g is O(f). Bonus points if they're both
monotonically increasing. :) It's useful to dispel the notion that functions
are given a total order by big-O notation.

------
gsibble
The biggest misconception of Big-O is that it actually matters at most start-
ups. I get asked about Big-O all the time. My response is, "You haven't
launched. You have zero traction. You have no users yet. Stop worrying about
Big-O and just build the damn product."

~~~
jrockway
This attitude is the reason why the most iconic image related to Twitter is
the fail whale.

~~~
recursive
Better to be twitter with a fail whale than app.ly-ify'r with no users, but
awesome scalability.

------
tylerneylon
For folks who genuinely love math (I do!), there's a great but little-known
book by Hardy called Orders of Infinity that can help readers grok the
possibilities of big-oh notation. It's not about algorithms, but about getting
a solid understanding of the many ways functions can behave as n goes to
infinity. As a bonus, it's a short (< 100 page) book.

If you're more of a CS person than a mathy, I recommend Sipser's Introduction
to the Theory of Computation. This one is written in friendly yet precise
terms, and I found it a pleasure to read.

------
ambrop7
Fun fact: there exist pairs of positive monotonic functions f(x) and g(x) such
that neither f(x)=O(g(x)) nor g(x)=O(f(x)), i.e. the functions are not
comparable asymptotically.

Example:

f(x)=gamma(floor(x)+1) (=floor(x)!)

g(x)=gamma(floor(x-0.5)+1.5)

There exists no such constant M>0 that f(x)<M*g(x) for x from some point on,
and the converse. The two functions periodically cross each other in a way
that neither is asymptotically bound by the other.

------
oskarkv
_The correct statement about lower bounds is this: "In the worst case, any
comparison based sorting algorithm must make Ω(n log(n)) comparisons."_

You are making the same mistake that people that have Misconception 1 make.
"The algorithm must make <a set of functions> comparisons." is a nonsensical
statement.

------
adrianr
Misconception 4 should be the first.

------
bnegreve
I find this a too informal and ultimately confusing.

> Misconception 1: “The Equals Sign Means Equality”

The equal sign does mean equals.

> The left-hand-side is a function, the right-hand-side is a … what, exactly?
> There is no help to be found in the definition. It just says “we write”
> without concerning itself with the fact that what “we write” is total
> nonsense.

This is not clear at all !

> If you take it at face value, you can deduce that since 5n and 3n are both
> equal to O(n), then 3n must be equal to 5n and so 3=5.

Yes well, f(x) = f(y) does not implies x = y. I think people know this ..

... and so on

~~~
esrauch
> This is not clear at all !

He is using 'type check' to point out the flaw. x = y means x and y are the
same thing. If x is a function and y is an integer, then it can't possibly
hold that x = y.

Consider x = O(n). The implication would be that whatever type of thing O(n)
is, x also is. This is trivially untrue.

If x = y, and x is a function, then x(K) = y(K). It would be ridiculous to say
that x(K) = O(n)(k).

> Yes well, f(x) = f(y) does not implies x = y. I think people know this ..

I don't understand why you are mentioning that. The issue is that x = f(z) and
y = f(z), then x = y. You can't get two different values by applying the same
function to the same value.

Saying that f is O(n), the "is" is not an equality statement at all, "is" is
some other _function_ exactly as you mention that people understand. It is
clearly some function that is non-transitive.

In this case the relationship is most easily expressed using existing
operators by having O(n) be a set of functions and the "is" in that sentence
be the "exists-in" set operator, which is a case where the transitive property
doesn't hold. Using the = symbol in that context is just bizarre when = is a
otherwise always used to mean equality, which is function with special
properties that don't apply here.

------
maeon3
The equals sign in the big-O notation makes about as much sense as the
folowing mathamatical expression:

Pidgeons = arraylist(1, ..., n)

It is an exercise in nonsense, it helps young computer scientists unwravel
nonsense in the world by making it part of the curriculum.

Sure big-o notation can be useful, but using equations to describe it is like
trying to type with boxing gloves.

