Hacker News new | past | comments | ask | show | jobs | submit login

At a high level it's not really that much more complex (conceptually) than counting iterations and trying to find closed-form equations to describe the total number of iterations in best/worse/average cases. For example, something that requires one loop that iterates N times for N data points is O(N); something that requires two nested loops, each of which iterates N times, to process the same N data points, is O(N^2), because the inner loop will execute N^2 times, which is clearly worse than two loops in succession, i.e. 2N iterations, which is just O(N). The rest is mostly working out clever schemes to actually quantify more complex iteration (or recursion) patterns, or to find ways of accounting for infrequent expensive operations, like re-hashing ("amortization" basically averages them out across all the intervening steps), among other twists.



on the other hand, things like "nlogn is a lower bound for comparison sorts" can't easily be reduced to counting your for-loops.


There is a (fairly) simple proof of that, however. You may well be aware of this already, but in case someone hasn't seen it before:

  Each comparison of objects provides one bit of information.
  So k comparisons provide k bits of information, or 2^k possibilities. 
  Therefore to distinguish between n options, we need at least log_2(n) comparisons.
  There are n! ways of ordering a list of length n.
  log(n!) ~ n * log (n) 
  So to sort a list of length n, we need at least O(n * log(n)) comparisons.
It certainly requires more maths than just counting 'for loops', but I find the explanation surprisingly elegant.


Information-theoretic lower bounds are neat. Here's an even simpler example. In the decision tree model of computation (where decision nodes have constant degree), consider the problem of finding the value corresponding to a key in a dictionary with n elements. Each leaf is labeled by a value, and in the general case where all values are distinct there are n leaves. So, the tree must have at least one path of length Omega(log n). Note that this is true even when the key space is a contiguous interval like [0..n], where it's just an array indexing problem. That's why the RAM model of computation is strictly more powerful than the decision tree model.

> log(n!) ~ n * log (n)

Random aside: while this is usually proven with Stirling's approximation, there is a much simpler proof. For the upper bound, n! <= n^n because 1 <= n, ..., n <= n. For the lower bound, n! >= (n/2)^(n/2) because you can drop the first n/2 terms without increasing the product and estimate the last n/2 terms as being >= n/2. That gives the double inequality (n/2)^(n/2) <= n! <= n^n. Take logs and you get (n/2) log(n/2) <= log(n!) <= n log(n). What makes this proof tick is that while (n/2)^(n/2) is a really shitty lower bound on n!, the shittiness evaporates once you take logs.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: