
What does O(log n) mean, exactly? - martindale
http://stackoverflow.com/questions/2307283/what-does-olog-n-mean-exactly/2307314#2307314
======
anonymouz
The example in the question seems to demonstrate a common misunderstanding
with regards to Big O notation. It measures the time complexity of an
algorithm in terms of its _input size_. But the n in the OP's example is not
the input size, it is the input value.

In the actual implementation, the input size is of course constant (being an
int), but if you look at the intended algorithm it is (I have renamed the
variable into b on purpose, to avoid confusion with the n usually used in the
Big O notation) we have something like:

* Take a number b.

* Loop from 0 to b-1 and print the number of the current iteration.

As far as this algorithm goes, the input size here is not b, but of order
log(b) (number of digits to represent b), while the algorithm's runtime is
proportional to b. That means this algorithm has exponential runtime.

~~~
shawnz
OP's example accepts a parameter, and that parameter is the input size, not
any of the input values. In fact, in the example, there _are_ no input values,
the function is just doing busy work -- although, I suppose it's also possible
to say that i (0,1,2,3,...,n) are the values, since that's what he's printing.
However, as you said, the individual input values are irrelevant to the
complexity.

Consider this slightly modified function:

    
    
        f(string[] phonebook, int phonebookSize) {
          int i;
          for (i = 0; i < phonebookSize; ++i)
            printf(phonebook[i]);
        }
    

Here it's clear that phonebookSize (n) is the input size, not an input value
-- however the function is essentially the same otherwise.

EDIT: Sorry, I understand now. You are saying that the way he's defined his
function is what determines the input size. Since it accepts one integer, the
complexity could be described as O(2^n) (where n is the size of the integer in
bits, or digits, or whatever). I had made the assumption that my updated
function is what OP's intention was.

~~~
orionblastar
I'm confused, wouldn't this be more of an O(n) instead of an O(log n) issue?
If not, then what is the difference? No pointers being used, not even a data
structure, just an array of strings. I expected at least a linked list data
structure for a phone book. Like sorted in alphabetic order or something.

~~~
shawnz
Correct, OP's example contained an O(n) function:

> For example, the following function is O(n) because the algorithm grows in
> proportion to its input n: [...]

OP's question was whether there exists a similarly simple example of O(log n),
which is not related to what anonymouz was commenting on. (or I'm
misunderstanding?)

------
Joeboy
I sometimes wonder, what sort of people actually need to know about Big-O
notation? In about 15 years of working in IT I don't recall anybody ever using
it in a work context. Generally something like "this strategy will become very
slow when we have a lot of users" seems adequate. Admittedly I'm only a humble
web developer. Do you regularly talk in Big-O at work? What is your job?

Edit: Despite being only a humble web developer I understand that fast
algorithms are better than slow ones. I'm wondering how Big O _notation_ is
used in the real world.

~~~
gurkendoktor
But what do you do once something becomes very slow? Creating a database
index, for example, is a beautiful example of O(log n) :)

~~~
Joeboy
But in that case, knowing Big-O notation is a retrospective explanation of
something any quarter-decent developer knows already. You don't need to know
Big-O in order to know how/when to create a database index.

~~~
greenyoda
It doesn't have to be retrospective. A knowledge of computational complexity
can be used to make predictions about the performance of a system (even one
you haven't implemented yet).

What if your manager asked you for an estimate of how long it would take to
load a million new records into the database and index them? It would be nice
to know whether you'd have to take the system down for an hour or for a day or
for a week.

What if the performance of your code was slow and you wanted to know whether
it could possibly be improved, and by how much. Knowing the theoretical
limitations of the algorithm you were running (e.g., sorting takes at least
O(n log n)) could tell you what kind of optimizations you could reasonably
expect to make or whether you needed to invest in faster hardware.

~~~
Joeboy
> What if your manager asked you for an estimate of how long it would take to
> load a million new records into the database and index them? It would be
> nice to know whether you'd have to take the system down for an hour or for a
> day or for a week.

I don't think I really buy this as a practical use case. Unless I was running
on system I understood very well (eg. an embedded platform) I don't think I'd
really trust estimates calculated using Big-O to be reliable on atypical
workloads. I'd pretty much expect unforeseen caches / optimizations / swapping
to disk to confound my estimates somehow. If at all possible I'd test it. If
testing it takes too long to be practical then there's a serious problem with
the system. I can see cases where testing is impossible for good, practical
reasons, just not enough of them to justify the obsession with Big-O.

------
eieio
The answer provides an excellent definition and some great examples, but I
actually prefer the second answer. I think that when it comes to time
complexities, developing an intuitive mapping of solutions to time
complexities is important.

Developing a feel for how log(n) tends to emerge in common programming
techniques is much more practical than specifically checking your code for
situations where "the choice of the next element on which to perform some
action is one of several possibilities, and only one will need to be chosen."

That being said, I'm not the OP and it's very possible that the phone book
examples are more helpful in developing that understanding for other people.

~~~
sirclueless
I think the first answer's intuition is actually better. It's stated a little
opaquely, so maybe it's not actually useful as a teaching tool until you
already understand why binary-search type algorithms that rule out a
significant fraction of the input at each step are O(log n).

Basically, phrasing big-O as a property of the algorithm rather than the tree-
structure of the solution space is more helpful in practice. I think it's
quite common to have some problem in front of you and say "OK, so I can see
immediately there's a naive solution thats linear, is there a smarter way?"
And the answer nearly always comes (for me at least) when I try to answer the
question, "Can I find some way to make smarter decisions at each step?" If the
answer is "yes" or "maybe", you might be able to get to O(log n), but quite
often the answer is "no" because you're streaming random data or something, or
you have some logical reason why you have to look at every single input
object, and you can immediately rule out better solutions and just get to work
on the linear one.

That insight rarely comes to me when I ask "Is there any way to think of the
input as a balanced binary search tree?" If there actually is such a tree
structure, you're bound to find it anyways by asking "What smart decision am I
making at each level?"

~~~
eieio
I agree that I rarely find myself asking whether I can represent input as a
binary search tree, I just find the binary search tree image a very effective
method of demonstrating how logarithms play into this at all. It's an easy
example of log n complexity.

That being said I'm not sure how much my speculation is worth here. I'm very
comfortable with the idea of log n based complexity and don't remember how it
was taught to me in the first place. Given that OP accepted the answer and
that it was significantly more votes, it seems that it's helpful for more
people.

------
glurgh
The entire first few chapters of CLR/The White book are dedicated to
explaining this in great, patient detail.

<http://en.wikipedia.org/wiki/Introduction_to_Algorithms>

Just pick it up.

------
lkozma
I find both the questions and answers confusing, I think this is a case when a
little more "formal" understanding would help before jumping into "intuition"
and big picture explanations.

If the question is really what O(log n) means, then the OP should first try to
understand the definitions of Big-Oh notation, asymptotic growth rate, etc. At
this stage it is premature to talk about running times and algorithms, these
are simply statements about functions. Maybe the OP is already past this
stage, or at least has a rough understanding of these concepts.

If the question is where does O(log n) arise in practice, then one can explain
binary search, binary trees, the representation of a number in a certain base,
harmonic series, etc.

------
wavesounds
Did anyone make sure this guy knows that log in Computer Science is base 2? We
need to remember that most places simply writing 'log' is assumed to be base
10.

Log base 2 is just how many times we can you recursively split something in
half, which is not that complicated of a concept. However, his log example is
base 10 and if he was trying to figure out these things as base 10 it would be
super confusing and not make any sense.

~~~
geoffschmidt
Well, it's like this -- log_2 and log_10 (and natural log) differ only by a
constant multiplicative factor, so they're all the same asymptotically, and it
doesn't matter which one you use inside the O(). Pretty neat right?

~~~
valleyer
This is correct, but in case it makes more sense this way (it does to me):

log (base X) Y === log (base e) Y / log (base e) X

So the difference between log (base 2) and log (base 10) is just the
difference between being divided by log (base e) 2 or log (base e) 10. Since
those are both constant factors, big-O notation doesn’t care.

~~~
tsahyt
It's actually the same statement after all just with and without the actual
math. But yes, it's very easy to see it from this.

------
jcampbell1
Why is O(N * N!) not written as O(N!)? It seems that N * N! < (N+1)! and we
don't care about constant factors.

~~~
SamReidHughes
O(N!) is not the same thing as O((N+1)!).

O(f(N)) = O(f(N+1)) is true for exponential functions and less quickly growing
functions but not for things past that. (And it's not true for sufficiently
quickly decreasing functions, either, in the other direction, e.g. where f(N)
= 1/N!)

------
loser777
Why doesn't anyone actually mention what Big O (of anything) actually means?
Instead of jumping directly to a CS-related example, shouldn't the definition
be made clear first?

<http://en.wikipedia.org/wiki/Big_O_notation>

~~~
eieio
Nobody explains Big O because the OP demonstrates that he understands the idea
of Big O by giving the complexity of a piece of code and explaining how the
complexity would change if the code changes.

OP is confused with the idea of O(log n), not of Big O in general.

------
mseebach
It took forever for the coin to drop for me on log n, and it seems it hasn't
dropped for the accepted SO answer either.

log n is the height of a tree when the log base and the branching factor of
the tree is the same.

Hash lookups are log n, because the hash table implementation is typically a
tree.

~~~
greenyoda
A binary search tree is O(log n), but that's not what's typically referred to
as "hashing".

Hash lookups can be close to O(1) (constant time) if you use a hash function
that calculates a close-to-unique index into a table based on the key's value
(provided the table isn't too full, so there are few collisions). This kind of
data structure is typically used for looking up identifiers (variable and
function names) in a compiler.

More information here: <https://en.wikipedia.org/wiki/Hash_table>

------
just2n
I found CLRS's explanation of divide and conquer gives a very intuitive
understanding of what a logarithmic time algorithm is doing and why it runs in
logarithmic time.

A little bit of nitpicking on the accepted answer: the list seems to imply
that these are "the" possible runtimes an algorithm may exhibit. But there are
many cases where algorithms run in times that lie between these. Perhaps
related are log-star and O(log log N) algorithms. These are even more
interesting to me than logarithmic ones.

------
jgross206
Not sure of the universality of this, but my intuitive way of understanding
O(log n) is as follows:

An algorithm is O(log n) if you can reduce the solution space (the set of all
possible solutions) by a fraction (e.g. divide it in half or divide it into
tenths) using a constant number of operations.

For example, binary search is O(log n) because you can cut the solution space
in half (regardless of how big it is) by using a constant number of
operations.

------
andrewparker
The use of a phonebook analogy is clever, but I think the answerer should
modify his answer to cover the O() running time of a trie data structure. A
phonebook is essentially a trie. He talks about the "divide and conquor"
approach to navigating a phonebook, but that's not how a computer would store
the names. The way a computer navigates a phonebook would be interested in
read about.

~~~
yuliyp
A phonebook as trie would only apply if you had tabs for each letter all the
way down. It's a vector of pages.

------
avaku
Maybe after this discussion PG will write an article that going to the uni is
not such a bad idea after all :)

------
jahitr
An important fact to take in account about Big-O is that it represent the
upper bound of the time or space complexity.

------
zura
First associations for me - sympathy, elegance (without even looking at the
code), thought about that the data should be prepared (most likely sorted),
thought about that this might be one particular operation and [data structure]
might imply that other operations could have different O's.

Well, and finally I'll actually look at it (click on the link).

------
shoo
it describes those infinite sequences of real numbers that grow no faster than
logarithmically, ignoring constant factors and anything the sequence might do
for a finite number of terms at the start.

assume f : N -> R. by "f in O(log n)" we mean there exist constants c >= 0 in
R, m in N, such that for all n in N, if n >= m then f(n) <= c log(n)

------
dschiptsov
Order of growth. A shape of a function. Execution time (or memory usage) of a
procedure as a function of the size of its input.

------
hayksaakian
The phone book example was genius. I wish they would have taught _that_ in
school.

~~~
BHSPitMonkey
They usually do, if "school" is a computer science (or surely even just
mathematics) program.

------
orionblastar
It is designed to estimate the amount of time it takes to run an algorithm
using a data structure like a binary tree instead of a different one like a
linked list O(n)instead for linked lists. I am trying to learn it myself, so
bare with me if I make any mistakes.

They throw words like polynomial and I've asked people with comp sci degrees
to define what they are. I get back replies like "We learned it as a
polynomial and never got a definition of it." OK look, here is what it means
and some people with a comp sci PHD cannot even tell me, poly means many,
right, and nomial like numbers or formulas or rather in computer science and
algorythm. You have to study how algorithms work, I asked this on Hacker News
and many tried to bury the story because most couldn't even answer that, and
thanks to those who did. I am glad to see someone else bring it up and hope it
sheds more light on it.

Consider this, that you really cannot get a good estimate of how long any
algorithm can run because you got other factors in play. You got multicore
CPUs now, and some of the work is offloaded to GPUs as well (like Bitcoin
mining programs using GPUs, well now algorithms can use GPUs for some of the
processing), plus you got caches on processors you didn't have in 1971, not
only that but the law of thermodynamics applies to computers and electronics
and many people forget that, so if your CPU is over heating it could slow down
if you have a variable fan it could cool the CPU down and it runs faster for a
bit and then heats up and slows down a bit, most operating systems are
multitasking so the program that runs the algorithm might take longer if the
Apache HTTPD server is in use with heavy traffic on the server the algorythm
runs on, not only that but Anonymous might have sent a DDoS attack to your
server the algorithm is running on. I hope those with a PHD in computer
science figure those as factors when they explore this issue.

~~~
tincholio
>It is designed to estimate the amount of time it takes to run an algorithm
using a data structure like a linked list instead of a different one like a
binary tree.

Wat?? Excuse me, but it seems you have no idea what you're talking about.

>Consider this, that you really cannot get a good estimate of how long any
algorithm can run because you got other factors in play

That has absolutely nothing to do with the complexity of the algorithm. Also,
it _can_ be estimated, too.

I wonder if you're trolling or just clueless

~~~
orionblastar
Well I am trying to understand it better. Instead of explaining it to me
better, I get downvoted instead.

Please explain it further and tell me how you can estimate the time when a
DDoS attack is being done on the system, or the CPU is overheating, I'd really
like to know how you estimate that. Apparently I'm clueless and in need of
many clues.

~~~
nitrogen
Big O notation is an expression of a theoretical concept that applies to the
algorithm itself. It is independent of any hardware that might be used to
implement an algorithm.

Specifically, asymptotic complexity is an expression of the number of abstract
operations or units of memory an algorithm will require. The meaning of
"operation" or "unit of memory" depends on the algorithm. In the case of
comparison sorting algorithms, for example, the operation in question is the
number of comparisons.

Even if the CPU time allotted to a running algorithm changes due to system
load or overheating, the algorithm still performs the same number of
operations. Actual running time under system load can be calculated, but this
is unrelated to big-O notation or complexity.

\--------

Regarding your explanation, it is excessively wordy and uses awkward phrasing.
Your subsequent complaint about the terminology used in responses to your
previous questions about P=NP demonstrates a serious lack of prerequisite
knowledge. You learn the definition of "polynomial" in algebra; teaching you
algebra is far beyond the scope of a discussion on HN.

As someone who is primarily self taught but also finished most of a college CS
education, I can understand your frustration. In order to understand one thing
you need to learn three other things, and there's no map that leads you from
what you know now to what you want to know. Until someone makes that map, or
you know enough to make up that map as you read Wikipedia, you'll have to
learn things in the order they're taught in school.

So here's what you should do if you want to understand what people are talking
about on HN:

1\. (Re)learn basic algebra from start to finish. Use Khan Academy. You
absolutely need this to understand what a polynomial is.

2\. Accept Wikipedia as a source of information. Use a dictionary and other
Wikipedia articles to look up terms you don't understand.

3\. Related to #2, try to study cognitive biases, critical thinking, and
logical fallacies. Studying these concepts will help your brain process
information, such as difficult to understand Wikipedia articles. Check out
lesswrong.com.

4\. Study basic CS. Look for an Introduction to Computer Science course on
Coursera or elsewhere.

5\. Study algorithms. Take an algorithms course on Coursera or elsewhere.

~~~
orionblastar
I studied computer science at the University of Missouri Rolla. My instructor
Dr. Pyron said that this stuff wasn't needed to learn Computer Science and
that most of it was hoaky and kind of hinky. But he also told our Freshman
class that "Ethics don't matter" as well. So I think I was robbed of that
opportunity to learn it. BTW I feel that "Ethics do matter" and told him of my
opinion, and he said it wouldn't work in the computer business.

I had a TA for College Algebra who couldn't teach it, and the professor was
nowhere to be found. They made Catch-22 Jokes that he was like Major Major you
could only see him if he wasn't in his office. It made me feel better but did
not help me learn. I was in the Delta Tau Delta fraternity and they helped me
out, but claimed the TA had messed up some of the problems that could only be
solved with Calculus which I didn't learn yet. To make matters worse I got
caught up in the party college thing and underage drinking. I quit eventually.
But at least I didn't get forced to meet "Alice" like some unfortunate souls.
I still have my Intro to Pascal and Pascal with Data Structures books in my
basement, I should re-read them and try stuff with Free Pascal.

In 2005 I had a car accident and almost died, was in a coma, and lost some
memories as a result. I may just have to relearn everything all over again.
Since I am on disability, free online classes and sources are my only hope.
Thanks.

