Hacker News new | comments | show | ask | jobs | submit login

The example in the question seems to demonstrate a common misunderstanding with regards to Big O notation. It measures the time complexity of an algorithm in terms of its input size. But the n in the OP's example is not the input size, it is the input value.

In the actual implementation, the input size is of course constant (being an int), but if you look at the intended algorithm it is (I have renamed the variable into b on purpose, to avoid confusion with the n usually used in the Big O notation) we have something like:

* Take a number b.

* Loop from 0 to b-1 and print the number of the current iteration.

As far as this algorithm goes, the input size here is not b, but of order log(b) (number of digits to represent b), while the algorithm's runtime is proportional to b. That means this algorithm has exponential runtime.

Your first paragraph is also a slight misunderstanding of big O notation. In fact, the notation allows to make general comparisons between the growth rates of arbitrary functions, not necessarily referring to time complexities of algorithms (although that is a frequent use of big-O notation). For example, sometimes you use big O to express the amount of space an algorithm uses, sometimes you just count some objects with a certain property, etc.

When it comes to time complexity, one always has to clarify what is the computational model assumed. What are the allowed operations? What are we counting? Are we counting the number of comparisons between arbitrary (comparable) elements? Are we counting operations on bits? Are we counting arithmetic operations on numbers that fit in a register of some idealized computer? Of course if the model is not agreed upon, the answers can differ, as in your example.

Once the model is clear, and you want to express the running time of an algorithm, the result can in fact depend on both input size and input value (although this distinction makes more sense in certain models than in others). An example is the Ford-Fulkerson algorithm for max flow, whose running time (in the RAM model) depends both on the size of the network and the value of the max flow.

> In fact, the notation allows to make general comparisons between the growth rates of arbitrary functions, not necessarily referring to time complexities of algorithms (although that is a frequent use of big-O notation).

True, I was simplifying by restricting to the context the OP was asking about (time complexity of an algorithm).

Yes, I didn't mean to sound pedantic - just meant that there is no single way to measure time complexity of algorithms - when we talk about sorting, often we assume the comparison model: only comparisons between elements count, and we can't do any arithmetic on the elements - then we have O(n log n). In another model we assume the elements are numbers of reasonable size and we can operate on them in constant time (a somewhat realistic model of a CPU) - then we can sort in O(n).

That's why I think it is good practice to separate the understanding of the mathematical concept from the study of algorithms and just be clear about what we are counting.

I get your point now, and it is indeed an excellent one to make.

In my opinion the misunderstanding arises due to the lack of understanding of what log n means, and not so much O notation.

Intuitively, logarithms are much easier to understand in the context of exponentials. For example, if there is "something" (be it a function, space, money, etc.) which starts at value "c" and doubles with each iteration: c, 2c, 4c, 8c,..., that is what we call exponential growth and iteration n would be represented by the following notation: c * 2^n.

Where does the logarithm come in? It is defined in the following way:

   if a^b = c, then log_a c = b   (note: read as log of c in base a equals to b)
so an exponential relationship can always be written in terms of logarithms.

Now, if the answer that we are looking for is not the actual value, but how many iterations did it take to get there ("b" above), that is the direct result of applying the logarithm to the function.

How does it all apply to functions execution time and Big O notation? Well, if we have a space N that we are searching over via a function F, and this function does so iteratively by dividing search space N into 2 at each iteration and discarding the wrong half, we will have the following sequence representing the space to search over at each iteration: N, N/2, N/4, ..., N * (1/2)^n.

If the question is, how long will it take this function to run? The answer is that starting at space N, the function will run until it has narrowed down the space to 1, so the search is over and it found (or not) what it was looking for. So the question is really, how many iterations will the function need to do? And as explained before this is exactly the definition of logarithm.

Therefore: ('~' defined as 'proportional to')

   time to run ~ space to search ~ # of iterations ~ log_2 (N*(1/2)^n) = constant + log (n)
In general, intuitively (not always true though) if there is a function that at each iteration is dividing the size of the work to be done by a constant factor (2,3,4,...) the run time, that is the number of iterations, will be O(log n).

OP's example accepts a parameter, and that parameter is the input size, not any of the input values. In fact, in the example, there are no input values, the function is just doing busy work -- although, I suppose it's also possible to say that i (0,1,2,3,...,n) are the values, since that's what he's printing. However, as you said, the individual input values are irrelevant to the complexity.

Consider this slightly modified function:

    f(string[] phonebook, int phonebookSize) {
      int i;
      for (i = 0; i < phonebookSize; ++i)
Here it's clear that phonebookSize (n) is the input size, not an input value -- however the function is essentially the same otherwise.

EDIT: Sorry, I understand now. You are saying that the way he's defined his function is what determines the input size. Since it accepts one integer, the complexity could be described as O(2^n) (where n is the size of the integer in bits, or digits, or whatever). I had made the assumption that my updated function is what OP's intention was.

I'm confused, wouldn't this be more of an O(n) instead of an O(log n) issue? If not, then what is the difference? No pointers being used, not even a data structure, just an array of strings. I expected at least a linked list data structure for a phone book. Like sorted in alphabetic order or something.

Correct, OP's example contained an O(n) function:

> For example, the following function is O(n) because the algorithm grows in proportion to its input n: [...]

OP's question was whether there exists a similarly simple example of O(log n), which is not related to what anonymouz was commenting on. (or I'm misunderstanding?)

The value n often stands for the input size measured in bits, but not always. As long as you make it clear what n means, there is nothing wrong with using O-notation for that n. O-notation is just a way to denote a set of functions. This notation is also commonly used in mathematics and physics, where it doesn't even have anything to do with algorithms.

Of course, but the OP is not asking about big-O-notation in mathematics in general. He is specifically asking about what it means when applied to time complexity of algorithms.

I've always read O(n) as "The worst-case (time|space) complexity in respect to n" where n is typically the input size, but could be any defined value (in this case, the value of b).

I'm confused by this.

His function f(x) is of linear time-complexity (with respect to the value of x). Assume it takes time c1 to print a number, this I will create a function g(x) = c1*x + k, where k is a constant representing the overhead. Then f(x) < g(x) and g(x) element of the set O(x).

I am not sure how that makes his function run in exponential time. Am I misunderstanding something?

The function runs in linear time relative to the value of x. However, time complexity refers to the size of the input, not the value. For example, binary search's O(log n) time complexity refers to the number of elements n in a list(n varies with the list's size). x + 1 will always increase the value of x, but not necessarily the size. The size of x is the number of digits in x. f(x) runs once if x=1, ten times if x=10, one hundred times if x=100, etc. So each time x's size increases by 1, f's runtime increases by a factor of 10.

It's a very literal definition of time complexity and it's counterintuitive, but ultimately it does make sense.


It's the difference between a function taking a list and a function taking a number.

A list's size depends on the number of elements in the list. Adding an element to a list increases the size of the list.

A number's size depends on the number of digits required to express the number. Number++ will always increase the numbers value, but not necessarily the number's size. anonymouz's point is that we should describe f(x)'s runtime in terms of the size of x, not the value of x. While I agree that the natural interpretation is to use x's value when x is a number, and x's size is going to be fixed anyway, f(x) is exponential in relation to x's size and linear in relation to x's value.

Put another way: 4 and 5 have different values, but the same size: we're expressing both with 1 digit. 4 and 10 have different values and different sizes, as we require 2 digits to express 10 and only 1 to express 4. Of course we'd probably be dealing with binary, but it's the same idea.

It's really a question of semantics/how you interpret OP's example and the natural interpretation leads to the function being O(n). However anonymouz makes an interesting point that I hadn't considered before.

But the size of the number is such a programming-centered view of things. In my opinion it doesn't at all reflect the mathematics behind it. It's completely dependent on the representation of the number, which could really result in anything.

The only thing that really matters is the size of the set of numbers that you do your computation on.

What's really going on here (not in code but conceptually) is that you have two steps. First you take a number `n` and basically convert it into a generator for the set of numbers `{0, ..., n-1}`. We can assume that to be a O(1) step. Then in the second step you apply a O(1) function for every number in the set which is clearly O(n). So you end up with O(1) + O(n) which boils down to O(n).

So we have functions

`g(n) -> generator for {0, ..., n-1}`

`f(x, funct) -> funct(n) for n in x`

and we combine them into

`p(n) -> f(g(n), print)`

Clearly, the input size of the arguments to `p` is no longer meaningful for the total runtime. You have to assume it is a set of numbers to make any sense.

So in my opinion while anonymouz is technically correct, it does not make any sense to see it that way. BigO notation/analysis is there to gain some understanding about your algorithm complexity and not to link the amount of digits in a number to the complexity of your algorithm.

>So in my opinion while anonymouz is technically correct, it does not make any sense to see it that way.

I absolutely agree, and anonymouz's approach is not how I would approach time complexity. I thought it was an interesting point and so I decided to explain where he was coming from, but practically I don't think it makes any sense.

Consider a function that prints a list of the numbers 1 through N. This function is linear in the size of its input, because we're talking about the size of a list.

But now, make a function that prints the first N numbers. This function is exponential in the size of its input, because we're talking about the size of a number.

We can optimize this second function by replacing our binary representation of a number with a unary ("tallying") representation. Now the size of a number grows linearly with its value, and so our function is linear instead of exponential. Nice win!

anonymouz's interpretation is bizarre and definitely not how I'd think about the function, I'm just trying to explain where he's coming from.

I don't plan on using the interpretation in the future but it was confusing to me too and I thought it was an interesting thought so I figured I'd attempt to explain.

> anonymouz's interpretation is bizarre and definitely not how I'd think about the function.

It is usually understood that one measures the time complexity of an algorithm in the size of the input (only if the input is an integer, would it even make sense to measure it in the value of the input). So you may of course use whatever definition you like, but be careful that statements about the time complexity of certain things are often made without referring to the exact definitions used, and you may then misunderstand them.

For example, when people say "We don't know a polynomial-time algorithm to factor integers" they mean polynomial time in the size of the input (i.e., number of digits/bits of the integer). The problem would trivially be polynomial time if you allowed it to be polynomial in the integer itself.

Sorry, I deleted my post before I saw your response.

Surely the n in the OP's example is the size of the input. The input is the phone book (or a given page in one case, etc) and n is the number of entries.

Applications are open for YC Summer 2018

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact