The math of Big-O isn't that hard and the article while having good intentions misses the point. Big-O is about asymptotic behaviour and the graphs are misleading in that regard (well, they're simply wrong, not misleading). There are algorithms where if you just look at the Big-O you'd think one has faster run time than the other but because the constants fall out that wouldn't be the case for any practical problem size.
What O(N) means is there is some large enough number where the run-time (edit: or any function really) is bounded by a constant times the input size for any input size larger than that number (see, math, not hard). That constant may be so large that an O(N^2) may be a better solution for any practical purpose.
EDIT: As an example of this we can look at two multiplication algorithms, Karatsuba and Schönhage–Strassen, the latter having better asymptotic performance but that really kicks in once you have large enough numbers (100,000 digits or so). ( http://en.wikipedia.org/wiki/Sch%C3%B6nhage%E2%80%93Strassen... )
He's saying that big O only matters for big input sizes, because big O is specifically about the algorithm's asymptotic performance, which means its performance for a large value of n. If you have a small value of n then the other constant time operations in the algorithm may affect the running time more.
n^2 is less than n^3 right?
But is 2 * n^2 + 100 less than n^3?
Depends on how big n is, right?
Big O notation just assumes that n is big enough to make the answer "yes."
More or less, yes. What he is saying is that there is a constant hidden in the big O. Say, we have two algorithms A and B with a runtime that can be bounded by the functions a(n) = 1000n^2 and b(n) = 0.001n^3 respectively. Hence a ∈ O(n^2) and b ∈ O(n^3). So the first algorithm A is clearly faster asymptotically. However, suppose we have input sizes of around n=50000, it actually turns out that algorithm B is faster because g(50000) < f(50000). This is because of the constant that is hidden in the big O. For sufficiently large n however (n > 10000000 in this case), algorithm A will always be faster.
A good implementation could check problem sizes beforehand and chose the algorithm accordingly.
What I don't agree with is that "big O only matters for big input sizes". Big is not really a well-defined term. The problem here is that "big" depends entirely on the algorithms. It might also be n > 10. There's nothing in the definition of the landau symbols that prevents that.
The definition of a "big" input size is somewhat circular, yes.
"The big O performance of an algorithm only matters for a sufficiently large value of n."
"Define a sufficiently large value of n."
"Large enough that the big O performance of the algorithm starts to matter more than the coefficients of n and constant time operations."
So yes, that could be 10 or 10 million depending on the nature of the problem, the constants and coefficients of the algorithm, the language, the hardware, etc etc. You could have an algorithm that takes factorial time but maybe you're only using it in a problem domain where n is always < 10 so you'll probably never notice or care that it's O(n!)
* That constant may be so large that an O(N^2) may be a better solution for any practical purpose.*
Another example of this is that a good system/library sorting function will sometimes use an asymptotically fast algorithm like quicksort or mergesort only for arrays over a certain size, but will switch over to a O(n^2) algorithm like insertion sort to sort smaller arrays, because it has smaller constants and coefficients. (But sometimes this is overkill, because if n is small, you probably aren't going to notice the speed difference anyway.)
So yes, Big O is all about asymptotic behavior, which basically means, the rate of growth as n approaches infinity. Because a large value of n is when the choice of algorithm starts to really matter for performance.
His tilde notation is also more informative than the big-O one. The canonical example is Quicksort. If you look at its big-O time complexity, it's O(n^2). When for all practical cases, it's ~n log n.
This sort of contributes to giving self-taught programmers a rather bad name. The writeup is good, but the idea that Big O is "scary" is just absurd. It's an elementary concept that every working programmer should be familiar with, regardless of whether they're self-taught. Algorithms are not "scary". If you can't reason about algorithms, you may not be a very good programmer yet.
To be clear, I really appreciate the writeup. I just wish it had been framed better. It should be clear that this is for beginner programmers, regardless of whether they have a degree or whether they're self-taught.
Ahh yes. Let's berate the OP for being intimidated by a topic and then diving in and learning it on their own. This will really encourage others to learn on their own and contribute back.
Well, whether we like it or not, self-taught programmers are held to a higher standard. It doesn't help us to further the stereotype that self-taught programmers are afraid of the basics, haven't attained a general education in computer science on their own, or are less reliable than their peers who have degrees.
Not trying to berate the OP. I'm trying to say I wish OP had framed it better.
Being familiar with big O notation is not that same thing as being familiar with a few basic complexity classes. I wouldn't be surprised or disappointed if a self-taught programmer was intimidated by big O notation, but I would be surprised if they were unfamiliar with the concept that hash map lookups are much faster than array searches, or that searching a sorted array is much faster than an unsorted array. You seem to be using "big O notation" to refer to fundamental competency about basic data structures. It's even quite possible and understandable for a self-taught programmer to, over time, figure out that certain big O classes refer to certain algorithms while still not understanding the meaning of the mathematical notation.
but the idea that Big O is "scary" is just absurd.
To you. I never studied things like these in school and getting to a point of "Oh that's what that means" was long and arduous as it pertained to a lot of scientific literature.
Plain English, it seems, isn't in the tool set for a lot of very smart people who, coincidentally, feel that it's their duty to Explain All the Things. It's unfortunate that so many of them are deluded by the "x should be elementary" mindset or the like without regard to the language they use or the specificities of notation (Big-O in this case).
It's unfortunate you've read my post as a claim to avoid actually learning these.
I was pointing out that the "x is elementary" attitude doesn't help the process and I thought (from the video at least) some empathy in communication is warranted. It's ridiculous to expect clarity when no such thing exists in the language used to describe an idea in the first place. The OP went to the trouble of making such a post. Maybe it wasn't perfect, but it gets the ball rolling.
Whatever inadequacies can be corrected with feedback and I for one would like to see more people engage in humane explanations for things pertaining to their expertise.
Big-O is "technical". Understanding of it comes with clear explanations.
No where did I claim anything contrary to : "The correct way to "understand these things" is to learn the math, the theory, and do the exercises." What I claimed was that language can soften the barrier to learning these as I imagine it did for you.
Well, I must still be misunderstanding you because I spent 30 minutes writing a whole comment explaining why Math is important then got to the bottom of your comment and realized we both believe the same thing (that these subjects are important).
What I will contend is that pity parties about scary topics are unhelpful and rigor is important - more so for self-taught people. Like you, I advocate humane teaching. Humane teaching, to me however, is more about adapting to learning styles while maintaining the rigor and difficulty of the material (without watering it down) - this article watered it down.
Clear and humane explanations are out there; this article was not one of them. The intention was noble and I respect them for that but I agree with the top commenter in that the pity party needs to end and more self-educated individuals need to be role models for those that do find it scary so that we can all (as in self-educated people) strive to understand difficult concepts instead of "being okay" with not fully understanding it or the language it was meant to be understood in.
Much as autodidactic scholars hold themselves up to very high standards when reading about an author, they read the author's works in the original language they were written in - not in its translations (this is slight speculation because I don't know any autodidactic scholars personally, but I have read some of their articles).
Watering down is unacceptable. No argument here. What I do appreciate though is that advanced topics can be made reachable with a step stool, at least at first, before the full rung up the ladder.
That "being okay" with not fully understanding a concept grates me to no end too.
It's honestly incredibly condescending. That said, there are ways to be more clear without being condescending and without accepting that "okay" is good enough.
Take that math, for example. I've lost count of how many people I've run into who hate Calculus and the like because "it's so hard!" This tells me that the approach to teaching it was all wrong. I hated math too, cause it was big and scary, until I came across and awesome teacher who actually sat down with me and went over the basics with very careful attention to the language she used. There are approaches to teaching like this online I'm sure, but they're very few and far between.
As an aside, you can't be self-taught in psychology. It's like being a self-taught doctor: Being educated in psychology is predicated upon you having a degree in psychology, because it's a very certification-heavy field.
I would actually advise you to not brag about being "self-taught in psychology" because it's a very strong indicator that you don't know what you're talking about.
So...you're claiming that because something is certification heavy I can't be self-taught!?!?!?! Jungian psychology, freudian psychology, alchemical symbolism; ALL disciplines of personal understanding and understanding inter-personal relationships that have spanned hundreds of books on my bookshelf and years worth of self-application to become a happier more effective human being. I guess none of that was worth it because I don't know what I'm talking about!
Oy I might as well stop reading books then because claiming I'm self-taught in anything will make me look bad! No offense, but your comment made you look like you don't know what you are talking about. How can you seriously say to someone they shouldn't call themselves self-taught/educated in anything based on your criteria?
What the fuck would you call self-education then if reading books, becoming more intelligent, applying it to your life, and improving quality of said life isn't self-education? Self-education is something everyone does, even people that have been through a formal education. The difference is that they are choosing their subjects of study instead of having them chosen.
By the way, your analogy would be stronger if I laid claim to "psychiatry" rather than "psychology" - psychiatry is more akin to being a doctor; I do not ever claim to practice what I know on other people just as people that love studying physiology and medical text-books don't practice on people!
Oh, also, I was not bragging, I was qualifying myself for the commenter as someone who is self-taught so I wouldn't appear to be someone that doesn't know what they are talking about.
I think the "scary" part is the notation. Any competent programmer, whether schooled or self-taught, is familiar with the concepts if not the notation.
Thanks for the comment. I don't feel as though self-taught programmers have a bad name. I feel like they can lack some skills because the importance of them aren't lauded in their social circles. It wasn't until 5 years into my professional career that I was fortunate enough to work with someone with a computer science background, so the topics of Big-O never even came up.
I think computer science has a bad wrap for being useless brain-teasers used only in interviews (within a subset of the target demographic of this article). Through the bits I've managed to pick up, I feel like they contribute to a better understanding of programming on a broader spectrum.
While you may not see a body of knowledge that many people expect you to know (that you don't know) as scary, I can assure you that many people do. There are many similar topics that induce some amount of fear in me still: issues around multi-threading and race conditions, cryptography, deep understanding of networking stacks, and compilers to name a few.
I'd also like to point out some of the phrasing you've chosen. 'the idea that Big O is "scary" is just absurd' -> 'If you can't reason about algorithms, you may not be a very good programmer'. This is the source of fear among self-taught programmers (and the source of my own fear in the above examples). We all want to be good at what we do, so let's try to lift each other up! :)
I understand where you're coming from, but unfortunately employers read articles like these and use them as justification to solidify their notion that self-taught programmers are less knowledgeable or less reliable than their peers with degrees, and hence should be paid less or not be hired at all. I've experienced it firsthand. You may argue "that's not a place you'd want to work at anyway," but unfortunately in a down economy one does not always have the luxury of rejecting work on principle. If you've ever looked into the eyes of your cat dying of cancer and felt ashamed that you didn't earn enough money to bring him the proper care to extend his life, then you'd possibly understand that prejudice against self-taught programmers who didn't have the opportunity to attend a university can be a real problem. Playing into the stereotype that we're all afraid to learn and happened to get lucky in getting a job isn't helpful.
The work you're doing is wonderful. The problem I had with it is that a simple modification to it (not phrasing it in a condescending way toward self-taught programmers) would've made it so much better.
You keep repeating this, but that doesn't make it true.
Where I've worked and hired people -- in San Francisco and Silicon Valley -- there is little emphasis placed on formal eduction. Virtually none. Some companies have a reputation for liking degrees. Google, for example, but they are an exception.
I don't think a self taught programmer, who takes his craft seriously and learns not just practical how-to but also data structures and algorithms, is at ANY disadvantage in today's job market.
Well, you work in SF / SV. There's a whole Earth outside of those places. And those places are largely prejudiced against people who don't have degrees. I've experienced it firsthand.
"Well, move!" Except it's not that easy when you have family.
Unfortunately, there are some misconceptions that are propagated in this article. Kudos on the effort, but some statements are just flat out wrong, such as this statement: "Big-O is all about the approximate worst-case performance". Big-O has nothing to do with worst-case, but is a bounding function. An O(n) algorithm is also O(n^2), O(2^n), etc. Those are valid bounds on the O(n) algorithm, just not the smallest.
Speaking of worst-case (or best-case, average-case, etc.) scenarios, how does big O notation relate? The variables inside an O() notation as far as I know refer only to the size of the input, so when we say that finding a value in an unsorted list is in O(n), we're referring to the worst-case scenario (obviously, finding a value in a list when that value is the head of the list is constant time, and not very interesting). Of course, that's a simplistic example, but with more complex algorithms like Quicksort, when we say it's in O(n log n) we're talking about average-case. Is this just because we know that worst-case performance in Quicksort is exceedingly rare so we don't bother mentioning that O(n log n) is average-case unless we're studying it more deeply?
Big-O (or Ω or Θ) notation is orthogonal to notions of best-case or worst-case, which are merely constraints on the set of inputs we're considering. If, for example, you ask "what is the upper bound on best-case running time for linear search," then what you're actually asking is "given an input sequence that begins with the item being searched for, what is the upper bound on the running time." I think part of the confusion stems from the fact that when people supply bounds on best- or worst-case running times for deterministic algorithms, they often use Big-O notation when they're actually giving an asymptotically tight bound. So in the case of linear search's best case, O(1) is an upper bound, but the lower bound is of the same order, i.e. Ω(1). It's not incorrect, but it can be misleading.
In the case of Quicksort, I think the answer to your question is yeah, pretty much. I mean, cases where you're dealing with nearly-sorted input are common in some domains and in that case you should be very much aware of the worst-case behavior of Quicksort, but otherwise I think the discussion from CLRS puts things in perspective: even if you had the bad luck of only ever receiving input sequences such that every partitioning led to a 99-to-1 split, then your running time is still going to be O(n lg n), since your recursion tree is going to be at most log-base-100/99(n) levels tall, which is the same as lg n / (lg 100/99), the denominator there being a constant factor that disappears in an asymptotic context.
The average case for finding a value in an unsorted list is also O(n). Assuming the values you want are randomly distributed, on average you have to look at half the list. Naively this sounds like O(n/2), but O(n/2) is actually the same as O(n) because you strip out the constant factor of 1/2.
Of course. But it's also the worst case. I'm really just wondering if there's a "default" scenario that's being referred to when we just say "f(n) is in O(n)" or does it depend on context?
It depends on context. Generally if people fail to specify, they mean average case, as in "Quicksort takes O(n*lg n) time". There may even be some amortization shenanigans thrown in there, as in "list.append takes O(1) time".
Quicksort is the exception, in my experience big-O usually refers to worst case running time as opposed to average case or expected running time. CLR(S) [1] sticks to worst case because 1) worst case is a guarantee; 2) the worst case can be frequent; 3) the average case is often roughly as bad as the worst case. This is a widely used textbook, so I generally assume a lot people follow its conventions.
It's also not obvious what "average case" means. For example, if we're talking about the asymptotic behavior of the function
A(n) = avg { RunningTime(Algo, x) : x is valid input to Algo and len(x) = n }
are we assuming that the inputs are drawn uniformly and therefore have equal weight or are we drawing them from a non-uniform distribution? What if some inputs never occur in practice?
Worst case and best case are totally unambiguous and don't need any additional clarification. There are just so many knobs one can adjust when defining "average".
First, when we talk about Big-O we're not talking about the "worst case scenario." Big-O gives us an upper bound on the worst case scenario, but the actual worst case scenario might be better. Big-O means "no worse than" not "as bad as."
When most people say Big-O they really mean Big-Θ, which does encapsulate the idea of "asymptotically equivalent, up to a constant." See my answer on Quora for more technical details.
Second, Big-O and other forms of notation used in the asymptotic analysis of functions were invented before physical computers existed. They're statements about pure functions.
When applied to the analysis of algorithms the function we're "really" analyzing isn't the algorithm. Rather, if we have an algorithm A that takes as its input a positive integer n, we're really analyzing the function "the length of time it takes algorithm A to run given input n."
The up-to-a-constant nature of Big-O notation is nice because that constant can encapsulate things like processor speed, memory access times, and so forth. This enables us to make intelligent statements about algorithms per se without reference to the underlying machine on which the algorithm might be implemented.
Even with ideal functions, this naïve asymptotic analysis has some problems. For a toy example, imagine a spiky function like this:
f(n) = 800*n^2 if n is divisible by 1000000000
f(n) = 400*n otherwise
This function is not O(n) but it is O(n^2). The "worst case" behaves like O(n^2), but for "most inputs" it behaves like O(n). We can't say "f(n) is asymptotically no worse than n, up to a constant" because for infinitely many inputs it is.
Lots of algorithms behave like this in practice because we optimize for common cases perhaps at the expense of less common cases. "Common" is dictated by how our algorithm is used.
Taking quicksort as an example, let's call the algorithm Q. We want to measure its running time given an input of length n. For input x, let's say it's running time is T(x).
Well, there are many x such that len(x) == n, so what does it even mean to say "its running time given an input of length n?" Are we given a particular input of length n? A uniformly-selected-but-random input of length n? To the extent that we can, we want to be making statements about the algorithm per se, not statements about the algorithm given a particular input.
On way to answer this to ask "Given an input of length n, what's the most time my algorithm could take?" In that case we're analyzing the following function:
W(n) = max { T(x) | x is valid input and len(x) == n }
On the other hand, maybe we care more about the average case. Perhaps we randomly pick 1,000 inputs of length n and average the running time. Now we're talking about something that looks more like a probability distribution than a discrete thing like "running time" because we've sampled the input space.
And in fact, we could calculate "expected running time given input n" in this way and graph that. We could then make Big-O like statements about that new function, which is the kind of thing folks mean when they talk about "average case."
It sounds to me like you're mixing up two things: "worst case performance" and "asymptotic performance." You say
> Big-O is concerned with worst case performance. Colloquially, f∈O(g) means that "Eventually, f performs no worse than g."
Aren't these two different concepts? Worst case performance deals with the worst possible input of any given input size (like pathological inputs to Quicksort), while asymptotic performance is talking about sufficiently large inputs (like the vertical lines you've drawn on the graphs).
When I say that merge sorting a set of n elements is in O(n lg n), I'm saying that there's some value on the n axis beyond which n lg n >= MergeSort(n). But when I say that Quicksort is O(n^2) in the worst case, it's as if I'm talking about another other function called WorstQuicksort which when given a set of n items always takes as long as Quicksort would take to sort the most pathological set of n items, and there is some value on the n axis beyond which n^2 >= WorstQuicksort(n).
Asymptotic analysis only makes sense in the context of functions whose inputs are real numbers (possibly integers) and whose outputs are real numbers. When we want to do asymptotic analysis of algorithms we need to derive functions amenable to this analysis and for a given algorithm there is more than one derived function we might care to look at.
But in any Big-O situation the person is always, always, always talking about a function from the real line to the real line, with perhaps many layers of confusion and equivocation in between. You should be able to suss out what function they're "really" talking about.
Phrases like "worst case", "average case", and "best case" are shorthand for three of the most common derived functions.
If you want, think of a function T which takes as its input an algorithm and an input to that algorithm and returns its running time.
T(QuickSort)(x) = running time of QuickSort given x as its input
then
W(QuickSort)(n) = max { T(QuickSort)(x) : length(x) == n }
We're then in the business of doing asymptotic analysis on W(QuickSort), A(QuickSort), and B(QuickSort).
Let's be precise. I'm being more precise here in my comment than I was on Quora.
Big-O and related notations are ways of categorizing functions. O(n^2) for example is actually a set of functions, which is why I wrote f ∈ O(n^2) rather than something like f = O(n^2) or f(n) = O(n^2). That is, f is a member of some set of functions which all satisfy a particular, precisely-defined property.
To understand that property, first, let's get rid of the idea of "performance" because asymptotic analysis has nothing to do with "performance" per se and predates even the first precise definitions of things like "algorithm" or "computability." The notation itself was invented in the late 19th century.
Instead, let's just talk about "upper bounds." If we have a function it's easy to talk about upper bounds. For example,
f(x) = sin(x)
is bounded above by 1, 1.5, 10, 100, 80457, and an infinitude of other numbers for any real number x. It's bounded below by -1.
Now, in this case, it's east for us to see that not only is
sin(x) <= 1 for all real x
but also that
max { sin(x) : x is real } = 1
So in this sense the upper bound of 1 is strict. 2 is also an upper bound in the sense that
sin(x) <= 2 for all real x
but it's not strict. There are other upper bounds which are strictly smaller than 2, e.g., 1.5. So, we can say that "the value of sin(x) for real x is no greater than 2," but we can't say that it "is" 2.
So, to answer your point before diving deeper, Big-O is about "worst case performance" in this sense. By itself it doesn't tell us what the worst case performance is. Instead, it gives us an upper bound on the worst case performance. It says "the worst case performance is no worse than FOO." The actual worst case performance might be better.
Big-Θ is the asymptotic equivalent to "this is a tight upper bound."
I'll skip further development of this for now and jump back to the issue of algorithms. The issue is this: given an algorithm with input of length N, we want to say something about how long it takes to run.
This means that the function we're analyzing isn't "QuickSort(n)". What does that even mean? The input of QuickSort is an array of integers and it returns a sorted array of integers. How can an array of anything be greater than or equal to n^2? So that's one way in which the CS vernacular equivocates -- we're not really talking about QuickSort we're talking about some other function:
T(n) = the amount of time it takes QuickSort to run given an input of length n
We're then talking about bounds on this other function T, asymptotic or otherwise.
But now we're in a pickle because what does "the amount of time it takes QuickSort to run given an input of length n" mean? There are many inputs of length n. If we're talking about just arrays of integers of length n, there are n! if all we care about is relative ordering and not the actual values in the array. If we care about the actual values in the array then there are an infinitude of inputs of length n.
There are a few ways we can handle this. Let's re-define T(n) like so:
T(x) = the amount of time it takes QuickSort to run given input x
One way is the "worst case" method. This says, ok, look at this function:
W(n) = max { T(x) : x is a valid input to QuickSort and len(x) == n }
We can now do Big-O, bounds, asymptotic analysis, etc. on W(n). This is what we mean when we say the worst case is O(n^2). It means W ∈ O(n^2).
Another way is the "average case" method. This says, ok, look at this function:
A(n) = avg { T(x) : x is a valid input to QuickSort and len(x) == n }
This is tricky if there are in principle an infinite number of valid inputs of a given length. There are various ways of handling this issue. For something like QuickSort we can see that it's really only the ordering that matters, i.e., for the purposes of QuickSort [1,10,5] is the same operation-wise as [-50, 80, 0], so there are only n! inputs we really need to check for a given n.
Yet another way is the "best case" method, which looks at
B(n) = min { T(x) : x is a valid input to QuickSort and len(x) == n }
So, given an algorithm we can derive these three functions and then answer Big-O questions about them. We're never answering Big-O questions about the algorithm per se, although we can get away with equivocating when W(n), A(n), and B(n) are always equal or it's obvious we only care about one of them.
For simple examples this is often the case, e.g., calculating the nth Fibonacci number in the standard iterative way has best, average, and worse case performance of O(n).
To make matters worse, most people say Big-O but mean Big-Θ, or at the very least aren't clear when they mean one or the other. So, when one says "worst case performance" and we have W(n), A(n), and B(n) all being the same, it can be particularly confusing.
Depending on the algorithm in question which it might be understood what we care about one more than the others. For example, if worst case inputs are particularly pathological we might talk as if we mean the performance of the algorithm per se but really be talking about A(n). However, if "bad" inputs are common we might really be talking about W(n).
f∈Θ(g) is equivalent to f∈O(g)∧g∈O(f). This isn't really a "tight upper bound." For your spikey function f above, we have f∈O(n^2) and f∈Ω(n). These bounds are tight in the sense that there are no strictly tighter bounds. That is, there exists no function g with f∈O(g)∧g∈o(n^2) (there is no function asymptotically smaller than n^2 that bounds f from above). So for this f one could reasonably consider n^2 to be a "tight upper bound", although one would need to pick a spikey bounding function to say anything involving f and Θ.
The problem of having an infinite number of inputs of size n is usually not a problem, because there are finitely many bit strings of each length. If an algorithm uses subroutines that act on input objects of unbounded length (like comparing arbitrary-precision integers), and you are only interested in the bound for the number of subroutine calls, then there might be some trouble with the notion of an average case across a fixed input size. This is a bit silly though; it's more a way to define "fixed input size" into describing "unboundedly large input size" than something I would actually want to do for some useful purpose.
Er, well, I was using the function as an example of something which is f∈O(n^2) but not f∈Θ(n^2) under the assumption that when the OP reads the symbols "f∈O(n^2)" he's really thinking something more like f∈Θ(n^2).
Right, yes, that's the main thing to understand and it's also the thing that most explanations obscure. I tried to be clear about that, but wasn't clear enough. See my follow-up comment.
What you said is not very correct too!
Big O tells about the "order of Growth of a function or the Growth rate", that's it. It gives a larger view of the function. Seeing from 1000 feet above the ground.
A bounding function is when we know the exact function and its end-points from an algorithm. We know the extremes. i.e C1 and C2. i.e a Function falls within that boundary, it will always stay within that box(end-points). C1 and C2 are 2 lines, making a rectangle.
- Worst case, Best case, average case are different input cases, under which the algorithm grows in different ways. And we use BigO notation to classify them under easier functions, based on the order of Growth of a function.
And unless we don't know that exact function, its not possible to predict for any input variables.
So don't call Big O as any bounding function, unless we know c1 and c2 <end bounds>.
- Big O is a notation that characterizes functions according to their growth rates.
- Different functions with the same growth rate may be represented using the same O notation.
- The letter O is used because the growth rate of a function is also referred to as order of the function. A description of a function in terms of big O notation usually only provides an upper bound on the growth rate of the function.
Nothing you've said contradicts anything I wrote. I mentioned bounding functions -- you've been a bit more pedantic, but essentially you've said nothing new.
While the mathematics use a bounding function, I am not sure how vital this is to a self taught programmer who is just trying to get insight into some algorithm.
O(N) is read "Order of N" because the O function is also known as the Order function. I think this is because we're doing approximation, which deals in "orders of magnitude".
It's a different meaning of "order", that has to do with the shape of the size-vs-time curve. It's the same meaning as the order of a polynomial ("x" is linear or 1st order, "x^2" is quadratic or 2nd order, etc).
but Big-O is all about the approximate worst-case performance of doing something.
It can be, but I think it's more commonly taken to mean the average case. One example is quicksort, which is O(n*log(n)) on average but can be quadratic (n^2) in the worst case.
More importantly, although "O" is a function, one doesn't usually think about it that way. If one were, I think it's important to be precise. Here's the best crack I can take at it, though.
Let ℜ be the set of all real numbers. Let F(ℜ) be the set of all functions f: ℜ → ℜ. Let X be a set and 𝒫(X) be the power set of X, i.e., the set of all subsets of X.
Then "O" is a function O: F(ℜ) → 𝒫(F(ℜ)) such that
O(g) = { f ∈ F(ℜ) : limsup_{x→∞} f(x)/g(x) < ∞ }
That is, O takes as its input a function from the reals to the reals and returns a set of all functions which are in its "Big-O" class.
That seems really, really pedantic and not particularly illuminating.
Thanks for your comment. I must admit that my lack of mathematical background makes your clarification difficult to understand. I've not heard the term "order of a polynomial". If you wrote up a post and emailed me at justin@abrah.ms explaining the concept, I'd happily link it in this article. :)
I'm self taught too. Pick up Knuths books and start going through them, stop and learn the math if you don't understand it. His first book in particular begins with some mathematical foundations that will dramatically change how you reason about your programs.
This stuff isn't scary. It's thick but learnable and you can even develop an appreciation for its elegance.
And when talking about the running time of an algorithm with O-notation, concerns about whether or not an algorithm is faster by a constant factor (for example three times faster, an order of magnitude faster, two orders of magnitude faster) compared to another algorithm is not captured by the O-notation by convention (it is abstracted out, so to speak). If you find out that an algorithm is O(2n), you simplify it to O(n) (the coefficient of n does not matter).
The post is kind of misleading because it confuses "order of magnitude" with the order of a polynomial. To make this very clear:
You have two functions, x^4 and x^3. You can multiply x^3 by any constant multiple a (any number, regardless of size), and as x gets sufficiently large x^4 will be bigger than a * x^3. This is the point of evaluating asymptotic performance: as your input data approaches an infinite size, only the highest order term in a polynomial really matters. For example, x^2 + x + 1 will approach x^2 for sufficiently large x - the lower order terms (x^1 and x^0) don't really matter for a big x.
Technically, big-O refers to a bounding function: x^2 is O(x^2) AND O(x^3), etc. because x^2 is less than or equal to x^2 and x^3 as you approach infinity. For convenience we're usually only interested in the best fit; it's useless to say your algorithm is faster than O(x^5), but it's interesting if it's O(n).
Finally, we also have small-oh notation, which is a lower bound. If your algorithm is never faster, on an ideal input set, than x^2, it's also o(x^2). Note that the same algorithm is also o(n) and o(1), because any algorithm which is slower than (or equal to) x^2 must necessarily be slower than x or 1 (constant-time).
edit: It's worth pointing out I only talked about polynomials because it's pretty intuitive. You can extend this notation to any class of functions - exponentials, logarithms, etc. but you just have to know that log(x) is O(x) and o(1), etc.
I find Big-O is mostly applicable to SQL queries. A hard drive/memory bottleneck exists in all database queries. Programmers will easily ignore bad SQL and chalk it up to 'database bottleneck etc'. The truth usually goes along the lines of 'I do not understand temporary tables or views. I just SELECT. I opened a 30,000 row cursor, then, a 1,000,000 row cursor, and finally another 500,000 row cursor and got the data (while storing all of the fetches in leaking arrays)!' Usually on keys without indices or tables without primary keys.
SELECT a, b, c FROM abc (30,000 rows)
BEGIN
SELECT e, d, f FROM edf WHERE e = a (30k * 1m)
BEGIN
SELECT x, y, z FROM xyz WHERE d = x (30k*1m*500k scanned)
BEGIN
process()
END
END
END
A customer or manager can all find respect in lowering the growth rate of a query. No mathematics required. Simply, "Your report now runs in 30 seconds instead of 20 minutes." Anyone can compute that!
The mathematics of Big-O gets annoying for average case scenarios. Instances of
def fun(abc):
l=[]
for x in abc:
if x%2==0:
for y in abc:
l.append(y)
return l
This is a nice explanation, but I couldn't help but notice that the estimate of the number of gumballs in the pictured machine seems closer to 1000 than 100 (contrary to the claim in the article).
You can actually see about 100 gumballs in the picture, so there must be far more hidden behind them -- my guess is closer to 500, which is about twice as close (in order-of-magnitude) to 1000 as to 100.
I don't know first thing about math, but even I find that the classical definition (i.e. that find in the CLRS book) is pretty straightforward: given an input big enough, the running time will be at most that of a multiple of function g(n) if f(n) = O(g(n)), where f is the function that describes the algorithm's running time.
I don't really know why Big-O notation is so common , even though Big O is the upper bound .
For me it is more practical and logical to use the Big Θ (Theta) notation as it provides a tighter bounder which is more understand able.
Also Big O is very misleading to the new comers, as they are usually confused when they see something like
O(n) = O(n^2)
which is perfectly valid , as the Big O notation is only the upper bound albeit it will be a loose upper bound.
For all we care , we can write the Big O of
for ( i = 0 ; i < n ; i++)
{ cout<<"I am constant time operation"; }
as O( n!) , it won't be mathematically wrong but again it would be very misleading and loose :).
So my advice to everyone is to use the Big Θ notation
As f(x)= Big Θ(g(x)) when f(x) = Big O ( g(x) ) and Big Ω(g(x)) .
Here for those that don't know what Big Ω(g(x)) (read Big omega) is, it is a lower bound . In English that would be that your loop will execute/iterate at least g(x) times.
Now before people get any more confused Big Θ(g(x)) is a tight bound , that means that your code/loop will run at least C1 * (g(x)) and at most C2 * (g(x)) .
where C1 and C2 are two constants .
If anyone is interested they should really read CLRS. It has an excellent chapter on calculating and explaining the time complexities.
Big O is a upper bound , what I meant was that if a piece of code has O(n) then it also has O(n^2).
With emphasis on the upper bound
f(x) = O(g(x)) if f(x) <= cg(x).
so my point is if f(N) = O(N) then f(N) is also equal to O(N^2)
as f(N) <= CN^2 .
This is definitely a good starting point. If you wanted to expand this, I would recommend diving into how the relationship between the real time an algorithm takes to execute and the order of the function, and constants or lesser order terms are unimportant with order notation (can best be shown with graphs, which has already been introduced).
One thing I would stay away from is the talk about "orders of magnitude" because the order of a function and an order of magnitude are very different topics, and it could cause a reader to make bad conclusions like "one algorithm takes 10 seconds to run, one take 100 to run, they must have different big-O notation". I understand why the analogy is made, it's because you're estimating things from a high level, but I think it could cause confusion on the fundamentals.
That said, I got ~500 gumballs in the machine ((container diameter / gumball diameter)^3 * .64) so both guesses of 100 and 1000 should be within an order of magnitude. ;)
One of the nice things about big-O notation from a mathematical point of view is that it allows you to avoid dealing with limits.
As everyone knows, calculus is based on limits, but when handled rigorously limits have a lot of subtle pitfalls that take a lot of work to deal with. For example, given a function f(x,y), if we want to take one limit after another, the result might depend on the order in which the limits are taken. In other words, taking limits is not commutative unless certain conditions are satisfied.
None other than the great Donald Knuth went so far as to claim that all of calculus could be taught without limits! Quoting from [1]:
"Students will be motivated to use O notation for two important reasons. First, it significantly simplifies calculations because it allows us to be sloppy — but in a satisfactorily controlled way. Second, it appears in the power series calculations of symbolic algebra systems like Maple and Mathematica, which today’s students will surely be using.
For more than 20 years I have dreamed of writing a calculus text entitled O Calculus, in which the subject would be taught along the lines sketched above."
This article has way too many words and not enough math. There is, in fact, nothing scary about big O notation once you dissect it, and it's a shame that so many people seem to think otherwise.
Here's the definition: if f and g are functions (let's say real-valued functions defined on the positive reals), then we say that f is big O of g, written f = O(g), if there exists a real number y and a real number K, K > 0, such that
f(x) <= K * g(x)
for every x > y.
If that makes sense, then done.
Else:
The first thing to do when meeting any mathematical definition that you don't understand is to throw away parts of the definition until you do understand it, then add them back in one by one. In this case, let's forget about the constant.
New definition: For functions f, g, f is blorb of g, denoted f = blorb(g), if there is a y such that
f(x) <= g(x)
for every x > y.
"f is blorb of g" actually just means that there comes a point after which g is never smaller than f. This gives us the first ingredient of big O: we are concerned only with asymptotic behavior. f could take on immense values for small x and still be O(g) as long as f eventually becomes always smaller than g.
The reason for caring about asymptotic behavior is that we often don't care about the time complexity of an algorithm for very small problem sizes. Even the traveling salesman problem is solvable on Raspberry Pi for very small problem sizes.
Okay, I hope we understand the above definition. Now we add the constant back into the fold and see if we can make sense of it. From what I can see, the constant is there for computer scientists who want to paint with broader strokes. There can be a huge practical difference between f1(n) = 2n and f2(n) = 2000n (the difference between a computation taking a day and taking 3 years), but they're both O(n) because complexity theorists are more concerned with O(n^2) versus O(2^n) than they are with O(n) versus O(5n). (Also could be because in practice algorithms with wildly varying constant factors out in front are rarely seen?)
For an alternative to big O notation, you should check out Sedgewick and Wayne's Algorithms, 4th ed. They use something they call "tilde notation" which preserves the leading constant factor. (See: http://introcs.cs.princeton.edu/java/41analysis/)
> Also could be because in practice algorithms with wildly varying constant factors out in front are rarely seen?
The main reason is that you want a result that does not depend on small implementation details, i.e. is consistent across programing languages and CPU architectures.
Things as simple as larger cache size or a slightly better hashing function in a dict can increase the running speed of a program by a constant factor.
First of all, O(g(n)) is a set. It is the set of functions f(n) such that there exists positive constants n0 and C, and C*g(n) > f(n) when n > n0.
Second, talking about O(g(n)) does not imply that the time complexity being discussed is the worst-case (or any other case) time complexity. One could for example say that the algorithm A's best-case time complexity is in O(n), and it's worst-case time complexity is in O(n^2).
I think the difficult part for someone with no math background isn't so much the ones he outlines here, which have fairly obvious causes that follow directly from the code, but the various logarithmic complexities which require a bit more reasoning. Certainly that's what always tripped me up before I put some effort into understanding it more.
I'm a newbie, and I enjoy articles like this as a starting point to frame the overall concept.
Like most simplifications, it's not entirely correct.
Which is why I like the comments section of HN - where I can learn more about the concept by observing how a wide variety of people explain it (and in the process correct the errors of the original article).
What O(N) means is there is some large enough number where the run-time (edit: or any function really) is bounded by a constant times the input size for any input size larger than that number (see, math, not hard). That constant may be so large that an O(N^2) may be a better solution for any practical purpose.
EDIT: As an example of this we can look at two multiplication algorithms, Karatsuba and Schönhage–Strassen, the latter having better asymptotic performance but that really kicks in once you have large enough numbers (100,000 digits or so). ( http://en.wikipedia.org/wiki/Sch%C3%B6nhage%E2%80%93Strassen... )