Hacker Newsnew | comments | show | ask | jobs | submitlogin
Joeboy 410 days ago | link | parent

I sometimes wonder, what sort of people actually need to know about Big-O notation? In about 15 years of working in IT I don't recall anybody ever using it in a work context. Generally something like "this strategy will become very slow when we have a lot of users" seems adequate. Admittedly I'm only a humble web developer. Do you regularly talk in Big-O at work? What is your job?

Edit: Despite being only a humble web developer I understand that fast algorithms are better than slow ones. I'm wondering how Big O notation is used in the real world.



0x0 410 days ago | link

Even a simple PHP script that needs to work with a large data set can quickly run into problems if you don't pay attention to algorithm complexity.

Consider this simple script that first creates a dataset of strings "x0", "x2", "x4", "x6" etc. (500000 entries)

The task is to count from 0 to a 1000 to see whether the string "x"+NUMBER is present. Using a linear search with array_search() is super slow, but if you switch the storage from array values to array keys, and use a hash lookup via array_key_exists(), suddenly it flies:

% time php -dmemory_limit=512M -r '$entries = array(); for($i=0;$i<1000000;$i+=2){ $entries[] = "x".$i; }; $hits = 0; for ($i=0;$i<1000;$i++) { if(array_search("x".$i,$entries)!==false) $hits++; } var_dump($hits);' int(500) php -dmemory_limit=512M -r 11.25s user 0.09s system 99% cpu 11.362 total

So, 11.362 seconds for a linear algorithm.

% time php -dmemory_limit=512M -r '$entries = array(); for($i=0;$i<1000000;$i+=2){ $entries["x".$i]=true; }; $hits = 0; for ($i=0;$i<1000;$i++) { if(array_key_exists("x".$i,$entries)) $hits++; } var_dump($hits);' int(500) php -dmemory_limit=512M -r 0.35s user 0.06s system 99% cpu 0.419 total

And 0.419 seconds for the hash one.

If you increase the workload to counting from 0 to 10000, it gets even worse in the linear case, while the hash key lookup finishes at about the same time as earlier!

% time php -dmemory_limit=512M -r '$entries = array(); for($i=0;$i<1000000;$i+=2){ $entries[] = "x".$i; }; $hits = 0; for ($i=0;$i<10000;$i++) { if(array_search("x".$i,$entries)!==false) $hits++; } var_dump($hits);' int(5000) php -dmemory_limit=512M -r 115.96s user 0.29s system 99% cpu 1:56.52 total

So, 1 minute 56.52 seconds for a linear algorithm.

% time php -dmemory_limit=512M -r '$entries = array(); for($i=0;$i<1000000;$i+=2){ $entries["x".$i]=true; }; $hits = 0; for ($i=0;$i<10000;$i++) { if(array_key_exists("x".$i,$entries)) $hits++; } var_dump($hits);' int(5000) php -dmemory_limit=512M -r 0.37s user 0.06s system 99% cpu 0.440 total

And 0.440 seconds for the hash one.

-----

mh- 410 days ago | link

keep in mind that since PHP must preallocate the array() to an arbitrary 'sane' size, it has to 'resize' (create a new one, copy it over, point reference, destroy old one) a large number of times here.

therefore, most of the time spent here will be malloc'ing (mmap, memcpy). the rest of it will be on executing the php interpreter, compiling the script, and cleanup.

the hash search and comparisons are so much faster than those operations (in PHP, especially), this isn't really a valid way to measure your thesis

-----

0x0 410 days ago | link

The initial array creation + startup/parse/compile/teardown should be quite constant between runs (probably most of the 0.4 seconds baseline). I think the 11sec->2min vs 0.41sec->0.44sec time increase illustrates the algorithmic complexity quite well, no?

-----

ahomescu1 410 days ago | link

> keep in mind that since PHP must preallocate the array() to an arbitrary 'sane' size, it has to 'resize' (create a new one, copy it over, point reference, destroy old one) a large number of times here.

Those allocations only happen once in a while (because at each reallication, the size of the array doubles), so the cost is amortized. I think that you only get about lg2(1000) = 10 allocations in total for an array of 1000 elements.

-----

mh- 410 days ago | link

fair point, but they're rather costly due to the amount of memcpy'ing in my experience..

you've probably seen/know this, given that you know PHP internals :), but for others: http://nikic.github.io/2011/12/12/How-big-are-PHP-arrays-rea...

-----

benhoyt 410 days ago | link

Yes, regularly. I'm a backend web developer who works on a fairly large-scale website, and program mostly in Python. It's pretty important to know that code like this (which we have in our photo search code):

   for image in images:
       if image.hotel_id in hotel_ids:
           ...
will be O(N) if hotel_ids is a set or dict (with O(1) lookup), but O(N²) if hotel_ids is a list (with O(N) lookup). We're not talking about micro-optimizations here -- when N is of moderate size, it can mean the differences in render time between 30 milliseconds and 3 seconds.

When developers don't have a feel for this, it really slows them down -- both figuratively and literally.

FYI, I wrote a blog entry about how our photo search engine works here: http://tech.oyster.com/how-our-photo-search-engine-really-wo...

-----

Joeboy 410 days ago | link

> When developers don't have a feel for this, it really slows them down -- both figuratively and literally.

Of course, I'm just not clear how Big-O helps with that. To me, the required insights seem to be:

* Things are slower when the computer has to do more work

* A feel for what strategies result in the computer doing more work

I can't think of many cases when saying "This algorithm is O(N^2)" is significantly more useful than saying "This algorithm is going to be really slow".

-----

mseebach 410 days ago | link

It's like grammar: You don't sit around articulating "this is the subject, this is the object, and I need the third person singular over there" when you write - but knowing the mechanics of grammar allows you to write texts with correct grammar more or less intuitively, and it's easy to reach back and pull out a rule when you think something sounds weird.

-----

zorked 410 days ago | link

At least where I work the definition of slow does not cross complexity boundaries. A O(n) algorithm that takes 100 seconds to run is "slow" compared to a O(n) algorithm that takes 50 seconds to run. But if it crosses a complexity boundary and it's now, O(n^2), we state that instead of saying it's "slow", because often it's far more tragical than just taking a little extra time to run.

-----

kryptiskt 410 days ago | link

> I can't think of many cases when saying "This algorithm is O(N^2)" is significantly more useful than saying "This algorithm is going to be really slow".

But to know that it will be slow you have to know how it scales with the input size. Because the list example may work just fine when the website has little data (oftentimes an O(N^2)-algorithm may even be faster than an O(N)-algorithm on small data) and hit you when you scale up. And the O-notation gives you a tool to estimate how the work scales. Any alternative to the O-notation is either more complex (look up Analytic Combinatorics http://ac.cs.princeton.edu/home/) or just an informal treatment of O-notation.

-----

jstanley 410 days ago | link

But how can you know how slow it is going to be under a production workload? Is very useful to be able to time some code on a small input set, then extrapolate based on knowledge of time complexity to find out roughly how long the full input will take. If you don't know the time complexity, this is impossible.

-----

Joeboy 410 days ago | link

> But how can you know how slow it is going to be under a production workload?

In a web dev context, load testing. I appreciate that that's not so easy if your program takes a week to run.

> Is very useful to be able to time some code on a small input set, then extrapolate based on knowledge of time complexity to find out roughly how long the full input will take. If you don't know the time complexity, this is impossible.

I suppose my confusion is that with code I've written, I think I generally understand the time complexity without having to use Big-O notation. I can sort of see that there are times you might want to tell a colleague something like "careful, this code I wrote is O(n^3)", but I'm wondering what kind of job entails doing that on a regular basis.

-----

textminer 410 days ago | link

I work in machine learning. I regularly get jazzed when I figured out a linear solution to a problem (e.g., using a min heap to grab top k elements from a similarity computation in a recommendation system instead of first sorting), and will remark on such to anyone I'm working with. In more mundane settings, I'll also regularly use, say, hash tables to store data for convenient constant time lookup instead of naively scanning through a list each time I need to find an index.

I think I can safely say that asymptotic time and space concerns of a problem firmly guide much of my choices in data structure and algorithm use. Though this is maybe followed closely by aesthetic pleasure, novelty of trying something new and neat, or clarity in the functional organization/readability/maintainability of the code.

-----

Udo 410 days ago | link

I would argue that having a basic Big-O intuition is important for every developer. Regardless of the work you're doing, you want to write efficient code. And you want to know how something scales.

I'm also mainly a web developer. In my head, I'm thinking about Big-O a lot - not that I'd go and formally examine every bit of code, but keeping O behavior in mind leads to certain almost subconscious decisions that are going to be good for your software.

Being Big-O-aware makes sense especially in web development where latency minimization and stable scaling behavior are important. Keeping Big O in mind influences a lot of choices, from using certain language constructs to the way your database tables are modeled.

Thinking about Big O is part of being aware what your code actually does.

-----

Joeboy 410 days ago | link

> Thinking about Big O is part of being aware what your code actually does.

I don't get this at all. Surely Big-O is an abstract way of describing a property of your code to somebody who doesn't want to read it. If Big-O gives you a better understanding of what your code does than writing it did, that suggests you wrote it by applying some kind of genetic algorithm and you should probably go back and look at it again until you understand it.

-----

Udo 410 days ago | link

Maybe I didn't express this very clearly. I argue that it's paramount to understand what your code actually does. As simple as this sounds, in my experience a lot of developers just don't care. I'm not talking about applying some abstract CS concepts, I'm talking about getting a feeling for the amount of computational effort and memory consumption your code causes.

Once you understand your code's behaviors, you understand its Big O. It's not necessary to be explicit about it, but this is one of the rare cases where theoretical notation and engineering intuition do map very nicely. If you can say things like "my code's CPU cycles go up linearly with variable n", that's the equivalent of understanding Big O.

So I'm not advocating blindly "learning" about Big O, I'm trying to say that once you get a good sense of what your code is doing you get Big O understanding automatically.

> Surely Big-O is an abstract way of describing a property of your code to somebody who doesn't want to read it.

It can be, but that's not useful. I think of it as a micro vocabulary that comes naturally from understanding code. It's really not an abstract concept, it's a very concrete attribute a section of code has.

-----

marvin 410 days ago | link

If you're working with a dumb API/library (i.e. Sharepoint) where the correct default search algorithms aren't very easily accessible, it's easy to fall into the trap of using linear-time search instead of a hashed or binary search strategy, for instance. Things will look fine until your system has been used for a while, then things will gradually slow down to a crawl.

So yes, I consider knowledge of algorithms and run-time analysis an essential software development skill (I experienced this after only 6 months on the job, so obviously our experiences differ. Consulting SharePoint developer for small businesses).

-----

lkozma 410 days ago | link

Big-O is just a formal notation for comparing growth rates of functions. You are right of course, that often one might get away without the precise formalism and rely only on a gut feeling - but there are limitations on how many things we can compare mentally.

A bit tongue in cheek one could make the following analogous statement: does anyone use greater-than or less-than or equals signs at work, as in something being "> 5" or "< 10"? In my experience "this quantity is much larger than that other quantity" or "these two quantities are roughly the same" is perfectly fine.

-----

dkersten 410 days ago | link

I worry about Big-O almost every day. I run an analytics startup and scaling to huge datasets is important. Sure, I could throw hardware at the problem, but even that has its problems and bottlenecks and would also inflate cost. Instead, I carefully handpick the algorithms used (where possible) to have reasonable time/space complexities for the tasks I want to perform. Not everything needs to be carefully tuned, of course, but for the bits that do, Big-O is a useful tool.

Before this, I've used Big-O when writing some hobby game engine code to make sure that my algorithms scaled nicely as number of game objects increased.

I've also used Big-O to compare algorithms in back-end web development. Usually, I didn't care, but there were a few cases where it mattered (one specific one I remember was sending out notification emails based on a complex criteria and the operations I was measuring with O(n) was number of database queries).

-----

gurkendoktor 410 days ago | link

But what do you do once something becomes very slow? Creating a database index, for example, is a beautiful example of O(log n) :)

-----

Joeboy 410 days ago | link

But in that case, knowing Big-O notation is a retrospective explanation of something any quarter-decent developer knows already. You don't need to know Big-O in order to know how/when to create a database index.

-----

greenyoda 410 days ago | link

It doesn't have to be retrospective. A knowledge of computational complexity can be used to make predictions about the performance of a system (even one you haven't implemented yet).

What if your manager asked you for an estimate of how long it would take to load a million new records into the database and index them? It would be nice to know whether you'd have to take the system down for an hour or for a day or for a week.

What if the performance of your code was slow and you wanted to know whether it could possibly be improved, and by how much. Knowing the theoretical limitations of the algorithm you were running (e.g., sorting takes at least O(n log n)) could tell you what kind of optimizations you could reasonably expect to make or whether you needed to invest in faster hardware.

-----

Joeboy 410 days ago | link

> What if your manager asked you for an estimate of how long it would take to load a million new records into the database and index them? It would be nice to know whether you'd have to take the system down for an hour or for a day or for a week.

I don't think I really buy this as a practical use case. Unless I was running on system I understood very well (eg. an embedded platform) I don't think I'd really trust estimates calculated using Big-O to be reliable on atypical workloads. I'd pretty much expect unforeseen caches / optimizations / swapping to disk to confound my estimates somehow. If at all possible I'd test it. If testing it takes too long to be practical then there's a serious problem with the system. I can see cases where testing is impossible for good, practical reasons, just not enough of them to justify the obsession with Big-O.

-----

greenyoda 410 days ago | link

Wouldn't inserting a single record in the index be O(log n) (if n is the number of records in the index), and creating the entire index be O(n log n)?

-----




Guidelines | FAQ | Lists | Bookmarklet | DMCA | News News | Bugs and Feature Requests | Y Combinator | Apply | Library | Contact

Search: