

Ask HN: When was the last time you used big-o in a web app? - methodin

Go into detail about the last time you've found it necessary to analyze a particular algorithm for either speed or memory using the typical tools taught by CS programs in your web application. Was it useful? Did it work according to plan?
======
perlgeek
Whenever I write a loop, I keep the big-O in the back of my mind. Whenever I
linearly scan a data structure for a value, I ask myself I couldn't use a hash
lookup instead. When loops are nested, my alarm bells start to ring, and I
estimate the size of each list I loop over, and try to think of ways to reduce
the number of iterations.

So I always track my O()s, even if it doesn't boil down to a rigorous
analysis.

~~~
methodin
I imagine the intuitive, unconscious use of big-o is actually more beneficial
as you can immediately identify problem areas without having to bust out paper
for complex analysis.

------
jhuckestein
You don't "use" big-O, it's just a way of writing down different classes of
(mathematical) functions; hence, "big-O-notation".

At all times when writing software the engineer should know a) the complexity
of the code they are trying to write and b) what the memory complexity of the
program is and, in many cases (slightly off-topic), c) the approximate times
it takes to do certain simple tasks (this may be memory access, simple
disk/network IO, database queries etc)

This is useful because it protects you from writing programs that "perform"
unexpectedly badly (I'm confused why anyone would expect them to "perform"
without thinking about it).

It may sound like a lot of work, but thankfully most things you program are
really constructed of simpler things that have well known complexities (simple
loops, sorting, hashing, stuff like that)

At our company (pretty straight-forward social web-app), we have surprisingly
many pieces of code where the trivial implementation wouldn't work. We have a
graph of different users and relations and want to maintain a transitive
closure for each type of edge while also maintaining a bunch of indexes.

Even on the client side we have simple things, such as an autocomplete box for
usernames. With many users, we can't just keep an array of all existing users
and iterate through it on every keystroke. At that point you have so many
options to choose from (e.g. have an index on the server and do a request for
the filtered list after short timeout or build a simple index (searchtree) on
the client and use that for searching) and you couldn't make a decision
without knowing about runtime complexity, constant time factors and memory
usage.

------
ColinWright
I have no idea what tools are taught in CS programs, but three of my recent
web apps used big-O analysis extensively.

I can't tell you what they are, because they are all projects that might soon
be opened, and while they're not exactly in stealth mode, they are good ideas
that I can only work on slowly. I need to get enough of an advantage to be
able to stay ahead of those with more web implementation experience.

However, one involves analysing the cross-connections between hundreds of
users, and hundreds of small units of data they each have. The naive algorithm
is O(n^3 m^2) and is too slow for more than 30 users or 100 units of data. The
analysis showed that early, so the architecture of the system was changed to
cause the data to "flow" around the system and agregate closer to where it was
needed.

Another application uses proximity detection on potentially thousands of
points. Again, the naive algorithm worked for small numbers of users (I use
the term loosely - they're more "agents" than users), and again, big-O
analysis showed the critical points in the system.

I'm sure that most web development consists of gluing libraries together,
stuffing databases, and populating forms or similar. Interesting stuff that I
do uses algorithmic analyses.

~~~
ww520
The first problem sounds like the join operation in RDBMS. It might be useful
to look into the various join algorithms as there're extensive research in
that area.

~~~
ColinWright
You're right, it sounds like that. But it wasn't. I can't go into more detail,
but I and my colleagues have some experience in identifying things that are
best simply handed off to a "proper" database. This wasn't one of them.

------
billybob
I actually learned about Big O from a friend with a computer science
background (I don't have one) when I was working on some jQuery show/hide code
a few years ago. The idea was: you have a bunch of products on a page, and as
you check and uncheck features you want, only the products matching your
criteria are shown.

Originally, my logic was something like "for each checkbox that's checked or
unchecked.... for each product that may or may not have that feature..." The
nested for loop was fine with a small number of checkboxes and products, but
as I got more products and more checkboxes, it began to slow down. My friend
pointed out that it was Big O of N squared and explained what that meant. It
was eye-opening.

I rewrote it to first compile a list of checked features (one pass through the
checkboxes) and then show/hide the products (one pass through the products),
which scales linearly.

I can't say that I think about Big O every time I write an algorithm, but I do
have a sense that nested loops are dangerous.

~~~
Shenglong
How many checkboxes were there, that you actually noticed the delay in a N^2?

~~~
ww520
O(N^2) can cause slowdown very fast. 1000^2 is a million operations. 10000^2
is 100 million operations. If each operation is 100 instructions, that's 10Ghz
very roughly. That would cause pauses even with a modern CPU.

------
Me1000
I had a major hand in writing the CPTableView class in Cappuccino. The
tableview is designed to handle hundreds of thousands of rows without slowing
down. This is accomplished both through very aggressive caching, and the smart
use of algorithms. If you look at the source code, you can actually see where
we document the time complexity.

A standard tableview (static row heights) is laid out in constant time,
whereas using variable row heights is laid out in logarithmic time.

An early implementation provided variable row heights in linear time (or
perhaps n^2, I don't recall)... this caused scrolling to be absolutely
terrible and unusable if you had more than a couple dozen rows.

With constant time we can, theoretically, have any number of rows in your
tableview and it will remain just as performant as if you only had 1.

If you're building real software and throw out the concept of time complexity,
then you're setting yourself up for failure in the long run.

~~~
Me1000
Quick follow up to clear up a couple questions that came up:

We only lay out visible rows, thus that layout is done in linear time, but
calculating the position of those rows use the time complexities I described
above.

For variable row heights we cache a few values for every single row up front,
that is also linear.

The time complexity I'm describing in my original post is talking about
calculating the position of those views are you scroll, draw, and layout (not
the initial setup) ... which is what is most expensive.

------
PaulHoule
Web apps stick to O(N) (iterate through a list) and O(N log N) (sort, some
operations on indices).

If something is O(N^2) or worse, it "doesn't scale" and you won't usually see
it done in a web app. That is, there's a whole shadow universe of web apps
that you've never seen because the resource demands are too high.

~~~
bunchesofdonald
I disagree as well. Certainly most code that runs when you actually view a
page in a web app is O(N), but there are many web apps that do backend
computing that is much more complicated. Take Google and Netflix as examples.

~~~
PaulHoule
You'd better believe that Google and Netflix have scientists who work hard
every day trying to "bend the curve" towards O(N) in all of those
applications.

------
derwiki
I built a recommendation engine: <http://github.com/causes/suggestomatic>

Because of the scale of data I'm using, I'm acutely aware of big-O. I guess
what's most interesting is that gcc is able to take some of those "slow"
operations and make them super fast (if it can fit them in a register) and
other times I get the full O(n) effect. Valgrind has been a very useful tool,
and can empirically show you where your program is going to blow up (if you
doubt your big-O analysis).

------
th0ma5
In all of my work with Google App Engine, I thought about complexity in loops
all of the time. If the problems were way more mathematical or even more
complex then sure I'd have to break out a textbook or something, but mostly
this was the first-day algo class sort of stuff about growth over time, how
deep does the tree-recursion go, etc. Mostly I just thought about this stuff
in my head without even specifically conceptualizing it as a big o thing, but
the principal is the same, just way simpler, more off-the-cuff, and in the
middle of coding.

------
mindcrime
_Go into detail about the last time you've found it necessary to analyze a
particular algorithm for either speed or memory using the typical tools taught
by CS programs in your web application._

I agree with the folks who mentioned doing a sort of naive, intuitive "big O"
analysis as you work, as opposed to a formal analysis. I was recently working
on some social graph stuff, and I started down the path of doing an "all pairs
shortest paths" determination with Floyd-Warshall (which is O(|V|^3)
complexity).. needless to say, it only took a few minutes to realize that this
won't work for any real world size datasets and would require a different
approach. If nothing else, you quickly find that the naive approach doesn't
work for any meaningful amount of data, as just building an adjacency matrix
for a few hundred entries, if the datatype is an int (which is what I was
using at the time) will blow past the heap size limit on a 32bit JVM.

I'm actually still researching and considering exactly what I'm going to do. I
don't necessarily _need_ every shortest path in the system, so the first cut
at optimizing this will probably be to cook up something that only calculates
a portion of the paths. Another approach might be to look into a parallelized
algorithm that can do this on a cluster. Beyond that, I've been poking around
trying to see if there's a way to apply a sort of incremental evaluation
approach to the problem, where I build the initial matrix of path distances,
and then incrementally recompute just the part I care about, when something
changes.

Anyway, at the moment, this is the largest extent to which I've actually had
to spend some time thinking about the kind of stuff that I studied in some of
my CS classes... I've actually got all my algorithms books of the shelf, and
digging in, as I research this. :-)

------
rledge21
All the time. I think about this stuff for every new feature that needs
implemented server-side. Our system is very read-heavy(posts need to be
circulated into different feeds for users), and all the mysql joins were
starting to slow things down. We ended up using redis sets to store post_ids
for specific feeds, it speeded things up considerably.

------
hamidpalo
While I never sit down and write out a formal proof, I always think about it
and always check the implementations of standard library functions that work
on collections before using them.

The real value of Bio-O isn't knowing how to prove that red-black trees have
O(1) amortized insert but rather understanding the tools used and the code
written better.

------
roel_v
What is special about a web app in this context? When programming,
shouldn't/doesn't one always estimate the complexity of algorithms or select
data structures based on the anticipated usage patterns?

~~~
methodin
A majority of web-apps are forms, data storage and display. I was more
interested to see what sort of problems people do solve with Big-O considering
it's not something that you typically have to use when working with a majority
of web apps and simple push/pull sites.

------
snikolic
I can't remember the last time I did a formal analysis, but I naturally think
about this stuff whenever I work with sets/lists, loops, recursion, string
manipulations, or non-trivial standard library implementations....which is
basically whenever I sit at a keyboard.

------
ohashi
Last time I used it (I can still see it on my whiteboard) is when designing a
spam filtering process. I wanted to get rid of the most spam as cost
efficiently as possible. So that meant figuring out the cost of each method
(Big O) and then figuring out approximately how much spam the process would
catch. Thus, I could run the most expensive processes last on the spam which
still hadn't been caught.

------
ww520
I haven't had the chance to deal with big O issue in the frontend web apps
where most operations are short and data set are small. For backend
processing, looping and large data processing are always considered with
efficiency in mind.

The most memorable big O analysis that actually led to fixing a performance
bug was when I wrote the caching layer of a file system using bubble sort for
sorting the dirty blocks before flushing. It slowed down noticeably, like
seconds of pause with 100% CPU, when the cache size exceeded 10K blocks.
Replacing it with heap sort made the problem go away. O(nlogn) really is much
more scalable than O(n^2).

------
nicholaides
If you solve problems that deal with a lot of data, or make a lot of
calculations, you'll probably care about big O.

Last time I really had to consider big O in a web app was 2009. Of course, it
had nothing to do with the fact that it was a web app and not desktop or
otherwise. It had everything to do with the problem I was trying to solve.

------
baddox
It depends what you mean. If you mean using asymptotic analysis on a "new"
algorithm to figure out its running time (master theorem, recurrence
relations, etc.), then probably never.

If you mean using your knowledge of computational complexity to inform your
choice of algorithms and data structures, then all the freaking time.

------
sushrutbidwai
as recently as Yesterday. We use algorithms like knapsack problem, shortest
path, minimum spanning tree and lot of predictive analysis algorithms. I have
used 3 levels of analysis - memory complexity, CPU complexity and database
calls complexity including cache hit/miss.

------
tlrobinson
The last time I did a pen-and-paper "big-O" analysis was probably in college
or a job interview, but that's not really the point. It's much more important
to have an intuition to guide you.

------
ramses0
Last week, I found an n^2 (actually n*m) loop in some graph drawing code.
Reworked it to be 2n (actually n+m). Everything went better than expected.

------
wulczer
The last time I had to rewrite an SQL query because it was too slow (5 days
ago) I definitely used the knowledge taught to me in my CS course.

