
Accidentally Quadratic - dmit
http://accidentallyquadratic.tumblr.com
======
kentonv
Ruby (or maybe specifically Gems) startup is still quadratic in another -- IMO
worse -- way: Every require() (that misses cache) searches for the requested
file in _every_ dependency package until found. It seems that instead of
having the file name itself include the precise package name, Ruby instead
treats the packages as a search path that unions a bunch of separate trees
into a single logical file tree.

The result is that Ruby does O(n^2) stat() calls at startup, where n is the
number of dependency packages. Most of these operations are for files that
don't exist, which means they'll miss the filesystem cache and perform
absolutely atrociously on top of FUSE, NFS, etc. A typical Ruby app sitting on
a loopback FUSE device (i.e. one just mirroring the local filesystem, no
network involved) will take _minutes_ to start up, even with aggressive
caching enabled. :(

~~~
mnarayan01
Well more like O(m * n) calls where m << n, but that is a fair point.

~~~
nirvdrum
In this case, the constant really can't be ignored since file system access is
so slow. Ruby scans for .rb and .so files (or .jar on JRuby).

Some other fun is RubyGems always appends to the $LOAD_PATH, so whenenver you
require from a gem that hasn't been loaded yet, you're guaranteed to scan the
entire $LOAD_PATH on the off-chance there's a conflicting partial path. And
you can't cache contents of the $LOAD_PATH entries unless you have a
filesystem watcher because the contents of the path can change and Ruby allows
that.

------
pcwalton
My favorite one of these in JavaScript is building an array by unshifting onto
it:

    
    
        var array = [];
        for (...) {
            array.unshift(whatever);
        }
    

jQuery (at least used to) do this all over the place, to disastrous results.
IIRC, in fact, V8 has special optimizations that allow arrays to grow
backwards if there's room in the heap, just because this is so common in
JavaScript :(

(Solving this in your code is usually trivial, by the way: just replace
unshift with push and use Array.reverse at the end.)

~~~
TazeTSchnitzel
Of course, it depends on the language. If you're using a language with linked
lists, appending is the _wrong_ way to do it. :)

On a different note: it's nice to see this here, it validates my choice to
.reverse() and .push() when using an array as a queue today.

~~~
wfunction
Linked lists are only useful in very few situations and this is not one of
them.

~~~
Niksko
I've been terrified of linked structures since I saw a video (I think it was
here) from a C++ conference talking about how horrible their performance is.

------
mikeash
A fun one in C is:

    
    
        for(int i = 0; i < strlen(s); i++)
            doSomething(s[i]);
    

strlen() searches for the terminating NUL byte and is thus O(n), which makes
the loop quadratic.

What makes this especially fun is that strlen is part of the standard library
and has defined semantics. That means the compiler is free to hoist the strlen
call out of the loop if it can prove that it wouldn't alter the standards-
specified behavior, like if you never modify the contents of the string. That
means your asymptotic performance will depend on which compiler you're using
and even which optimization level you're using.

~~~
nightcracker
The effect you're describing can happen to any inlined function.

------
ggchappell
Fun stuff.

My favorite example goes a long way back. The string garbage collection in
Applesoft BASIC (Microsoft BASIC for the Apple II, introduced in 1977) had a
loop that was written backwards. The result worked correctly, but it was
accidentally quadratic.

Variables were stored low in memory. A string variable held a pointer to its
value, which was stored in high memory. When high memory filled up, GC ran on
the string values, pushing the ones that were still used up to the top of
memory. The main loop for this GC routine should have run from the top of
string space to the bottom, copying each string value that was still used to
its final location higher in memory. But, alas, it ran from the bottom of
string space to the top, so each iteration copied up _all_ used string values
that had been found so far. Thus, if there were n used string values, the GC
routine would do O(n^2) copy operations, instead of the O(n) copies that the
correctly written routine would have done.

So code like the following -- with K set to some appropriate value -- would,
at some point, pause for a _long_ time indeed.

    
    
        10 DIM A$(K): REM STRING ARRAY; K SHOULD BE A LARGE-ISH NUMBER
        20 FOR I = 0 TO K
        30 A$(I) = "X": REM EVENTUALLY UNUSED VALUE; GC WILL DELETE
        40 A$(I) = "Y": REM USED VALUE
        50 NEXT I
    

By "a long time" I mean rather more than an hour (maybe -- my memory grows
dim).

------
HeavenFox
Kind of tangent, but this is the reason I find a formal CS degree important,
despite the "hacker school movement" that tells you otherwise. Sure, you may
never need the stuff you learned in Algorithms or Computer Architecture, but
having gone through these classes you sort of develop a knack of recognizing
these kind of bad code: you instantly know the time/space complexity of your
code, and what the best-known complexity might be.

Sure, premature optimization is the root of all evil, but there's a huge
difference between not knowing there's a faster solution and actively choosing
not to use the faster solution because it's harder to read/harder to
maintain/more bug prone/etc.

~~~
eck
This was a good one:

[http://blog.willbenton.com/2008/11/rent-a-coder-
hilarity/](http://blog.willbenton.com/2008/11/rent-a-coder-hilarity/)

~~~
octatoan
Post it, will you? :)

------
AgentIcarus
So. Having just messed up an interview at a large tech company by having
trouble deriving an O(n) solution on the whiteboard, anyone got any good tips
on how to take algorithms / data structures to the next level and look good at
this sort of thing?

~~~
taeric
It seems the holy grail is almost always linear. Especially in interviews.
Main thing to look for is if you find yourself doing something you have
already done before.

Above that, just know the general idea behind different data structures.
HashTables and binary trees will probably have you covered. More data
structures can't hurt, though. Tries, BTrees, etc.

Though, I can't think of a single time I have used some of the more advanced
items. Linear searches with sentinal values being my personal favorite
optimization that I will likely never directly code.

~~~
AgentIcarus
That's a really good point, thanks taeric.

Anything less than O(n) obviously means not needing to look at every element
of the input, i.e. it's already sorted or similar. For most other interview
problems that seems like a reasonable lower bound in the absence of more
detailed analysis. I guess the recruiter's advice to go practice on TopCoder
wasn't just copy-paste.

------
jws
How timely. I just ran into one today using the duktape† javascript engine and
the hogan‡ mustache template expansion code. If you make a ~4MB output using
deep in your library bowels…

    
    
       output += tiny_string;
    

… a million or so times in Javascript, and that means you create a new string
and reallocate and copy every time, then you end up with the impression that
you have somehow made an infinite loop, but it should be a quadratic function.

Switching the hogan buffer appending code to…

    
    
        chunks.push(tiny_string);
    

… and ending with a …

    
    
        output = chunks.join('');
    

… gets back down into the milliseconds range instead of the "so long I have no
idea if it would ever complete, left running while I developed a work around
and it didn't finish" range.

␄

† [http://duktape.org/index.html](http://duktape.org/index.html)

‡ [http://twitter.github.io/hogan.js/](http://twitter.github.io/hogan.js/)

~~~
smilekzs
No.

[http://stackoverflow.com/questions/7299010/why-is-string-
con...](http://stackoverflow.com/questions/7299010/why-is-string-
concatenation-faster-than-array-join)

~~~
jws
Yes.

I'm not using the V8 engine. Tiny embedded engine, no super powers other than
EC5 correct, small size, and reasonable performance.

Of course V8 and its multimillion dollar development cost rivals probably only
have this superpower because programers kept writing quadratic code.

~~~
pcwalton
> Of course V8 and its multimillion dollar development cost rivals probably
> only have this superpower because programers kept writing quadratic code.

…and because SunSpider does it.

[https://github.com/WebKit/webkit/blob/master/PerformanceTest...](https://github.com/WebKit/webkit/blob/master/PerformanceTests/SunSpider/tests/sunspider-1.0.2/string-
base64.js)

------
detrino
From C++: calling vector<T>::reserve in a loop can lead to quadratic behavior.
For example:

    
    
        std::vector<int> v;
        while (...) {
          v.reserve(v.size() + 3);
          v.push_back(0);
          v.push_back(1);
          v.push_back(2);
        }
    

Rust has both reserve and reserve_exact to overcome this gotcha.

------
lmm
My "favourite" example of this was a previous company that used TeamCity for
CI builds. At the time (no idea if it's still true) TeamCity knew when a maven
project depended on another project, directly or transitively - but it didn't
distinguish between the two cases.

So if you committed a change to the low-level "core" library, it would rebuild
every project - and then every project except for the two that depended
directly on core. And then every project but those three, and so on. It took
days.

------
chadaustin
With a naive mark-and-sweep GC and a policy that runs the GC every N
allocations minus frees, it's very easy to write an N^2 algorithm:

    
    
      r = []
      for i in range(1000000):
        r.append({})
    

IIRC, Python has (or used to have) the above property.

~~~
xxxyy
Python uses reference counting to save on GC time. This is one of the reasons
why it has GIL.

~~~
stormbrew
There's nothing unique to refcounting garbage collection that implies you need
or should use a GIL. Most naive tracing collectors also have one so they can
stop the world and trace every thread's stack at the same time. In both cases
there are workarounds that reduce throughput as a tradeoff for decreasing
latency (and the two mechanisms start to converge into similar algorithms,
incidentally).

~~~
foobar2020
There is: you either have to use atomic operations on all the reference
counters, which is costly, and not available on all platforms, or you can have
only one thread active at a time. This is the most classic problem: "x == 0";
parallel execution of "x += 1" and "x += 1"; now "x == 1" or "x == 2".

I don't see why tracing collectors need GIL in particular to stop all threads,
i.e. why should only one thread be running at a time.

~~~
stormbrew
I'm not sure why you seem to feel your first paragraph here is a disagreement
with what I said. As you point out, there is a well known way to have
refcounting act correctly in the face of multiple threads. There are other
ways as well involving per-thread shadow counts and such, but that is the most
basic way to (as I said) trade throughput for latency.

Re. your second paragraph, it is again not inherent to the algorithm, but it
does greatly simplify things. In particular, pretty much all tracing
algorithms require at least some kind of stop the world event, though some can
make this event very limited in duration.

It is much much easier (and results in greater throughput) to stop the world
if only one thread is running at a time, because you just do your collection
when that one thread triggers a heap mutation.

If you have multiple threads running at once you need to coordinate when you
stop the world, which means waiting for the threads to all wind up in a state
where they can be stopped (which gets a lot harder if, say, one of the threads
never allocates because it's in a tight loop). If it takes too long to do this
coordination, your heap might explode on you.

Again, what I'm saying is that a GIL is not an inherent feature of either
garbage collection mechanism. This is really pretty obvious when you consider
that there's an abundance of implementations of all combinations of (RC,
tracing)(GIL, no GIL). Python and Ruby's main implementations both chose a GIL
for simplicity's sake (and an at the time safe assumption that single-core
performance was more important than multi-core), but most C++ refcounting code
uses interlocked counters and the JVM has obviously never used a GIL.

------
willvarfar
I recall a story about one of these in Haskell's Cabal, but I can't find it on
google now.

~~~
teraflop
I think you're thinking of this bugfix:
[https://github.com/nominolo/HTTP/commit/b9bd0a08fa09c6403f91...](https://github.com/nominolo/HTTP/commit/b9bd0a08fa09c6403f91422e3b23f08d339612eb)

HN discussion:
[https://news.ycombinator.com/item?id=6912474](https://news.ycombinator.com/item?id=6912474)

------
eck
I once found this bug, where a destructor was taking forever:

    
    
        hash_set<char*> hash;
        // <fill up/use hash>
        while(!hash.empty()) {
          free(*hash.begin());
          hash.erase(hash.begin());
        }
    

hash tables are not meant to be used like that.

~~~
xyzzyz
If this was quadratic, then the hash table was not implemented correctly.
Erase in hash table should be amortized O(1).

~~~
eck
It's not erase(), it's begin(). It's going to do a linear scan of the hash
table from the beginning to find the first non-empty cell.

~~~
xyzzyz
Its expected value will likely still be linear - since values are distributed
among cells uniformly, and there ratio of empty cells to all cells is bounded
from above (otherwise rebucketting occurs), expected position of the first
nonempty cell is also close to the beginning. Calculating it is a good
exercise, helpful Google keywords are "order statistics of discrete uniform
distribution".

~~~
alexbecker
No, because you're erasing the cells in order. So the first nonempty cell will
get progressively further from the beginning.

------
shiggerino
What exactly am I looking at here?

~~~
johnloeber
Discussions of systems-level applications that turn out to operate in
quadratic (O(n^2)) time. It's a poke at algorithmic inefficiency.

~~~
shiggerino
Ah, should have read past the second paragraph of the first post, my bad.

It's an interesting read, but at the first glance it doesn't look all that
different from a lot of other programming blogs out there. A brief one-
sentence description of the blog's scope under the title would be nice.

~~~
jrochkind1
I believe 'Accidentally quadratic.' is a brief one-sentence description of the
blog's scope.

~~~
shiggerino
Lots of things have names related to topics in computer science. Like Y
Combinator or Stack Overflow. Neither of those represent the respective
companies' entire scopes. Y Combinator has the courtesy toward the uninitiated
to actually write "Y Combinator created a new model for funding early stage
startups." on the first place you lay your eyes on when you visit the page.

------
jheriko
i seem to remember that the (gotcha, and bizarre) link order requirement for
gcc reveals that it has a problem like this where it looks for symbols in
order. although i can't be sure without looking at the code (and i cba)

the obvious gotcha free solution is linear but requires two passes (i'm not
sure thats a bad thing)...

------
Dewie
Nice concept to go along with things like Accidentally Turing Complete.

[http://beza1e1.tuxen.de/articles/accidentally_turing_complet...](http://beza1e1.tuxen.de/articles/accidentally_turing_complete.html)

~~~
cdr
Interesting that it stops at Pokemon Yellow - maybe that's all that was known
at the time. It seems like any video game where you can corrupt memory is
potentially "Turing complete" \- which is many of the old console game as the
speedrunning community has discovered.

