
O(n^2), again, now in Windows Management Instrumentation - harikb
https://randomascii.wordpress.com/2019/12/08/on2-again-now-in-wmi/
======
harikb
A gem

> Dawson’s first law of computing: O(n^2) is the sweet spot of badly scaling
> algorithms: fast enough to make it into production, but slow enough to make
> things fall down once it gets there

~~~
rcthompson
I think the other reason O(n^2) is a "sweet spot" is that it often arises from
one O(n) algorithm calling another O(n) algorithm in each iteration, resulting
in O(n^2) overall. Very often it's ultimately because the inner O(n) algorithm
_should_ have been implemented as O(1), but nobody bothered because it was
never intended to be called in a loop.

~~~
crankylinuxuser
This is absolutely the better analysis. Many times, especially in agile-world,
you get away with things as fast and as reasonable as you can. And that
usually means O(n) as it is perfectly acceptable for a single call... But when
an O(n) calls another O(n) is where you run into trouble.

But at the beginning, nobody was planning for that call to be fast - implicit
requirements lead it that way.

In some ways, being able to identify and detect resource usage is what is nice
about waterfall. Identification of critical API calls and respective timings
would be integral to continue building the GUI elements. But we all know how
waterfall is poo-pooed these days.

~~~
pjc50
> Identification of critical API calls and respective timings would be
> integral to continue building the GUI elements. But we all know how
> waterfall is poo-pooed these days.

If you've worked out which API calls are happening and how often, you've
already written most of the application. It's just that it might be on paper
or in pseudocode.

Waterfall was abandoned because getting to that level of detail takes far too
long for management to accept and it's easier - sometimes orders of magnitude
faster - to write the thing as actual code, try it, and then see what needs
improving. And see what was wrong with the requirements.

~~~
retrovm
That presupposes that you can start with a big pile of bad APIs and somehow
incrementally approach something worth having. That's not consistent with my
experience. In my experience the innermost APIs, the ones that get written
down first, are effectively cast in stone and dictate the quality of the whole
product, forever. The first draft of a system is the one that should have all
its APIs worked out with a pencil before any code is written. You get these
accidental complexity explosions precisely because someone thought it was easy
and obvious to write a function returning std::vector<string> (or whatever)
and didn't consider from the caller's point of view that making the function
take an output iterator might be less complex.

~~~
jacobolus
You are right. It is typically more effective to write one implementation as a
hacky prototype, play with it a bit, throw it away, and then rewrite a
production implementation from scratch, so that the mistakes of a design made
by someone inexperienced don’t get baked in forever.

Unfortunately there are cultural/psychological factors which often preclude or
discourage this method, even if it would save time and produce better code in
the medium term than either exhaustive planning up front uninformed by
practice OR just iterating the initial broken version to continually meet new
requirements.

~~~
BlueTemplar
Lol, ninjaed, see above.

------
ww520
Ah, this brings back the memory of my O(n^2) fiasco. I wrote a low level file
system storage driver in the past. In the caching layer, there's a need to
sort by LRU time for cache eviction. It's a minor thing not run often and I
wanted to move quickly. Also since it's low level kernel mode code, I wanted
the code to be simple and correct. The number of cache entries was not big.
Bubble sort was adequate for small N, to get the job done quickly and simple
enough to ensure correctness. Plus I could replace it later if needed. So it's
done and forgotten.

Things were working fine and performance was good. Then one day Windows
Explorer suddenly hung with 100% CPU for couple seconds. This was one of the
worst kind of bugs. There's no crash to pinpoint the problem. Things still
work most of the times, just slowed down intermittently. Luckily I was able to
catch a slowdown and deliberately crashed the process in time. The call trace
stopped in the bubble sort function. I immediately kicked myself - it's the
classic case of O(n^2) blowup. The cache entries had been scaled up to couple
thousands items and the exponential O(n^2) blowup to tens of million of
iterations was having a real impact. I switched to merge sort and performance
was back to normal.

Edit: I picked merge sort because the worst case was O(n log n), unlike quick
sort whose worst case was O(n^2). Once burnt, needed to be extra careful with
edge cases.

~~~
vecter
n^2 is polynomial, not exponential. If it was truly exponential (which a sort
should never be unless you've decided to solve an arbitrary 3-SAT problem
while sorting a list), then it would've blown up in your face much sooner.

~~~
chx
> which a sort should never be

There's certainly a most awesome sort algorithm which is exponential...

[https://www.dangermouse.net/esoteric/bogobogosort.html](https://www.dangermouse.net/esoteric/bogobogosort.html)

They are not even sure what the complexity is but it's like O(n!^(n-k)) or
O(n*(n!)^n).

~~~
vecter
I'm aware of bogosort and its crazy variants, but let's not be ridiculous
here. You could make any "sorting" algorithm arbitrarily bad if you'd like by
doing such silly but useless things.

~~~
samatman
I propose Boltzmann sort: wait around until the heat death of the universe,
and a Boltzmann brain will emerge from the cosmos, and sort your values while
contemplating infinity.

~~~
thewakalix
Ah, a fully general O(1) algorithm. Amazing.

------
rwmj
I have my own O(n^2) blow-up story. Back in the day I wrote some Google
AdWords analysis software in a functional language. Being written in a
functional language meant that using association lists (linked lists for
key/value) was very natural. Anyway these were only used to load the analysis
inputs which was written in an Excel spreadsheet (the customer insisted on
this!), exported to a CSV and loaded into the software. The spreadsheet was
only perhaps 5 rows so, whatever.

One day I got a call from the customer that his analysis was taking 6 hours to
run, for a program that should have finished in a fraction of a second. It
turned out the customer had tried to load a 30,000 row spreadsheet of input.
The program would on every loop iterate over the input association list,
resulting in classic O(n^2) performance overall.

After changing all the places to use a hash table it again ran in sub-second,
although the code was nowhere near as elegant.

~~~
masklinn
Why would the code be "nowhere near as elegant"? Lack of a functional /
persistent map?

Because I'd expect all maps to provide roughly similar interfaces whether
they're assoc lists, hashmaps, btrees, HAMT, …: iterate all entries, presence
of a key, get value for key, insert (key, value), remove key (and value),
possibly some other niceties on top (e.g. update value in-place, merge maps,
…) but those are extras.

~~~
cogman10
The one benefit of n^2 is it is really easy to do without any additional
memory or data structure requirements.

To make an n^2 algorithm n (or n log n), you pretty much always need to add in
some constant time or logarithmic data structure. That requires tracking that
structure, usually generating a good key, etc.

I'm not saying that's really all that extreme. It just means your previous 5
line algorithm will often "bloat" to 10 or more lines with a new concept to
track. This is where I see some saying "not elegant".

~~~
masklinn
That still makes no sense. __The GP is literally just talking about switching
from an association list to a hashmap __:

> Being written in a functional language meant that using association lists
> (linked lists for key/value) was very natural. […] After changing all the
> places to use a hash table it again ran in sub-second, although the code was
> nowhere near as elegant.

You're not tracking more things (just tracking a hashmap instead of a list),
you're not generating anything different, you don't have any new concept to
track.

All I'm asking is where the apparently significant loss of elegance would come
from.

~~~
monoideism
> apparently significant loss of elegance would come from.

In a word, you're losing a persistent data structure[1]. And this can go
beyond just loss of elegance. Sometimes, functional programs will depend on
immutability of data structures for performance optimizations and even for
functionality (like keeping a version history).

1\.
[https://en.wikipedia.org/wiki/Persistent_data_structure](https://en.wikipedia.org/wiki/Persistent_data_structure)

~~~
jimbo1qaz
When was this story from? From my limited knowledge, I think persistent
hashmaps have been available since Clojure 1.0 (2009) or before? (hash-map,
[https://clojure.github.io/clojure/clojure.core-
api.html#cloj...](https://clojure.github.io/clojure/clojure.core-
api.html#clojure.core/hash-map)).

~~~
monoideism
He was working in OCaml, which even now isn't fortunate to have a persistent
O(1) hashmap like Clojure (and Scala, I think) does in the standard library.

It's true that there are a few 3rd party implementations that I've noticed
before, but not sure how robust and well-tested these are. I'd hesitate to use
them or something I wrote in production without careful review.

In any case, persistent HAMTs are a relatively new phenomenon that are just
starting to catch on, and this story is described as being "back in the day".
It's unlikely this option was available to him even if he were willing to
write an implementation himself or use an untested 3rd-party implementation.

Maybe one will be added eventually to one of the OCaml standard libraries,
which would be great.

------
jedberg
The thing that impresses me most about this writeup is how much
instrumentation there is on Windows now. I stopped using Windows about 15
years ago, but I don't think this kind of analysis was possible back then.

Or maybe I just didn't know the tools well enough?

~~~
AaronFriel
15 years ago? You might have just missed it. Event Tracing for Windows gained
a lot of functionality in Windows Vista, which some[1] have described as the
most forward-looking and instrumental release in Windows history. Vista had a
lot of flaws, but it introduced almost everything fundamental to Windows 7 to
Windows 10, save perhaps the more recent developments in virtualization and
containerization. From differential application-consistent full backups, to
transparent full disk encryption, to the use of a TPM to verify boot
integrity.

Vista was a massive release, but also much maligned so maybe you didn't miss
much leaving when you did. The tooling has certainly gotten a lot better since
then, and so has Windows.

Rather more recently, Windows has gained dtrace support:
[https://techcommunity.microsoft.com/t5/Windows-Kernel-
Intern...](https://techcommunity.microsoft.com/t5/Windows-Kernel-
Internals/DTrace-on-Windows/ba-p/362902)

[1] - See this self-described eulogy by @SwiftOnSecurity:
[https://twitter.com/SwiftOnSecurity/status/85185740489147187...](https://twitter.com/SwiftOnSecurity/status/851857404891471872)

~~~
sedatk
Vista's importance in Windows history can't be overstated. It's regarded as a
failure in the end but it provided a great baseline for the releases following
it. Windows 10 has a much better networking stack, update mechanism, device
driver model, GUI etc, all thanks to Vista.

------
hnick
At work we had a daily script that took about 30 minutes to run, and about 3
hours on the weekend due to more data. One day we got a report that a
downstream service was missing SLA because the script wasn't finishing. I'd
come onto the team late so wasn't aware this was abnormal.

It turns out it was now taking about 1-2 hours daily and 6-12 hours on the
weekend depending on data size. This had been going on for months but
gradually getting worse as the data grew, to the point it was unbearable so
finally reported.

A senior programmer had removed the shell call to sort on an indexed text file
and written their own ad-hoc sorter through every fresh programmer's favourite
(you guessed it) bubble sort. To make things worse, this was perl which has a
perfectly functional sort itself if you really have to do it that way. I still
have no idea why this was done, I don't think asking would have been
productive in that place at that time.

------
peter_d_sherman
Excerpt: "I decided that the interactions were too complex to be worth
analyzing in detail, especially without any thread names to give clues about
what the 25 different threads in svchost.exe (3024) were doing."

Future OS/Compiler Programming Note: It would be nice if threads could be
individually named and those thread names shown/slowed/stopped/debugged in
whatever tool implements Task Manager like functionality...

~~~
retrovm
POSIX threads can be individually named. They inherit the name of their
creator if you don't set one.

~~~
adzm
They can be named in Windows as well, and this author has made pleas for
developers to do so.

~~~
jandrese
The caveat being the developers who are careful enough to name their threads
and implement good debug hooks aren't the ones you get called on to fix their
code.

------
gameswithgo
In the world of database backed web apps a related pattern is asking the
database for some rows for a table, and then asking the database from some
more rows, based on each row you just got. Rather than joining it together
into one query.

Makes it into production but falls down eventually!

~~~
toast0
I've actually had a lot of success going the other way. A database query with
a join takes down production, so do a client side join, query one: get a bunch
of ids from table A, query two: get a bunch of rows from table B, often as get
one row from table B UNION get next row from table B etc.

You can try using IN on the second query, but usually if that was going to
work in a reasonable amount of time, your join would have also worked in a
reasonable amount of time.

The real problem people run into with the client side join is making it query
one get a bunch of ids from table A, query two through N, get one row from B,
with each query requiring a round trip. Even a pretty small client to server
roundtrip of 1 ms gets nasty quick with repeated queries.

~~~
ht85
After going through every pattern this is also my favorite one.

While it can add some perceivable latency if you have many levels of depth, it
is usually a lot lighter on CPU and memory than one big query.

The reason I really like it is because it is very easy to strongly type
results coming from a single table, and processing the data in your
application code allows you to keep the typings through the process of
stitching everything back together.

------
rcthompson
Probably the most important thing I learned about algorithmic complexity
analysis in grad school was that N is not constant, but rather it grows over
time (probably at an exponential rate). That is, as our computing power
increases, we continue to aspire to analyze larger and larger data sets. So
anything beyond O(n*log(n)) that works right now is more or less guaranteed to
blow up in the not-so-distant future, if it remains in use that long.

------
ihaveajob
This reminds me I once wrote a hyper-exponential graph processing algorithm,
in the range of O(2^2^n), and I still believe it was the right choice because:
1) It was much easier to understand than a faster alternative, and 2) I could
mathematically guarantee that the largest value of n was something like 8 or
9, so you could argue that it was in fact a constant time operation.

~~~
vecter
Could you explain more? As the other commenter pointed out, 2^2^9 = 2^512
which is an astronomically large number (far more than the number of atoms of
ordinary matter in the observable universe). Something doesn't seem right.

~~~
mathgladiator
Why couldn't it be (2^2)^9 = 4^9 = 262144 ?

~~~
vecter
Why wouldn't he just write O(4^n) then? That'd be like saying "I counted the
cats in the room and there were 2+2 cats".

------
lousken
seems like building chrome is a loyal pain in the ass, so many bugs found in
windows because of that :)

~~~
techntoke
It's actually pretty easy and there are package manifest in Arch that shows
you how.

~~~
ComputerGuru
That’s just because someone went through that pain for you and scripted and
packaged the entire procedure and rolled up most of the dependencies and their
bills systems. I’ve done it from the official build directions without
containers or prebuilt dependencies and “royal pain in the ass” is an
understatement.

~~~
techntoke
Firefox is a nightmare to package that way.

~~~
ComputerGuru
That’s one nice side effect from the push to convert dependencies/components
to pure rust, one at a time. Even if it doesn’t go all the way, each step
hugely simplifies the build process and minimizes the list of (recursive)
dependencies.

------
JonathonW
Honestly, O(n^2) is probably okay for a diagnostic tool that's only meant to
be run occasionally-- I'd like to know why Google's IT department felt the
need to schedule it to run so frequently.

~~~
loeg
It's sort of unclear why 3 billion bytes needs to be processed in an O(N^2)
algorithm in the first place, but seems especially egregious in that it also
blocks other processes from running.

One other aspect of the issue is that it's unclear why that database is now so
large. It seemed like different machines had different sized databases —
perhaps one angle of the solution is trimming and vacuuming.

~~~
brucedawson
Well, it doesn't block all other processes, but it sure did block a lot of
them.

I agree that finding out why the repository is huge seems worthwhile. I think
it's been growing lately which means it might eventually get to an
unsustainable size. As far as I can tell Microsoft doesn't ship any tools to
make repo-size analysis easy. An open-source tool was suggested in one of the
comments on my blog.

~~~
acqq
> I agree that finding out why the repository is huge seems worthwhile. I
> think it's been growing lately which means it might eventually get to an
> unsustainable size.

Exactly. It's 1.9 GB on your machine, whereas on a plain home-use computer
it's less than 50 MB, i.e 40 _times_ smaller. Something produces all that data
there, what is that, and what's that that's being stored?

------
aneutron
Reminds me of the time we found out a team was running O(n^2) code, but in
terms of SQL Queries, that is to say, not O(n^2) of comparisons or anything,
and were trying to blame the other team when a client sent a request that
never ended.

And they tried blaming them for migrating data to a new server with an SSD
that shaved 20 minutes from their processing time.

And if you're wondering, they refused to fix it because "it would need too
many sprints" and "maybe we'll talk about it in a workshop".

It's still not fixed.

~~~
72deluxe
Funny that you used the word "team" when the outcome is anything but teamwork.

------
fourseventy
I love this guys blog. Been reading it for a while. He is a beast.

------
radicalbyte
Ten years ago the hash-map implementation in Internet Explorer was also O(n2).

I had to inject javascript into some vendor code to avoid this after our
production environment died. I ended up replacing the underlying hash-map into
multiple smaller maps based on a hash of the items. So I'd have 50 maps of
1000 items instead of one map of 50000 items.

~~~
Etheryte
I'm interested to know, what practical application did you load 50.000 items
into the frontend for 10 years ago?

~~~
masklinn
10 years ago is 2009. Large web applications had been breaking out of
intranets (where they'd lived for quite some time at that point) for years at
that point. GMail was launched in 2004, Google Maps in _2005_ (that's also the
year "AJAX" was coined) the JS library war was done and over with (jquery,
prototype, mochikit, mootools, dojo, YUI, … were all released between 2005 and
2006). IE7 was 3 years old, Google Chrome was reaching its first year, IE8 was
just released.

~~~
Etheryte
You seem to be missing the crux of the issue I was trying to address — even
today, loading 50.000 items into a frontend application would be a very, very
niche edge case. 10 years ago even more so. Tacking on arbitrary reference
points from Wikipedia doesn't change any of that.

~~~
masklinn
> You seem to be missing the crux of the issue I was trying to address — even
> today, loading 50.000 items into a frontend application would be a very,
> very niche edge case.

It's a medium-sized inventory, if you need fast / offline access then loading
it to the client makes a lot of sense. I'm sure there are plenty other things
of which you can easily reach 50k, and that you'd want to index or cross-
reference somehow.

------
wakatime
That's why latency is always measured in p99:

[https://stackoverflow.com/questions/12808934/what-
is-p99-lat...](https://stackoverflow.com/questions/12808934/what-
is-p99-latency)

------
mark-r
Any time I hear of an O(n^2) problem I remember Shlemiel the painter.

[https://www.joelonsoftware.com/2001/12/11/back-to-
basics/](https://www.joelonsoftware.com/2001/12/11/back-to-basics/)

------
codetrotter
I have over a 100,000 images on my iPhone and some apps used to be really
really really slow when I selected browse album in them.

One of the apps that used to be maddeningly slow was Instagram, but Instagram
has gotten better lately. No idea why though.

------
equalunique
(not) discussed previously:
[https://news.ycombinator.com/item?id=21740589](https://news.ycombinator.com/item?id=21740589)

------
tinus_hn
Then again, your disk space, processing power and memory is cheap for
Microsoft.

------
zackmorris
If you're building a library or framework, then just assume that worst-case
performance will be encountered by your users, because you can't predict their
use cases.

So for example, I almost always use associative arrays (maps) instead of
lists. I actually really wish there a MAPP language because I view the map as
a potentially better abstraction than the list in LISP, but I digress.

I also tend to use atomic operations instead of locks. Some good starting
points for that are understanding how compare-and-swap (CAS) works, and also
how functional programming with higher order functions and immutable variables
works because that mindset greatly simplifies threading with no shared mutable
state for the Actor model. Lazy evaluation is another good one. Also vector
languages like Gnu Octave and MATLAB are good because they favor a level of
abstraction above the bare-hands programming of C-style languages like C++ and
Javascript so you tend to see that most algorithms are embarrassingly parallel
at some level (especially the things we tend to think of as computationally
expensive like multimedia processing).

Also (this may be controversial) but I think that poor performance can be
politically motivated. For example, I run Safari with Javascript disabled so I
can have thousands of tabs open (it's disabled as I write this). But when I
disable Javascript in Chrome, performance grinds to a halt. You can try it
right now on the Mac by force quitting Chrome with a bunch of tabs open and
relaunching it from Terminal.app with:

    
    
      open -a "Google Chrome" --args --disable-javascript
    

Or manually with:

[https://www.computerhope.com/issues/ch000891.htm](https://www.computerhope.com/issues/ch000891.htm)

I don't know what causes it, but my guess is that Google either wrote some of
the loops under the assumption that Javascript would always be on, or they had
a blind spot in their implementation because so much of their business model
depends on ads having dynamic behavior.

So when you're in a meeting and someone shouts down your concern about edge
case performance because they don't see that as a priority, graciously humor
them and then write your code the right way because you know that it doesn't
take any longer than doing it the wrong way. You might catch some flack during
code review so have a good excuse handy, something about trying it the other
way but running into problems. Often I'll write the easy imperative solution
in a comment above the simple functional solution or even put both solutions
under a preprocessor directive or feature flag to leave it up to the team
lead/project manager and have our keisters covered if/when something melts
down.

------
kortilla
Sounds like they’ll have to increase their leetcode requirements in the
interviewing process. This is one of the “obvious fundamentals” that leetcode
challenges screen for, right? /s

------
stevefan1999
Well, now this show how polynomial time isn't necessarily any better than
O(2^n), or exponential time in reality. Once the k grew bigger in O(n^k), our
modern machine will still struggle to run it.

~~~
loeg
O(N^2) is strictly better than O(2^N). Per the article, this algorithm fell
over at 3 billion bytes of data, but wasn't especially noticeable at 300
million bytes of data.

~~~
saagarjha
Better from an theoretical perspective. Depending on n and any constant
factors, one or the other may end up winning in reality.

~~~
loeg
In practice, not really either. Unless "N" isn't actually variable and is
basically zero.

------
vortico
Batch scripts themselves are O(n^2) because the shell closes and reopens them
after each line (even comments or even blank lines), and the scan to get to
line `n` takes O(n) time and disk bandwidth.

~~~
concerned_user
That is just plain incorrect, local variables would not work in scripts if
that were the case.

~~~
vortico
I definitely remember both reading about and experiencing this years ago, but
I can't seem to find a source and it's driving me crazy (will search more
later). I imagine variables would persist in the shell's memory. I'm talking
about an implementation detail of the batch shell script parser/runner.

I recall that some people would put GOTO statements to jump across large
comment blocks in the days where reading a few KB was noticeable.

~~~
concerned_user
It can be correct if you are talking about subprocesses/subshells of another
batch script/command then yes, separate shell process will be spawned in each
iteration.

~~~
vortico
No, I recall reading that the batch processor closes/re-opens the file, scans
for n carriage returns where n is the current line number, executes that line,
and repeats. These are the sources I found.

[http://xset.tripod.com/tip3.htm](http://xset.tripod.com/tip3.htm)
"COMMAND.COM reads and executes batch files one line at a time; that means
that it reads one line, execute it and rereads the file from the beginning to
the next line." Perhaps this is a really old version of COMMAND.COM? Perhaps
it's poorly stated but actually meant that it "rereads the file from the
beginning _of the next line_ to the next line".

[https://docs.microsoft.com/en-
us/windows/win32/fileio/local-...](https://docs.microsoft.com/en-
us/windows/win32/fileio/local-caching) "Command processors read and execute a
batch file one line at a time. For each line, the command processor opens the
file, searches to the beginning of the line, reads as much as it needs, closes
the file, then executes the line." This also seems to agree with my original
claim.

I tested this with a batch script generated by

    
    
        print("@echo off")
        for i in range(1000):
            for j in range(1000):
                print("rem hello world")
            print(f"echo {i}")
    

and ran it using COMMAND.COM in a Windows 95's VM. It appears to run in linear
time. Either COMMAND.COM was fixed, or both sources are incorrect.

~~~
ygra
Seeking to a file offset is a constant-time operation (assuming no
fragmentation), so re-opening the file and skipping to where you left off
isn't that bad.

And you can certainly write self-modifying batch files, but only appending at
the end is safe, not changing lines before you're currently executing.

~~~
vortico
>re-opening the file and skipping to where you left off

That's what Windows does now. But I think in the past it wasn't that way (as
suggested by the sources). In my tests it appears that the command processor
caches the byte offsets of each line, so it's possible to GOTO any line in the
past without rescanning the whole file.

