Ah, this brings back the memory of my O(n^2) fiasco. I wrote a low level file system storage driver in the past. In the caching layer, there's a need to sort by LRU time for cache eviction. It's a minor thing not run often and I wanted to move quickly. Also since it's low level kernel mode code, I wanted the code to be simple and correct. The number of cache entries was not big. Bubble sort was adequate for small N, to get the job done quickly and simple enough to ensure correctness. Plus I could replace it later if needed. So it's done and forgotten.
Things were working fine and performance was good. Then one day Windows Explorer suddenly hung with 100% CPU for couple seconds. This was one of the worst kind of bugs. There's no crash to pinpoint the problem. Things still work most of the times, just slowed down intermittently. Luckily I was able to catch a slowdown and deliberately crashed the process in time. The call trace stopped in the bubble sort function. I immediately kicked myself - it's the classic case of O(n^2) blowup. The cache entries had been scaled up to couple thousands items and the exponential O(n^2) blowup to tens of million of iterations was having a real impact. I switched to merge sort and performance was back to normal.
Edit: I picked merge sort because the worst case was O(n log n), unlike quick sort whose worst case was O(n^2). Once burnt, needed to be extra careful with edge cases.
I got the honor to file a bug against Together's design tool. We had a project that was a monorepo with a few dozen distinct compilation units in it and some had complained that the more modules you added to the project the longer it would take to open. One person was up to half an hour and didn't even have the whole thing.
I had a slow week babysitting something else so I just let it chew away in the background, while doing other things like documentation and requirements work, noting down the time, adding one more module and running it again. The last one I was willing to do took 18 hours to load. It was still running when I got back to the office the following day.
To this day, I can't recall ever seeing any production performance bug with greater than n cubed complexity. This bug progressed at n to the fifth. Truly, a thing of singular beauty.
Thankfully they had a workaround that dropped it to something like n^2 (30 minutes became less than 5) and a bug fix not too long after.
n^2 is polynomial, not exponential. If it was truly exponential (which a sort should never be unless you've decided to solve an arbitrary 3-SAT problem while sorting a list), then it would've blown up in your face much sooner.
I'm aware of bogosort and its crazy variants, but let's not be ridiculous here. You could make any "sorting" algorithm arbitrarily bad if you'd like by doing such silly but useless things.
I propose Boltzmann sort: wait around until the heat death of the universe, and a Boltzmann brain will emerge from the cosmos, and sort your values while contemplating infinity.
You're correct n^2 is polynomial, not exponential. It was just an exaggerated figure of speech. BTW, O(n) and O(n log n) are also polynomial; it just doesn't have the punch to it.
Combinatorial? Seriously? The difference between exponential and cominatorial is essentially purely theoretical when it comes to computational complexity. In practice, even the difference between n and n log n is barely relevant, and the exponential and combinatorial are the exponentiation thereof. Nobody ever, ever deals with processes large enough where the difference both matters, and the process completes. Even a tiny scaling factor in the exponent and a moderate base means the point at which a unscaled combinatorial passes it will take more steps that you might as well be waiting for the heat death of the universe.
But even without any scaling factors, don't underestimate the size of the numbers involved. If you had two threads, one simply counting up to some n! upper bound, and the other up to 10^n - how long do you think the minimum problem size would take for the "slow" combinatorial to actually be slower than the exponentiation? Hint: at 4GHz and 1 op/cycle... Let's just say I'm not holding my breath that our species will still be around then. And even a tiny scaling factor to the mix...
I mean, if you're trying to estimate computational complexity at least, this nuance seems pointless.
But quadratic vs. exponential really matters, with plausible parameters.
With n = 26, the combinatorial process takes 4 times as long. 4 x 10^26 "counts". Googles SHA1 collision attack two years ago used roughly 10^19 computations. So we're off by a factor of 10^7.
I'm curious if the rise of quantum computers will make the differences between exponential and combinatorial meaningful in a practical sense.
I actually thought the most likely intended meaning was that an algorithm in O(a^n) must take asymptotically longer than one in O(b^n) as n goes to infinity (if a > b), which isn't true. (For example, when a > b, then every algorithm in O(b^n) is also in O(a^n), but obviously no algorithm can asymptotically require more time than itself.)
I once coded a bubble sort in an application where I expected the worst case to be n=4. Later that assumption was invalidated, but I had long ago moved on to something else. I hope the person who inherited my code doesn't still hate me.
Bubble sort is popular because it's provably optimal... under extremely restrictive circumstances: https://stackoverflow.com/a/3274203
Those circumstances stopped being semi-relevant when even the cheapest computers started using (floppy) disk drives instead of tape drives, but somehow bubble sort is still being taught.
It's probably because the implementation of the algorithm is the simplest, because you don't need to modify the existing data structure. You don't need to merge or create new arrays. You just repeatedly loop and swap two elements when needed. Super easy by all metrics.
Bubble sort is the first one I learned, so I remember it the best. I never spent any time comparing any of the O(n^2) sorts so it never occurred to me to try a different one.
This was quite a while ago, and the library sort was QSort which had a lot of overhead due to its calling convention. Today I'd have no problem using std::sort instead.
Sure, and the wikipedia page makes mention of that. But it's not super relevant to the point I'm making about the worst-case complexity of quick sort (which is often misunderstood to be O(n^2)).
Would it be correct to say it's a quadratic complexity causing the run time to grow exponentially? Or is the word exponential still wrong here. My Mathematics terminology is rather.. dusty.
That is wrong, because quadratic means that the variable is in the base, while exponential will tell you, that the variable is in the exponent (just as the name tells).
That makes a lot more sense! Actually rings a bell from when I ran in to the same wrongly-remembered wording in uni.
I'm not sure why, but getting back to the right terminology/vocabulary just to get back in to the basics of mathematics is a lot harder for me than it was 20 years ago.
If you casually want to get into it without the commitment, I've found that even passively watching the Tim Roughgarden videos from Coursera (probably can be found crossposted to YouTube by third parties) to be stimulating in the past.
If you seek to acquire formality and rigor, I can't help you there; I figure shortcuts will be nonexistant ;)
Don't fret it. Imo people are being unnecessarily curt when trying to flex their pedantry here, which we all love to do, myself included.
Of course, poly still means there is an exponential in the time complexity.
It's really nothing beyond the main takeaway that the usual norm to describe something as having "exponential" growth is that the number of computations increases in order of magnitude every time you add but a single item to the list of inputs.
Yes, radix is faster. But at that point I was looking for stability and simplicity over all else. Merge sort was stable and simple to implement, with decent performance.
but merge sort is not in-place (at least not any remotely practical variant); heap sort is simple, in-place and O(n log n), although it is not a stable sort as merge sort.
> Dawson’s first law of computing: O(n^2) is the sweet spot of badly scaling algorithms: fast enough to make it into production, but slow enough to make things fall down once it gets there
I think the other reason O(n^2) is a "sweet spot" is that it often arises from one O(n) algorithm calling another O(n) algorithm in each iteration, resulting in O(n^2) overall. Very often it's ultimately because the inner O(n) algorithm should have been implemented as O(1), but nobody bothered because it was never intended to be called in a loop.
This is absolutely the better analysis. Many times, especially in agile-world, you get away with things as fast and as reasonable as you can. And that usually means O(n) as it is perfectly acceptable for a single call... But when an O(n) calls another O(n) is where you run into trouble.
But at the beginning, nobody was planning for that call to be fast - implicit requirements lead it that way.
In some ways, being able to identify and detect resource usage is what is nice about waterfall. Identification of critical API calls and respective timings would be integral to continue building the GUI elements. But we all know how waterfall is poo-pooed these days.
> Identification of critical API calls and respective timings would be integral to continue building the GUI elements. But we all know how waterfall is poo-pooed these days.
If you've worked out which API calls are happening and how often, you've already written most of the application. It's just that it might be on paper or in pseudocode.
Waterfall was abandoned because getting to that level of detail takes far too long for management to accept and it's easier - sometimes orders of magnitude faster - to write the thing as actual code, try it, and then see what needs improving. And see what was wrong with the requirements.
That presupposes that you can start with a big pile of bad APIs and somehow incrementally approach something worth having. That's not consistent with my experience. In my experience the innermost APIs, the ones that get written down first, are effectively cast in stone and dictate the quality of the whole product, forever. The first draft of a system is the one that should have all its APIs worked out with a pencil before any code is written. You get these accidental complexity explosions precisely because someone thought it was easy and obvious to write a function returning std::vector<string> (or whatever) and didn't consider from the caller's point of view that making the function take an output iterator might be less complex.
You are right. It is typically more effective to write one implementation as a hacky prototype, play with it a bit, throw it away, and then rewrite a production implementation from scratch, so that the mistakes of a design made by someone inexperienced don’t get baked in forever.
Unfortunately there are cultural/psychological factors which often preclude or discourage this method, even if it would save time and produce better code in the medium term than either exhaustive planning up front uninformed by practice OR just iterating the initial broken version to continually meet new requirements.
I've been taught waterfall a couple of years ago. We were supposed to do our project using waterfall (but at least with weekly meetings with the "customers").
My suggestion of building a first test version to, amongst other things, at least get our feets wet with the programming language (which most of us had hardly any experience with), then throw it completely away and restart "from scratch" (but with a better idea as of what we were supposed to do) was rejected.
Waterfall refers more to the fact that further work (development, testing, or deployment) has to wait for someone else to finish something else first. Multiply that by how many “steps” something has to go through before being released. This implies everyone is taking work in large chunks instead of small iterations, and that too much is being planned too early instead of learning as you go.
Taking things iteratively, small iterations, delaying decisions, learning as you go — that’s what agile development is. And all the stuff you mention above is possible when developing with agility.
I feel like the concept of an MVP is often abused. It shouldn't be a thing that somehow works with bubblegum and luck; it should still be "solid". The point ought to be to leave out features, not quality.
When you leave out features and quality, you have a proof of concept.
Those are valuable things to build, IMO. Just make sure you leave out enough features that it cannot be pushed into production once someone else sees it running.
Right, if we consider O(n) "ideal" (for many domains, this is true), we can consider processing per item as a more convenient metric. If you go above, say, O(log n) processing per item, you're in trouble proportionate to the unintended success of this portion of the code. Very easy to write a sloppy utility with a nice API, only for it to be 'the perfect fit' for someone else's tight deadline, become an idiom in the code, finally see a 10,000 entry example much later
That's a good point. It explains why you can very often unintentionally end up with an O(n^2) algorithm in the first place, and Dawson's law then explains why no one bothers to fix it until it's too late.
I often see O(n^2) algorithms that can be reduced to O(2n) at the very least. One of the best things I gained from school was the red flag that fires off in my mind any time I see a loop nested in a loop.
A little experience gets you that same red flag instinct in a hurry, too.
[EDIT] it also makes you hesitate & second-guess and worry and experiment a bunch when contemplating using a recursive algorithm, which is deeply counterproductive in interviews where the expected behavior is so often "apply recursion, instantly and without hesitation" :-)
Unfortunately interviews seem to be all about dogma, and if you even mention performance you will get dinged for premature optimization.
It’s like a postmodern religion where premature optimization is the cardinal sin, and you are supposed to burn as many cycles as possible to demonstrate that you are of good faith
On the flipside, I've met different sorts of interviewers that ding someone for writing something intentionally unoptimized in the spirit of avoiding premature optimization for algorithmic clarity, because don't they have rote knowledge that "x is faster". Dogma on every side of the fence. (Also, continuing proof that technical interviews will never be objective evaluations of skill.)
Yet, we understand exactly what he means. Sometimes it's useful to show an un-reduced intermediate step to facilitate understanding, even if you could reduce it further.
Eh, you're technically correct but when used as a point of relative comparison to an O(n²) algorithm I think there's some merit to it, because it actually gives some sense of constant overhead within the context.
The best kind of correct. Calling it O(n) then is the best way to express the performance improvement - I think that's the whole point of Big-O notation.
However I agree that for smaller n one should not underestimate the constant factor impact.
Actually O(2n) did not convey this for me at all, all it did for me was cause confusion. If this ("turn it into two not nested loops") was indeed intended, I'd suggest to spell it out, not abuse notation
> If this ("turn it into two not nested loops") was indeed intended, I'd suggest to spell it out, not abuse notation
O(2n) is not an abuse of notation. It's just as well defined as O(n). The fact that O(2n) is a subset of O(n) is a theorem, not part of the definition of the notation.
Interesting, I hear people spelling out constants all the time when using the O notation to put emphasis where they want. I guess that's not really as universal as I thought, but I still think it's a good way to express.
It can cause more confusion though, if say you write O(kn), but the oh-notation hides a factor k^3, then it would be better to just write O(n). Or write O_k(n) to signify a hidden dependency on k. Or best of all, just write out k^4 n without any ohs at all.
What if the constant were massive? Then the coefficient might not be significant in practise. O(2n) might be worse than O(n+99999999), they both reduce to O(n) though. It is a classification.
Personally I would prefer to use different semantics to express that. It seems big O notation often carries a separate meaning in parlance.
I thought big O notation was about classifying function growth irrespective of the coefficient/constant. Does it not grow at all with respect to n? great you are O(1). Does it grow linearly? cool you are O(n). Logarithmically? Quadratically? In my mind, this kind of analysis aligns with big O notation.
If constant factors are significant in practice and the size of the input is not significant, then big O notation is not the right tool for the analysis, and that's perfectly fine. For instance, we generally don't talk about big O notation when comparing the performance of SSDs and spinning HDs.
Big O notation is for describing how the performance of an algorithm changes as the size of its input changes. If the size of the input is not a significant concern, then it's totally fine to not use big O notation for analyzing the problem.
In reality, O(n) and O(2n) can be quite different for small n. As can O(n)+k0 and O(2n)+k1. Or worse, O(n^2)+k2 where sufficiently large k0 and k1 make the quadratic system better because it's k2 constant is so much smaller. Setup time matters.
Nowadays, you rarely have enough elements that the asymptotic behavior is the defining performance characteristic.
O notation is asymptotic by definition. The sentence
“O(n) for small n” is completely
meaningless. O(n) cannot never be different from O(2n) because they refer to the exact same set of algorithms.
Theoretically yes but practically no. Theoretically Big O notation is defined as a limit that goes to infinity but in practice there's no such thing as infinity and saying that an algorithm's runtime is in O(n) implies certain things about its behaviour for finite n, that's kind of the point. And it's perfectly possible for an algorithm to be defined piecewise on the input size and so saying it has runtime O(n) for small n is a statement that has practical significance.
sometimes it was correct to choose O(n) instead of O(1). if you know you're only going to have very few elements, linear search on a vector can be faster than a hashtable lookup or rb-tree traversal and use less memory for the data structure. later on some clown decides to expose your private function for their own convenience and starts calling it with thousands of elements.
You're not wrong, but IMHO the proper place for that sort of optimization is inside the dictionary class/module/whatever. It knows why the optimization is there and the conditions under which it makes sense.
For application code it's probably better to express clearly what you're trying to do, in this case keyed lookup. Throwing in a linear search at that level isn't just dangerous, it's potentially confusing to people reading the code later.
Depends. Sometimes the key isn't easily hash or comparable. For example, I know of a few case where the keys looks something like "if (a > b and c < d) or (a < b and d == b)".
Now, that doesn't mean you can't hash a, b, c, and d. It just means that the logic around doing the lookups is nontrivial.
When you are talking about < 100 elements, sometimes the consideration is "Welp, this is < 100, so lets just n^2 it".
I'm confused what the variables are in this example—what's already in the container vs. what is being looked up. If we conceptually don't even have a proper mapping to begin with, then it doesn't make sense to ask whether it's hashable or comparable or not in the first place. If we do—then what is the "key" in this example?
Let me preface this by saying I work in finance and rules are weird.
This comes up frequently when looking for fuzzy "duplicates".
For example, sometimes you get data from 2 sources, one will send it through with 0.01 precision. Another will send it through with 0.001 precision. You can't hash or compare that.
Things get more tricky when different finance institutes used different terms for the same entity.
That doesn't mean you can't avoid n^2 duplicate searches. It just means you have to be smart about how you group things.
Right, so you have an application where you don't genuinely have a dictionary -- e.g., inserting items in a different order can result in a very different outcome even when all the keys are distinct, or even worse, inserting two keys and then removing a third can give entirely different results depending on the order in which you inserted the first two. (Or other arbitrary things depending on your application.) That can of course come up; it's just been quite confusing for me to see a fuzzy search scenario like this presented as a counterexample when (at least to me) the discussion seemed to be about proper mappings.
A useful heuristic from my CS professor: always treat every tab (indentation / complexity) to be equivalent to a cart that you are attaching your horse. Add enough carts (complexity) and you will soon be asking yourself why your horse isnt a nuclear reactor instead?
Programming languages have native support for O(n) algorithms in the form of for loops. You can compose as many of those as you want to get a huge polynomial, but you have to go out of your way and do something a bit weird to get an exponential run time.
I don't quite understand your code, but if I'm guessing correctly that n is an integer and "for i in n" means in human terms "for all non-negative integers less than n",
that is not exponential, that is actually factorial, which is superexponential. Uh, or close to factorial, I'm not sure exactly. Testing it in python right now, incrementing a counter each time whoops is called, I'm seeing a relationship that looks like
whoops(n) == n * whoops(n-1) + 1
That is interesting.
I don't think that's a real insight. You could equally say that O(n^3) should be a sweet spot because you can nest a loop 3 deep, and so on for n^4, n^5 and so on. But we don't see these cases making it out into the wild because they're caught in testing.
Or if you've already made allowances for the memory usage, you can also check to see if you can write your O(n^2) nested loops as O(n) + O(n) depending on the shape of your data.
I have my own O(n^2) blow-up story. Back in the day I wrote some Google AdWords analysis software in a functional language. Being written in a functional language meant that using association lists (linked lists for key/value) was very natural. Anyway these were only used to load the analysis inputs which was written in an Excel spreadsheet (the customer insisted on this!), exported to a CSV and loaded into the software. The spreadsheet was only perhaps 5 rows so, whatever.
One day I got a call from the customer that his analysis was taking 6 hours to run, for a program that should have finished in a fraction of a second. It turned out the customer had tried to load a 30,000 row spreadsheet of input. The program would on every loop iterate over the input association list, resulting in classic O(n^2) performance overall.
After changing all the places to use a hash table it again ran in sub-second, although the code was nowhere near as elegant.
Why would the code be "nowhere near as elegant"? Lack of a functional / persistent map?
Because I'd expect all maps to provide roughly similar interfaces whether they're assoc lists, hashmaps, btrees, HAMT, …: iterate all entries, presence of a key, get value for key, insert (key, value), remove key (and value), possibly some other niceties on top (e.g. update value in-place, merge maps, …) but those are extras.
To answer this I had to remember how to use CVS again because the code was from 2004 ... It turns out that (for reasons I don't recall now) I modified the code to limit what could appear in the spreadsheet and use an alternate algorithm only when there were more than 500 lines in the spreadsheet. The final code was below. Unfortunately I lost the actual change, but you can work it out:
let max_map_regexp = 500 (* lines - see below *)
(* Format of the map_file. *)
type map_file_t =
| Map of ((Pcre.regexp * Pcre.regexp *
Pcre.regexp * Pcre.regexp) *
(float option * bool * Inp.params)) list
| Hash of (string * string * string * string,
float option * bool * Inp.params) Hashtbl.t
let run map_filename =
(* Parse the map file. *)
let csv = Csv.load map_filename in
if Csv.lines csv < 2 then
failwith (map_filename ^
": bid file should contain headings and at least one row of data");
let csv' = List.tl csv in (* Ignore headings. *)
...
(* If the map file is > max_map_regexp lines long then regular
* expressions are banned and a more efficient hash table format
* is used for lookups. Otherwise the program takes far too long
* to run.
*)
let map_file =
if List.length map_file <= max_map_regexp then (
Map (List.map (
fun ((c_name, ag_name, kw_text, kw_type), a) ->
(compile c_name, compile ag_name, compile kw_text,
compile kw_type), a
) map_file)
) else (
(* Regexps banned from large map files. *)
Hash (
let h = Hashtbl.create (List.length map_file) in
List.iter (
fun ((c_name, ag_name, kw_text, kw_type), a) ->
Hashtbl.add h (c_name, ag_name, kw_text,
String.lowercase kw_type) a
) map_file;
h
)
) in
The one benefit of n^2 is it is really easy to do without any additional memory or data structure requirements.
To make an n^2 algorithm n (or n log n), you pretty much always need to add in some constant time or logarithmic data structure. That requires tracking that structure, usually generating a good key, etc.
I'm not saying that's really all that extreme. It just means your previous 5 line algorithm will often "bloat" to 10 or more lines with a new concept to track. This is where I see some saying "not elegant".
That still makes no sense. The GP is literally just talking about switching from an association list to a hashmap:
> Being written in a functional language meant that using association lists (linked lists for key/value) was very natural. […] After changing all the places to use a hash table it again ran in sub-second, although the code was nowhere near as elegant.
You're not tracking more things (just tracking a hashmap instead of a list), you're not generating anything different, you don't have any new concept to track.
All I'm asking is where the apparently significant loss of elegance would come from.
> apparently significant loss of elegance would come from.
In a word, you're losing a persistent data structure[1]. And this can go beyond just loss of elegance. Sometimes, functional programs will depend on immutability of data structures for performance optimizations and even for functionality (like keeping a version history).
He was working in OCaml, which even now isn't fortunate to have a persistent O(1) hashmap like Clojure (and Scala, I think) does in the standard library.
It's true that there are a few 3rd party implementations that I've noticed before, but not sure how robust and well-tested these are. I'd hesitate to use them or something I wrote in production without careful review.
In any case, persistent HAMTs are a relatively new phenomenon that are just starting to catch on, and this story is described as being "back in the day". It's unlikely this option was available to him even if he were willing to write an implementation himself or use an untested 3rd-party implementation.
Maybe one will be added eventually to one of the OCaml standard libraries, which would be great.
I am not OP but I definitely have been in similar circumstances where an algorithm became slightly more complex when adding a hash map to cover performance issues with a list.
A trivial example, I recall about 1 month ago calling it out on a Javascript code review where someone was doing a `someList.find(x => x === "something")` inside a loop creating O(n^2) complexity. Rather than change the entire codebase wherever `someList` is used into a new type it is often easier just to suggest we build a Set (linear on length of `someList`) before the loop and use that for lookups within the loop (constant).
Of course, I am talking about circumstances clearly outside of your original model of OPs description. I am tracking more things (the new Set). However, I realize that to give the full context as to why the type of `someList` from my trivial example _couldn't_ be changed easily and why creating a new Set was the most prudent option would require a comment even longer than this essay. So I give OP the benefit of the doubt that he was in a similar situation where using the hashmap everywhere was either difficult or impossible.
Not OP, but I'm guessing it's because linked lists are recursive and hashtables are not, so in a functional programming language you can do lots of clever things in a recursive way.
I don't understand how or why. What's the "parsing stage"? You have an associative array, a key, a value, and a function to associate the k, v pair to the array. You're moving from
let m' = (k, v) : m
to
let m' = insert k v m
What there would make the code "nowhere near as elegant"?
It's a function of the language. If you're working a Lisp, association lists are very much the "default" for mappings, and the style of code built around them is very clear as a result (in most lisp-like languages, at least). For example, there's no special constructor for an alist: you just create a list, which can be done using the lisp syntax itself. Whereas for something like a hash table, you likely have to call a constructor or something. It's all just a consequence of lisp's syntax and structure being largely built around list manipulation.
In contrast, languages like Perl or Python have literals and syntax support for both lists and mapping-type data structures (hashes in Perl, dicts in Python), so using either one is roughly equally elegant, and you are more free to choose one or the other based on performance concerns.
But well. In Haskell assotiation lists are also the default, as they don't impose any stictness and are very fast to iterate. So that is what you get from libraries.
If you need a map, you will have to first construct it, then run your code. But of course, if you are the one creating the lists, it does change very little.
The thing that impresses me most about this writeup is how much instrumentation there is on Windows now. I stopped using Windows about 15 years ago, but I don't think this kind of analysis was possible back then.
Or maybe I just didn't know the tools well enough?
15 years ago? You might have just missed it. Event Tracing for Windows gained a lot of functionality in Windows Vista, which some[1] have described as the most forward-looking and instrumental release in Windows history. Vista had a lot of flaws, but it introduced almost everything fundamental to Windows 7 to Windows 10, save perhaps the more recent developments in virtualization and containerization. From differential application-consistent full backups, to transparent full disk encryption, to the use of a TPM to verify boot integrity.
Vista was a massive release, but also much maligned so maybe you didn't miss much leaving when you did. The tooling has certainly gotten a lot better since then, and so has Windows.
Vista's importance in Windows history can't be overstated. It's regarded as a failure in the end but it provided a great baseline for the releases following it. Windows 10 has a much better networking stack, update mechanism, device driver model, GUI etc, all thanks to Vista.
Same here. WMI and perf counters have been around a long time though.
I recently used Win10 for work and was amazed at the combination of awe-inspiring tech and plain awe-full complexity. Just the control panel goes four layers deep attempting to make things easier but in practice is several times harder to find options than it was in Win2k.
If there were a distribution with all the corporate goals stripped out I'd jump on it.
At work we had a daily script that took about 30 minutes to run, and about 3 hours on the weekend due to more data. One day we got a report that a downstream service was missing SLA because the script wasn't finishing. I'd come onto the team late so wasn't aware this was abnormal.
It turns out it was now taking about 1-2 hours daily and 6-12 hours on the weekend depending on data size. This had been going on for months but gradually getting worse as the data grew, to the point it was unbearable so finally reported.
A senior programmer had removed the shell call to sort on an indexed text file and written their own ad-hoc sorter through every fresh programmer's favourite (you guessed it) bubble sort. To make things worse, this was perl which has a perfectly functional sort itself if you really have to do it that way. I still have no idea why this was done, I don't think asking would have been productive in that place at that time.
Excerpt: "I decided that the interactions were too complex to be worth analyzing in detail, especially without any thread names to give clues about what the 25 different threads in svchost.exe (3024) were doing."
Future OS/Compiler Programming Note: It would be nice if threads could be individually named and those thread names shown/slowed/stopped/debugged in whatever tool implements Task Manager like functionality...
Windows had a janky way of naming threads before 10, but it has a supported API now.
The problem is that svchost is a host that runs a thread pool with various services being invoked as needed. You'd have to rename each thread every time you ran a function for a service.
That's probably more doable now with the supported API. The previous way involved raising an exception, so I can see why it wasn't done.
Seems like Windows has a more systemic testing problem though. We used to stress-test the crap out of various NT components back in the day. Don't know if the elimination of SDETs means there's no one doing that other than the NT perf team, which never did targeted stress.
This was well known, and many applications actually used that. Of course it was very ugly.
I don't think it would have been a problem to also rename a given thread multiple times, but not sure.
Funny thing, windbg apparently uses a fixed-size buffer to store the old-style thread names and it must use strncpy to copy them in because if your thread names are long enough then windbg displays the first ~15 characters and then shows garbage, because strncpy doesn't guarantee null-termination.
The "thread names" clause was a hyperlink to an article I wrote a few years listing all of the problems with thread naming on Windows. Microsoft took the suggestions seriously and basically landed all of the features and fixes that I requested, so now when I'm profiling or debugging Chrome the threads all have nice names.
I just kinda like trolling Microsoft for not using their own thread naming API very much.
I appreciate your efforts then. It's been a while since I was doing Windows development, but I remember staring at a bunch of hex values for thread IDs and despairing.
The caveat being the developers who are careful enough to name their threads and implement good debug hooks aren't the ones you get called on to fix their code.
As an aside, POSIX/Linux thread names have a limit of 16 bytes, which means 15 characters plus NULL. Speaking from experience, this can and will bite you when you need the names the most, if e.g. they are long, have just one or a handful of common prefixes and their creation is not entirely under your control.
That would help, but I think the root problem here (of the debugging difficulty, not the actual perf problem) is the Windows approach of having lots of unrelated services running together in a single process (svchost)
Since Windows 10 1703/1709, if you have at least 4G of RAM or so, every service will run in its own svchost (for the most part there may be some exceptions). There's some registry key to tweak how much RAM is needed to trigger this.
In the world of database backed web apps a related pattern is asking the database for some rows for a table, and then asking the database from some more rows, based on each row you just got. Rather than joining it together into one query.
Makes it into production but falls down eventually!
There are sometimes reasons why you don't want to join, such as large data parent row sizes that get repeated a lot in if there's a lot of child relations.
But that should be two queries. One for all the parents, and one to get the children with a sub-select (or at worst a manually generated parent id list), and then you can join them manually (if actually needed and you can't just use the data sets as they are).
you are right, APIs often use o(n) algorithms in an individual API call that is meant to execute once.
what happens, is that API users start executing that endpoint N times and here it becomes O(n^2).
people should remember that APIs are for CRUD calls. If you want batch reports - call a separate reporting endpoints that process data in large batches.
I've actually had a lot of success going the other way. A database query with a join takes down production, so do a client side join, query one: get a bunch of ids from table A, query two: get a bunch of rows from table B, often as get one row from table B UNION get next row from table B etc.
You can try using IN on the second query, but usually if that was going to work in a reasonable amount of time, your join would have also worked in a reasonable amount of time.
The real problem people run into with the client side join is making it query one get a bunch of ids from table A, query two through N, get one row from B, with each query requiring a round trip. Even a pretty small client to server roundtrip of 1 ms gets nasty quick with repeated queries.
After going through every pattern this is also my favorite one.
While it can add some perceivable latency if you have many levels of depth, it is usually a lot lighter on CPU and memory than one big query.
The reason I really like it is because it is very easy to strongly type results coming from a single table, and processing the data in your application code allows you to keep the typings through the process of stitching everything back together.
> You can try using IN on the second query, but usually if that was going to work in a reasonable amount of time, your join would have also worked in a reasonable amount of time.
Depends on whether your database is reasonable or not.
Sometimes "foo IN (1,2,3,4,5)" will do five index lookups, while a JOIN will check every single "foo" in the table.
For bonus points it will also convert "IN ([independent subquery])" into the same JOIN. So just by reminding the database that a thing inside parentheses gets evaluated first, you can make it go a thousand times faster. Not a single other change to the query.
I suspect you're just working around an issue with the query or the DB itself.
Previous company had to do something like that because of an Oracle perf bug way back (outer join issue I believe?), but they fixed it eventually and it was all deleted.
I had this on Magento 2.3.3 for a "custom stock" module that was running loops to grab data for a product and its child configurable products. As Magento is written in a modular way, asking a subcomponent for data can involve a database lookup so it is really slow to use and painful to debug.
Anyway, it was running 29000 database queries as it was repeatedly using an "in" clause of 1 item across hundreds of queries, instead of 1 query with hundreds of items in the "in" clause.
I replaced them with 1 query that fetched all it needed instead of the 29000 queries (and 8000+ which were duplicates).
Probably the most important thing I learned about algorithmic complexity analysis in grad school was that N is not constant, but rather it grows over time (probably at an exponential rate). That is, as our computing power increases, we continue to aspire to analyze larger and larger data sets. So anything beyond O(n*log(n)) that works right now is more or less guaranteed to blow up in the not-so-distant future, if it remains in use that long.
This reminds me I once wrote a hyper-exponential graph processing algorithm, in the range of O(2^2^n), and I still believe it was the right choice because: 1) It was much easier to understand than a faster alternative, and 2) I could mathematically guarantee that the largest value of n was something like 8 or 9, so you could argue that it was in fact a constant time operation.
Could you explain more? As the other commenter pointed out, 2^2^9 = 2^512 which is an astronomically large number (far more than the number of atoms of ordinary matter in the observable universe). Something doesn't seem right.
That's certainly not the standard. If anything, exponentiation is usually performed right to left, with some exceptions [0]. However, the fact that we even have carets in this conversation isn't because the GP wanted to adopt a left-to-right convention, but due to limitations in the richness of HN's text editor.
If you write a tower of 2^2^9 on a whiteboard and ask 1000 mathematicians and computer scientists to evaluate it, I'm sure 999 or 1000 of them would evaluate it as 2^(2^9).
I'm a mathematician and a computer scientist, so I must be one in a thousand. The link even indicates that ambiguity exists in wild, and I think clear notation would help.
In this case, the absurdity of the number suggests a more realistic number.
Don't you think it's more likely that the original poster made a mistake and that a graph algorithm isn't actually O(2^2^n)? I can't name a single meaningful algorithm that has that time complexity.
That’s just because someone went through that pain for you and scripted and packaged the entire procedure and rolled up most of the dependencies and their bills systems. I’ve done it from the official build directions without containers or prebuilt dependencies and “royal pain in the ass” is an understatement.
That’s one nice side effect from the push to convert dependencies/components to pure rust, one at a time. Even if it doesn’t go all the way, each step hugely simplifies the build process and minimizes the list of (recursive) dependencies.
Honestly, O(n^2) is probably okay for a diagnostic tool that's only meant to be run occasionally-- I'd like to know why Google's IT department felt the need to schedule it to run so frequently.
That could be underestimating the truly catastrophic nature of O(n^2). Blows up very very fast. From milliseconds to seconds to days to decades. Pretty much anything that grows at all, must not be subject to the terrible power of O(n^2)
The problem is that available disk space is growing exponentially over the years with a current doubling time of around 18 months for flash drives. And therefore repository size has been as well. What today are 1 GB repositories fairly predictably will be 10+ GB repositories in 5 years. And 100+ GB in 10 years. Which means that the period of locking your computer predictably will go from minutes to hours to days in the same time period.
What is kinda OK today is going to definitely not be OK going forward. This algorithm has to get fixed.
It's sort of unclear why 3 billion bytes needs to be processed in an O(N^2) algorithm in the first place, but seems especially egregious in that it also blocks other processes from running.
One other aspect of the issue is that it's unclear why that database is now so large. It seemed like different machines had different sized databases — perhaps one angle of the solution is trimming and vacuuming.
Well, it doesn't block all other processes, but it sure did block a lot of them.
I agree that finding out why the repository is huge seems worthwhile. I think it's been growing lately which means it might eventually get to an unsustainable size. As far as I can tell Microsoft doesn't ship any tools to make repo-size analysis easy. An open-source tool was suggested in one of the comments on my blog.
> I agree that finding out why the repository is huge seems worthwhile. I think it's been growing lately which means it might eventually get to an unsustainable size.
Exactly. It's 1.9 GB on your machine, whereas on a plain home-use computer it's less than 50 MB, i.e 40 times smaller. Something produces all that data there, what is that, and what's that that's being stored?
WMI is not a diagnostic tool. It’s an abstraction/API and lots of developers unfortunately the build over it rather than directly against the underlying WIN32 APIs.
He used that to manually reproduce, but the original behaviour wasn't caused by manually calling the winmgmt executable with the /verifyrepository switch.
Instead the repo was being verified as a side effect of another WMI operation.
It sure sounded like they had a scheduled task to run it manually every hour. "Until then our IT department has promised to stop running the /verifyrepository command every hour, which should avoid the worst symptoms."
That’s just a manifestation of the issue and not necessarily the only symptom. Dawson used it to explicitly illustrate the wbem size to slowdown effect.
The reason I’m hating on developers that use WMI when alternatives are available is because WMI is dog slow and everyone that does low-level Windows development knows it.
That's the part that seemed insane to me. Dial it back so it only runs at 4AM every day and nobody would have noticed. Running it every hour seems like outright paranoia by the IT staff.
Reminds me of the time we found out a team was running O(n^2) code, but in terms of SQL Queries, that is to say, not O(n^2) of comparisons or anything, and were trying to blame the other team when a client sent a request that never ended.
And they tried blaming them for migrating data to a new server with an SSD that shaved 20 minutes from their processing time.
And if you're wondering, they refused to fix it because "it would need too many sprints" and "maybe we'll talk about it in a workshop".
Ten years ago the hash-map implementation in Internet Explorer was also O(n2).
I had to inject javascript into some vendor code to avoid this after our production environment died. I ended up replacing the underlying hash-map into multiple smaller maps based on a hash of the items. So I'd have 50 maps of 1000 items instead of one map of 50000 items.
10 years ago is 2009. Large web applications had been breaking out of intranets (where they'd lived for quite some time at that point) for years at that point. GMail was launched in 2004, Google Maps in 2005 (that's also the year "AJAX" was coined) the JS library war was done and over with (jquery, prototype, mochikit, mootools, dojo, YUI, … were all released between 2005 and 2006). IE7 was 3 years old, Google Chrome was reaching its first year, IE8 was just released.
You seem to be missing the crux of the issue I was trying to address — even today, loading 50.000 items into a frontend application would be a very, very niche edge case. 10 years ago even more so. Tacking on arbitrary reference points from Wikipedia doesn't change any of that.
> You seem to be missing the crux of the issue I was trying to address — even today, loading 50.000 items into a frontend application would be a very, very niche edge case.
It's a medium-sized inventory, if you need fast / offline access then loading it to the client makes a lot of sense. I'm sure there are plenty other things of which you can easily reach 50k, and that you'd want to index or cross-reference somehow.
I can think of plenty of tasks where a 2009 frontend will potentially be exposed to workloads in the 10k-100k elements range:
* Editing objects in a 3D scene(e.g. a AAA open-world game or a CGI film)
* Plotting events in a long-running log on a graph
* Scraping and presenting data from web sources
The common thread here is that you have most of your data in application memory but not necessarily in a formal database system, and so you shoulder the full burden of managing it properly. In most cases the solution is to define a real backend and filter it there, paginate or otherwise reduce the amount that gets presented because data at that scale won't be usable by humans to begin with. But sometimes you do have a reason to explicitly want a "big list of everything," and equally as often you end up with the big list of everything just by accident. It just comes with the territory of report generation tasks.
50k items seems actually quite small. Front loading data can make your app look a lot snappier at the cost of a slightly longer load time. It may be worth it in many situations.
It was a data visualization tool which let users make dashboards based on your data-warehouse. You could add filters to let users "slice-and-dice" the data in the dashboard.
Needless to say one of the measures had a large number of parameters. The dashboard in question allowed users to see the change in measurements of [X] over time where [X] was one of 50k poisons that the Environment Agency were measuring.
It a pretty reasonable question and - in browsers with O(n) hash lookup in their hashmaps - performed excellently.
Almost 10 years ago (2010) I built a virtual table scroller in javascript for editing spreadsheet of 100 columns by 10.000 rows in the browser.
I learned that just replacing the whole page's innerHTML by a concatinated string on each frame/mousewheel event is way faster then updating the cells content and position individually, and resuls in butter smooth scrolling.
For some context: at that time I was working on a Window Forms ERP tool whose planning module would regularly load and display several million items. It performed surprisingly well considering what it was doing.
We also regularly had grid containing millions of items (or snapshots thereof).
If you're building a library or framework, then just assume that worst-case performance will be encountered by your users, because you can't predict their use cases.
So for example, I almost always use associative arrays (maps) instead of lists. I actually really wish there a MAPP language because I view the map as a potentially better abstraction than the list in LISP, but I digress.
I also tend to use atomic operations instead of locks. Some good starting points for that are understanding how compare-and-swap (CAS) works, and also how functional programming with higher order functions and immutable variables works because that mindset greatly simplifies threading with no shared mutable state for the Actor model. Lazy evaluation is another good one. Also vector languages like Gnu Octave and MATLAB are good because they favor a level of abstraction above the bare-hands programming of C-style languages like C++ and Javascript so you tend to see that most algorithms are embarrassingly parallel at some level (especially the things we tend to think of as computationally expensive like multimedia processing).
Also (this may be controversial) but I think that poor performance can be politically motivated. For example, I run Safari with Javascript disabled so I can have thousands of tabs open (it's disabled as I write this). But when I disable Javascript in Chrome, performance grinds to a halt. You can try it right now on the Mac by force quitting Chrome with a bunch of tabs open and relaunching it from Terminal.app with:
open -a "Google Chrome" --args --disable-javascript
I don't know what causes it, but my guess is that Google either wrote some of the loops under the assumption that Javascript would always be on, or they had a blind spot in their implementation because so much of their business model depends on ads having dynamic behavior.
So when you're in a meeting and someone shouts down your concern about edge case performance because they don't see that as a priority, graciously humor them and then write your code the right way because you know that it doesn't take any longer than doing it the wrong way. You might catch some flack during code review so have a good excuse handy, something about trying it the other way but running into problems. Often I'll write the easy imperative solution in a comment above the simple functional solution or even put both solutions under a preprocessor directive or feature flag to leave it up to the team lead/project manager and have our keisters covered if/when something melts down.
Sounds like they’ll have to increase their leetcode requirements in the interviewing process. This is one of the “obvious fundamentals” that leetcode challenges screen for, right? /s
Well, now this show how polynomial time isn't necessarily any better than O(2^n), or exponential time in reality. Once the k grew bigger in O(n^k), our modern machine will still struggle to run it.
The point where polynomial time becomes impractical is far beyond the point where exponential becomes impractical. It's like for n=10 your smartphone can run an exponential algorithm instantly but for n=20 the fastest supercomputer on earth will take forever. Not true for polynomial.
O(N^2) is strictly better than O(2^N). Per the article, this algorithm fell over at 3 billion bytes of data, but wasn't especially noticeable at 300 million bytes of data.
Not really. The n is huge, there is an enormous difference between an exponential complexity like O(2^n). Many algorithms are O(N^2) and are perfect for their use case.
Look at the difference in order of growth. Just to give a very simplified comparison, your individual processor core can run about 10^9 simple instructions in a second.
Batch scripts themselves are O(n^2) because the shell closes and reopens them after each line (even comments or even blank lines), and the scan to get to line `n` takes O(n) time and disk bandwidth.
I definitely remember both reading about and experiencing this years ago, but I can't seem to find a source and it's driving me crazy (will search more later). I imagine variables would persist in the shell's memory. I'm talking about an implementation detail of the batch shell script parser/runner.
I recall that some people would put GOTO statements to jump across large comment blocks in the days where reading a few KB was noticeable.
It can be correct if you are talking about subprocesses/subshells of another batch script/command then yes, separate shell process will be spawned in each iteration.
No, I recall reading that the batch processor closes/re-opens the file, scans for n carriage returns where n is the current line number, executes that line, and repeats. These are the sources I found.
http://xset.tripod.com/tip3.htm "COMMAND.COM reads and executes batch files one line at a time; that means that it reads one line, execute it and rereads the file from the beginning to the next line." Perhaps this is a really old version of COMMAND.COM? Perhaps it's poorly stated but actually meant that it "rereads the file from the beginning of the next line to the next line".
https://docs.microsoft.com/en-us/windows/win32/fileio/local-... "Command processors read and execute a batch file one line at a time. For each line, the command processor opens the file, searches to the beginning of the line, reads as much as it needs, closes the file, then executes the line." This also seems to agree with my original claim.
I tested this with a batch script generated by
print("@echo off")
for i in range(1000):
for j in range(1000):
print("rem hello world")
print(f"echo {i}")
and ran it using COMMAND.COM in a Windows 95's VM. It appears to run in linear time. Either COMMAND.COM was fixed, or both sources are incorrect.
Seeking to a file offset is a constant-time operation (assuming no fragmentation), so re-opening the file and skipping to where you left off isn't that bad.
And you can certainly write self-modifying batch files, but only appending at the end is safe, not changing lines before you're currently executing.
>re-opening the file and skipping to where you left off
That's what Windows does now. But I think in the past it wasn't that way (as suggested by the sources). In my tests it appears that the command processor caches the byte offsets of each line, so it's possible to GOTO any line in the past without rescanning the whole file.
Things were working fine and performance was good. Then one day Windows Explorer suddenly hung with 100% CPU for couple seconds. This was one of the worst kind of bugs. There's no crash to pinpoint the problem. Things still work most of the times, just slowed down intermittently. Luckily I was able to catch a slowdown and deliberately crashed the process in time. The call trace stopped in the bubble sort function. I immediately kicked myself - it's the classic case of O(n^2) blowup. The cache entries had been scaled up to couple thousands items and the exponential O(n^2) blowup to tens of million of iterations was having a real impact. I switched to merge sort and performance was back to normal.
Edit: I picked merge sort because the worst case was O(n log n), unlike quick sort whose worst case was O(n^2). Once burnt, needed to be extra careful with edge cases.