This said, the article feels quite arrogant. As other commenters say, the idea behind the optimization is nothing new, just some basic concepts well applied. Which is a job well done, but probably doesn't justify the tone (that at least I perceive) from the article.
The author is misguided if he thinks that an algorithm being called 'optimal' will have the best result in his particular use case. It usually means that the algorithm has optimal asymptotic runtime. In particular, asymptotic analysis completely ignores caching issues.
That said, the technical details are still worth reading if you have the time. I also agree with the author that CS programs are somewhat losing touch with reality.
And he's not putting himself above falling into that trap either:
> So why are you, and I, still doing it wrong?
Also, this is an article that was published in an ACM magazine, under the header of "The Bike Shed" - which I would assume is a "provocative debate"-type of column. Keep both that target audience and intent behind the article in mind when evaluating the overall tone of it.
> (The Imgur previously featured in this story doesn’t credit him at all, which is some bogus, Fat Jew-style shit.)
Being unaware that "Fat Jew" was the handle of someone who got internet famous but was caught stealing other people's jokes, I was really puzzled for a second what could have warranted that random anti-Semitic statement.
Usually, I've seen using swap-space met with derision. I'd only expect page-faults to really happen when using a memory-mapped file. Again on the note of caches, I'd expect more fanfare be made regarding the TLB when accessing a different page than actual page-misses.
Also, he’s not using swap as such, he’s using memory-mapped IO. If you have to hit the disk, you have to hit the disk, might as well try to do it in an OS-aware fashion.
TFA's context is a larger-than-memory data structure mapped from disk, where heap block size = page size.
Agreed. Its tone is very off-putting. From the article:
> Not just wrong as in not perfect, but wrong as in wasting half, or more, of your performance.
I wish articles would move away from this use of 'wrong'. Neither of those definitions are what I understand when I read 'wrong'. If you mean sub-optimal than use that word instead. Feels like much of the rest of the article is built around this misused terminology.
The only thing I don't understand is why his data structure for finding the least-recently-used item wasn't just a linked list - where each time you touch an object you move it to one end of the list, and when you want to discard an object you remove from the other end. That'd be o(1) and would cause, at most, one page fault per update and none for the add and remove cases (since the two ends would surely be paged "in"). And it'd be a lot easier than his fancy thing. I guess I missed something.
It might be worth pointing out that NVMe or Intel Optane would solve his problem (1) without his fancy data structure too. Some might go as far as to say that while his fancy data structure is out dated now, the stuff that the CS departments teach became correct again. Maybe St Andrews had the right idea all along.
1) Or huge-pages or NUMA node pinning, to some extent.
I counter with similarly anecdotal evidence from 2012. :)
While we had a normal "Algorithms and Datastructures" course that taught the classical theory, there was also a separate course about "Algorithm Engineering" which was (among other things) about performance in the presence of the memory hierarchy. The upshot is that there are different cost models that you have to use in the analysis of your algorithms (e.g., IO complexity), but the techniques that you use for the analysis are the same.
To be honest, I think the system is working as intended here. Undergraduate courses are not really designed to push people to the forefront of current research. They're intended to give you the basic skills you need to catch up to modern developments yourself.
I think what's happening here is that people remember "optimality proofs" from their algorithms courses and take that to mean that there is nothing more to be done (that's the sentiment I got from this blog post anyway). The problem with this is that in this context "optimality" referred to a technical definition - optimal with respect to some specific simplified model of computation. The result is probably not optimal with respect to a more complicated model!
Because he is not talking about expiring the Least Recently Used item, but to expire the items whose "cache time" has expired. That is, when you add a new item to the cache, you add it with some lifetime (e.g.: an integer). Then you have some clock (another integer) and what Varnish wants to do is expire all items whose lifetime is lower than the current clock.
With a (doubly-)linked list you would have to traverse the list on insertion to place the item in the correct spot...
Nitpick, but your list-based LRU cache’s operations wouldn’t be in o(1) but rather O(1) – little oh means strictly slower growth, in this case sub-constant. Probably just a typo though. And it could be further optimised for locality by using a cyclic buffer instead of a linked list (-> no pointer chasing + sequential access), but see kilburn’s answer (the structure in the paper solves a different problem)
I disagree. CS algorithms is about the logical/mathematical understanding of algorithms. If you want to understand hardware implications, you are generally taught that in an comp architecture class. Blaming algorithms class for the lack of consideration for real world hardware issues is like blaming the computational class because the turing machine doesn't factor in hardware issues. You abstract it out to make general logical observations.
I’m not arguing against teaching classical algorithm design and analysis, they’re both incredibly important imho. But we should stop ignoring the shortcomings of the models used, hand-waving them away with “the rest is just a bit of engineering”. It’s not.
(You may have heard the terms "cache aware" and "cache oblivious" used to describe algorithms. These terms come out of theoretical CS, describing algorithms with good performance on these "RAM with cache" machines when the cache size is known or unknown ahead of time, respectively.)
The rough gist is that if the only thing you need to be able to do is insert and pop-min, then you can buffer up roughly a page's worth of incomplete work for each page, getting amortized performance that tracks the optimal number of cache transfers at each level of the cache.
Further, in the paper above you are introduced (in the related work) to the citation:
"A. LaMarca and R. E. Ladner. The influence of caches on the performance of heaps. Journal of Experimental Algorithmics, 1, 1996."
which describes the author's "B-Heap" (at least, it describes a b-ary heap where you pick b to match your cache line/page size).
I'm guessing the problem isn't that academics don't understand these things so much that practicing programmers don't always read very much.
I wonder how legitimate this concern is from an environmental angle. On one hand a lot of code in the world probably has much worse performance ratios than than the one described in the article -- using python is sufficient to cause an order of performance loss. On the other hand data centers are using more renewable energy sources and reducing their carbon output.
Is this only a wasted money problem? How worth while is it to write "green code"? Back of the envelope calculations on how much developer comfort costs would be interesting.
* Once we leave a VM page through the bottom, it is important for performance that both child nodes live in the same VM page, because we are going to compare them both with their parent.
* Because of this, the tree fails to expand for one generation every time it enters a new VM page in order to use the first two elements in the page productively.
Is the point just that you want to pack 8 elements into each page, but if the tree expands the first generation of a page, then you end up only being able to pack in 6?