Let's begin with Squid. I can't speak to the performance of Squid. Never measured it. Don't care. I do care about how it is described in this article.
We're told that you tell Squid how much RAM it may use, and how much disk, and it honors those constraints.
Then we're told that, as a result, Squid page thrashes and so it performs badly.
A well-written program described that way page thrashes because it is running on a poorly administered system. The entire point of writing a program like squid to let you say how much RAM to use, is that you then use Squid on a machine on which you can ensure that it will not ever need to page.
Some programs (and this was as true in the very earliest days of virtual memory as it is today) are purposefully designed so that they will run correctly even if they page, but will run very fast if they do not page. Server software is often in this category because server software often runs on dedicated hardware where the hardware budget is large enough.
If you look at a server program (like squid) running on some dedicated server and find that it is paging a significant amount, you don't just decide "that program is poorly written" --- you must consider the possibility that the machine is poorly configured.
When writing a program, you've a choice: manage your own working set and write assuming you won't be page-thrashed? Or punt it to the underlying OS. Which is better? The answer really depends on what you know about the memory-use patterns of your program. If you know nothing, consider leaving it to the OS. If you know the OS's paging policies are a good fit, leave it to the OS. If you can beat the OS's policies and count on not being page thrashed (and performance is worth the work) --- then write like it's 1975 for goodness sake.
This essay is written by Poul-Henning Kamp ( http://en.wikipedia.org/wiki/Poul-Henning_Kamp ) so he has enough experience with OS level code to comment on this particular area. I guess it can depend on what kind of problem you are trying to solve. In the case of Varnish vs Squid, his approach worked better.
Benchmarks are freaking hard to perform and interpret and, sorry, I don't buy the reputation of "lots of people" either. Not saying you're wrong, just that you haven't convinced.
This is why databases have bufferpools. The database knows best what information stored in memory is useful. Look at the cache hit from a database bufferpool vs. an OS cache. It's abysmal. The OS makes the assumption that, if I got this page then I'll probably want the next 5 pages and pulls em in. That's great for sequential access because it's likely that you WILL need those pages but for random access it's horrid.
And that's just disk. In the case of redis, the last thing you want is the OS determining what data is needed but isn't. At worse case, the data you need is spread across multiple pages and you can't really swap anything out. Systems that implement thier own memory management on top of the OS are typically ones that know the hot spots in the data in the first place.
"How do we cope? Avoid memory operations if at all possible."
As you might imagine, a caching proxy server is extremely well suited for this, as it's basically just shuttling data directly out of it's store and over the socket.