Hacker News new | comments | show | ask | jobs | submit login
What's wrong with 1975 programming? (varnish-cache.org)
41 points by wingo 2256 days ago | hide | past | web | 12 comments | favorite

This rant is a mixed bag of decent advise and dismal advice.

Let's begin with Squid. I can't speak to the performance of Squid. Never measured it. Don't care. I do care about how it is described in this article.

We're told that you tell Squid how much RAM it may use, and how much disk, and it honors those constraints.

Then we're told that, as a result, Squid page thrashes and so it performs badly.

A well-written program described that way page thrashes because it is running on a poorly administered system. The entire point of writing a program like squid to let you say how much RAM to use, is that you then use Squid on a machine on which you can ensure that it will not ever need to page.

Some programs (and this was as true in the very earliest days of virtual memory as it is today) are purposefully designed so that they will run correctly even if they page, but will run very fast if they do not page. Server software is often in this category because server software often runs on dedicated hardware where the hardware budget is large enough.

If you look at a server program (like squid) running on some dedicated server and find that it is paging a significant amount, you don't just decide "that program is poorly written" --- you must consider the possibility that the machine is poorly configured.

When writing a program, you've a choice: manage your own working set and write assuming you won't be page-thrashed? Or punt it to the underlying OS. Which is better? The answer really depends on what you know about the memory-use patterns of your program. If you know nothing, consider leaving it to the OS. If you know the OS's paging policies are a good fit, leave it to the OS. If you can beat the OS's policies and count on not being page thrashed (and performance is worth the work) --- then write like it's 1975 for goodness sake.

I'm not saying that you are wrong about generic cases, but in the case of Squid I think the author has done proper analysis before deciding on an architecture

This essay is written by Poul-Henning Kamp ( http://en.wikipedia.org/wiki/Poul-Henning_Kamp ) so he has enough experience with OS level code to comment on this particular area. I guess it can depend on what kind of problem you are trying to solve. In the case of Varnish vs Squid, his approach worked better.

Proof by reputation doesn't cut it here. He needs to demonstrate either that it is impossible to reasonably configure Squid to do very little paging or, in the alternative, that a non-paging instance of Squid performs unimpressively compared to Varnish -- at least to support his stated thesis. This is not to say either Squid or Varnish is better, only that his criticisms of Squid appear to be unfair or naive or very poorly stated on their face.

Lots of people have run benchmarks that show exactly that - non-paging, in memory working set Varnish clobbers Squid. The difference is even more noticeable when the working set is larger than memory. Most of those benchmarks are micro-benchmarks that can be criticized. To the extent that we have data, Varnish is much faster than Squid. It's not just PHK's reputation.

In all seriousness, can you please point me ideally to a peer reviewed comparison from a decent forum or at least what you consider to be some of the best examples outside of the reviewed publication stuff?

Benchmarks are freaking hard to perform and interpret and, sorry, I don't buy the reputation of "lots of people" either. Not saying you're wrong, just that you haven't convinced.

The fact of the matter is that OS caches are shitty for anything other than what the OS does.

This is why databases have bufferpools. The database knows best what information stored in memory is useful. Look at the cache hit from a database bufferpool vs. an OS cache. It's abysmal. The OS makes the assumption that, if I got this page then I'll probably want the next 5 pages and pulls em in. That's great for sequential access because it's likely that you WILL need those pages but for random access it's horrid.

And that's just disk. In the case of redis, the last thing you want is the OS determining what data is needed but isn't. At worse case, the data you need is spread across multiple pages and you can't really swap anything out. Systems that implement thier own memory management on top of the OS are typically ones that know the hot spots in the data in the first place.

http://antirez.com/post/redis-virtual-memory-story.html is an interesting contrast to this story.

His summary advice:

"How do we cope? Avoid memory operations if at all possible."


No useless copying -- less pagefaults. Less pagefaults -- less trashing.

This advice is best taken for server software where allocating memory and copying data would be considered a very expensive part of the pipeline. In the vast majority of scenarios, any I/O wait or mild to serious computation that has to be done using this data will vastly outweigh the major advantage of memory-mapped files: not having to allocate and copy.

As you might imagine, a caching proxy server is extremely well suited for this, as it's basically just shuttling data directly out of it's store and over the socket.

in the first part, he says, "don't bother doing manual memory management, you'll be fighting with virtual memory, that's programming like its 1975". in the second part he says, "do a different sort of manual memory management to avoid contention across processors". how long until the latter advice is "programming like its 2006" :)

Thanks. I was wondering where the title from that other one came from, even though I remember what the rant was about.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact