

What's wrong with 1975 programming? - wingo
http://www.varnish-cache.org/trac/wiki/ArchitectNotes

======
dasht
This rant is a mixed bag of decent advise and dismal advice.

Let's begin with Squid. I can't speak to the performance of Squid. Never
measured it. Don't care. I do care about how it is described in this article.

We're told that you tell Squid how much RAM it may use, and how much disk, and
it honors those constraints.

Then we're told that, as a result, Squid page thrashes and so it performs
badly.

A well-written program described that way page thrashes _because it is running
on a poorly administered system_. The entire point of writing a program like
squid to let you say how much RAM to use, is that you then use Squid on a
machine on which you can ensure that it will not ever need to page.

Some programs (and this was as true in the very earliest days of virtual
memory as it is today) are purposefully designed so that they will run
correctly even if they page, but will run very fast if they do not page.
Server software is often in this category because server software often runs
on dedicated hardware where the hardware budget is large enough.

If you look at a server program (like squid) running on some dedicated server
and find that it is paging a significant amount, you don't just decide "that
program is poorly written" --- you must consider the possibility that the
machine is poorly configured.

When writing a program, you've a choice: manage your own working set and write
assuming you won't be page-thrashed? Or punt it to the underlying OS. Which is
better? The answer really depends on what you know about the memory-use
patterns of your program. If you know nothing, consider leaving it to the OS.
If you know the OS's paging policies are a good fit, leave it to the OS. If
you can beat the OS's policies and count on not being page thrashed (and
performance is worth the work) --- then write like it's 1975 for goodness
sake.

~~~
qw
I'm not saying that you are wrong about generic cases, but in the case of
Squid I think the author has done proper analysis before deciding on an
architecture

This essay is written by Poul-Henning Kamp (
<http://en.wikipedia.org/wiki/Poul-Henning_Kamp> ) so he has enough experience
with OS level code to comment on this particular area. I guess it can depend
on what kind of problem you are trying to solve. In the case of Varnish vs
Squid, his approach worked better.

~~~
dasht
Proof by reputation doesn't cut it here. He needs to demonstrate either that
it is impossible to reasonably configure Squid to do very little paging or, in
the alternative, that a non-paging instance of Squid performs unimpressively
compared to Varnish -- at least to support his stated thesis. This is not to
say either Squid or Varnish is better, only that his criticisms of Squid
appear to be unfair or naive or very poorly stated on their face.

~~~
SpikeGronim
Lots of people have run benchmarks that show exactly that - non-paging, in
memory working set Varnish clobbers Squid. The difference is even more
noticeable when the working set is larger than memory. Most of those
benchmarks are micro-benchmarks that can be criticized. To the extent that we
have data, Varnish is much faster than Squid. It's not just PHK's reputation.

~~~
dasht
In all seriousness, can you please point me ideally to a peer reviewed
comparison from a decent forum or at least what you consider to be some of the
best examples outside of the reviewed publication stuff?

Benchmarks are freaking hard to perform and interpret and, sorry, I don't buy
the reputation of "lots of people" either. Not saying you're wrong, just that
you haven't convinced.

------
lusis
The fact of the matter is that OS caches are shitty for anything other than
what the OS does.

This is why databases have bufferpools. The database knows best what
information stored in memory is useful. Look at the cache hit from a database
bufferpool vs. an OS cache. It's abysmal. The OS makes the assumption that, if
I got this page then I'll probably want the next 5 pages and pulls em in.
That's great for sequential access because it's likely that you WILL need
those pages but for random access it's horrid.

And that's just disk. In the case of redis, the last thing you want is the OS
determining what data is needed but isn't. At worse case, the data you need is
spread across multiple pages and you can't really swap anything out. Systems
that implement thier own memory management on top of the OS are typically ones
that know the hot spots in the data in the first place.

------
jefffoster
<http://antirez.com/post/redis-virtual-memory-story.html> is an interesting
contrast to this story.

------
ctdonath
His summary advice:

"How do we cope? Avoid memory operations if at all possible."

WTF?

~~~
adobriyan
No useless copying -- less pagefaults. Less pagefaults -- less trashing.

------
rbranson
This advice is best taken for server software where allocating memory and
copying data would be considered a very expensive part of the pipeline. In the
vast majority of scenarios, any I/O wait or mild to serious computation that
has to be done using this data will vastly outweigh the major advantage of
memory-mapped files: not having to allocate and copy.

As you might imagine, a caching proxy server is extremely well suited for
this, as it's basically just shuttling data directly out of it's store and
over the socket.

------
krosaen
in the first part, he says, "don't bother doing manual memory management,
you'll be fighting with virtual memory, that's programming like its 1975". in
the second part he says, "do a different sort of manual memory management to
avoid contention across processors". how long until the latter advice is
"programming like its 2006" :)

------
wccrawford
Thanks. I was wondering where the title from that other one came from, even
though I remember what the rant was about.

