

So what's wrong with 1975 programming? (2006) - mooreds
https://www.varnish-cache.org/trac/wiki/ArchitectNotes

======
rodgerd
All that slagging off of Squid looks pretty stupid now that Varnish are
implementing... their own OS independent disk and memory management! (see
[https://www.varnish-software.com/blog/introducing-varnish-
ma...](https://www.varnish-software.com/blog/introducing-varnish-massive-
storage-engine)).

Maybe those Squid developers weren't such knuckle-dragging morons after all.

~~~
BlackAura
Unwarranted hostility aside...

The fact that Varnish changed over the years neither invalidates this article,
nor vindicates Squid's design.

On any remotely modern system (say, 2006 or later), Squid's design is absurd.
The critique in this article is spot on. Squid basically pretends that the
operating system's virtual memory system and disk cache simply don't exist,
and spends it's time working against them. This does cause exactly the kind of
problems detailed in the article.

Of course, that's because Squid is not Varnish. Squid was designed a long time
ago, with maximum portability in mind, and intended to run on operating
systems with very poor VM and disk cache systems. With that in mind, Squid's
design makes sense. It just doesn't make sense on newer systems.

In Varnish, all of this work was delegated to the operating system. This works
very well. It's certainly a lot simpler than Squid, in addition to being a lot
faster.

As long as most of your hot data can fit in the disk cache, at least. The
infrequently used parts, which could well be a lot larger than the frequently
used parts, can be kicked out to disk by the OS, and although reading them
back in incurs a performance penalty, it's not that bad. It only really
affects less commonly accessed data, and doesn't interfere with everything
else.

The original varnish design works great for that. It's less good if your
entire working set fits in RAM (in which case, the slightly newer malloc-based
system is faster because it has lower overhead, but becomes much slower if you
really need to swap).

Varnish starts to fall down if your working set doesn't fit in RAM (in which
case, you're doomed regardless), or if the total cache is really huge (think
somewhere in the terabyte range).

The new storage engine mostly just re-organizes the existing mmap-based
caches. It has better cache eviction algorithms, which give a much higher
cache hit rate, and much lower internal fragmentation. That alone accounts for
nearly all of the performance benefit.

The only I/O change I can find is that it uses the write syscall to write
newly cached objects to the file directly, rather than writing to the mmap
file. That allows them to replace the contents of those pages atomically - the
OS will just drop them into the disk cache, rather than potentially having to
re-read them from disk if they happen to not be in the cache.

All of the reading, memory management and I/O is still done by the VM and disk
cache systems of the OS. That hasn't changed.

~~~
acqq
Thanks for the details. It still stays that the original Varnish design was
also less than optimal, even for the computers of 2006, and that the more
recent changes made it using the hardware better.

------
tambourine_man
Previously, on HN:
[https://news.ycombinator.com/item?id=4874304](https://news.ycombinator.com/item?id=4874304)

~~~
SwellJoe
And, even earlier (also with excellent commentary and occasional rebuttal):
[https://news.ycombinator.com/item?id=1554656](https://news.ycombinator.com/item?id=1554656)

------
pjc50
It's still interesting to read about people minimising the number of syscalls
and memory copy/allocation actions. When I was working at Zeus on their
webserver back in 2001, it had a lot of effort devoted to exactly that.
Strings referring to chunks of header rather that malloc-and-copy. A stat()
cache to avoid touching the disk.

------
simula67
God, what a mess we are in.

I can't wait for memristors to become commercialized and get TBs of register
speed memory on every processor core. None of this cache, paging, NUMA
nonsense.

~~~
dietrichepp
This is not a problem solved by memristors. The more memory you have, the more
addressing and multiplexing you need to address it. The delay in a multiplexer
grows logarithmically with the number of inputs. With a cache, it is even
worse, because you have to address it by the real address, not the address in
cache. So there will always be a hierarchy of speeds, unless you can figure
out a completely different way to design a multiplexer.

In the best case scenario, memristors give us TBs of NVRAM.

~~~
simula67
I am not a hardware designer, but are you saying one large 64 bit multiplexer
to access the whole memory would be impractically slow ? Even if we don't get
register speed, it would simplify software design, wouldn't it ?

~~~
dietrichepp
Yes, it would be slow and large. Because it's large, you'd get less memory in
the same area. Another factor is that RAM and CPUs are usually on completely
different dies to begin with, which are manufactured with somewhat different
processes so you can't just copy and paste them onto the same chip.

Incidentally, this is what computers looked like 30 years ago. You could have
a CPU with a bunch of address and data pins wired up to a RAM chip that would
give you whatever address you wanted right away.

Loosely speaking, a modern computer still works the same way, but memory
speeds haven't kept up with CPU speeds. So, to make our software run faster,
we put layers of smaller, faster memory between the CPU and the larger, slower
memory. But the hardware hides all of this from the software: you don't have
to care, unless you want to optimize things. So we have registers, L1 cache,
L2 cache, RAM, SSDs, HDDs, and the network. You can write a program today and
all seven layers of caching might be mostly transparent, some more so than
others.

A lot of this complexity is in the hardware, and other parts of it are in the
OS. Application developers have it easy.

------
Qantourisc
Or you just use mlock to prevent a civil war ? (Prevent the kernel from paging
it out)

~~~
dietrichepp
Locking pages to try to improve performance is a dangerous game.

