
Introducing Varnish Massive Storage Engine - Jgrubb
https://www.varnish-software.com/blog/introducing-varnish-massive-storage-engine
======
damagednoob
"Varnish allocate[s] some virtual memory, it tells the operating system to
back this memory with space from a disk file. When it needs to send the object
to a client, it simply refers to that piece of virtual memory and leaves the
rest to the kernel.

If/when the kernel decides it needs to use RAM for something else, the page
will get written to the backing file and the RAM page reused elsewhere.

When Varnish next time refers to the virtual memory, the operating system will
find a RAM page, possibly freeing one, and read the contents in from the
backing file.

And that's it. "

[https://www.varnish-cache.org/trac/wiki/ArchitectNotes](https://www.varnish-
cache.org/trac/wiki/ArchitectNotes)

I'll try and hold back the snark but I find it interesting that after
attacking '1975 programming' and Squid's deficiencies, here we are 8 years
later and maybe the kernel doesn't know best.

~~~
nkurz
You have a strong point. I was more struck by the incongruity of the following
statement: "We started out using a memory mapped file to store objects in. It
had some problems associated with it and was replaced with a storage engine
that relied on malloc to store content. While it usually performed better than
the memory mapped files performance suffered as the content grew past the
limitations imposed by physical memory."

So it looks like they gave up on the "kernel knows best" approach quite a
while ago. But then they show a graph, where the older mmap() approach is more
than twice as fast as the malloc() approach across the entire range of the
graph. They explain this in a sentence below the graph saying "Malloc suffers
quite a bit here as the swap performance on Linux is rather abysmally bad. "
Well, yes. But presumably that is the case we're interested in, as a cache
that has room for everything is a much easier problem.

So what are we to make of the earlier contention that malloc() trumps mmap()?

~~~
wmf
Varnish was originally designed to run on FreeBSD (as PHK works on both
projects). Have they changed to primarily target Linux, perhaps in response to
market demand?

~~~
perbu
~ 99% of all varnish instances run on Linux. FreeBSD is still important.
Varnish Software doesn't currently have any customers who are planning on
staying on FreeBSD. They are all migrating to Linux.

~~~
vorador
I'm a linux user but I've always been interested in running my projects on
FreeBSD. Could you give a couple reasons why those people are migrating?

~~~
perbu
FreeBSD admins are hard to come by. Same with software; try getting something
hip running on FreeBSD these days. Even better, try getting a support contract
on it, it will not be trivial.

However, the quality of the sysadmins that prefer FreeBSD is pretty high,
though. And FreeBSD is pretty neat.

~~~
vorador
Thanks!

------
fiatmoney
"Assumption 1. Using write() instead of implicitly writing to a memory map
would lead to better performance."

I've seen this mentioned in the context of RocksDB; but contradicted by e.g.
SQLite. The case for mmap has always been that one avoids the overhead of a
system call & some double-copying, and in either case it just dirties the page
cache and is only "really" written in periodic flushes (assuming it's not
writing via direct IO). Can someone explain what the bottleneck is on the mmap
side and why write() might be faster?

~~~
cbsmith
Because most POSIX systems, including Linux, provide even almost nothing to
help with hinting about page management, but write() (and the implicit copy)
provides better ways to be explicit about that.

~~~
fiatmoney
Could you go into more detail? mmap lets you specify some flags about access
patterns; write() doesn't give you any such hints.

~~~
woadwarrior01
Correct me if I'm wrong. I think he's talking about posix_fadvise() [1], which
lets you pre declare access patterns for an fd. And you're talking about
madvise() [2].

[1]:
[http://linux.die.net/man/2/posix_fadvise](http://linux.die.net/man/2/posix_fadvise)
[2]: [http://linux.die.net/man/2/madvise](http://linux.die.net/man/2/madvise)

~~~
cbsmith
...and madvise hardly does anything.

------
skrause
Why does a site of a web site acceleration software take almost 10 seconds to
load?

~~~
mortenlarsen
Looks like it is hosted in Norway. Loaded instantly for me, but I am close by
in Denmark. So it is probably the RTT that is affecting your page load time.

------
jallmann
Regarding three tier caching between RAM/SDD/HDD: isn't this exactly what ZFS
L2ARC is supposed to do? Relying on that seems closer to the Varnish
philosophy of leaving as much to the underlying system as possible. Or would
that encounter the same bottlenecks they are trying to solve right now? And if
so -- how?

~~~
wmf
Yes, the "1995 programming" solution would be to put each cached object in a
file and the let the filesystem manage it. I'm sure there's a good reason why
they don't do that, although it would be interesting to hear what that reason
is.

~~~
Rapzid
ZFS has a very advanced caching mechanism, I wonder how it would compare to
standard file systems or the kernel's paging for something like this.

It's also pretty common for people to enable file system caching plugins in
Wordpress, Drupal, etc. and forget to leave the system enough RAM to actually
do its business. D'oh.

------
lifeisstillgood
I am reminded of the slow programming article a while back. How rare is it for
companies to say "take a year, see if it works"?

Really good performance needs doing things differently, not the same thing
faster. Yet most organisations don't want to try something different or give
the space to try it.

~~~
wmf
You can promise that it will work ahead of time, work on it for a year, and
then exercise your political skills if it doesn't.

------
ForFreedom
What is better for caching: Varnish or Squid? [2012]
[http://www.quora.com/What-is-better-for-caching-Varnish-
or-S...](http://www.quora.com/What-is-better-for-caching-Varnish-or-Squid)

Varnish was built for caching web apps. Squid is a forward proxy that can be
configured to work as a web app caching program. So, when Varnish was designed
we where able to disregard a lot of stuff that isn't needed when caching in
reverse mode. On the other hand, squid has been around for ages and is a very,
very mature product with a very well known set of strengths and weaknesses.
Varnish is only 5 years old.

------
101914
So what happens if you put the "disk file" on tmpfs?

Personally, I have more than enough RAM now and I no longer need virtual
memory. I do not need a "disk" or other secondary storage in order to retrieve
and consume data.

I consider virtual memory a relic from an earlier era of limited computing
resources, like "user accounts" designed for an era of time-limited use of
prohibitively expensive, shared computers.

We now all have our own _personal_ computers and GB's of RAM, but we still
have use kernels with builtin solutions designed to address issues of scarce,
expensive, shared computers and scarce, expensive RAM.

~~~
perbu
If you have enough memory you can just have Varnish store stuff in memory.
However, there are quite a few petabyte datasets out there and there will be
quite a few years before we can all stick those sets in memory.

------
dantiberian
I'd be curious to know why they didn't just use one of the existing cache
algorithms from the literature. They talk about it being close to them, but
not why they chose to go their own way. I suspect it was because their
algorithm had better mechanical sympathy. There's a number of good choices at
[http://en.wikipedia.org/wiki/Cache_algorithms#Examples](http://en.wikipedia.org/wiki/Cache_algorithms#Examples),
theirs sounds closest to ARC.

~~~
zzzcpan
I presume that ARC cannot be used because of the patent for it.

------
amelius
Speaking of memory mapped files: how easy is it to allocate and store various
data structures into mmapped files in languages such as C, C++, Rust, etc.?

If it is difficult, then why is this the case? Are languages lacking in this
respect?

~~~
xxxyy
You just have to override the new operator (C++) or use a different malloc
(C). All you need is to replace brk() with an appropriate mmap(). Don't know
how it goes for Rust.

C++ is especially appealing, because you can embed file-backed mmap
allocations in chosen classes by overriding their new operator. So you could
for example create a float array class that automagically allocates itself in
a file-backed mmap region by a simple _new Array()_ call.

Edit: Python's numpy has a file-backed array:
[http://docs.scipy.org/doc/numpy/reference/generated/numpy.me...](http://docs.scipy.org/doc/numpy/reference/generated/numpy.memmap.html)

~~~
amelius
well, the problem is that the mmapped file is based at a different address
every time you mmap it. (in realistic situations, and the general case)

thus, every pointer dereference needs a base address. how is the support for
this?

~~~
xxxyy
This is not an easy way to serialize objects, it is merely a way to help
virtual memory manager recognize portions of memory that can be saved to disk
in the first place.

.so files are loaded through mmap, and the pointers problem is solved there
through the mechanism of relocations. But please don't write raw memory
objects to disk, use Google Protobuf or ASN.1

------
kolev
Another payware from Varnish?

~~~
wmf
It's almost like Varnish is a company with paid developers.

~~~
kolev
There are much more complex projects that add tremendously more value and they
don't charge a penny. Varnish and Nginx are doing it wrongly and their flawed
models will only create competitors. I can understand charging for support and
scale, but charging for features is stupid.

