
What's wrong with 2006 programming? - antirez
http://antirez.com/post/what-is-wrong-with-2006-programming.html
======
wmf
Just to amplify his point, if you want your program to take page faults as PHK
suggests, it _has_ to be multithreaded. If you choose event-driven concurrency
you can't afford to take page faults in mmap() or read(). When you make the
threads vs. events decision you're implicitly making a bunch of related
decisions about I/O and scheduling as well; a hybrid approach (like using
events and mmap) won't work well.

~~~
nostrademons
You can use events + mmap, you just need to factor the paging latency into
your design. Normally, this might mean mmapping a chunk of data at startup,
touching it all so that it's resident in RAM, and _then_ beginning to serve
queries, keeping an eye on your total resident set so that it never pages out.

~~~
rbranson
What would be the point of being able to spill to disk if you've got to keep
everything in RAM? Simple serialization?

~~~
nostrademons
The point is zero-copy on load, not being able to spill to disk. Most high-
performance, scalable servers I've seen ignore virtual memory entirely and
kill (+ restart) the process if it exceeds the physical memory available on
the machine. Yes, that means they use pre-1960s technology; sometimes, the
price of performance is ignoring the programming conveniences we've come up
with in the last 50 years.

~~~
rbranson
Most applications need to perform some computations with data that is read
from the backing store. Even simple sorts and searches will vastly outweigh
the cost of an extra memcpy. Honestly, an HTTP cache is sort of a perfect case
for the way in which Varnish was implemented. There is very little actual
processing, if any, that needs to be done with the data that's read from disk.
It just needs to be read from the backing store and shuttled over the socket
as fast as possible, with very little friction. An extra memcpy or two matter
in the Varnish scenario.

------
fleitz
Stop thinking of RAM/disk/etc as storage systems and start thinking of them as
retrieval systems. Then stop thinking of your data costs as $/GB (storage
systems) and start thinking of your data costs as $/(IO/sec/GB) (retrieval
systems).

I know everyone these days seems to think that removing a structured query
language parser from a database makes every other problem go away, but
realistically RDBMS vendors spend millions of dollars trying to fix this exact
problem. It's called cache invalidation and it's a hard problem to solve in a
general way.

SSDs are just a midpoint in the performance trade-off game.

The OS is the worst at this, DBs are somewhat better, but realistically if you
want serious performance out of your application you need to make those
choices for yourself, and use all strategies where appropriate RAM for records
you need instantly (memcache,redis,mongodb) SSDs for the stuff you can't
afford to keep on an SSD And hard drives for stuff you can't afford to keep on
an SSD.

What you need to think about is the value of your data in dollars per IO/sec
per DB ($/(IO/sec/GB), if the amortized value of that data exceeds the
amortized cost of the retrieval system then buy it. Focus on increasing the
value of your data, not reducing the costs of it's retrieval as that will drop
by 1/2 every 18 months anyway. Alternatively, change your business model so
you are going short on IO/sec/GB, (eg. pre-sell storage so that when you need
to buy it you can do so cheaply)

What I'm trying to say is that the value of a picture is worth more to Flickr
than it is to Facebook, thus Flickr will have an easier time building it's
retrieval systems than Facebook because of the costs involved. That's why
Facebook had to write their own filesystem for retrieving pictures.

I'd bet that any commercial DB will blow rings around redis/mongo/etc if you
had your persistent store as a RAM disk and used hard drives for the
transaction log. The cost of a SQL Server license is negligible if you're
going to buy a server with $200,000 worth of RAM in it. If your data is
valuable enough you could just keep everything in SRAM (L1/L2 cache) and buy
processors just for the cache.

~~~
jasonwatkinspdx
This has been well understood since antiquity (in the CS world at least). Read
Jim Gray's "The 5 minute rule" and the more recent papers that cite it. Most
likely the access frequency of your objects is not such to demand them
residing in L2.

Ultimately there's no need to use a commercial database either, as there are
compelling open source alternatives, though if your needs are very specific, a
commercial database may be your best tool.

~~~
mkramlich
> This has been well understood since _antiquity_ (in the CS world at least).

Yes I believe it was first Cicero that pointed this out. Or perhaps even
Aristotle. ;)

------
scott_s
_Again, the kernel will use a simple LRU algorithm, where the granularity is
the page._

I don't think it's accurate to describe any performance critical part of the
Linux kernel as "simple." For an overview of the page replacement policy, see
<http://kerneltrap.org/node/7608>. I wondered if CLOCK-Pro [1, 2] had made it
into the kernel yet, but it looks like it has not.

This author makes compelling arguments for implementing application level
paging. But the nice thing about doing systems work is we never have to rely
on arguments alone to evaluate something - show me numbers.

[1] <http://linux-mm.org/ClockProApproximation>

[2] [http://www.cse.ohio-
state.edu/~fchen/paper/papers/usenix05.p...](http://www.cse.ohio-
state.edu/~fchen/paper/papers/usenix05.pdf)

~~~
bch
There was no mention of Linux in the article. This system also runs on the
various *BSDs, Solaris, MacOS...

~~~
scott_s
Then it's worth noting that CLOCK-Pro is used in NetBSD.

My point here is that a lot of people have spent a lot of time working on the
page replacement problem. I am very open to the idea that an application can
beat the page replacement policy in the underlying kernel, for a variety of
reasons. But: numbers. Always evaluate. The implementation of these algorithms
in practice is always more subtle than our high level understanding, so we
need to do real performance comparisons to know if we actually improved
anything.

(I'm getting my information from the algorithm's author's page:
<http://www.ece.eng.wayne.edu/~sjiang/> Jiang graduated from William and Mary
while I was a young grad student there.)

~~~
dasil003
I'm a bit torn on this. I realize that you need to benchmark to really prove
anything, and that optimization intuitions are often wrong, but on the other
hand, you really need a wide variety of world benchmarks to approach any form
of real world "proof". In the early implementation stages, I think clear
thinking such as provided in this article is probably more important than
spending 50 times as long setting up an array of benchmarks. Hopefully
Salvatore has done some micro-benchmarks during development to guide his
thinking, but even if not, it's hard to refute the thinking in this article.

If you think about the problem space of Redis vs Varnish, it's intuitively
obvious that Varnish deals with a wide variety of general data without many
opportunities to optimize beyond general purpose algorithms such as an OS
provides. Whereas Redis has specific data types often with small footprints,
and very careful attention paid to the details of optimization for memory and
disk usage.

~~~
scott_s
I think you're overstating the time it takes to come up with benchmarks to
evaluate performance optimizations. Even micro-benchmarks tailored to showcase
your performance under ideal circumstances are a start. The longer you go
without doing any performance comparisons, the longer you go without knowing
if your work was worth it.

I'm not trying to deride his work - it's a neat project, and I will probably
read through his earlier entries more. I'm down with all of the reasons
provided, but I recognize that as humans, we tend to believe in things we
understand. Hence, we need to evaluate.

~~~
dasil003
Perhaps I am, but why are we assuming he hasn't done any benchmarks? It seems
quite likely that his opinion is informed by actual results. Does he need to
draw up graphs and spend more time on a blog post to be taken seriously?

~~~
scott_s
Well, yes. I take him "seriously," but I'm not yet convinced his techniques
outperform the kernel. That's how systems work is done. If you want to
convince people that your way is better, then you need data to back it up.

~~~
dasil003
You can't take any of his arguments at face value? Like the blocking argument?

~~~
jules
No, performance of complex systems is really really hard to predict.

~~~
dasil003
Yes, but again, why do we assume Salvatore is just pulling this stuff out of
his ass?

------
bediger
I blame Benjamin Zorn in this paper:
[http://www.cs.colorado.edu/department/publications/reports/d...](http://www.cs.colorado.edu/department/publications/reports/docs/CU-
CS-665-93.pdf)

for the whole "programmers shouldn't manage memory" myth. Clearly, if you know
what you're doing, you can do better than the OS and/or malloc() does. If you
don't know what you're doing, you have bigger problems than writing your own
allocator will quickly solve.

~~~
gjm11
It seems a bit strange to blame Zorn for spreading the myth that "programmers
shouldn't manage memory" on the basis of that paper, when at about the same
time he was writing a bunch of _other_ papers extolling the virtues of
customized memory allocators: <http://portal.acm.org/citation.cfm?id=172674>,
for instance.

~~~
bediger
Did you read the "CustoMalloc" paper? It's more of the same. Just to
illustrate, the paper you referenced has the title "CustoMalloc: efficient
synthesized memory allocator".

That is, "CustoMalloc" takes a look at memory usage patterns of a particular
program, then _generates_ a semi-customized allocator for that program.

~~~
gjm11
Yes, of course I read it. Yes, the allocators are synthesized; so what? The
point is that "synthesize a semi-customized allocator for each program" is a
very different thing from "just use what the system provides you with; you
won't be able to do better".

------
stephen
I'm not an expert, but the application's memory usage being opaque to the
kernel sounds vaguely like the problems Azul had scaling its Java runtime to
large (>2GB) heaps.

Azul recently open sourced some of their kernel patches:

<http://www.managedruntime.org/faq>

But that is about all I know.

------
jamii
In a vaguely similar vein, the recent Mirage paper showed impressive GC
improvements by running ocaml on xen without an operating system in between.
It allowed them to use a much simpler allocation algorithm.

<http://anil.recoil.org/papers/2010-hotcloud-lamp.pdf>

------
moron4hire
Something like paging virtual memory for Redis is going to be a core feature.
I can completely understand why they would want to keep core features "in
house". It's not "not implemented here" syndrome if the entire point of your
company is to implement things like this.

~~~
davidw
The code is open source under a BSD style license, so it's not really about
benefits for the company, which in this case is VMWare. They make their money
elsewhere and pay antirez to hack on Redis, which is a pretty good deal for
users of Redis.

------
wingo
Fascinating article. I wonder how SSDs affect his conclusions, though. If you
really care, don't you have an SSD? Especially for the recently mentioned 1:10
problem (10G of data, 1G of memory).

------
poet
Weird to see PHK being referred to as "the Varnish guy". :P

~~~
antirez
Fixed in the article, sorry I was not aware that the "Varnish guy" was a well
known programmer.

~~~
jacquesm
It's like referring to Donald Knuth as 'the TeX guy'.

~~~
antirez
With all the respect, I wish Poul-Henning Kamp the biggest of the fame, but it
is hard to compare him with a world wide myth as Donald Knuth.

