
Anti-Caching: A New Approach to Database Management System Architecture [pdf] - luu
http://www.vldb.org/pvldb/vol6/p1942-debrabant.pdf
======
andrewstuart2
Archiving. You've described archiving. Taking less-recently-used data off of
main storage (RAM in this case) onto a slower, cheaper medium.

Heck, the only reason we have hard disks and memory is that it's not
economically feasible to have billions of CPU registers, and that there's a
further trade-off between volatility and speed. This is literally an older
technique than the computer. Files are kept in the main office until they're
not being used and then moved to the basement for long-term storage.

~~~
hliyan
Even more simply put: "memory hierarchy exists"?

~~~
andrewstuart2
Haha seriously. "Engineering trade-offs exist. A new theory by my startup.
(Buy my product)"

------
emeryberger
Anti-caching is _literally_ caching. The system may be fantastic, but that's
what it's doing.

Consider the following key characteristics of "anti-caching":

(1) Cold data is moved from RAM to disk.

This is cache replacement. Eventually, caches fill and you have to choose what
to evict. While there are many replacement algorithms, one of the most popular
is LRU, which is what is used here. In conventional CPU caches, data is moved
a cache line at a time, moved transactionally. Here, it is a tuple at a time,
moved transactionally.

(2) There is only one item present in either RAM or disk.

This is (almost) exclusive caching, which maintains exactly one item in all
levels of the cache hierarchy (as done by the AMD Athlon). The key difference
is removing it from the bottom of the hierarchy. This approach may be novel,
but as far as I can tell, it is the primary novelty.

To be clear, adapting all of this to DBMS architecture may be a great idea,
but let's call things by their names.

------
rbetts
I work at VoltDB (based on the original h-store concepts) - if anyone has
questions or ideas about "anti-caching", feel free to ping me.

------
notacoward
Please put [2013] on this. It's good stuff, but it's not news.

------
falcolas
From past experience in the database industry, command logs are insufficient
at maintaining data integrity after a crash. They capture the bare minimum
information to reproduce a transaction, which is frequently not enough
information. A log of "I'm changing field X from A to B" produces much more
reliability in practice.

~~~
MichaelGG
If they are capturing enough to reproduce a transaction, then by definition
that's enough information :). Non-deterministic commands (WHERE RANDOM()
without a known seed) are the problem.\

------
nickpsecurity
Like andrew, the premise is misleading: any worthwhile, in-memory database
inverts the memory cache and RAM situation because the whole point is to be in
memory. They all typically have a way of storing that data in secondary
storage over time. Solo [1], for instance, can do 700,000 transactions a
second on a 32-core machine with data regularly saved to the disk.

Now, the strategy they use is interesting. I'd just rather see an apples-to-
apples comparison of it against DB's like FoundationDB, F1 RDBMS, and others
doing high-performance w/ strong consistency. The more modern stuff, that is.
MySQL and its performance are quite dated.

[1]
[http://db.csail.mit.edu/pubs/silo.pdf](http://db.csail.mit.edu/pubs/silo.pdf)

------
limau
A recent survey on in-memory big data processing systems:
[http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7097...](http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7097722)

