
Cache is the new RAM - aristus
http://blog.memsql.com/cache-is-the-new-ram/
======
jandrewrogers
A couple points I would make with respect to the article:

\- In-memory databases offer few advantages over a disk-backed database with a
properly designed I/O scheduler. In-memory databases are generally only faster
if the disk-backed database uses mmap() for cache replacement or similarly
terrible I/O scheduling. The big advantage of in-memory databases is that you
avoid the enormously complicated implementation task of writing a good I/O
scheduler and disk cache. For the user, there is little performance difference
for a given workload on a given piece of server hardware.

\- Data structure and algorithms have long existed for supercomputing
applications that are very effective at exploiting cache and RAM locality.
Most supercomputing applications are actually bottlenecked by memory bandwidth
(not compute). Few databases do things this way -- it is a bit outside the
evolutionary history of database internals -- because few database designers
have experience optimizing for memory bandwidth. This is one of the reasons
that some disk-backed databases like SpaceCurve have much higher throughput
than in-memory databases: excellent I/O scheduling (no I/O bottlenecks) and
memory bandwidth optimized internals (higher throughput of what is in cache).

The trend in database engines is highly pipelined execution paths within a
single thread with almost no coordination or interactions between threads. If
you look at codes that are designed to optimize memory bandwidth, this is the
way they are designed. No context switching and virtually no shared data
structures. Properly implemented, you can easily saturate both sides of a
10GbE NIC on a modest server simultaneously for many database workloads.

~~~
mpweiher
> In-memory databases offer few advantages over a disk-backed database with a
> properly designed I/O scheduler.

Hmm...Michael Stonebraker would probably disagree with you on that. For his
different newer projects, he claims and somewhat convincingly shows 10x or
more performance improvements. The analysis is that ~10% of a typical disk-
based system is useful work. By radically simplifying as a result of doing
away with the disk-store, you remove those overheads.

A similar argument (also with benchmarks) is made by the RAMCloud people, they
claim up to 1000x perf. increase over disk-based storage for a data-center.

Since I am mostly a dabbler (but have also managed to outperform RDBMs by
factor of 1000 or more using RAM-based techniques), I would be curious as to
what these papers get wrong.

[http://downloads.voltdb.com/datasheets_collateral/technical_...](http://downloads.voltdb.com/datasheets_collateral/technical_overview.pdf)

[http://cs-www.cs.yale.edu/homes/dna/papers/vldb07hstore.pdf](http://cs-
www.cs.yale.edu/homes/dna/papers/vldb07hstore.pdf)

[https://ramcloud.atlassian.net/wiki/display/RAM/RAMCloud](https://ramcloud.atlassian.net/wiki/display/RAM/RAMCloud)

~~~
platz
It's not just the disk store, also avoiding the time spent locking and
unlocking in row-based architectures (i.e. using lock-free algorithms)

[http://www.se-radio.net/2013/12/episode-199-michael-
stonebra...](http://www.se-radio.net/2013/12/episode-199-michael-stonebraker/)

~~~
mpweiher
"...single-threaded, lock-free, doesn’t require disk I/O in the critical path,
..."

IIRC, the single-threadedness is made possible in large part by not having I/O
on the critical path. (When you wait on I/O, a single threaded design stalls
completely, so either you multi-thread somehow or your throughput dies).

:-)

------
temuze
The database I want still doesn't exist.

Here's what I want:

\- Easy sharding, a la Elasticsearch. I want virtual shards that can be moved
node to node and an easy to understand primary/replica shard system for
write/reads. I want my DB nodes to find each other with an easy discovery
system with plugins for AWS/Azure/Digital Ocean etc.

\- Fucking SQL. I don't want to learn your stupid DSL. I want to give
coworkers a SQL client a say "go! You already know how to use this!". If I
want a new feature, then dammit, build on top of SQL the way PostgreSQL has.
Odds are, regardless if its some JSON API or SQL, my language will have a
client for it that will be superior than writing raw queries anyway.

\- Easily pluggable data management systems. For example, if I do a lot of
SUMs and I know I'm not doing writes very often, I want to use CStore. If I'm
storing a bunch of strings, I want to able to index it anyway I please - maybe
one index with Analyzer/Tokenizer X and another with Analyzer/Tokenizer Y -
all in a nice inverted index. Good, I can make an autocomplete now. Oh, and
sometimes I want a good ol' RDBMS.

\- Reactive programming! It works well in the front end and it'd be amazing in
the backend. For example, I want to make a materialized view that's the result
of a query, but that gets updated as new rows get inserted or as the rows it
uses gets updated. Let's call it a continuous view or something. Eventual
consistency is fine. Clever continuous views can solve a lot of performance
issues.

\- I want to be able to choose if a table/db is always in memory or not. I
don't care about individual rows - that sounds like someone else's problem.

\- Easy pipelining - these continuous views mean that an insert can span a lot
of jobs because one continuous view can be dependent on another. I want my
database to manage all of this for me and I want to forget that Hadoop ever
existed. I want to be able to give my database a bunch of nodes that are just
for working jobs if need be. Maybe allow custom throttling for the updates of
these "continuous views" so the queries don't get re-run every update if
they're too frequent.

\- While I'm at it, I want a pony, too. But I'd settle for this being open
source instead.

There's a lot of possible directions for the DB world in the next decade. Me,
I think the line between DBs and MapReduce/ETL/Pipelining is going to be
blurred.

~~~
icelancer
I have a LOT fewer requests than you, but this one is just so irritating:

>\- Fucking SQL. I don't want to learn your stupid DSL. I want to give
coworkers a SQL client a say "go! You already know how to use this!". If I
want a new feature, then dammit, build on top of SQL the way PostgreSQL has.
Odds are, regardless if its some JSON API or SQL, my language will have a
client for it that will be superior than writing raw queries anyway.

It pisses me off to no end these developers have to out-think SQL and re-
invent the whole damn new wheel for "efficiency's" sake. It seems that no one
takes into account friction costs and instead just waggles their dick around
saying "Look how smart this new language is or how I co-opted
[erlang,perl,JS,etc] into being the query language!" Just stop it. Some people
making less than $150k/year are going to have to use this and they are good at
DB analysis but not bullshit esoteric language writing.

Good lord this makes me so mad. I feel your pain.

~~~
cpncrunch
That's an amusing (and incorrect) assumption that the smartest people are
making >$150k.

~~~
icelancer
That's also not the intent of the sentence, but thanks for putting words in my
mouth.

~~~
cpncrunch
Fair enough. What was the intent? That's certainly what it sounded like.

------
GrinningFool
> It pisses me off to no end these developers have to out-think SQL and re-
> invent the whole damn new wheel for "efficiency's" sake.

Worked with someone who did this a couple years ago. I keep trying to get him
to explain why he was reinventing sql, and never did get a clear answer to
that.

> just waggles their dick around saying

Sorry to be the one to say it, but I don't think that part was necessary or
appropriate to this community. Also, using the collective gender-neutral
"their" in conjunction with dick waggling is just funny, now that I think
about it.

~~~
throwaway981
There seems to be a culture clash on HN.

On one side there is California, where "dick wagging" is an unnecessary
instance of gendered language, certain to drive women out of the tech
industry.

On the other side there's the rest of the world, where being offended by "dick
wagging" seems dazzlingly childish. It's just a penis, most grown men and
women know what they are and observe their motions on a daily basis.

In Bhutan they draw penises on things for good luck.
[http://en.wikipedia.org/wiki/Phallus_paintings_in_Bhutan](http://en.wikipedia.org/wiki/Phallus_paintings_in_Bhutan)

~~~
ajuc
Avoiding gendered words as a recept for more women in IT seems very funny to
me.

My language forces you to add gender-dependent postfixes to every verb and
noun. Nobody have a problem with that, and I think there's more women in IT
here than in US.

It's obviously not the source of problem. The source is people being jerks.
You don't fix that by changing the non-jerks language (jerks won't care).

In fact nonjerks weren't jerks because they treated women the same as
everybody. When you embarass them into treating women differently (more
carefully) it becomes more awkward and reminds women more often they are
different here. Counterproductive IMHO.

Regarding unnessesary vulgarity - it seems weird for me, no matter if it's
gendered or not. But people use different styles of communication, that's the
diversity people are so proud of.

Regarding SQL - yes please. It's a perfectly good wheel, don't reinvent it
without a really good reason.

~~~
aninhumer
>My language forces you to add gender-dependent postfixes to every verb and
noun.

If there's truly no gender neutral way of saying something, then I'd imagine
that people always use one gender for generic cases, and as such it's not
exclusionary. Which is different from having the option and not using it.

>The source is people being jerks.

Non-jerks being exclusionary is arguably far more of a problem than isolated
jerks. If one person says "lol women suck at computers", they're obviously a
moron, and can be dismissed. Whereas if everyone around you subtly assumes
that all programmers are male, it's far more hurtful, because they're people
that you respect.

>When you embarass them into treating women differently

Asking for gender neutral language isn't asking people to treat women
differently.

~~~
ajuc
> If there's truly no gender neutral way of saying something, then I'd imagine
> that people always use one gender for generic cases, and as such it's not
> exclusionary. Which is different from having the option and not using it.

Well, the option is always there, you can use "The person that is using this
application clicks the button X" instead of "User clicks the button X".
"Person" is female so you then use female for every verb. It's just longer,
inconvenient and sounds like legal text so nobody does it. People just write
everything in male versions. Even "Are you sure?" is different depening on
gender of "you", and I'm yet to see application that uses anything other than
the male version for dialogs (except for the applications that know your
gender somehow).

I think the distinction beetween using defaults without assuming gender, and
assumming gender is important, but some people argue for using the contrived
language to not be exclusionary, and that's what I'm against, and what I think
is counterproductive.

Imagine if everybody talked in legalese when you aproach them, and switched to
normal language otherways.

~~~
aninhumer
Humans are pretty flexible when it comes to language, what sounds awkward or
verbose can often become completely natural over time. That said I agree that
it's easier to make smaller language changes stick.

There might be an alternative way to phrase things neutrally that's easier to
say though. (passive?)

~~~
throwaway981
Yet if neither men nor women wish to change their own language, why must it be
changed?

Sometimes HN reminds me of Christian missionaries: interposing themselves into
foreign cultures that they do not understand and saying "No! No! It's all
wrong! You do it our way."

It is an ugly kind of imperialism.

~~~
ajuc
Meh, nobody is using guns, and the discussion is interesting. I just don't
like the assumptions some people make (I've heard that my culture is
inherently sexist for example).

------
nemo44x
This article is full of so much logical fallacy I'm surprised it made it here.
And it's an advertisement none the less.

Creates a red herring by stating he's been doing this a long time and has seen
it all.

Creates straw man after straw man in the trashing of memory caches (avoids
their use cases), Dynamo (there's a good reason tons of people use various
NoSQL Databases) and Hadoop (C'mon, now).

He also creates more logical fallacy in calling various concepts silver
bullets that ended up having problems. I don't think anyone serious about
technology thinks replication, sharding, load balancing "solves everything".
Nothing is a silver bullet and anyone who says something is is selling you
something...

And then he fails to really address the MemSQL uses replication, sharding (in
a limited sense since the core SQL concept of a JOIN is wrecked here and they
have a big warning on their troubleshooting page about an error you users must
see often).

SQL is great but I have plenty of great reasons to use other data stores. SQL
isn't a silver bullet for data.

Point is, he is calling MemSQL a silver bullet and is obviously trying to sell
something while ripping plenty of great ideas and concepts by picking the
worst implementations of them and largest misunderstandings of them.

------
brendangregg
Yes. Or as I've said: memory is the new disk. This is why PMCs (performance
monitoring counters) are more important than ever, to provide observability
for cache and memory analysis. (I'd like some PMCs made available in EC2. :)

------
hcarvalhoalves
> It’s been 65 years since the invention of the integrated circuit, but we
> still have billions of these guys around, whirring and clicking and
> breaking. It’s only now that we are on the cusp of the switch to fully
> solid-state computing.

Am I missing something, or should it read "hard disk" rather than "integrated
circuit" here?

~~~
jobu
He's referring to the picture of the hard drive above that line when he says
"these guys". It took me a couple reads through that sentence to grasp the
meaning = "Why are we still using spinning metal contraptions to store data 65
years after the invention of the integrated circuit?"

------
maerF0x0
Amazon doesnt expose much of these statistics (how fast of ram do i get with a
M3.large or a c3.med etc) . Does this mean real performance is for those who
own their servers?

~~~
wmf
RAM performance is the same for VMs and bare metal. And the ~1.5x performance
difference between different grades of DRAM (e.g. 1333 vs. 1600 vs. 2133 MHz)
is negligible compared to the massive cache-RAM and RAM-flash gaps.

And speaking of cache, lstopo (from the hwloc package) does work correctly
under EC2.

~~~
lgeek
It may be the same in the sense that hypervisors don't explicitly limit it,
but on a multicore host you're sharing memory bandwidth with the other guests,
in the common(?) case when the host has more cores than a guest. You can also
experience increased latency when there is access contention.

~~~
wmf
True, if you want full and predictable memory/cache performance you need to
rent a full-machine VM (usually 8xlarge) or bare metal.

------
yason
That's how it has always been. We take different storage/memory technologies,
sort them by their speed and price, put the fastest but most expensive closest
to the CPU and the slowest but cheapest as far as possible. Minimizing memory
footprint allows us to do more work on the faster end while minimizing storage
cost allows us to store terabytes of data on your bookshelf.

There might have been just two or three levels initially: cpu register(s),
system ram, and external storage. Now the spread has several more steps:
registers, L1 cache, L2 cache, maybe L3 cache or part of memory as disk cache,
SSD (either as a standalone drive or as an on-disk cache inside a traditional
hard drive), and the good old spinning platter. We've mostly let go of tape
storage by now but those are still sold for their capacity.

However, from the programmer's point of view, nothing has necessarily changed.

We have several levels of storage, more than before, ranging from the fastest
on-chip cache ram to the mechanical storage and we still optimize our programs
to run mostly in the fastest tip of this memory pyramid. What has changed is
the size of the spread itself: the gap between the fastest and the slowest is
huge in numbers. But relatively, not so much.

A quick guesstimate of the ratio of microseconds needed for a zero-page read
in C64 vs. reading a byte from the 1541 floppy drive versus a read from cpu
cache vs. a read from a spinning platter tells that the relative difference
still roughly on the same order of magnitude. From various sources, I get a
figure between 50-100 million times faster between the fastest and slowest
read.

That is also what makes programming so much fun: everything gets redone all
the time and the pace of advancements is crazy yet some things don't change.
We just do more complex things but still bump into essentially the same
tradeoffs.

------
contingencies
Database vendor frames history of computing in database evolution, makes snide
remarks about competing technologies, admits it has no idea where the world is
going while invoking the 'history repeats itself' notion. Well, duh.

OTOH, databases are only one component of modern architectures, which the
article correctly asserts are largely limited in terms of scalability by
throughput and latency. However, scalability is often secondary to
functionality. And in terms of functionality, the long list of database types
trawled out through the article only serve to highlight the real chokepoint:
cognitive overhead.

Perhaps what we really need are tools that enable us to more easily stop and
think about the problem. Ideally, tools to test, profile, compare and switch
between storage or other subsystem architectures without having to delve in to
infinitesimal intracacies of each.

 _Success really depends on the conception of the problem, the design of the
system, not in the details of how it 's coded._ \- Leslie Lamport

------
xacaxulu
Laughing so hard at this line:

"Bringing you yesterday's insights, TOMORROW"

------
kephra
I once learned, in the good old mainframe times, that there are 3 sizes of
databases: Small size that fit into RAM, medium size that fit on one computer,
and big databases, that require a cluster of computers.

The relational model, and SQL databases play their strong roles in medium size
databases, but are to much overhead for a fast small database, and do not
scale well for big databases.

It was hoped at that time, that Moors law will beat Wirths law (who claimed
this law much later), that big databases will soon be medium sized, nobody
would care about performance of small databases that much, and we could happy
use SQL for all problems. This was true for surprisingly long time, and still
is, if your problem fits into a medium size database.

Unfortunate, computer history turns in cycles, and tends to forget lessons
from the past. Coding access to a bunch of different databases was at least
standard under COBOL. Coding for half a dozen NoSQL databases now, is a
complete mess.

~~~
jbergens
That is a nice scale to think about. When I look back most systems I've been
involved with are small with todays servers. Rarely more than 50 GB and that
actually fits into memory. How many systems actually needs more than about
1000 GB of database data? Things like images and videos can be stored
separateley anyway, I'm talking about other data together with metadata about
images/videos/files.

------
Roboprog
I have been saying this since the late 90s.

[https://news.ycombinator.com/item?id=8557596](https://news.ycombinator.com/item?id=8557596)

Small code & data fit in cache, and run full speed. Fortunately, I can get at
the GB that used to be (mainly) on my hard drive faster, now.

------
graycat
Okay, if the title is correct, then to heck with traditional RAM and, instead,
have _very long addresses_ , say,

    
    
         a(i).b(j).c(k) ...
    

stored in, say, a key-value store. Then, as usual for caching, just hash that
long address.

Why do that? Mostly no one really wants the sequential addresses, and a lot of
work in software and the processor is calculating those sequential addresses
nearly no one really wants anyway. So, e.g., software collection classes, just
let the keys be the long addresses and f'get about AVL trees, red-black trees,
etc. And for sparse matrices, just use the row and column indices as the
addresses and f'get about all the tricky addressing for sparse matrices. Etc.

------
mmphosis
_It means that caching is often more trouble than it’s worth._

------
farresito
I've always found very unfortunate that memsql is not open source. It looks
very interesting. VoltDB seems to fill a similar niche. Has anyone tried both?

~~~
lerchmo
I have tried VoltDB, it's nice for scaling low latency data access (realtime
bidding / adserving is one use case it shines)

------
alexjarvis
[http://crate.io](http://crate.io) is pretty much the system described towards
the end of the article.

------
aesede
how come nobody noticed qwantz.com's T-Rex yet!

