
RAMCloud Project - monort
https://ramcloud.atlassian.net/wiki/spaces/RAM/overview?mode=global
======
erwan
A cool tiny bit of trivia about RAMCloud: that's from this project that the
Raft consensus algorithm emerged!
([https://raft.github.io/raft.pdf](https://raft.github.io/raft.pdf))

Right now, I think that the algo is used in RAMCloud via LogCabin
([https://github.com/logcabin/logcabin](https://github.com/logcabin/logcabin)).

Raft is more practical (as in "well specified") than Paxos and closest to its
lesser known cousin, VR ( _Viewstamped Replication_ ). Beyond the academic
genealogy of the project, what is interesting here is the fact that usability
is a first-class concern. Clearly the fact that it was born out of a
real/breathing project was a driving factor.

It wasn't just an academic being creative. To pick a notorious example, have a
quick look at Leslie Lamport's paper on Paxos: you are never quite sure
whether what you are reading is a distributed systems paper or a vintage
edition of the Holy Bible.

So Raft had great timing too. It came after decades of clumsy (because novel!)
systems research on consensus algos and from a laboratory of practitioners,
hence its designers knew exactly how previous attempts were deficient with
respect to their own needs. And it turns out these overlapped with a lot of
people's. There is wisdom to be learnt from this.

Also, on another note I think that's funny because this narrative reads
exactly like a startup-story!

~~~
antics
Raft is certainly easier to _explain_ , but it's hard to argue that Raft is
more _practical_.

For one thing, implementing Raft in a performant way is incredibly difficult.
Where paxos has very fine-grained units of consensus ( _e.g._ , ballots), in
Raft, everything happens through leases, and is totally linear. This is one of
the major reasons a bunch of (all?) major Raft implementations do unprincipled
things like non-quorum reads. This sort of problem also makes it very
difficult to get Raft to do _anything_ on an unreliable network. Raft is also
super noisy, so you have to do stuff like piggy back heartbeats in other RPC
calls.

One way of thinking about this is: in paxos much of the complexity is in the
specification itself. Performant Raft pushes this complexity into external
systems. Both of these are extremely hard problems. There is no free lunch.

If the problem is really just verifying the model, then I think you are better
off writing a simple modeling language, writing paxos in it, and then
compiling that to c++ or something. This was the approach Google took in paxos
made live[1] for example.

[1]:
[http://www.read.seas.harvard.edu/~kohler/class/08w-dsi/chand...](http://www.read.seas.harvard.edu/~kohler/class/08w-dsi/chandra07paxos.pdf)

~~~
makmanalp
To expand on this point, check out this talk by the CockroachDB VP of
engineering[0]: they used raft but found out that it's approximately as
complicated as paxos. They also do non-quorum reads, all through the leader
replica)

[0] [https://atscaleconference.com/videos/scale-2018-run-your-
dat...](https://atscaleconference.com/videos/scale-2018-run-your-database-
like-a-cdn/)

------
dugluak
RAMCloud sounds cool and superfast, but why is the documentation so slow? Why
does it need to be a site with heavy javascript with 482 ajax requests instead
of plain old HTML?

~~~
rad_gruchalski
Because it's an Atlassian Confluence.

------
godelmachine
Excellent review by Adrian Colyer -

[https://blog.acolyer.org/2016/01/18/ramcloud/](https://blog.acolyer.org/2016/01/18/ramcloud/)

------
jmiserez
Is this still maintained? It's a really cool project, it's just so insanely
fast.

I once implemented a MariaDB storage engine that used RAMCloud for storage,
and it was an eye-opening experience. With blazingly fast reads and writes,
latency was the number one performance issue for us.

~~~
ddorian43
Did you use it in production ?

~~~
jmiserez
No it was part of a semester project at university.

~~~
ddorian43
Yugabyte is doing something similar with postgresql (using FDW now & using
storage api on pg12).

------
michaelmior
Calling RAMCloud "new" seems a bit disingenuous since the first paper came out
around 7 years ago.

[https://dl.acm.org/citation.cfm?id=2043560](https://dl.acm.org/citation.cfm?id=2043560)

~~~
manigandham
It's not new. These wiki/project pages have years of history and haven't been
updated.

~~~
michaelmior
True. 2009 should probably be added to the title.

~~~
acdha
The pages are still being actively edited which would suggest it's still on-
going

------
byte1918
> Durability: RAMCloud replicates all data on nonvolatile secondary storage
> such as disk or flash, so no data is lost if servers crash or the power
> fails.

How does this work if someone is doing multiple sequential writes? Doesn't
backup-ing to disk take a lot longer than writing to _RAM_ meaning some writes
could get lost?

~~~
vbezhenar
You can distribute writes over multiple disks.

~~~
byte1918
Out of pure curiosity but wouldn't the number of disks required to guarantee
100% write-backup at 100% peak times be pretty big? A quick search says HDD is
~200 times slower than RAM (on average, consumer grade).

~~~
ddorian43
You should see sequential write speed. Its probably writing a log +
checkpointing.

------
kijin
How does it replicate all the data on disk or flash while maintaining the
write latency of DRAM?

I'm thinking there must be a delay during which loss of power will result in
loss of data that was acknowledged as written. Is this something that RAMCloud
overcomes in a novel way, or is the answer simply that the data is replicated
across many nodes?

How about the problem of raw bandwidth? If you keep sending writes to a node
that exceeds the bandwidth of the persistent storage medium but is well within
the capabilities of DRAM, the storage medium will never be able to catch up
even if we allow a generous delay. Maybe this is a non-issue right now because
you can't send more than a few dozen Gbps over the network anyway, and we just
need to hope that flash performance will improve faster than 100GbE goes
mainstream.

~~~
krageon
Given the odds of these people inventing a huge solution to a hard problem in
a publishing vacuum (I am not aware of papers coming out that fundamentally
solve this, for example) I think it is pretty safe to assume it's subject to
the same limitations all such systems are subject to. Loss of power leads to
loss of data. Commonly mitigated as you already state.

~~~
ernestoo
The Raft Consensus Algorithm
([https://raft.github.io/](https://raft.github.io/)) was born out of this
project

------
newaccoutnas
Does this mean Atlassian products will be quick(er)? Perhaps they should
concentrate there if not.

~~~
tlunter
> The RAMCloud project is based in the Department of Computer Science at
> Stanford University.

It's not by Atlassian...

~~~
newaccoutnas
Sure, but the URL is for Atlassian.net

~~~
manigandham
They just host the site, it is their wiki product called Confluence. The same
way Github hosts pages at github.io but has nothing to do with content that
people host there.

~~~
newaccoutnas
Oh, I know, I administer Confluence, Jira and Bitbucket. Bar from bitbucket,
the other two hosted versions are woefully slow. Don't get me started about
the Jira Calendar.

If they're blogging about some tech, maybe they need to dogfood that first..
especially when it's due to performance

~~~
oarsinsync
As per a previous comment, the contents of the pages linked have as much to do
with Atlassian as the contents of a page on github.io have to do with Github.

This is a stanford.edu project wiki page, hosted by Atlassian. Nothing more.

By the same logic, any performance improvement that Github might see from
random projects hosted on Github, should be "dogfooded" in the same way.

It's not dogfooding when it's someone else's work.

------
AndrewDucker
What would be the difference between this and an SSD-based keystore and an
instance of memcached with as much RAM as was necessary to hold the whole
dataset?

~~~
Rafuino
Here's a slide from Dormando (memcached maintainer) who spoke recently at QCon
SF. He created extstore, which lets you offload values to SSD and keep keys in
memory. With all hits on different types of SSDs, he's seeing near all-DRAM
performance in terms of throughput and latency.

[https://twitter.com/justincormack/status/1059540644245295105](https://twitter.com/justincormack/status/1059540644245295105)

The NAND SSD (dark blue line) is still hitting good performance but at ~250K
queries per second hits P99 latencies above 1ms (a typical SLA Netflix calls
out for their own caching using extstore). The Optane SSD (light blue line)
stays well below 1ms P99 latencies up to ~500K queries per second.

DRAM alone can handle more throughput but by then you have to take into
account what your network can handle.

~~~
AndrewDucker
Thank you, that's fascinating.

------
justinholmes
I worked on this a couple of years ago to make it easier to create deb
packages.

[https://github.com/ticketscale/ramcloud-deb-
packaging](https://github.com/ticketscale/ramcloud-deb-packaging)

------
manigandham
I would recommend Apache Ignite if you want a similar production-ready system
today, along with extending datasets automatically to disk, built-in messaging
and distributed data structures, and read/write-thru cache options.

------
macca321
Would be interesting to know how this compares/contrasts with Apache Ignite.

------
chrisweekly
Mods please update title to reflect year (2009). Thanks!

~~~
erwan
It's still under active development afaik.

Edit: Double checked, and it is - somewhat - for sure. Including "2009" in the
title would be misleading.

------
continuations
Does this remain a research project or can it be used in production?

------
beamatronic
You could accomplish at least 99% of this by using a properly sized Couchbase
cluster

------
zygotic12
Regardless - RAM disks have been sexy since the 80's. Love it. Let's go
extreme!

Domain.com: Congratulations! your domain is available. l1cache.net $12.99

~~~
zygotic12
Sorry - British. Sarcasm is our art.

