
Facebook Announces Apollo, a New NoSQL Database for On-line Low Latency Storage - sumitkumar
http://www.infoq.com/news/2014/06/facebook-apollo
======
dj-wonk
"Currently, Apollo is developed internally at Facebook. No firm claims were
made during the talk that it will be opensourced. It was mentioned as a
possibility after internal development settles down." from
[http://java.dzone.com/articles/facebook-announces-apollo-
qco...](http://java.dzone.com/articles/facebook-announces-apollo-qcon)

~~~
dj-wonk
HN readers, what do you think are Facebook's motivations for announcing Apollo
at this point?

~~~
necubi
I'd guess that the engineers who are building it think it's cool and want to
talk about it. Facebook seems to be generally open about their internal
systems, presumably because they don't see it as their competitive advantage
(unlike, say, Google).

~~~
alec
Google talks a fair bit about their internal systems at this level of
"descriptions but not code" \- Bigtable, MapReduce, Spanner, Flume, Chubby,
and more have been influential.

~~~
timothya
In fact, they do more than just talk: they often publish papers describing how
they work. The open source community has since recreated a lot of them, which
has proven useful to a lot of people (e.g. HBase, Hadoop, Apache Crunch, etc.)

------
spiralganglion
One of their supported storage primitives is CRDT-based, according to [1]. I,
for one, am really interested to see how this works in practice. I've been
quite excited about CRDTs, but haven't seen enough examples of them in the
wild to get a sense of their drawbacks — for instance, how difficult it is to
use them to model various processes or data structures.

[1]
[https://twitter.com/adrianco/status/476843040330743809](https://twitter.com/adrianco/status/476843040330743809)

~~~
platz
Do you know of a resource for learning the basics CRDTs that doesn't require a
PHD?

~~~
seiji
The name is intimidating, but the operations are simple.

Basically, your storage has container types ("T"). A list, a set, a
dictionary, etc. Container types can be split and added together in a
distributed fashion ("R" and "D").

The "C" in CRDT stands for "Convergent and Commutative" to imply your
distributed operations can obtain the same value when merged.

Quick example: If you have a node with a key pointing to value (set) [a, b, c]
and another node with the same key but different value [c, e, f], then when
the nodes communicate, they can do a set union for the actual result of [a, b,
c, e, f]. Keys can keep a running log of recent operations to clean up the
global result too (like: [c, e, (recently deleted f)], so on merge, if the
other list has f, it would be deleted instead of re-added).

Before CRDTs were a thing, Bob made state box and it's very easy to
understand. Give the README a read to understand more basics:
[https://github.com/mochi/statebox](https://github.com/mochi/statebox)

~~~
platz
That's helpful, thanks (I've downloaded some crdt videos to watch in the
meantime).

At the surface they sound like something vaguely resembling an abelian group
(+/\- inverses), but the conflict resolution stuff is the heart of it I'd
guess.

~~~
spiralganglion
Yes, from my (limited but growing) understanding of it, they are indeed
similar to abelian groups.

~~~
noelwelsh
CRDTs are, in the basic case, a idempotent commutative monoid, aka an
idempotent abelian monoid.

If this floats your boat, here's me on CRDTs:
[https://skillsmatter.com/skillscasts/5301-convergent-
replica...](https://skillsmatter.com/skillscasts/5301-convergent-replicated-
data-types)

~~~
ryanobjc
I don't think you understand the word basic...

CRDTs wont have mainstream success until people stop using the words 'monoid'
and 'abelian' etc.

Most programmers aren't required to learn this kind of math in a CS degree,
AND furthermore, many programmers dont have a CS degree/forgot it.

So the question is, are CRDTs a useful technique for all developers, or just a
way for a minute few to demonstrate their ability to sling around math words?

~~~
noelwelsh
Did you read the thread? The parents were using terms from abstract algebra,
so I replied using the same language.

In another context I would have avoided those terms and perhaps used an
explanation like this: [http://noelwelsh.com/programming/2013/12/20/crdts-for-
fun-an...](http://noelwelsh.com/programming/2013/12/20/crdts-for-fun-and-
eventual-profit/)

------
eslaught
> Apollo, Facebook’s Paxos-like NoSQL database ...

> supports anything from a minimum of three servers to thousands

Sorry, you don't run Paxos on thousands of servers. Typical Paxos cluster
sizes are 5-7. The algorithm would never converge if you did run it on
thousands of servers.

~~~
teraflop
Well, I wouldn't judge the software on the basis of the article. The words
"Paxos-like database" are enough of a tip-off that it's not exactly going for
rigorous technical accuracy.

------
tluyben2
"is on-line low latency storage - in particular Flash and in-memory." "As
distinct from a document oriented, or key value store, Apollo is about
modifications to data structures, allowing you to represent maps, queues,
trees and so on, as well as key values. "

Sounds like Redis?

~~~
justincormack
With flash support though, presumably for larger than fits in memory, rather
than just as a persistent store, based on the fact it uses leveldb.

