
Anna: A Fast, Scalable, Flexibly Consistent Key-Value Store - mpweiher
https://databeta.wordpress.com/2018/03/09/anna-a-crazy-fast-super-scalable-flexibly-consistent-kvs/
======
ricardobeat
The link to bloom is broken: [http://bloom-lang.net](http://bloom-lang.net)

This is a result of what appears to be a chain of ideas, papers and prototypes
started more than eight years ago (see
[http://boom.cs.berkeley.edu](http://boom.cs.berkeley.edu) \- maybe even
earlier, hard to tell since most older URLs are gone). I'm amazed that people
are able to remain funded working on something with a very theoretical and
long-term payoff, and incredibly thankful at the same time for the entities
supporting this! Wish success to the team on releasing Bedrock.

~~~
dnautics
I wonder what the performance would be like if instead of running on ruby they
ran it on a scalable virtual machine with near-first-class actors (not truly
first class, but the engine is optimized to handle them, and the constructs
are extremely important in the standard library) like BEAM.

~~~
dwenzek
While Bloom was indeed coded in ruby, they use C++ for Anna:

> We get this rich consistency in Anna with a very clean codebase, by porting
> design patterns of monotone lattice composition from Bloom to C++.

In the paper:

> The Anna actor and client proxy are implemented entirely in C++. The
> codebase—including the lattice library, all the consistency levels, the
> server code, and client proxy code— amounts to about 2000 lines of C++ on
> top of commonly-used libraries including ZeroMQ and Google Protocol Buffers.

~~~
dwenzek
Curious to see such effective code, I looked for the source code, but found
nothing.

The more approaching code base I found is
[https://github.com/ucbrise/LatticeFlow](https://github.com/ucbrise/LatticeFlow).

Is the Anna source repository public ?

------
nicpottier
Being faster (in throughput) than Redis doesn't seem that difficult, Redis is
mostly single core. But with that restriction comes the magical constraint
that you can run Lua on that single core without having to think about race
conditions or concurrency. For many complicated systems this is an incredible
property and can provide incredible power. Yes, you may eventually outgrow
Redis, but if you do then you are entering pretty crazy territory where you
will likely need something custom anyways.

All that to say, that particular comparison feels a bit Apples and Oranges to
me.

~~~
ricardobeat
The comparison with Redis is on page 10, and (obviously) takes into account
the number of threads. It is a bit presumptuous to post a comment like this
without minimal effort, please don’t.

Regarding embedded processing, it’s very easy to embed whatever single-
threaded language you’d like into C: Lua, Javascript or something new like
Gravity. This is orthogonal to the storage / network architecture.

A better argument would be on the extra operations and data structures that
Redis offers, not being a simple key-value store.

~~~
nicpottier
The comparison to Redis is in the 4th paragraph of the linked article. "The
paper includes numbers showing it beating Redis by over 10x on a single AWS
instance"

I'm sorry I didn't go read the original paper, but I thought reading the
article qualified me to comment. Sorry dad.

I think you also misunderstood my point about Lua. Embedded Lua in Redis is so
powerful BECAUSE it is single threaded, not because it is just snazzy to have
an embedded language. That with the primitives Redis provides allows you to
build your own domain specific data structures with their own custom semantics
that just aren't possible in other systems without rolling your own. And you
can do it simply.

That Anna is faster is great, but it comes with its own set of constraints, I
don't think I would be wrong in saying that includes you not having exclusive
access to the data while you are in an embedded script running on the store.

~~~
naasking
> "The paper includes numbers showing it beating Redis by over 10x on a single
> AWS instance"

Right single core against single core,the best scenario for Redis, and Anna's
an order of magnitude faster. I'm not sure what you don't find impressive
about this.

~~~
nicpottier
Ahem. Single AWS instance does not equal single core.

See my earlier note, it is a bit apples to oranges. Redis is not optimized for
throughput on a machine, it is optimized for throughput on a single core. And
that property allows lots of interesting things due not having any
parallelism.

They are different tools and they have different performance characteristics.
My only point was that comparing against Redis is somewhat misguided due to
those differing goals.

------
AboutTheWhisles
This link claims to be 700x faster than something called Masstree, which
itself claims to do about 3 million requests per second with 16 cores. I'm not
sure I buy 131 Million request per second per core.

> it was up to 700x faster than Masstree, up to 800x Intel’s “lock-free” TBB
> hash table. In fairness, those systems provide linearizable consistency and
> Anna does not. But Anna was still up to 126x faster than a “hogwild”-style
> completely inconsistent C++ hashtable due to cache locality for private
> state, while providing quite attractive coordination-free consistency.

------
ralusek
I feel like between Redis, S3, Cloud Storage, RocksDB, Cassandra, etc...this
area strikes me as one that has been solved as well as we could reasonably
expect it to be.

What the world of data needs more of is continued development into novel
indexing strategies/implementations. ElasticSearch, Postgres's GIN index on
JSONB, MapReduce, graph databases.

I don't need another key value store...

~~~
xstartup
Cassandra is JVM GC issues

Redis is in-memory only

Cloud Storage - Not sure, how we can use it outside of cloud vendors

RocksDB - Facebook just outsourced the engine to the community, where is the
service which adds replication, clustering and network interface on top of it?
I am sure, they use one internally, why is it not being open-sourced?

There is also badger but most of these only offer low-level operation.

Sorry, most of my developers are unable to consume them just like Redis.

~~~
kodablah
Agree w/ all these points (forgot "S3 has provider lock-in" though others
replicate the API). I do use Cassandra for most use cases and I don't hit GC
things, but I understand the concerns that come with the JVM (no, haven't
subbed Scylla in yet). One that I haven't tried but want to hear the downsides
of is [https://github.com/pingcap/tikv](https://github.com/pingcap/tikv) (not
the DB built on top, but just that one for KV). A nice published list of all
cons of all database systems would be ideal.

~~~
SanFranManDan
From what I could tell, its tikv is virtually useless without the rest of the
tidb (the MySQL layer that sits on top written in go) to be useful.

TiKV has no replication / sharding built in, that is actually handled by a
Placement Driver (PD)

From the docs: > TiKV is a component in the TiDB project, you must build and
run it with TiDB and PD together.

------
hardwaresofton
So reading through this announcement, it seems the key to ANNA's speed is:

local cache (in the form of the actor's mailbox) + background gossip

The usual restrictions (and increases in latency) still apply when you want to
make sure that something's actually written (quoruming) after you've written
it, from what I can tell.

Can someone explain to me why this is a step forward for the field -- I
haven't yet read all the papers they linked to (including their ANNA papers),
but this doens't seem to be one of those times where a bunch of disparate
papers are combined into creating something truly groundbreaking?

I feel like I must be missing the point

~~~
ricardobeat
I don't think that's correct. The real key to its performance seems to be the
usage of _distributed lattices_ as data structures, and the ability to perform
compile-time checks that guarantee the data will be eventually consistent,
both of which allow code to be completely lock-free. This comes from the CALM
paper cited in the article - lots of reading to do!

~~~
doorbumper
I thought so before too, but after reading the paper again, I came to the same
conclusion as the parent. The usage of distributed lattices is key, but it
only works because it allows them to reduce messaging cost and gossip at
background intervals. As far as I can tell, this means that you can receive a
successfully written response, have that machine die , and all data within the
last multicast period is lost. Therefore, it isn't suitable as a datastore,
and the benchmarks are mostly worthless with the exception of Redis.

~~~
hardwaresofton
Super duper late, but I still haven't had time to read any of the papers
(there are like 4 if I really want to get anywhere close to understanding
their spin on gossip + the lattice thing) -- glad the discussion is still
interesting though.

I'm starting to think that the quorum strategy is something like a theoretical
lower bound -- at least until someone brilliant figures out a way past it (or
technology shifts in some gigantic way or something).

------
dman
Has there been a public release of the source for this?

------
bootcat
Looks interesting, especially about coordination free consistency and lock
free structures !!

------
nickreese
Looks exciting but very light on details.

~~~
gjem97
There's a lot more detail in the linked paper.

[http://db.cs.berkeley.edu/jmh/papers/anna_ieee18.pdf](http://db.cs.berkeley.edu/jmh/papers/anna_ieee18.pdf)

------
leventov
Claims in the blog post (orders of magnitude faster than the current state of
the art systems, universal linear universal scalability from threads to many
nodes, dimissing Dean's rule of redesign after x10 scale) seem overblown to
me.

What have they really built: a purely in-memory KV store that doesn't support
synchronous secondary writes for durability. So, any comparisons with ACID KV
stores, either disk based (Cassandra, Mongo) or in-memory, are not apples-to-
apples comparisons from the beginning. What could be production applications
of such system, other than cache?

On their benchmarks: they don't really compare with state of the art.

Selection of competitors in the single-server, multi-core benchmark doesn't
include systems like
[https://github.com/fastio/pedis](https://github.com/fastio/pedis). Also, they
still use 100 millisecond granularity of gossip (within a single server!),
while for all other compared systems corresponding metric could be evaluated
as nearly 0 by construction, that gives Anna a huge edge.

In multi-node benchmark, they claim 10x over Cassandra. ScyllaDB
([https://www.scylladb.com/](https://www.scylladb.com/)) claims the same,
while being ACID and linearizable, unlike Anna. Also, Anna achieves stronger
consistency levels by holding off reads, that kills latency, given 100
millisecond gossip granularity. If it applies only to their multi-key
consistency (Read Committed/Uncommitted) it's probably OK, because I suppose
that there is no magic bullet that allows to preserve super low latencies and
providing similar consistency in Scylla either. But if Anna needs to hold off
reads for any of their claimed single-key consistency levels (all of which are
weaker than linearizable), that's worse than Scylla. The authors of the paper
didn't detail the algorithm for each consistency level.

Seems like the authors don't benchmark multi-node scalability of Anna on any
consistency levels except the weakest, simple eventual consistency. It would
be interesting to see if Anna scales as well on stronger consistency levels.

To me, the main outcome of this paper is another confirmation that shared-
nothing, thread-per-core, message passing designs are beneficial in the modern
computing environment. This is not new, however, see H-Store, Scylla/Seastar,
Tarantool
([https://github.com/tarantool/tarantool](https://github.com/tarantool/tarantool)),
Aeron ([https://github.com/real-logic/aeron](https://github.com/real-
logic/aeron)), Tempesta ([https://github.com/tempesta-
tech/tempesta](https://github.com/tempesta-tech/tempesta)), etc.

Novelty is the framework that generalizes thread/node scalability, different
consistency levels reusing the same codebase, and having just a single knob -
gossip granularity. Practical applications are limited. Certain techniques are
probably going to be cherry-picked by systems such as Redis Cluster and In-
Memory Data Grids.

~~~
evanweaver
Can you point me to where Scylla claims to be ACID or linearizable? As far as
I know there is no Paxos implementation yet, not that Cassandra's LWT
implementation is anything to write home about.

~~~
leventov
Anna paper itself says that Cassandra and Scylla are Linearizable per-key.
Yes, obviously Scylla is not ACID, sorry for my loose usage of this term. I
was referring to durability, i. e. "D" from ACID.

------
polskibus
Can Anna's approach improve current solutions to the problem of managing
secondary indexes in a partitioned KV store while preserving consistency?

~~~
zzzcpan
I don't think this problem exists outside of the realms of linearizable or
serializable consistency, which this system doesn't provide.

------
dis-sys
> Totally ordered request processing requires waiting for global consensus at
> each step, and thus fundamentally limits the throughput of each replica-set.

really? such global consensus increases latency (network round trip plus fsync
write), with a fully batched and pipelined concurrent design, when CPU cycles
are being saturated in those benchmarks, why throughput is fundamentally
limited by such increased latency?

~~~
zzzcpan
Let's say we have a primitive system with total ordering and single static
coordinator per each replica-set. And we want to update a global counter from
every node. Now since counter is global it lives on a single replica-set and
every node has to communicate with this one coordinator. Be it 4 nodes or
4000, still one coordinator. If we drop total ordering and use something like
gossip protocol to communicate between nodes, we can merge this counter
everywhere in the system as it propagates, eliminating all unnecessary
communications and distributing communications in the network. So, yeah,
global consensus fundamentally limits both throughput and latency.

------
nogenerics123
If anyone is interested in looking at the benchmarks or implementation:
[https://github.com/cw75/tiered-storage](https://github.com/cw75/tiered-
storage) is from the main developer and it looks similar to Anna.

------
muxator
Rant:

Why so much focus on Key Value stores? That's the easy part of the problem.

I would like to know more about the interesting ones: secondary indexes, range
scans, performance on mixed workloads, robustness, operational complexity.

~~~
snissn
> Why so much focus on Key Value stores? That's the easy part of the problem.

There aren't enough good/fast/reliable ones

~~~
kamranjon
Does Redis not count?

~~~
doorbumper
Redis is an impressive piece of engineering, but it performs best as an in-
memory kv-store on a single core. Its distributed capabilities target a
different problem than other distributed kv-stores attempt to solve. Redis
Cluster focuses on reasonable functionality for an in-memory store. However,
Redis Cluster is neither highly available, nor consistent. There are multiple
modes that can cause catastrophic data loss, so Redis Cluster works best in
situations where losing data isn't a big deal. For it's intended use case,
nothing else comes close to offering the same functionality, performance,
reliability, and ease of use.

------
petre
There is already a database product called Bedrock, using SQLite.

[http://bedrockdb.com/](http://bedrockdb.com/)

------
polskibus
Can anyone with a good understanding of latest db research share their
thoughts on the paper?

------
herogreen
Thats not a very lucky name IMHO, given that there already exists a data
storage related software called Hanna
([https://www.sap.com/products/hana.html](https://www.sap.com/products/hana.html))

------
julienmarie
no link ?

~~~
thwd
[http://db.cs.berkeley.edu/jmh/papers/anna_ieee18.pdf](http://db.cs.berkeley.edu/jmh/papers/anna_ieee18.pdf)

------
Capaverde
Why this fad with naming apps as persons? Found this:
[http://fortune.com/2014/12/22/startup-names-
human/](http://fortune.com/2014/12/22/startup-names-human/)

~~~
doorbumper
Anna is a hummingbird. Known for its fastest relative speed.

~~~
Capaverde
Why this fad with naming birds as persons?

