This is a result of what appears to be a chain of ideas, papers and prototypes started more than eight years ago (see http://boom.cs.berkeley.edu - maybe even earlier, hard to tell since most older URLs are gone). I'm amazed that people are able to remain funded working on something with a very theoretical and long-term payoff, and incredibly thankful at the same time for the entities supporting this! Wish success to the team on releasing Bedrock.
> We get this rich consistency in Anna with a very clean codebase, by porting design patterns of monotone lattice composition from Bloom to C++.
In the paper:
> The Anna actor and client proxy are implemented entirely
in C++. The codebase—including the lattice library, all the consistency levels, the server code, and client proxy code— amounts to about 2000 lines of C++ on top of commonly-used libraries including ZeroMQ and Google Protocol Buffers.
The more approaching code base I found is https://github.com/ucbrise/LatticeFlow.
Is the Anna source repository public ?
All that to say, that particular comparison feels a bit Apples and Oranges to me.
A better argument would be on the extra operations and data structures that Redis offers, not being a simple key-value store.
I'm sorry I didn't go read the original paper, but I thought reading the article qualified me to comment. Sorry dad.
I think you also misunderstood my point about Lua. Embedded Lua in Redis is so powerful BECAUSE it is single threaded, not because it is just snazzy to have an embedded language. That with the primitives Redis provides allows you to build your own domain specific data structures with their own custom semantics that just aren't possible in other systems without rolling your own. And you can do it simply.
That Anna is faster is great, but it comes with its own set of constraints, I don't think I would be wrong in saying that includes you not having exclusive access to the data while you are in an embedded script running on the store.
Right single core against single core,the best scenario for Redis, and Anna's an order of magnitude faster. I'm not sure what you don't find impressive about this.
See my earlier note, it is a bit apples to oranges. Redis is not optimized for throughput on a machine, it is optimized for throughput on a single core. And that property allows lots of interesting things due not having any parallelism.
They are different tools and they have different performance characteristics. My only point was that comparing against Redis is somewhat misguided due to those differing goals.
> it was up to 700x faster than Masstree, up to 800x Intel’s “lock-free” TBB hash table. In fairness, those systems provide linearizable consistency and Anna does not. But Anna was still up to 126x faster than a “hogwild”-style completely inconsistent C++ hashtable due to cache locality for private state, while providing quite attractive coordination-free consistency.
What the world of data needs more of is continued development into novel indexing strategies/implementations. ElasticSearch, Postgres's GIN index on JSONB, MapReduce, graph databases.
I don't need another key value store...
There really aren't even that many DBMS that are KV (like redis) out there to handle it either. They are normally much more complicated (like adding SQL layer on top of it).
Indexing requires knowledge of a higher level data model. (Again, BerkeleyDB has built in support for secondary indexing, but last time I checked it was a quite braindead and slow implementation. Faster to build your own indices instead, using the other facilities provided.)
With that said, while a KV store has no logical data model to apply to index generation, it can at least provide primitives for you to construct your own indices. BerkeleyDB and LMDB do this.
Distribution with transaction support may require help from the storage engine (offering something resembling multi-phase commit). BerkeleyDB provides this already; LMDB will probably provide this in 1.0.
An argument could be made that the storage engine should be able to handle replication/distribution even without understanding the higher level/application data model. BerkeleyDB does this with page-level replication. IME this results in gratuitously verbose replication traffic, as every high level operation plus all of its dependent index updates etc. are replicated as low level disk/page offset operations. IMO it makes more sense to leave this to a higher layer because you can just replicate logical operations, and save a huge amount of network overhead.
As for the possible higher layers - antoncohen's response below gives a few examples. There are plenty of higher level DBMSs implemented on top of LMDB, providing replication, sharding, etc.
Along the same lines, a few newer codebases for distributed stores seem to be building with those delineations in mind. Another comment brought up Tidb/Tikv for example. Tikv iirc uses RocksDB as its local store.
Many are or can be used as key-value stores. MySQL actually has a memcached compatible KV store, using InnoDB for storage. Postgres has HStore. A lot of the distributed databases roughly fall into the category of KV stores: HBase, Riak, Cassandra, DynamoDB, etc.
This paper literally shows this statement is false. In what other circumstances could you provide one and two orders of magnitude performance improvement, and people would just shrug and say, "meh, I really don't think we need improved performance".
That said, I was mostly interested on how this compares to etcd, since the use case seems pretty similar. How does Anna scale out (differing data centers) compared to it being a bad idea with etcd for example.
I'm not sure I fully understand etcd's durability guarantees. When an operation is completed, does that mean that the data is durable on a single machine, and then becomes durable on other machines at a later point? If so, it seems like Anna could offer the same durability guarantees.
I think Anna's architecture could be more of a competitor to Cassandra, or DynamoDB as long as you only need casual consistency. The performance implications do seem pretty interesting.
Redis is in-memory only
Cloud Storage - Not sure, how we can use it outside of cloud vendors
RocksDB - Facebook just outsourced the engine to the community, where is the service which adds replication, clustering and network interface on top of it? I am sure, they use one internally, why is it not being open-sourced?
There is also badger but most of these only offer low-level operation.
Sorry, most of my developers are unable to consume them just like Redis.
Actually I would prefer such kind of database (just the engine).
1. Expose it to network via what ever framework you like. Thrift, Rest, grpc, ... You don't have to include different kind of libraries for each network service. I would love to connect to every network service (Redis, Elasticsearch, Cassandra, MySQL, ...) via a single framework (say grpc).
2. In most large scale scenarios, there is already some kind of log service (DistributedLog, NATS, Kafka, ...). Why not take benefit of that for replication? Isn't it great to separate the engine layer from replication layer? Currently we are doing double replication actually. Replicate data from master DB to slave DB. Then replicate the same data, from any DB to cache, search, ... components. The data is already there on log. Let everyone (slave DB as well as cache/search module) consume it. This is basically state machine replication idiom. PNUTS, Twitter K/V database, LinkedIn Espresso  (as well as Ambry which is their internal object store), ... use this approach for replication.
3. I would agree with that, they only support basic low level operations.
TiKV has no replication / sharding built in, that is actually handled by a Placement Driver (PD)
From the docs:
> TiKV is a component in the TiDB project, you must build and run it with TiDB and PD together.
local cache (in the form of the actor's mailbox) + background gossip
The usual restrictions (and increases in latency) still apply when you want to make sure that something's actually written (quoruming) after you've written it, from what I can tell.
Can someone explain to me why this is a step forward for the field -- I haven't yet read all the papers they linked to (including their ANNA papers), but this doens't seem to be one of those times where a bunch of disparate papers are combined into creating something truly groundbreaking?
I feel like I must be missing the point
I'm starting to think that the quorum strategy is something like a theoretical lower bound -- at least until someone brilliant figures out a way past it (or technology shifts in some gigantic way or something).
What have they really built: a purely in-memory KV store that doesn't support synchronous secondary writes for durability. So, any comparisons with ACID KV stores, either disk based (Cassandra, Mongo) or in-memory, are not apples-to-apples comparisons from the beginning. What could be production applications of such system, other than cache?
On their benchmarks: they don't really compare with state of the art.
Selection of competitors in the single-server, multi-core benchmark doesn't include systems like https://github.com/fastio/pedis. Also, they still use 100 millisecond granularity of gossip (within a single server!), while for all other compared systems corresponding metric could be evaluated as nearly 0 by construction, that gives Anna a huge edge.
In multi-node benchmark, they claim 10x over Cassandra. ScyllaDB (https://www.scylladb.com/) claims the same, while being ACID and linearizable, unlike Anna. Also, Anna achieves stronger consistency levels by holding off reads, that kills latency, given 100 millisecond gossip granularity. If it applies only to their multi-key consistency (Read Committed/Uncommitted) it's probably OK, because I suppose that there is no magic bullet that allows to preserve super low latencies and providing similar consistency in Scylla either. But if Anna needs to hold off reads for any of their claimed single-key consistency levels (all of which are weaker than linearizable), that's worse than Scylla. The authors of the paper didn't detail the algorithm for each consistency level.
Seems like the authors don't benchmark multi-node scalability of Anna on any consistency levels except the weakest, simple eventual consistency. It would be interesting to see if Anna scales as well on stronger consistency levels.
To me, the main outcome of this paper is another confirmation that shared-nothing, thread-per-core, message passing designs are beneficial in the modern computing environment. This is not new, however, see H-Store, Scylla/Seastar, Tarantool (https://github.com/tarantool/tarantool), Aeron (https://github.com/real-logic/aeron), Tempesta (https://github.com/tempesta-tech/tempesta), etc.
Novelty is the framework that generalizes thread/node scalability, different consistency levels reusing the same codebase, and having just a single knob - gossip granularity. Practical applications are limited. Certain techniques are probably going to be cherry-picked by systems such as Redis Cluster and In-Memory Data Grids.
really? such global consensus increases latency (network round trip plus fsync write), with a fully batched and pipelined concurrent design, when CPU cycles are being saturated in those benchmarks, why throughput is fundamentally limited by such increased latency?
At some point in a system, you may need a response to one request in order to generate the next request. At this point latency affects throughput.
Also, apparently, their "lattice composition" technique let's them push concurrency to a lower level, which would allow them to avoid the overhead of doing concurrency at a higher level. What I mean is, if you have to process items one-at-a-time in a replica, then to process more items at a time, you need more replicas. But each replica has overhead -- the details depend on what technology we're talking about, but there's overhead to each replica... e.g., to utilize 4 cores on your server you might host four replicas on that server... but now you have to maintain four memory spaces as well, which takes cycles, even though all you wanted was to make use of all your cores to process items. And it might not be four cores, but 18 or 36 or perhaps 1000s. At some point, the cost of the overhead is greater than amount of benefit.
Why so much focus on Key Value stores? That's the easy part of the problem.
I would like to know more about the interesting ones: secondary indexes, range scans, performance on mixed workloads, robustness, operational complexity.
There aren't enough good/fast/reliable ones
On a general purpose hardware/network adding features as you mentioned, tends to be seriously difficult and tremendously workload dependent. At some point you need help from hardware designers which itself limits the use cases of your system.
FPGA supported databases, In-network SQL processing, Server-less caches (an SSD directly connected to network without OS), Storage systems with mind blowing low energy consumption, ... are just bunch of absolutely amazing research that are being done today. however the limited public availability is badgering.
With mass public usability in-mind, a general purpose K/V system is the highest summit you can really achieve.