Of course this is a huge optimization for data warehousing applications, where two co-partitioned tables can be joined without any network data transfer, and in some cases could even use merge join instead of hash based strategies. But, it's the usual time/scope/quality trinity, and we'd rather not compromise the third element.
This is described in more detail here: http://www.confluent.io/blog/distributed-consensus-reloaded-...
I think there are two separate things:
1. Is there a principled notion of when a write is "committed" and a leadership election algorithm that works with this to ensure committed writes aren't lost as long as the failure criteria are met? This has been true since we added replication to Kafka.
2. The second is what is the recovery behavior when you have no available replicas. Previously we would aggressively elect any replica even if it had incomplete data. Now that behavior is configurable.
There is a more detailed explanation here: http://blog.empathybox.com/post/62279088548/a-few-notes-on-k...
Basically "required.acks" lets you choose between "no acks", "leader only" and "all in sync replicas", while min.insync.replicas lets you control what "all in sync replicas" actually mean.
For those in the Bay Area, if you'd like to hear more about Kudu, I'm giving a talk about it in Palo Alto tomorrow (Wed) at the Big Data Application Meetup at 6PM: http://www.meetup.com/BigDataApps/events/227191025/
Also presenting are Julien Le Dem of Dremio on Apache Drill and James Taylor of Salesforce on Apache Phoenix. Should be a really interesting evening!
(Don't mean to hijack the thread, just thought this might be relevant to some folks. :)
- all reads go through the Raft state machine, which means that reading from one node implies talking to at least a quorum of nodes. Consul is an example of this behavior. It provides strong consistency but reduces the performance gain from horizontal scaling. (for Consul at least it's been shown that heavy read loads can cause leadership transitions as the master gets overwhelmed).
- Reads are processed by the local node only. This is fast and scalable but is basically "eventual consistency". A read immediately following a write is not guaranteed to see the write. etcd works this way by default (unless you pass ?quorum=true to your read request)
How does Kudu behave (or, rather, how is it intended to behave)?
Basically the short answer, though, is that reads are not done by quorum, but the client can specify if they need "up to date" reads. If they do, currently, we force reads to go to the leader, but have a roadmap for how to read from other replicas (see the Spanner paper for a rough idea how that can work).
"Unlike Cassandra, Kudu implements the Raft consensus algorithm to ensure full consistency between replicas"
Cassandra does offer transactional consistency in the form of PAXOS that achieves the same goal. Additionally PAXOS or RAFT based consistency models aren't really necessary if you're willing to accept out of order processing and adopt idempotent data models; you only need quorum based consistency.
I'm curious to see what your throughput was in TX/s not just bytes/s in order to get a more apples to apples comparison.
I think you meant able to adopt those things. Those techniques are tenable and even excellent in some domains, but completely impermissible in others.
Analytics is what Kudu was designed for (it's not just marketing), so some tradeoffs were made. You'll get the biggest bang for the buck if your use cases are heavy on inserts and big scans with selective filters. Other use cases will perform ok or meh VS other storage engines. Also note that Kudu is still pre-1.0.
Did you initially pick ES over systems like Impala because of the lack real time inserts/updates when used with HDFS?
BTW, here's a presentation that might help you understand Kudu: http://www.slideshare.net/jdcryans/kudu-resolving-transactio...
Secondly, ES is easy to deploy and manage. Being on the JVM, it admittedly has a considerable RAM footprint, but at least it's just one daemon per node. With anything related to Hadoop, it seems you have this cascade of JVM processes that inevitably need management. And lots and lots of RAM.
Thirdly, as you point out it's easy to do real-time writes.
I do like the fact that Kudu is C++.
You're probably happy with what you have in prod but if you get some time to try out Kudu feel free to drop by our Slack channel for a chat! http://getkudu.io/community.html