
Kudu as a More Flexible and Available Kafka-Style Queue - frew
http://blog.rodeo.io/2016/01/24/kudu-as-a-more-flexible-kafka.html?v2
======
tlipcon
Todd from the Apache Kudu (incubating) team here. I'll check this thread
throughout the day in case there are any questions (and try to check the
original post for comments as well).

~~~
nfa_backward
Does Kudu colocate data from different tables with equal keys? If not, is this
or a similar feature on the road map?

~~~
tlipcon
It doesn't yet. It's on our nebulous "we'd like to do this some time" roadmap,
but currently concentrating on some more basic stuff around stability and time
series features.

Of course this is a huge optimization for data warehousing applications, where
two co-partitioned tables can be joined without any network data transfer, and
in some cases could even use merge join instead of hash based strategies. But,
it's the usual time/scope/quality trinity, and we'd rather not compromise the
third element.

~~~
nfa_backward
Glad to hear this is at least being considered. The optimizations for data
warehousing you mentioned are my use case. I understand the it is a very
active project with a lot on the road map. It's a very cool project and I
follow you guys on
[http://gerrit.cloudera.org/#/q/status:open](http://gerrit.cloudera.org/#/q/status:open)

~~~
tlipcon
Also worth noting it's an open source project so if you're interested in
contributing in this area, we'd love to have you on board.

------
boredandroid
This post is a bit confused about how Kafka replication works. Replication in
Kafka is always synchronous in the sense that the cluster internally has a
strong notion of which messages are committed and no uncommitted message is
handed out to consumers. It is just that the client has the option of writing
to Kafka without blocking while the servers commit the message.

This is described in more detail here:
[http://www.confluent.io/blog/distributed-consensus-
reloaded-...](http://www.confluent.io/blog/distributed-consensus-reloaded-
apache-zookeeper-and-replication-in-kafka)

~~~
frew
Apologies if I misrepresented Kafka's replication. So the system described in
the blog post is the new 0.8.2+ stuff with min.insync.replicas taking over
required.acks? Will review.

~~~
boredandroid
Thanks for clarifying!

I think there are two separate things: 1\. Is there a principled notion of
when a write is "committed" and a leadership election algorithm that works
with this to ensure committed writes aren't lost as long as the failure
criteria are met? This has been true since we added replication to Kafka. 2\.
The second is what is the recovery behavior when you have no available
replicas. Previously we would aggressively elect any replica even if it had
incomplete data. Now that behavior is configurable.

There is a more detailed explanation here:
[http://blog.empathybox.com/post/62279088548/a-few-notes-
on-k...](http://blog.empathybox.com/post/62279088548/a-few-notes-on-kafka-and-
jepsen)

------
mpercy
Fred, really interesting article! Thanks for posting it.

For those in the Bay Area, if you'd like to hear more about Kudu, I'm giving a
talk about it in Palo Alto tomorrow (Wed) at the Big Data Application Meetup
at 6PM:
[http://www.meetup.com/BigDataApps/events/227191025/](http://www.meetup.com/BigDataApps/events/227191025/)

Also presenting are Julien Le Dem of Dremio on Apache Drill and James Taylor
of Salesforce on Apache Phoenix. Should be a really interesting evening!

(Don't mean to hijack the thread, just thought this might be relevant to some
folks. :)

Mike

------
bkeroack
I'd like to know more about read performance. My understanding is that when
you base your system on Raft, you have to choose one of two operating modes:

\- all reads go through the Raft state machine, which means that reading from
one node implies talking to at least a quorum of nodes. Consul is an example
of this behavior. It provides strong consistency but reduces the performance
gain from horizontal scaling. (for Consul at least it's been shown that heavy
read loads can cause leadership transitions as the master gets overwhelmed).

\- Reads are processed by the local node only. This is fast and scalable but
is basically "eventual consistency". A read immediately following a write is
not guaranteed to see the write. etcd works this way by default (unless you
pass ?quorum=true to your read request)

How does Kudu behave (or, rather, how is it _intended_ to behave)?

~~~
tlipcon
We have a few entries in the FAQ about this: [http://getkudu.io/faq.html#what-
is-kudus-consistency-model-i...](http://getkudu.io/faq.html#what-is-kudus-
consistency-model-is-kudu-a-cp-or-ap-system) as well as some more in-depth
docs:
[http://getkudu.io/docs/transaction_semantics.html](http://getkudu.io/docs/transaction_semantics.html)

Basically the short answer, though, is that reads are not done by quorum, but
the client can specify if they need "up to date" reads. If they do, currently,
we force reads to go to the leader, but have a roadmap for how to read from
other replicas (see the Spanner paper for a rough idea how that can work).

------
AYBABTME
Would be cool to have some sort of link to the Git repo on the `Kudu` page.
Googling "kudu git" gave me
[https://github.com/cloudera/kudu](https://github.com/cloudera/kudu).

------
f2f
I think the term the title was looking for is "Kafkaesque" :)

------
mstump
The following comment is incorrect.

"Unlike Cassandra, Kudu implements the Raft consensus algorithm to ensure full
consistency between replicas"

Cassandra does offer transactional consistency in the form of PAXOS that
achieves the same goal. Additionally PAXOS or RAFT based consistency models
aren't really necessary if you're willing to accept out of order processing
and adopt idempotent data models; you only need quorum based consistency.

I'm curious to see what your throughput was in TX/s not just bytes/s in order
to get a more apples to apples comparison.

~~~
KirinDave
> willing to accept out of order processing and adopt idempotent data models;

I think you meant _able_ to adopt those things. Those techniques are tenable
and even excellent in some domains, but completely impermissible in others.

------
lobster_johnson
I see that Kudu is marketed as being for analytics, but it is usable as a
general-purpose, primary data store?

~~~
jdcryans
Hi, I work on Apache Kudu (incubating).

Analytics is what Kudu was designed for (it's not just marketing), so some
tradeoffs were made. You'll get the biggest bang for the buck if your use
cases are heavy on inserts and big scans with selective filters. Other use
cases will perform ok or meh VS other storage engines. Also note that Kudu is
still pre-1.0.

~~~
lobster_johnson
Thanks for the explanation. We're currently using Elasticsearch for doing ad-
hoc aggregations (e.g. count per month filter by a few dimensions) in real
time. Indexes are some tens of million items per year per column; not much by
big-data standards, but enough that we need a whole bunch of nodes, and small
enough that even wide queries (like getting a total for the entire year)
finish in a couple of seconds. A large part of ES's speed comes from being
able to perform the aggregation function locally on each node that has the
shard, in parallel, rather than scanning the full data the data into the
client first. Is this a kind of use case where Kudu would perform well? How
does Kudu solve data locality? Would you actually do queries like these, or
would you precompute periodic rollups?

~~~
jdcryans
You seem to have a pretty typical use case that we're targeting. One thing to
understand about Kudu is that it doesn't run queries, it only stores the data.
You can use Impala or Drill, they'll figure out the locality and apply the
aggregations properly/push down the filters to Kudu.

Did you initially pick ES over systems like Impala because of the lack real
time inserts/updates when used with HDFS?

BTW, here's a presentation that might help you understand Kudu:
[http://www.slideshare.net/jdcryans/kudu-resolving-
transactio...](http://www.slideshare.net/jdcryans/kudu-resolving-
transactional-and-analytic-tradeoffs-in-hadoop)

~~~
lobster_johnson
Thanks, that's helpful. We picked ES for several reasons. We're not a Java
shop, and the Hadoop ecosystem is heavily biased towards JVM languages.

Secondly, ES is easy to deploy and manage. Being on the JVM, it admittedly has
a considerable RAM footprint, but at least it's just one daemon per node. With
anything related to Hadoop, it seems you have this cascade of JVM processes
that inevitably need management. And lots and lots of RAM.

Thirdly, as you point out it's easy to do real-time writes.

I do like the fact that Kudu is C++.

~~~
jdcryans
Well TBH you do get to pick which Hadoop-related components need to run,
HDFS's Datanode itself is happy with just a bit of RAM. I do understand the
concern though.

You're probably happy with what you have in prod but if you get some time to
try out Kudu feel free to drop by our Slack channel for a chat!
[http://getkudu.io/community.html](http://getkudu.io/community.html)

~~~
lobster_johnson
Thanks — I will definitely be keeping an eye on the project.

------
spo81rty
FYI, Kudu is also the name of Microsoft Azure's app management tool...

~~~
jdcryans
Yeah, the title in the article does says "Apache Kudu", maybe it was changed
later. What it should really be is "Apache Kudu (incubating)".

