Hacker News new | past | comments | ask | show | jobs | submit login
Kudu as a More Flexible and Available Kafka-Style Queue (rodeo.io)
83 points by frew on Jan 26, 2016 | hide | past | web | favorite | 32 comments

Todd from the Apache Kudu (incubating) team here. I'll check this thread throughout the day in case there are any questions (and try to check the original post for comments as well).

Does Kudu colocate data from different tables with equal keys? If not, is this or a similar feature on the road map?

It doesn't yet. It's on our nebulous "we'd like to do this some time" roadmap, but currently concentrating on some more basic stuff around stability and time series features.

Of course this is a huge optimization for data warehousing applications, where two co-partitioned tables can be joined without any network data transfer, and in some cases could even use merge join instead of hash based strategies. But, it's the usual time/scope/quality trinity, and we'd rather not compromise the third element.

Glad to hear this is at least being considered. The optimizations for data warehousing you mentioned are my use case. I understand the it is a very active project with a lot on the road map. It's a very cool project and I follow you guys on http://gerrit.cloudera.org/#/q/status:open

Also worth noting it's an open source project so if you're interested in contributing in this area, we'd love to have you on board.

You can also join us on Slack here if that's more your style: http://getkudu-slack.herokuapp.com

Still haven't congratulated you guys. Kudu is what I always wanted in a datastore.

Thanks, Alex! Appreciate the kind words.

Seriously, everything we were trying to achieve with c5, and more. I just wish the c++ was more modern, but I know when the project started.

We've just updated to C++11 as of a couple weeks ago. Partially the issue was when it started, partially the issue is that we have to support older platforms and we have a C++ client library to worry about.

This post is a bit confused about how Kafka replication works. Replication in Kafka is always synchronous in the sense that the cluster internally has a strong notion of which messages are committed and no uncommitted message is handed out to consumers. It is just that the client has the option of writing to Kafka without blocking while the servers commit the message.

This is described in more detail here: http://www.confluent.io/blog/distributed-consensus-reloaded-...

Apologies if I misrepresented Kafka's replication. So the system described in the blog post is the new 0.8.2+ stuff with min.insync.replicas taking over required.acks? Will review.

Thanks for clarifying!

I think there are two separate things: 1. Is there a principled notion of when a write is "committed" and a leadership election algorithm that works with this to ensure committed writes aren't lost as long as the failure criteria are met? This has been true since we added replication to Kafka. 2. The second is what is the recovery behavior when you have no available replicas. Previously we would aggressively elect any replica even if it had incomplete data. Now that behavior is configurable.

There is a more detailed explanation here: http://blog.empathybox.com/post/62279088548/a-few-notes-on-k...

required.acks works together with min.insync.replicas.

Basically "required.acks" lets you choose between "no acks", "leader only" and "all in sync replicas", while min.insync.replicas lets you control what "all in sync replicas" actually mean.

Added a clarification to the relevant section. Apologies again for the confusion!

Fred, really interesting article! Thanks for posting it.

For those in the Bay Area, if you'd like to hear more about Kudu, I'm giving a talk about it in Palo Alto tomorrow (Wed) at the Big Data Application Meetup at 6PM: http://www.meetup.com/BigDataApps/events/227191025/

Also presenting are Julien Le Dem of Dremio on Apache Drill and James Taylor of Salesforce on Apache Phoenix. Should be a really interesting evening!

(Don't mean to hijack the thread, just thought this might be relevant to some folks. :)


I'd like to know more about read performance. My understanding is that when you base your system on Raft, you have to choose one of two operating modes:

- all reads go through the Raft state machine, which means that reading from one node implies talking to at least a quorum of nodes. Consul is an example of this behavior. It provides strong consistency but reduces the performance gain from horizontal scaling. (for Consul at least it's been shown that heavy read loads can cause leadership transitions as the master gets overwhelmed).

- Reads are processed by the local node only. This is fast and scalable but is basically "eventual consistency". A read immediately following a write is not guaranteed to see the write. etcd works this way by default (unless you pass ?quorum=true to your read request)

How does Kudu behave (or, rather, how is it intended to behave)?

We have a few entries in the FAQ about this: http://getkudu.io/faq.html#what-is-kudus-consistency-model-i... as well as some more in-depth docs: http://getkudu.io/docs/transaction_semantics.html

Basically the short answer, though, is that reads are not done by quorum, but the client can specify if they need "up to date" reads. If they do, currently, we force reads to go to the leader, but have a roadmap for how to read from other replicas (see the Spanner paper for a rough idea how that can work).

This is not totally true. There are ways to ensure a read from a follower sees the latest writes from the same client. You do this by sending the index of a client's write back to the client upon success and the client sends that index to a follower when reading. The follower awaits logs up to that index before servicing the read. Some Raft implementations do this, but clearly it can cause greater latency when switching between writes and reads and still only gets you sequential consistency and not linearizability.

Would be cool to have some sort of link to the Git repo on the `Kudu` page. Googling "kudu git" gave me https://github.com/cloudera/kudu.

I think the term the title was looking for is "Kafkaesque" :)

The following comment is incorrect.

"Unlike Cassandra, Kudu implements the Raft consensus algorithm to ensure full consistency between replicas"

Cassandra does offer transactional consistency in the form of PAXOS that achieves the same goal. Additionally PAXOS or RAFT based consistency models aren't really necessary if you're willing to accept out of order processing and adopt idempotent data models; you only need quorum based consistency.

I'm curious to see what your throughput was in TX/s not just bytes/s in order to get a more apples to apples comparison.

> willing to accept out of order processing and adopt idempotent data models;

I think you meant able to adopt those things. Those techniques are tenable and even excellent in some domains, but completely impermissible in others.

I see that Kudu is marketed as being for analytics, but it is usable as a general-purpose, primary data store?

Hi, I work on Apache Kudu (incubating).

Analytics is what Kudu was designed for (it's not just marketing), so some tradeoffs were made. You'll get the biggest bang for the buck if your use cases are heavy on inserts and big scans with selective filters. Other use cases will perform ok or meh VS other storage engines. Also note that Kudu is still pre-1.0.

Thanks for the explanation. We're currently using Elasticsearch for doing ad-hoc aggregations (e.g. count per month filter by a few dimensions) in real time. Indexes are some tens of million items per year per column; not much by big-data standards, but enough that we need a whole bunch of nodes, and small enough that even wide queries (like getting a total for the entire year) finish in a couple of seconds. A large part of ES's speed comes from being able to perform the aggregation function locally on each node that has the shard, in parallel, rather than scanning the full data the data into the client first. Is this a kind of use case where Kudu would perform well? How does Kudu solve data locality? Would you actually do queries like these, or would you precompute periodic rollups?

You seem to have a pretty typical use case that we're targeting. One thing to understand about Kudu is that it doesn't run queries, it only stores the data. You can use Impala or Drill, they'll figure out the locality and apply the aggregations properly/push down the filters to Kudu.

Did you initially pick ES over systems like Impala because of the lack real time inserts/updates when used with HDFS?

BTW, here's a presentation that might help you understand Kudu: http://www.slideshare.net/jdcryans/kudu-resolving-transactio...

Thanks, that's helpful. We picked ES for several reasons. We're not a Java shop, and the Hadoop ecosystem is heavily biased towards JVM languages.

Secondly, ES is easy to deploy and manage. Being on the JVM, it admittedly has a considerable RAM footprint, but at least it's just one daemon per node. With anything related to Hadoop, it seems you have this cascade of JVM processes that inevitably need management. And lots and lots of RAM.

Thirdly, as you point out it's easy to do real-time writes.

I do like the fact that Kudu is C++.

Well TBH you do get to pick which Hadoop-related components need to run, HDFS's Datanode itself is happy with just a bit of RAM. I do understand the concern though.

You're probably happy with what you have in prod but if you get some time to try out Kudu feel free to drop by our Slack channel for a chat! http://getkudu.io/community.html

Thanks — I will definitely be keeping an eye on the project.

FYI, Kudu is also the name of Microsoft Azure's app management tool...

Yeah, the title in the article does says "Apache Kudu", maybe it was changed later. What it should really be is "Apache Kudu (incubating)".

Applications are open for YC Winter 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact