Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Apache Kudu – Fast Analytics on Fast Data (apache.org)
80 points by espeed on Jan 19, 2017 | hide | past | favorite | 15 comments


Is there an infographic that organizes the Apache big data and ML projects based on their use case? There seem to be many projects with seemingly overlapping use cases.


Hortonworks has some infographic of Apache projects that they support http://hortonworks.com/apache/

And here's an attempt to keep track of all Hadoop-related projects https://hadoopecosystemtable.github.io


Oh god, /another/ Apache fast data analytics platform.

Apache is like a powerhouse in creating these short-lived projects that force companies to get the most expensive, esoteric talents.


I have definitely have shared that sentiment. However, Apache didn't create this project; it's from Cloudera originally, and they've got a pretty good track record. In this case, Kudu is more of a database/store on top of which analytics products (for example, this is a best-in-class engine for Impala, allowing fast read/write) can run. I wouldn't quite say it's like HDFS, which is more of a document store/distributed file-system.


All big-data related projects were donated to Apache by companies and not developed by ASF and they're still active https://projects.apache.org/projects.html?category


Not just big-data projects but many other projects move from companies to the ASF. Apache Groovy was previously developed by 2 programmers and a project manager on VMware's dime, until they dropped it. But to say VMware "donated" Groovy to the ASF would be inaccurate though.


kudu is meant to be a file system replacement for HDFS

kudu::hdfs

as

spark::pig

Additionally unlike most pieces of the Hadoop ecosystem it is implemented in pure c++ instead of java.


Fine. But all the other myriad Apache projects in this space were also different! better! The sentiment in the previous post still applies.


Actually no, there is currently no alternate project to HDFS. The only comparable compatible platform for the ecosystem is MapR-FS which is closed source. Kudu is an interesting experiment, c++ introduces a level of memory management and performance in a space where the JVM typically falls down.

Alternately here is a list of the storage engines for MySQL

InnoDB

MRG_MYISAM

BLACKHOLE

CSV

MEMORY

FEDERATED

ARCHIVE

MyISAM

Hadoop used to just have HDFS... now it also has Kudu


That's not a fair comparison


Todd Lipcon (Kudu Founder)'s talk on Kudu - New Hadoop Storage for Fast Analytics on Fast Data...

https://www.youtube.com/watch?v=32zV7-I1JaM


As far as I can tell, Kudu is dominated by Cloudera. They open sourced it as a marketing strategy.

Is anyone who isn't paying for CDH using this?


Apache Kudu project founder here:

It's true that the project was initially developed at Cloudera, and employees continue to be the main driving force behind development. That said, we have committers and contributors from other companies as well. Roughly half the people who contributed a patch in the last 3 months have been non-Cloudera. Additionally we are very strict about doing all development upstream (eg with the first open source release we spent a lot of effort to open the entire development history going back to 2012, including JIRA, git, etc).

As for users, here are a couple examples off the top of my head who aren't currently paying for any support:

- Xiaomi (world's 4th largest smartphone maker) collects ~2TB/day of event data from >5million phones into a cluster which simultaneously runs analytics workloads (SQL, Spark, etc) - CERN is looking at using Kudu to store high energy physics experiment data from the ATLAS detector at the LHC. You can find some code at https://gitlab.cern.ch/zbaranow/kudu-atlas-eventindex and a poster here: https://indico.cern.ch/event/505613/contributions/2230964/at...

(of course lots more too whose names I dont have permission to mention)

Feel free to join our slack if you're interested in chatting with more - usually plenty of people online here: https://getkudu-slack.herokuapp.com

-Todd


The Hail project at the Broad Institute makes use of Kudu I believe.

https://hail.is

Not sure if that group is paying for CDH, but this tool is definitely being built for others to use.


For those interested in a technical overview, http://kudu.apache.org/kudu.pdf is our academic-style (but not submitted to any journal) paper.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: