Is there an infographic that organizes the Apache big data and ML projects based on their use case? There seem to be many projects with seemingly overlapping use cases.
I have definitely have shared that sentiment. However, Apache didn't create this project; it's from Cloudera originally, and they've got a pretty good track record. In this case, Kudu is more of a database/store on top of which analytics products (for example, this is a best-in-class engine for Impala, allowing fast read/write) can run. I wouldn't quite say it's like HDFS, which is more of a document store/distributed file-system.
Not just big-data projects but many other projects move from companies to the ASF. Apache Groovy was previously developed by 2 programmers and a project manager on VMware's dime, until they dropped it. But to say VMware "donated" Groovy to the ASF would be inaccurate though.
Actually no, there is currently no alternate project to HDFS. The only comparable compatible platform for the ecosystem is MapR-FS which is closed source. Kudu is an interesting experiment, c++ introduces a level of memory management and performance in a space where the JVM typically falls down.
Alternately here is a list of the storage engines for MySQL
InnoDB
MRG_MYISAM
BLACKHOLE
CSV
MEMORY
FEDERATED
ARCHIVE
MyISAM
Hadoop used to just have HDFS... now it also has Kudu
It's true that the project was initially developed at Cloudera, and employees continue to be the main driving force behind development. That said, we have committers and contributors from other companies as well. Roughly half the people who contributed a patch in the last 3 months have been non-Cloudera. Additionally we are very strict about doing all development upstream (eg with the first open source release we spent a lot of effort to open the entire development history going back to 2012, including JIRA, git, etc).
As for users, here are a couple examples off the top of my head who aren't currently paying for any support: