

The Data Engineering Ecosystem: An Interactive Map - jecs321
http://insightdataengineering.com/blog/The-Data-Engineering-Ecosystem-An-Interactive-Map.html

======
jkestelyn
Very nice effort. A couple things missing though to make it truly up to date:

1\. In "Ingest", where's Flume? 2\. Where's "Interactive SQL" (eg Impala, and
for Presto)? 3\. Where's "Search" (Solr, ElasticSearch)?

~~~
ddrum001
Thanks, if you mouse-over the Batch and Database, you'll see more categories.
Presto is under Batch SQL and Elasticsearch is under Document Oriented.

We see Flume, Solr, and Impala a lot but decided to omit them to strike a
balance between including more technologies and overwhelming people who are
new to the field. Inevitably, we had to leave off many of our favorite
technologies.

------
bzz01
While all maps like this tend to make little practical sense since they
inevitably over-generalize and over-simplify things, I'd still like to point
out that they got "columnar" category quite wrong: neither HBase nor Cassandra
are columnar stores in a way this term is commonly understood.

HBase and Cassandra still store data in rows, however rows can be partitioned
into column families which may be stored separately. Columnar databases are
usually also relational (Vertica and Redshift) and support SQL or SQL-like
query language.

Anyway, I think regardless of how you define columnar, HBase and Redshift
shouldn't end up in the same category as they are quite different in a way
they work, throughput/latency and read/write balance and use cases.

~~~
ddrum001
You're correct that it's very difficult to balance between giving an overview
and over-simplifying. We intended this to be a starting point, knowing that
we'd have to make a trade-off between including more details and giving a
concise overview.

For NoSQL specifically, it's difficult to put every database in only a few
categories. We discussed that Cassandra and HBase are "maps of maps" in the
details, and we definitely didn't want to imply that they have the same model
or use-cases as Redshift. Perhaps our next iteration will explain the "column
family" more and include a separate category for the databases that were
inspired by the Big Table data model.

------
iblaine
No real time MPP databases like Redshift, Netezza, Aster...otherwise good
graph.

~~~
ddrum001
Thanks, if you mouse-over the databases box, you'll see Redshift under
Columnar - we do see a lot of Netezza and Aster as well.

------
sampathweb
Love the text overlay. It would be nice to also have project links in the
text.

~~~
ddrum001
Thanks, we'll definitely include links in our next iteration.

