When I'm wearing the data engineer hat, I start to get interested in some of the options that aren't even databases at all, such as storing the data in ORC files. Because they sometimes offer big performance advantages. The data intake pipelines I'm working on are batch-oriented, so they benefit little from how RDBMSes are optimized for random access to records, and maybe also ranges of records, but only at the cost of write performance if you're looking to filter on a natural key instead of a synthetic key. . . it gets thorny. ORC is less-than-stellar at selective ad-hoc queries, but I'm not doing selective ad-hoc queries against the internals of my data intake pipeline, and I will fight to the death to prevent the BI team from doing so.
When I'm wearing my data scientist hat, I've got no idea what my next query on the data is, so there's no indexing strategy that could possibly meet my needs. That kills one of the RDBMS's biggest value propositions. So let's just wing it with map-reduce and get on with life. And, TBH, SQL is just too declarative for the way my brain works when I'm in analyst mode - the semi-imperative workflow I get out of something like Spark SQL or R or Pandas feels more natural.