
The Revival of Great SQL Ideas - MarkusWinand
https://winand.at/newsletter/2018-12/consistency-cloud-jit
======
hodgesrm
> Nevertheless Amazon uses Oracle Database to run their business. Even though
> they don’t want to. ... In the end, they will still use SQL—just not the
> Oracle Database.

I really dislike articles that try to make it sound like smart companies like
Amazon and Facebook don't know what they are doing. Amazon has never said they
won't use SQL. They have an innovative set of AWS SQL offerings including RDS,
Aurora, and Redshift.

In fact, I would argue that Amazon Web Services is responsible as much or more
than any other vendor for 'The Revival of Great SQL Ideas' that this article
discusses. For evidence look at 'Amazon Redshift and the Case for Simpler Data
Warehouses' by Gupta et al. [0] Amazon has profoundly altered data management
not just in the SQL arena but across the board.

[0]
[https://event.cwi.nl/lsde/papers/p1917-gupta.pdf](https://event.cwi.nl/lsde/papers/p1917-gupta.pdf)
(Paragraph 3 of the abstract is my favorite part--it's a brilliant formulation
of the problem they are solving.)

------
billdybas
> The lesson we can learn from this story is the following: start with a
> generic database...SQL database are a good choice because they can do many
> tricks...The modern and successful architecture that is commonly used today
> is to have an SQL database that is sometimes surrounded by some one-trick
> ponies to take care of a few pain points.

Yup.

I like that we now have more of these one-trick ponies to choose from our
toolbox when necessary, when a relational database just won't cut it.

But, my biggest complaint around the NoSQL movement is the marketing pseudo-
hype it created. So many amateurs who don't understand database selection took
it as gospel and evangelized it across the web (eg. Mongo w/ Node).

It's hard to correct people's understanding when they learn things wrong the
first time, especially when there's a mountain of incorrect information they
can point to on the web ("These people can't all be wrong, can they?"
Well...).

~~~
hodgesrm
The quoted statement is not correct at least as far as analytics are
concerned. Two examples:

1.) Analytics in many enterprises increasingly feed off data lakes consisting
of enormous quantities of data in object storage. SQL has a part to play but
it's effectively computing aggregates and creating data marts off this deeper
pool of data. Data lake architecture is likely to be increasingly dominant
given the enormous growth in data volumes.

2.) Machine learning is transforming analytics. This looks like the next
feature likely to be absorbed into DBMS systems. SQL integration with ML is
likely to be a hot topic in future systems but a substantial fraction of ML
processing will remain outside the DBMS.

So _SQL_ is going to be present widely in most future solutions but that's not
the same as saying that a single relational DBMS architecture will solve all
problems. It's been clear for years that ACID-compliant RDBMS have a part in
this picture but it's just part.

Overall the article still seems to be fighting the SQL/NoSQL wars of the last
decade. A large part of the market is moving on to other use cases.

~~~
billdybas
Oh, yes, definitely.

Data warehousing and ML have different requirements and needs than your
typical N-tier web app. Even streaming event data warrants a different
solution. It comes back to knowing how to choose the right database for the
job.

The issue is around how these technologies are marketed – grandiose claims and
few practical use cases. Once the marketing material permeates the industry
and some part latches on, it becomes a self-reinforcing cycle. Blog posts,
books, and courses bring the information to the masses. Then companies start
to adopt the tools. Then they need to hire engineers who know those tools. So
more information gets published about them because that's what people want to
learn to get hired.

Many engineers today will turn to NoSQL for everything because of the past few
years of marketing hype (and acronym-driven-development), and that's quite a
shame.

~~~
billdybas
For those looking to understand how to choose the right database for the job,
I'd recommend first reading "Designing Data-Intensive Applications"
([https://dataintensive.net](https://dataintensive.net))

------
lixtra
I sometimes wish I could directly write a plan for my query instead of relying
on the db to come up with something that matches it (OLAP). The other 90%
percent I’m happy that it just works.

~~~
crazygringo
I agree 1,000% -- I've thought about writing an article on it.

There's a huge difference between one-time queries (let the planner figure it
out) versus production queries.

I've been bitten in the past by the SQL planner failing as it reaches some
threshold and suddenly stops using an index it was using before, etc., and
queries go from taking milliseconds to seconds. It's rare, but it happens, and
the results can be catastrophic -- usually because it starts ignoring an index
because it (wrongly) thinks the index would slow it down -- planners aren't
perfect.

I'd much rather specify in advance the query plan to handle what I knew would
be the eventual biggest-case scale of data: to _always_ use X index (forced,
not just hinted), and to deal, in advance, with whether or not sorting will
always be possible in memory or if it will need to be done in a swap file on
disk, and at what points filtering happens.

I actually don't care if this slows down my (already blazing fast) queries 10x
or even 100x when the database is small (e.g. by sorting on disk instead of in
memory, even when just 10 rows -- or using the index even when clearly
unnecessary with just 10 rows). It buys me the peace of mind that there will
be no discontinuities in performance in production, no surprises.

Of course this isn't for basic grab-a-row queries: I'm talking about more
complex queries with all the bells and whistles (subqueries, multiple joins,
filters, etc.) where the database's planner goes haywire.

~~~
evanweaver
This is FaunaDB’s strategy. Indexes and views are explicitly referenced and
thus every query is implicitly also its own query plan, with local
optimizations only. No discontinuous planning events.

------
guachesuedehack
Everything old is new again after everything new becomes old again

