
A one-size-fits-all database doesn't fit anyone - rbanffy
https://www.allthingsdistributed.com/2018/06/purpose-built-databases-in-aws.html
======
okket
A one-size-fits-all database like PostgreSQL fits most. Moving to a
specialised solution is always an option, if necessary. Starting with
specialised solutions is a bad idea that will limit flexibility needlessly.

~~~
giancarlostoro
I mean MongoDB is schema flexible, there's some cases where data is so dynamic
it just makes sense to use something like MongoDB. I do agree, PostgreSQL (or
if someone's more familiar with MySQL that's fine too - use what's most
effective for you and your team) to start is a good route.

~~~
icebraining
PostgreSQL using JSON fields is also fully flexible.

------
niftich
This sort of post was most useful in the days when the term 'NoSQL' was being
thrown around as the silver bullet that would revolutionize one's business,
and there was a significant knowledge gap between the domain experts -- who
quite often weren't the promoters -- and everyone else. While it's still
plausible that someone obtains this information for the first time right now,
today the mystique around the term 'NoSQL' has more muted, the marketing has
become more factual, and a fair number of people understand the different
types of data paradigms (KV, document, graph, timeseries) that the newer
offerings provide.

As it stands, this post is a brief comparison sheet of AWS's datastore
offerings, with bite-sized anecdotes that hint at a valuable usecase. It
doesn't make a particularly strong case for the necessities, justifications,
and tradeoffs of each particular type of paradigm or concrete offering, but is
instead a brief content marketing piece with just enough practicality to
justify its existence and make its point.

~~~
unclebucknasty
What's weird is that we create a tech (say, NoSQL) to address the shortcomings
of another tech (say, SQL). We then call it a win and give it this strange
grace period, during which we don't honestly assess the shortcomings of the
_new_ tech.

Worse, we actually declare the _good_ parts (say, ACID transactions) of the
old tech unnecessary and even dub it a feature that they are absent in the new
tech.

Finally, the grace period expires and everyone seems to get the memo at the
same time. The response is then "This is nuts. What the hell are we doing?"

The old tech is then restored to its rightful place and it's on to the next
tech.

~~~
CaptainZapp
The ironic part to me from the No SQL crowd was:

 _SQL Bad! SQL Slow! SQL terrible overhead!_

SQL is just an abstraction layer. What makes relational databases "slow" is
actually the overhead to implement ACID and implictly guarantee consistency
and integrity _at all times_.

When Facbook losses a dozen status updates that will piss off a few users.
When a bank loses a single transaction and depending on the value and
importance of that transaction (I.e. mainting a position in FX, with deals
that can go into the billions) this can literally kill the bank.

Besides: relational databases (which I'd wager are still the backbone of > 90%
of businesses) are definitly not slow. And very few companies have such insane
requirements like Google, or Facebook.

Edit: Wrong word that didn't make sense

~~~
zzzcpan
Banks do lose transactions and make plenty of other mistakes that result in
sudden appearance or disappearance of money from people's accounts. They are
also eventually consistent, somewhat strongly. And are the ones who can in
fact drop ACID pretty much completely in favor of proper strong eventual
consistency with CRDTs and stuff, with all that "consistency and integrity at
all times".

What makes things slow with interactive transactions in distributed
environments is explicit tradeoff of latency and availability for consistency.
It just can't be done in bounded time [1]. But that's not the reason
traditional databases are slow, they do not actually make this tradeoff in
most deployments and are assuming you are ok losing some consistency due to
network problems (yeah, so much for that ACID guarantees, you really need
distributed algorithms to guarantee consistency if you talk to a database over
an asynchronous network, like ethernet).

They are slow because for performance SQL is just a bad leaky abstraction and
arguing that it can be fast, is like arguing for a sufficiently smart compiler
[2] that can turn your high level abstraction into the fastest possible
opcode.

[1]
[https://groups.csail.mit.edu/tds/papers/Lynch/jacm85.pdf](https://groups.csail.mit.edu/tds/papers/Lynch/jacm85.pdf)

[2]
[http://wiki.c2.com/?SufficientlySmartCompiler](http://wiki.c2.com/?SufficientlySmartCompiler)

------
DoubleGlazing
Having so many database choices is something of an embarrassment of riches. So
many options, but also so much stress for the software architect who hopes to
make the right choices. Its impossible for someone to be so familiar with each
option so as to anticipate future issues that may arise because of a missing
piece of functionality or other strange quirk. Something made worse when you
go down the proprietary route.

I prefer to stick with one single reference data store, be it MySQL, Postgres,
SQL Server or whatever. If we need to add some kind of search optimised
database or big data analysis database they should replicate from the
reference data store and be treated as non-mission critical add-ons.

~~~
sokoloff
There are cases where that strategy doesn't work effectively (IMO). Consider a
social graph or purchase history relationships in a graph database. You
_could_ restrict your engineering team to only make entries in a (relational)
reference data store and then express those relationships in a graph DB for
querying.

That is not likely to give you the same pace of development as a competitor
who chooses to use a graph DB natively while your team goes through a labor-
intensive translation or impedance-matching process for each change. If
they're able to outpace you in development/innovation, you are more likely to
lose in the market.

~~~
golergka
> If they're able to outpace you in development/innovation, you are more
> likely to lose in the market.

These things always depend on business requirements. What relational database
offers, above all others, is data consistency: if used properly, it's an
incredibly useful safety net that prevents you from errors that corrupt user's
data. Transactions and especially proper schema can catch a lot of bugs that
could otherwise wreak havock.

So, it's a trade-off: how bad, from a business perspective, would some data
corruption be? If you're Tinder, mixing up profile information or deleting a
match is probably not that bad. If you're dealing with other people's money,
on the other hand, it could destroy your entire business.

~~~
diroussel
Do relational databases really give you consistency? They can do if you are
careful how you write your app. And many people don't use serialisable
transactions because of the implications.

Also relation databases enforce their version of consistency. What if I only
want consistency within one customer's data but not all customers?

I could have two inserts for operations on two different customers fail
because of a deadlock on page slips in a secondary index. I didn't need that
consistency but the database didn't know that.

------
andrewstuart
I've tried lots and lots of database of all sorts.

I always hit come sort of limitation and come back to Postgres.

One day I'll just stop betting against Postgres, but I really do want to give
some of those graph databases a try just for fun.

~~~
WA
I mostly used MySQL. But for my app, I now use CouchDB. It has its
limitations, but the automatic sync between devices is pure gold. I haven't
written a single line of sync code and _it just works_.

It baffles me a bit that CouchDB isn't more popular and that people rather use
MongoDB for NoSQL, which doesn't come with the same syncing capabilities.

~~~
StavrosK
> It baffles me a bit that CouchDB isn't more popular

Agreed, it should be.

> and that people rather use MongoDB for NoSQL

I don't see why people would rather use MongoDB for anything, but that's just
me.

> which doesn't come with the same syncing capabilities.

This is the main reason. CouchDB is amazing if you need to sync, but, if you
don't, you may be better off using something else (like Postgres).

How does CouchDB handle conflicts, by the way?

~~~
xj9
[http://docs.couchdb.org/en/2.0.0/replication/conflicts.html](http://docs.couchdb.org/en/2.0.0/replication/conflicts.html)

------
capkutay
What's new is old again. Michael Stonebraker had a paper about this in 2007
[0]. Over that time a slew of products came into the market that claimed to do
it all. JSON, SQL, ACID transactions, Streaming, time-series, Full-text
search, batch processing, ETL etc. Things that don't even fit in the same
bucket. For awhile companies were positioning Hadoop to do all of it

Seems like through every technology hype cycle, the same reminders need to be
written and re-circulated again.

0:
[https://cs.brown.edu/~ugur/fits_all.pdf](https://cs.brown.edu/~ugur/fits_all.pdf)

~~~
gaius
_a slew of products came into the market that claimed to do it all. JSON, SQL,
ACID transactions, Streaming, time-series, Full-text search, batch processing,
ETL etc_

Stonebraker’s own Postgres does all of that.

The article is just an infomercial for Amazon’s (expensive, proprietary)
offerings.

------
jchanimal
I wrote this blog espousing the opposite view, that you are better off with a
unified database, here: [https://blog.fauna.com/unifying-relational-document-
graph-an...](https://blog.fauna.com/unifying-relational-document-graph-and-
temporal-data-models)

And followed up about multi-cloud here: [https://blog.fauna.com/survive-cloud-
vendor-crashes-with-net...](https://blog.fauna.com/survive-cloud-vendor-
crashes-with-netlify-and-faunadb)

------
mamcx
Sadly it talk against "Relational databases" instead of "RDBMS
implementations".

The relational model is incredible powerfull. _Maybe_ graphs need something
seriously specializes (and indexes, of course) but the relational model do
fine:

\- Relational \- Key-value \- Document \- In-memory \- Search

In fact, a index (that is the thing that need to be fit for the case) can be
modeled as key-value, so, as relational.

Relational won agains the NoSql in the past because is MORE powerfull,
flexible.

\---

What need to change, is the specifics of the implementations. And yet,
relational adapt fairly easy.

In fact, just look how many stores add a sql-layer on top. Ironically, SQL is
what need a easy makeovers, no the relational model....

~~~
qop
I think there are times when the graph model shines.

[https://theburningmonk.com/2015/04/modelling-game-economy-
wi...](https://theburningmonk.com/2015/04/modelling-game-economy-with-neo4j/)

I read this article I found on HN a few years ago, and for some reason it came
to my mind just now.

The neat thing about the graph model of course is being more immediately
expressive to the query-er, and that takes cognitive load away from that
person.

I'll never say a bad thing about SQL or the relational model. I completely
agree that it's superior in terms of flexibility, performance (almost always),
modelling (usually), etc but I guess there are times where you have a weird
project and it's just I guess the convenience and ease of using Neo4j and a
few cypher queries and then it's over.

In the case of the article I linked, there is also the benefit of having a
convenient way to see how changes echo throughout the graph, something that is
more involved to do with tables and joins and all that. I'm not a DBA, I've
really only ever dabbled, but if I got dropped into that situation and DIDN'T
have a graph model, I'd have no clue where to even start. Maybe I'd start
trying to walk tables and build graphs out from each item type to each other
item type. That'd be gross.

Of course, that's probably not a problem scenario that comes up very often.
Most of the times I've interacted with databases in general I've been just
dealing with sanitizing params, getting HTTPS configurations configurated,
just boring infosec stuff. That problem is probably very rare in the realm of
"problems that graph seems to have an advantage"

Anyways, yeah. Relational, awesome. Graphs, sometimes convenient.

------
dagenix
I was expecting some sort of interesting discussion of different databases.
What I got was an ad for Amazon's products. And I found it quite off-putting
that the post seemed to go way out of it's way to avoid even acknowleding
databases not being sold as Amazon services.

------
phn
Well, it generally fits until it doesn't, and that's OK.

If your scale is small enough, fiddling with specialized solutions is just
premature optimization.

PostgreSQL will take you a long way before you need to look somewhere else.

~~~
zzzcpan
High availability would be the most common case when you need to look
somewhere else. And it's a serious distributed systems problem that requires
pretty big architectural and cultural changes, you can't just optimize for it
later.

~~~
phn
Yep, but you can "buy" HA from any DB provider (Amazon RDS, compose, etc.)
until you're big enough to dedicate resources to solve that particular
problem.

My comment doesn't necessarily contradict the article. I only mean to point
out that a general purpose "non-optimized" thing still fit a whole lot of
companies out there, depending on their scale.

EDIT: Although I concede that yeah some DB tech makes it easier to configure
HA systems, if you can live with the downsides.

------
WaxProlix
Is there a matrix out there with the various needs that one might have of a
DB/storage solution (consistency, throughput, sharding, replication, etc) with
products associated with each of those sets of requirements? I feel like for a
lot of what I do on a daily basis I _do_ end up using the same old one-size-
fits-all approaches, and as those scale we feel the pain -- but the product
search and vetting process is tough and time consuming to even begin. Would be
nice to be able to narrow options down to a smaller search space at least.

------
ianamartin
One thing that strikes me about this article is the examples of companies that
move from one system to another (whatever that move entails) and achieve big
performance wins.

We see long articles about this very often here. Sometimes it's moving from
relational to non, sometimes it's moving back. But what really throws me for a
loop is that so many of these sites are so mind-bogglingly, face-punchingly
_slow_. All of Atlassian's web properties are stupidly slow. AirBNB (mentioned
in this particular article) also--painfully slow. Github--slow. Reddit--
somewhat slow almost always. Twitter--slow. Facebook is usually the exception.

I realize that sounds like maybe I'm just on a slow connection, but I'm not. I
don't know where the time is being eaten up if it's not in the database. But I
feel bad for all the people working on db performance when the end result is
so bad.

I don't understand how websites whose only purpose in life is to serve text
and images as fast as possible can be so slow and that the companies that make
them can find this acceptable. No database technology in the world can make up
for bad product decisions and companies that don't consider speed a feature.

------
donw
Every use case listed on this article followed the following nominal pattern:

(1) Start with a relational database.

(2) Build product until you have a deep understanding of your market and their
needs.

(3) Move to a specialized data store.

For established products, or well-researched use-cases, sure, pick a
specialized data store. It will probably serve your needs better.

If you don't yet have product-market fit, use a relational database. It will
give you far greater flexibility when you discover that "what we thought the
market wanted" is different from "what actually made the right numbers go in
the right direction".

~~~
et1337
Perfect example is that article about the Facebook messenger database
migration earlier today. Messenger transitioned from an email-like system to
an instant messenger.

------
techno_modus
Michael Stonebraker, Uğur Çetintemel, "One Size Fits All": An Idea Whose Time
Has Come and Gone:
[https://cs.brown.edu/~ugur/fits_all.pdf](https://cs.brown.edu/~ugur/fits_all.pdf)

------
HappyFapMachine
Oh, wow, another advertisement. I don't know why, but that instantly
disregards any meaningful point OP wanted to say...

~~~
amelius
[https://en.wikipedia.org/wiki/Ad_hominem](https://en.wikipedia.org/wiki/Ad_hominem)

> Ad hominem (Latin for "to the man" or "to the person"), short for argumentum
> ad hominem, is a fallacious argumentative strategy whereby genuine
> discussion of the topic at hand is avoided by instead attacking the
> character, motive, or other attribute of the person making the argument, or
> persons associated with the argument, rather than attacking the substance of
> the argument itself.

~~~
HappyFapMachine
Not applicable as a fallacy here, questioning the style and the motive is
reasonable with tech blog post mixed with advertising.

I might have added "for me" at the very end, so it would not sound general and
was only describing my personal vibe. If this was a very interesting article
about the importance of hydration and was full of references to coca cola
products, I would feel the same.

------
amelius
Does anybody have a complete list of possible requirements/features for
databases?

------
mmckeen
It never fit.....

