

Ask HN: What made SQL databases so popular? - jmilinion

I propose a persistent storage solution for ANY idea to a traditional technical community.  Immediately, a SQL relational database is proposed as the solution.  I suggest other solutions.  I'm sad now because everyone is telling me that I MUST implement it in SQL otherwise I'm wasting my time.  According to them, once SQL stops handling the solution, then can I look at the other "eccentric" solutions.<p>I'm trying to figure out how did SQL get to be the de facto solution to every single persistent data problem out there?  Data can be stored in so many ways but a SQL relational database is pushed by so many people that their voice can be overwhelming at times.  Why is this?<p>Added Note: I feel sad for the SQL language since it is one of the few ways many programmers learn about declarative programming.  With that being said, this SQL language is different from the more commonly used form of "SQL" which is the nickname people call the well known SQL "relational" databases
======
buster
Quite honestly i often do not understand the extreme focus HN puts on NoSQL
databases, as if it would be the new solution that fits every problem. The
point is that SQL solves a lot of problems rather well and every NoSQL
database has its own drawbacks. In the end you have a lot of NoSQL databases
that solve some specific problems particularly well but have major drawbacks
in other situations. SQL databases tend to fit a lot of solutions quite well,
they are tested, stable, do transactions, can even scale and (for me) most
important: SQL is a standard, the query language is well understood and
documented, it is relatively easy to switch between SQL databases. But it's
for practical purposes impossible to switch between a given set of NoSQL
databases, let alone you need to learn a lot of new things or need another set
of developers. Recent example in my project: I had a problem that did fit
particularly well into Redis (small, self contained writes) but i also needed
good querying capabilites. So after fiddling around with Redis and trying to
make it work i had to abandon Redis. Next choice would have been: MongoDB or
MySQL. The project already uses a MySQL server and i am easily able to replace
MySQL with SQLite, Postgres or Oracle by using an ORM (SQLAlchemy). So the
choice was easy. MongoDB would have been the worse choice but a possibility.

I'd wish someone would come up with a common query language that could be
reused on different NoSQL databases, that would make NoSQL much more an
option. UnSQL[1] looks abandoned :(

So after all: Every problem has its solutions, every program a good fit of a
database. Either NoSQL or SQL, don't just point to some NoSQL database because
it is the current buzzword, but ONLY because it can solve a problem better
then a given SQL database. Then it doesn't matter if you call it SQL or NoSQL.
Your NoSQL solution simply needs to redeem the benefits of SQL (standardized
query language, transactions) with other benefits.

[1] <http://www.unqlspec.org/display/UnQL/Home>

~~~
jpalomaki
I think the same applies to quite many new innovations on our field. They look
interesting, because they make things simple. Things that used to be very
complicated.

The problem is that in some cases there are reasons why the things are
complicated. Quite often it is easy to dismiss these issues during the initial
crush with the new technology. "Sure I can live without transactions, why
would I need to enforce schema for my objects, who needs type information in
language"

Once the initial crush is over, people start slowly adding features to make
the shiny-new technology look more like the old stuff. Enforce JSON schema for
Mongo, add lightweight type information for Javascript functions etc. Usually
the thinking goes we are just adding these few good bits of structure from the
Old Technology, surely we are not going to replicate the bloat and complexity.

------
justinsb
"Democracy is the worst form of government except all those other forms that
have been tried from time to time."

Relational DBs & SQL aren't perfect, but they've proven better than 30 years
of attempts to replace them. I think the burden is (rightly) on you to explain
why your alternative is better. There are applications where other approaches
are appropriate (e.g. web search); if you can't explain why something is
better solved using a non-relational system, then perhaps it isn't.

------
indymike
Short answer: SQL databases were easier to use for developers and non-
developers than other client server relational database systems in the late
80s and early 90s (dbase derivatives, btrieve and ctree were common then).
Tools like Btrieve and ctree were largely libraries, and really didn't have
the ad-hoc query capability that SQL has. Dbase suffered from incompatible
implementations... so an application written on one Dbase clone wouldn't work
exactly right in another. Old school client-server databases like Btrieve and
ctree were extremely tightly coupled, and there was little separation of
concerns between storage and logic. Lots of messy, messy code... and writing
the storage and retrieval parts of your application were non-trivial. Since
end users had no way to do their own queries (outside Dbase), most reporting
was written in C, pascal or even BASIC. There were a few reporting tools like
Crystal Reports that tried to solve the reporting problem, but it was often
just as easy to roll the report in C...

When the internet took off, you had MS-SQL server on Windows and several open
source SQL servers on Linux (MySQL, mSQL and Postgres) that freed web
developers from having to write lots of custom data storage/retrieval code. A
singe SELECT or UPDATE command did what 150-200 lines of C could do. It also
meant that web apps could go to market without advanced reporting capabilities
or CRUD for detailed configurations because you could always just roll an SQL
query to deal with that later.

SO, here is what you are up against in selling noSQL vs SQL:

* Old developers who lived through the 80s and 90s database hell and see SQL as the thing that freed us.

* An extremely mature toolchain vs. new applications without the toys needed by non-developers (reporting, ad-hoc query for end users, integration with spreadsheets, etc...)

* Familiarity with the query language. Lots of devs know SQL

* Risk of something new.

* Fear of having your job turn into writing reports constantly because no one else in the company can.

SO if you want to be persuasive in selling a noSQL solution:

* Define the problem clearly and be prepared to show why Mongo, Couch or whatever is the right solution.

* Avoid arguing that SQL isn't relational. You might be right, but Oracle, IBM, Microsoft, Ashton-Tate, Sybase, Progress and everyone else with an SQL product has outspent you on marketing.

* Be ready to deal with questions about reports, ad-hoc queries, stored procedures and security. Each of these issues is solved and documented in mature SQL based systems.

* Giving Sally in accounting access to a unix command line is not going to be as well recieved as giving her access to Excel, Access, MS SQL Management Studio or Navicat.

------
omarqureshi
In no particular order:

1\. Transactions are not implemented in most of the "eccentric" solutions

2\. SQL solutions are more battle tested.

3\. The relational model fits most solutions quite well

Where the relational model doesn't fit or transactions are not required, then
it may be worth looking at other solutions.

Polyglot persistence is also another moving part too, it needs time, expertise
and infrastructure to deal with, so even if a non relational model makes sense
and there is currently no need for transactions. Who knows when transactions
would be a good idea?

~~~
duaneb
If it doesn't have transactions it's not a database.

------
antirez
SQL is popular because for many classes of problems it _is_ awesome and uses a
very powerful abstraction (tables) and query language. Most implementations
are also very robust and able to provide very interesting guarantees (like
ACID).

The only problem of SQL was that the computer industry was trying to solve
everything with SQL databases, that is not a good idea.

It's good that there are other databases now, but I don't think SQL is going
to face away. For instance many NoSQL databases like MongoDB, RethinkDB, may
well be SQL databases with different implementation tradeoffs but just picked
a different query language because... well because... I don't know why
honestly. Not that's a problem but the model is very very similar to objects
with fields (tables) I can query.

------
benologist
It's the de facto solution because for most use cases it is a valid solution
and for many it is the best solution.

If it wasn't developers would leave it in the past like any other obsolete
technology. _Especially_ today technology competes on merit before marketing
or anything else.

------
emin_gun_sirer
My group developed the latest, fastest NoSQL system on the scene
(<http://hyperdex.org>), one that provides ACID transactions as well as a ton
of other exciting features like scalability, consistency and fault-tolerance.

Yet when someone asks this question, I usually respond with "try a traditional
RDBMS first."

The simple reason for this is that there is additional, implicit information
in the question: the fact that someone is asking it indicates that they are
not up-to-date on one of the most exciting revolutions taking place in data
management. Such users will require extensive hand-holding, and there are a
lot of SQL-related resources out there that can help them with this.

But there is a twist: what people are doing is kind of like how some religions
have to turn you away before you become a true convert. Will an RDBMS user be
completely happy? Perhaps, but if the application is at all demanding, they
will wake up in the middle of the night because of a wedged DB, running out of
internal table space. Or have to frantically hire DBAs to "optimize" their
queries. Or hand money hand over fist to Oracle. Or discover that an AWK
script is faster than their multi-million dollar RDBMS installation.

So it's not like people are turning you away from NoSQL altogether -- you will
end up with a NoSQL system eventually, they're just making sure that you do
this for the right reasons.

------
davidad_
I recommend this book: <http://amzn.to/13NgjU9> (affiliate link), which I
bought for inspiration on how to build my own database from scratch and
finally put down, after reading several chapters, with a reasoned appreciation
for the way things are.

To simplify, the SQL data model exquisitely balances:

* expressivity - queries involving both GROUP BYs and subqueries, which I've needed more than once, are challenging at best to translate into Mongo's query model, which is one of the most expressive outside SQL

* speed - as long as your query can run on a single machine, SQL query planners are the best, period. Other models tend to be more horizontally scalable, but this was not a priority for most of database history, nor for most real-world situations.

* compactness - the on-disk overhead of a SQL database is fairly limited, about 40-50% in practice. Obviously, a flat file has even less overhead (<10%, generally), but some non-SQL systems consume storage willy-nilly (it's not uncommon to see XML overheads surpass 500%, and Mongo overhead hovers around 90-100%).

* robustness - in the SQL standard, "undefined behavior" is kept to an absolute minimum. In particular, transactions are invaluable anti-Heisenbugs, but cascading rules, sanity constraints, and fixed schemas also reduce production gotchas (at the cost of development agility, of course).

~~~
twic
If i think about what scares me most about using a noSQL database, it's the
loss of expressivity.

I believe noSQL databases can be fast, compact, and well-behaved. But SQL is a
phenomenal query language; it anticipates Alan Kay's (alleged) maxim that
"simple things should be simple, complex things should be possible". Not only
does it make complex things possible, but modern query planners make a decent
fist of making them fast, too. NoSQL databases either don't support rich
queries at all (eg Riak, which is in all other respects rather wonderful), or
have substantially less power (eg Mongo). Even databases which are queried
with map-reduce operations using functions written in real programming
languages (eg CouchDB) are lacking compared to SQL, due to the lack of
anything like joins, subqueries and so on.

(The only kinds of noSQL databases that have something approaching SQL's power
are graph databases (eg neo4j, whose Cypher query language is really rather
nice), some object databases (eg ObjectDB, which supports JPQL), and some of
the pre-SQL options like Pick databases. However, these are rarely the ones
which get the attention, and it's notable that their data models are actually
not far removed from the relational model.)

The reason this flexibility matters is because it supports change in the
future. I might be storing and querying data for some purpose now, but it's
possible that a year down the line, i will want to do something entirely
different with it - usually yawn-inducing things like reporting and auditing,
sometimes just implementing new features that look at the data in a different
way. The expressiveness of SQL lets me be confident that i will be able to do
that with my relational database. With a noSQL solution, i run a substantial
risk of not being able to do it, and so being forced to export my data into
another store to do it, with all the headaches that that entails.

------
atsaloli
You may find
[http://en.wikipedia.org/wiki/Database_management_system#Hist...](http://en.wikipedia.org/wiki/Database_management_system#History)
informative

------
RyanZAG
Here are some good reasons for it:
[http://stackoverflow.com/questions/3756100/how-did-sql-
becom...](http://stackoverflow.com/questions/3756100/how-did-sql-become-the-
dominant-database-language)

In particular, you can read this book:
[http://www.nap.edu/openbook.php?record_id=6323&page=159](http://www.nap.edu/openbook.php?record_id=6323&page=159)

------
jamespitts
Relational databases provide a _framework_ for structuring data and retrieval.
And it is an approach informed by decades of practical experience and, well,
plain mathematics.

Would you try to program a web application without a framework? Most of us do,
at least once or twice in our career, not thinking about the next guy who
needs to maintain our POS spaghetti. Perhaps if you're very bright or
experienced you could do a good job structuring a web app from scratch, but in
effect you would writing your own framework. Still it makes sense to depend on
and learn from the great web frameworks -- rails, django, etc. -- so you can
think about getting your real work done.

Same goes for sql. It is structured to keep us from being an idiot.

Stand on the shoulders of giants.

~~~
VLM
Your analogy with a web framework is a good one, but it extends beyond that
into growth of application complexity.

True, my giant sprawling complex web application appears that it should have
been done in a complex framework, little does anyone know it started out 5
years ago as a single perl CGI formatting the output of "df" on a webpage...

A similar thing can happen with non-relational databases... are you willing to
bet your entire dev budget and/or the company on NEVER EVER needing features
in the future? Sometimes thats OK, usually not.

------
glimmung
You seem to be asking this question in a vacuum, which makes it impossible to
address the costs and benefits of relational vs. alternative approaches with
any context. What types of problem are you trying to solve? What alternatives
to an RDBMS are you proposing to solve them with?

Having said that, I write transactional business applications (often
dismissively referred to as "CRUD Apps), and in that context an RDBMS is a
sensible default, but not for every part - so on Google App Engine for example
I'm using CloudSQL for the business data, but Google's data store for the
session data - because that has very different characteristics.

[1] I'm reading "SQL" as shorthand for relational, although the two are hardly
synonymous.

------
michaelochurch
Relational databases have a sound mathematical model. You may not think of it
this way, but a table can be seen as a _predicate_ in logical terms. You can
frame most logical queries as a chaining of predicates and functions, and you
can create arbitrary (finite) predicates using tables. That's powerful.

NoSQL came into style with the web, in which you have large data sets but
don't need most of the relational features. You might have a trillion Wuphfs,
but you'll never need ad-hoc queries because you're doing mostly CRUD on the
Wuphfs. Sharded SQL is a painful way to deal with that problem space. NoSQL
systems are often better adapted to it. NoSQL is almost always _less powerful_
abstractly but easier to scale.

The major problem that I've seen with NoSQL systems is that they tend not to
handle the edge cases very well. Remember the old joke about the man who goes
to the doctor and says, "It hurts when I do this", so the doctor says "Don't
do that". Sometimes, you have to "do that". Many NoSQL products are also quite
low-level, requiring that more traditional database functionality be written
in client code. That gets nasty-- slow, flimsy, hard to coordinate. You don't
want, for example, clients to be responsible for generating sequential unique
IDs because the clients all have to agree on what the rules are, in addition
to generating a lot of communication overhead each time a new block of IDs is
requested.

Furthermore, if you're using NoSQL to build something that will be marketed as
a database product (e.g. within your company) and are exposed to requirement
accretion, you'll find that the requirements are often inspired by
expectations derived from relational models: ACID transactions being the big
one.

I am not saying that SQL is the best for all problems, but it is a damn good
technology. Personally, I think that a lot of SQL hate comes from the
limitations of MySQL. If that's the source of your headaches, look into
Postgres. Postgres is surprising in how modern and powerful it is.

~~~
emin_gun_sirer
>NoSQL is almost always less powerful abstractly but easier to scale.

This is no longer true. We now have second-generation NoSQL data stores that
provide strong consistency, fault-tolerance and ACID transactions, based on
the sharded NoSQL architecture that scales easily.

~~~
ucee054
Just out of interest, how are your opinions on the superiority of NoSQL
systems received by

<http://www.cs.cornell.edu/johannes/>

and

<http://www.cs.cornell.edu/annual_report/99-00/Demers.htm>

?

~~~
emin_gun_sirer
All I can say is "some of my best friends have worked on RDBMSs." I don't hold
it against them. ;-)

More seriously: Al, Johannes, and many other notable researchers in the DB
space have complex, nuanced opinions on NoSQL systems. And these opinions are
being revised daily as new information comes in; we've seen some researchers
go from decrying the whole NoSQL movement to becoming some of its fiercest
proponents, seemingly overnight. I would do them disservice if I were to try
to present them here.

So I'd encourage you to ask them directly. Or perhaps we could have a panel at
a conference. That'd be fun if done right.

------
drucken
Nothing else offered Codd's relational model of:

0\. simple user commands

1\. data independence (from hardware and software implementation)

2\. automatic navigation (instead of single record access).

The rest was competitive history with the giant of the time, IBM, who created
and used SQL in their relational System R product. SQL variants rapidly became
commercial successes, with the earliest external adoption and most notable
being that by Oracle.

In time, SQL became a standard and, as a relatively simple declarative
language (the HTML of data if you will), it is easy to learn and widespread.

------
7952
This is just a personal opinion. But as apps scale you run into concurrency
problems. You want to maintain a reliable consistent state but due to latency
and resource limits that is difficult to do within your own code. So the
responsibility is moved onto another system designed for that purpose. NoSQL
simply solves the same problem in a slightly different way. The consistent
state is allowed to be abstract and doesn't need to be transactional.

The real dichotomy is between handling state yourself in code and letting
outside software do it for you.

------
TheZenPsycho
Whoah whoah whoah. SQL IS NOT RELATIONAL. Let's put that to rest right now.
You can have relational databases if you like, but put SQL in front of it and
now you don't have a relational database system anymore. you have an SQL
database. This is because SQL actually violates a few of CODD's axioms for
what constitutes a "relational" database. The tragedy of NoSQL is really the
setting up of this dichotomy between "SQL/Relational" and
"NoSQL/NotRelational" which ignores the possibility that "SQL CAN'T BE
RELATIONAL" and "Relational with no SQL"

~~~
twic
I don't believe this is correct. SQL certainly includes some things which
don't exist in the pure relational model (LIMIT, stored procedures, array
types), but i am not aware of anything in SQL which prevents you using a pure
relational model. SQL is not a pure implementation of the relational model,
but it's close enough that is's reasonable to talk about SQL being relational.

~~~
TheZenPsycho
1\. The existence of a null value (3-value-logic) 2\. permitting duplicate
rows 3\. permitting duplicate columns. 4\. "outer" joins.

and.. other things, that you can enforce in your client application, but then
that rather defeats the whole point of having a "relational database
management system". you might as well just have a dumb key-value store, if you
have to enforce so much of what a "relational database" is in your app.

note that not even the SQL standards use the word "relational". anywhere.
Years ago I couldn't believe this, but go ahead and look. It's not there.

------
xd
The majority of HN won't like me for saying it .. but the major factor in
NoSQLs "popularity" is not needing to learn SQL. Which is a crying shame, as
it's one of the corner stones of computer science.

------
arvit
Relational DBs work for processing of business transactions -- when you need
to get the amount of money and all the other details exactly right, when you
need to retrieve the right record according to a variety of criteria which may
come from other types of data, and when you need to be sure the info is
inserted in exactly the right format.

For this, there are DB transactions, SQL SELECT with joins, and schemas.

Much of this is possible in other technologies, but RDB does it well.

------
zobzu
Simple. Same reason people code in C or Python or Bash. And not C#, Ruby, ZSH,
or much worse: their own implementation or whatever: It standard. Everyone
knows how it works. It's proven.

So unless it's really not good enough, diversity for the sake of diversity is
the opposite of being efficient.

Hence, you're asking the wrong question. In fact, you shouldn't ask a
question. Show or prove your solution brings a REAL benefit. If you can't,
you're indeed wasting your time.

~~~
IsTom
Perhaps it's standard, but implementations vary so wildly that I doubt this is
of any importance. MySQL vs PostgreSQL vs Oracle, it's just madness.

------
manidoraisamy
Tools. All data processing tools right from analytics to integration need to
work with the datastore. RDBMS has accumulated these tools for so many years.
It is not a level playing field for NoSQL to beat that ecosystem.

------
amrnt
Direct answer, SQL is the only thing that learnt at schools.

------
ExpiredLink
\- Codd's relational model

\- ACID

\- SQL

\- Chen's ERM

------
rbanffy
What exactly are you trying to store? Depending on what you are persisting,
SQL may make perfect sense.

------
VLM
Amazing no one's quoted the inner platform effect.

<http://en.wikipedia.org/wiki/Inner-platform_effect>

Hmm, SQL query too slow takes 10 seconds for one query. I know, I can use a
non-SQL solution. Each query will take 100 milliseconds. Whoops turns out the
users need all the relational "stuff" for my reporting and also in my CRUD to
keep it consistent enough. I mean, everybody knows databases are consistent
and codd normal form and relational, and nosql is a database, therefore when
the users spec a database they'll be fine either way? Well no big deal I'll
just write my own JOIN. After all, I'm as good of a programmer as the team at
Oracle who's been working on JOINs since before I was born, so I'm sure a
couple cans of red bull and I'll have something better. Whoops, each "nosql"
query does in fact take 100 milliseconds but now I need to run 1e6 of them and
shove them into my homemade JOIN routine, so I've now "improved" my total
overall system time from 10 seconds to 10 kiloseconds. Whoops.

I've never been able to implement a nosql solution at work because I always
get stuck in inner-platform tar pit, or massive consistency issues making the
CRUD a nightmare, etc. Usually if you complain about that in the nosql
community you get pretty intense complaints that your business is doin' it
wrong, lol at a paying gig thats not going to work very well... I'd like to
try something new but I haven't had the business case for it yet, or rephrased
a deep enough analysis has so far always excluded nosql solutions and that
Might indicate a positive analysis isn't deep enough. Hear hooves, think
horses not zebras. I've bought the "seven databases in seven weeks" book and
its an excellent introduction unfortunately its something like a guided
introduction into smooshing up against the business requirements limits of
each DB as much as a technical intro.

Another point is the original request was "persistent storage" not why does
nosql not work most of the time. Back in the olden days we called that a
"filesystem" but lots of $5/mo webhosters don't allow that type of thing, but
they do give you mysql... Also its hard for the PHP coder types to not shoot
themselves in their foot with filesystem access, for example, allow the user
of the order.php form to store their picture in username.gif what could go
wrong other than lack of input sanitizing and some goofball is going to
register the username order and upload a "picture" with the filename order.php
that actually does something rather naughty, etc. Even if it works, next week,
the boss asks to store two image files now, one full size and one icon size
now. Hmm the filesystem soln doesn't scale... Rather than all that input
sanitation, wouldn't it be simpler to just dump it into a blob column in a
mysql table?

------
ucee054
"I'm sad now because"

Why would you be _sad_? SQL databases are just a tool, and happen to be the
_right_ tool. Why does that make you _sad_?

It's like saying "I wanted to eat soup with chopsticks and everyone told me to
use a spoon instead. That made me _sad_."

------
martinced
Woaw. HN, the place created by pg, who advocated trying to "beat the average"
and have "frighteningly ambitious startup ideas", etc.

Yet HN has become the place for the status-quo. Where people are going to vote
everything encouraging the status-quo (like articles about the benefits of C#
and SQL) as if it was the holy gospel...

 _"Immediately, a SQL relational database is proposed as the solution."_

For a start you're probably not trying to solve a problem involving big-data ;
)

Ask the big data players who are using _fast_ and _gigantic_ key/value stores
how did SQL work for them...

For smaller needs, then SQL offers, basically: ACID. Most companies do not
care about what SQL can really do: they'll wrap everything into ORMs and
perform business logic outside of SQL and in crazy models that aren't
relational at all and then store back everything into SQL.

What most business care about is ACID, not being truly relational (which SQL
DBs ain't anyway), not set theory.

The most saddening thing about traditional SQL DBs is that there's no basis
for queries: the same query shall return different results depending on when
the query is made (because most SQL DBs are "update in place" DBs / value-
oriented DBs).

In the Real-World [TM] this is _very_ problematic because when you can't
reproduce a problem you have to try to reproduce the _state_ the DB was in at
a particular time and query from there. It is a gigantic time waster.

In my view for really big data needs SQL simply doesn't cut it while for
smaller needs I still want ACID but I want something CRA ("create read append-
only"), not CRUD.

I have nothing against using something like Datomic (which is CRA) backed with
a SQL datastore.

But the "traditional" SQL DB that the herd --bent on never creating anything
disruptive-- is using? Not for me.

CRUD is dead baby. CRUD is dead.

There are, today, people who've read and understood pg's writings who are
using Lisp dialects (like Clojure) and CRA DBs (like Datomic, backed or not by
SQL) to gain a competitive edge.

 _This_ kind of stuff is the reason why I come to HN. Not to read why the
status-quo is acceptable (even if it's all most of the HN crowd knows).

