Hacker News new | comments | show | ask | jobs | submit login
Ask HN: What made SQL databases so popular?
29 points by jmilinion 1718 days ago | hide | past | web | 55 comments | favorite
I propose a persistent storage solution for ANY idea to a traditional technical community. Immediately, a SQL relational database is proposed as the solution. I suggest other solutions. I'm sad now because everyone is telling me that I MUST implement it in SQL otherwise I'm wasting my time. According to them, once SQL stops handling the solution, then can I look at the other "eccentric" solutions.

I'm trying to figure out how did SQL get to be the de facto solution to every single persistent data problem out there? Data can be stored in so many ways but a SQL relational database is pushed by so many people that their voice can be overwhelming at times. Why is this?

Added Note: I feel sad for the SQL language since it is one of the few ways many programmers learn about declarative programming. With that being said, this SQL language is different from the more commonly used form of "SQL" which is the nickname people call the well known SQL "relational" databases

Quite honestly i often do not understand the extreme focus HN puts on NoSQL databases, as if it would be the new solution that fits every problem. The point is that SQL solves a lot of problems rather well and every NoSQL database has its own drawbacks. In the end you have a lot of NoSQL databases that solve some specific problems particularly well but have major drawbacks in other situations. SQL databases tend to fit a lot of solutions quite well, they are tested, stable, do transactions, can even scale and (for me) most important: SQL is a standard, the query language is well understood and documented, it is relatively easy to switch between SQL databases. But it's for practical purposes impossible to switch between a given set of NoSQL databases, let alone you need to learn a lot of new things or need another set of developers. Recent example in my project: I had a problem that did fit particularly well into Redis (small, self contained writes) but i also needed good querying capabilites. So after fiddling around with Redis and trying to make it work i had to abandon Redis. Next choice would have been: MongoDB or MySQL. The project already uses a MySQL server and i am easily able to replace MySQL with SQLite, Postgres or Oracle by using an ORM (SQLAlchemy). So the choice was easy. MongoDB would have been the worse choice but a possibility.

I'd wish someone would come up with a common query language that could be reused on different NoSQL databases, that would make NoSQL much more an option. UnSQL[1] looks abandoned :(

So after all: Every problem has its solutions, every program a good fit of a database. Either NoSQL or SQL, don't just point to some NoSQL database because it is the current buzzword, but ONLY because it can solve a problem better then a given SQL database. Then it doesn't matter if you call it SQL or NoSQL. Your NoSQL solution simply needs to redeem the benefits of SQL (standardized query language, transactions) with other benefits.

[1] http://www.unqlspec.org/display/UnQL/Home

I think the same applies to quite many new innovations on our field. They look interesting, because they make things simple. Things that used to be very complicated.

The problem is that in some cases there are reasons why the things are complicated. Quite often it is easy to dismiss these issues during the initial crush with the new technology. "Sure I can live without transactions, why would I need to enforce schema for my objects, who needs type information in language"

Once the initial crush is over, people start slowly adding features to make the shiny-new technology look more like the old stuff. Enforce JSON schema for Mongo, add lightweight type information for Javascript functions etc. Usually the thinking goes we are just adding these few good bits of structure from the Old Technology, surely we are not going to replicate the bloat and complexity.

"Democracy is the worst form of government except all those other forms that have been tried from time to time."

Relational DBs & SQL aren't perfect, but they've proven better than 30 years of attempts to replace them. I think the burden is (rightly) on you to explain why your alternative is better. There are applications where other approaches are appropriate (e.g. web search); if you can't explain why something is better solved using a non-relational system, then perhaps it isn't.

In no particular order:

1. Transactions are not implemented in most of the "eccentric" solutions

2. SQL solutions are more battle tested.

3. The relational model fits most solutions quite well

Where the relational model doesn't fit or transactions are not required, then it may be worth looking at other solutions.

Polyglot persistence is also another moving part too, it needs time, expertise and infrastructure to deal with, so even if a non relational model makes sense and there is currently no need for transactions. Who knows when transactions would be a good idea?

If it doesn't have transactions it's not a database.

SQL is popular because for many classes of problems it is awesome and uses a very powerful abstraction (tables) and query language. Most implementations are also very robust and able to provide very interesting guarantees (like ACID).

The only problem of SQL was that the computer industry was trying to solve everything with SQL databases, that is not a good idea.

It's good that there are other databases now, but I don't think SQL is going to face away. For instance many NoSQL databases like MongoDB, RethinkDB, may well be SQL databases with different implementation tradeoffs but just picked a different query language because... well because... I don't know why honestly. Not that's a problem but the model is very very similar to objects with fields (tables) I can query.

It's the de facto solution because for most use cases it is a valid solution and for many it is the best solution.

If it wasn't developers would leave it in the past like any other obsolete technology. Especially today technology competes on merit before marketing or anything else.

My group developed the latest, fastest NoSQL system on the scene (http://hyperdex.org), one that provides ACID transactions as well as a ton of other exciting features like scalability, consistency and fault-tolerance.

Yet when someone asks this question, I usually respond with "try a traditional RDBMS first."

The simple reason for this is that there is additional, implicit information in the question: the fact that someone is asking it indicates that they are not up-to-date on one of the most exciting revolutions taking place in data management. Such users will require extensive hand-holding, and there are a lot of SQL-related resources out there that can help them with this.

But there is a twist: what people are doing is kind of like how some religions have to turn you away before you become a true convert. Will an RDBMS user be completely happy? Perhaps, but if the application is at all demanding, they will wake up in the middle of the night because of a wedged DB, running out of internal table space. Or have to frantically hire DBAs to "optimize" their queries. Or hand money hand over fist to Oracle. Or discover that an AWK script is faster than their multi-million dollar RDBMS installation.

So it's not like people are turning you away from NoSQL altogether -- you will end up with a NoSQL system eventually, they're just making sure that you do this for the right reasons.

I recommend this book: http://amzn.to/13NgjU9 (affiliate link), which I bought for inspiration on how to build my own database from scratch and finally put down, after reading several chapters, with a reasoned appreciation for the way things are.

To simplify, the SQL data model exquisitely balances:

* expressivity - queries involving both GROUP BYs and subqueries, which I've needed more than once, are challenging at best to translate into Mongo's query model, which is one of the most expressive outside SQL

* speed - as long as your query can run on a single machine, SQL query planners are the best, period. Other models tend to be more horizontally scalable, but this was not a priority for most of database history, nor for most real-world situations.

* compactness - the on-disk overhead of a SQL database is fairly limited, about 40-50% in practice. Obviously, a flat file has even less overhead (<10%, generally), but some non-SQL systems consume storage willy-nilly (it's not uncommon to see XML overheads surpass 500%, and Mongo overhead hovers around 90-100%).

* robustness - in the SQL standard, "undefined behavior" is kept to an absolute minimum. In particular, transactions are invaluable anti-Heisenbugs, but cascading rules, sanity constraints, and fixed schemas also reduce production gotchas (at the cost of development agility, of course).

If i think about what scares me most about using a noSQL database, it's the loss of expressivity.

I believe noSQL databases can be fast, compact, and well-behaved. But SQL is a phenomenal query language; it anticipates Alan Kay's (alleged) maxim that "simple things should be simple, complex things should be possible". Not only does it make complex things possible, but modern query planners make a decent fist of making them fast, too. NoSQL databases either don't support rich queries at all (eg Riak, which is in all other respects rather wonderful), or have substantially less power (eg Mongo). Even databases which are queried with map-reduce operations using functions written in real programming languages (eg CouchDB) are lacking compared to SQL, due to the lack of anything like joins, subqueries and so on.

(The only kinds of noSQL databases that have something approaching SQL's power are graph databases (eg neo4j, whose Cypher query language is really rather nice), some object databases (eg ObjectDB, which supports JPQL), and some of the pre-SQL options like Pick databases. However, these are rarely the ones which get the attention, and it's notable that their data models are actually not far removed from the relational model.)

The reason this flexibility matters is because it supports change in the future. I might be storing and querying data for some purpose now, but it's possible that a year down the line, i will want to do something entirely different with it - usually yawn-inducing things like reporting and auditing, sometimes just implementing new features that look at the data in a different way. The expressiveness of SQL lets me be confident that i will be able to do that with my relational database. With a noSQL solution, i run a substantial risk of not being able to do it, and so being forced to export my data into another store to do it, with all the headaches that that entails.

"the on-disk overhead of a SQL database is fairly limited"

The ___minimum___ on-disk overhead...

At run time after deployment after growth sometimes I have to scramble adding indexes and depending on the business demand for "instant" online reporting or query speed, you can end up with a ridiculous number of large indexes. Which also has a negative, hopefully survivable effect on insert/update/delete speeds. I guess I'd add:

* tunability - the DBA decides at the DB level based on defined business requirements how to trade disk space for read speed.

Relational databases provide a _framework_ for structuring data and retrieval. And it is an approach informed by decades of practical experience and, well, plain mathematics.

Would you try to program a web application without a framework? Most of us do, at least once or twice in our career, not thinking about the next guy who needs to maintain our POS spaghetti. Perhaps if you're very bright or experienced you could do a good job structuring a web app from scratch, but in effect you would writing your own framework. Still it makes sense to depend on and learn from the great web frameworks -- rails, django, etc. -- so you can think about getting your real work done.

Same goes for sql. It is structured to keep us from being an idiot.

Stand on the shoulders of giants.

Your analogy with a web framework is a good one, but it extends beyond that into growth of application complexity.

True, my giant sprawling complex web application appears that it should have been done in a complex framework, little does anyone know it started out 5 years ago as a single perl CGI formatting the output of "df" on a webpage...

A similar thing can happen with non-relational databases... are you willing to bet your entire dev budget and/or the company on NEVER EVER needing features in the future? Sometimes thats OK, usually not.

You seem to be asking this question in a vacuum, which makes it impossible to address the costs and benefits of relational vs. alternative approaches with any context. What types of problem are you trying to solve? What alternatives to an RDBMS are you proposing to solve them with?

Having said that, I write transactional business applications (often dismissively referred to as "CRUD Apps), and in that context an RDBMS is a sensible default, but not for every part - so on Google App Engine for example I'm using CloudSQL for the business data, but Google's data store for the session data - because that has very different characteristics.

[1] I'm reading "SQL" as shorthand for relational, although the two are hardly synonymous.

Whoah whoah whoah. SQL IS NOT RELATIONAL. Let's put that to rest right now. You can have relational databases if you like, but put SQL in front of it and now you don't have a relational database system anymore. you have an SQL database. This is because SQL actually violates a few of CODD's axioms for what constitutes a "relational" database. The tragedy of NoSQL is really the setting up of this dichotomy between "SQL/Relational" and "NoSQL/NotRelational" which ignores the possibility that "SQL CAN'T BE RELATIONAL" and "Relational with no SQL"

I don't believe this is correct. SQL certainly includes some things which don't exist in the pure relational model (LIMIT, stored procedures, array types), but i am not aware of anything in SQL which prevents you using a pure relational model. SQL is not a pure implementation of the relational model, but it's close enough that is's reasonable to talk about SQL being relational.

1. The existence of a null value (3-value-logic) 2. permitting duplicate rows 3. permitting duplicate columns. 4. "outer" joins.

and.. other things, that you can enforce in your client application, but then that rather defeats the whole point of having a "relational database management system". you might as well just have a dumb key-value store, if you have to enforce so much of what a "relational database" is in your app.

note that not even the SQL standards use the word "relational". anywhere. Years ago I couldn't believe this, but go ahead and look. It's not there.

SQL can be relational, if you do it right: http://en.m.wikipedia.org/wiki/Relational_model#SQL_and_the_...

True. You can also make good programs in PHP if you "do it right". it's not exactly promoted or encouraged, or even anywhere on the radar though, and doing so involves a fair bit of that relational logic being implemented and enforced in the client.

The point is, it's possible to have a "relational database management system" which enforces the relational axioms for you. SQL is fundamentally incompatible with this concept, and forces the programmer to keep the databases relational.

> True. You can also make good programs in PHP* if you "do it right". it's not exactly promoted or encouraged, or even anywhere on the radar though, and doing so involves a fair bit of that relational logic being implemented and enforced in the client.

It seems while you see that you're making a concession, you fail to understand that that actually disproves your own point. I repeat: Your argument is invalid.

> The point is, it's possible to have a "relational database management system" which enforces the relational axioms for you.

People aren't saying that it's impossible - they're saying it's a question of practicality, with the most true to the abstract concept being the least practical in real world applications.

> SQL is fundamentally incompatible with this concept, and forces the programmer to keep the databases relational.


* By the way: Classic case of "doubting, supported by not looking". Even for PHP, there are plenty of resources like http://www.phptherightway.com/ and they are widely known and shared amongst people who know their stuff.

There is a difference between "SQL is relational" and "SQL can be used to maintain a relational database". the first is false, the second is true.

As for whether it's practical to have a relational database system, are you claiming that it isn't ?

Re: PHP. I said that PHP can be used to make good programs. Is this a point you disagree with? or, do you see some other point?

You've been smoking too much FabianPascal

well, just because he's an antisocial unbearable asshole doesn't mean he's not right.

No, the fact that he's not living in reality does.

I never heard about Codd complaining about IBM's first relational implementation. In fact, have you read Codd's papers defining relational databases and normal forms?

To even get to 1NF, you have to eliminate any non-atomic datatypes. How the hell are you going to do that?

Get rid of strings? Any idiot can put "key1=value1;key2=value2" into a string and violate 1NF. Any idiot can create two identically typed columns for storing the same attribute, instead of one, in the same table, and violate 1NF. Even frickin' dates are non-atomic.

There is no practical way for a database server to ENFORCE what Pascal calls "relational" on the users.

So actually the whole "SQL's not relational" bullshit boils down to three things: "WAAAH! DISTINCT should be the default!" "WAAAAH! we shouldn't have NULLs!" and "WAAAH! we shouldn't have recursive table definitions or hierarchy types!"

Back in the real world we want those features, because we NEED to represent hierarchies and "Not Applicable" values in our databases.

err.... I think you are deeply confused. http://en.wikipedia.org/wiki/First_normal_form#Atomicity

The rules of relational databases aren't the rules because some guy named CODD said they are. There are deeper principles of mathematics and logic at work. It's rather hysterical to claim that, because some guy named CODD explained the properties of this system in an ambiguous or confusing way, that the whole system should be disregarded as impractical nonsense.

Oh I forgot to add, the rules of relational database are exactly what CODD said they are because CODD invented them and said "I define the relational model to be this"

What they are NOT is what you, Chris Date and Fabian Pascal want them to be. And no-one is going to use Tutorial-D ever, so it's about time you made your peace with SQL.

What precisely am I confused about?

You are linking to the 1NF wikipedia page section on atomicity.

Are you alleging that I am confused about atomicity?

Because I said strings and dates are not atomic. Here is your link:

"A character string would seem not to be atomic, as the RDBMS typically provides operators to decompose it into substrings."

By the way, dates can also be decomposed, into years, months, days, hours, minutes etc

Oopsie, it appears that you're the one who's confused.

Anyone can click on that link and read it. You only make a fool of yourself by willfully failing to understand what is written there.

Well no, you make a fool out of yourself by reading something that isn't there.

Instead of resorting to ad-hominems, you could have actually explained what I am supposed to be confused about, and what I am supposed to have "understood" from the page which was saying exactly the same as what I was saying.

Or you could have tried to tackle the fact that we need NULLs and other "sins against Pascal", which was the point of my argument.

You failed to do both. I conclude that you got nuthin'

So you are saying that you insist that the correct definition of "atomic" is the one that is completely useless (by defining strings and dates as non atomic), and by extension makes the whole of the relational model useless? Uhhh. okay. You can have that victory I guess. pats your head.

Your recurring "rebuttal" to somebody posting a reply to your argument is to say that your point is right, that he is 'deeply confused', that he is arguing from authority, ignoring mathematical principles, that he is hysterical, that he doesn't read his own links, that he is making a fool of himself by not understanding what he has linked to himself. Up to this point, you don't use a single word to actually provide any substance at all to your argument. Then, finally, you actually do address one of his points in the most dismissive and condescending way I have seen all day.

Kudos, I guess. No substance and all hot air. Somewhere halfway through your silly dance, irony ate its own hat.

huh? Who are you? have you made me a project? ooh! that's so cute! Well I could just wave off your weird outburst here but I guess out of fairness I'll address you point for point.

> Your recurring "rebuttal" to somebody posting a reply to your argument is to say that your point is right, that he is 'deeply confused'

Recurring? Well I did say he was deeply confused. once, up thread. I suppose the part where I supposedly say "my point is right" is implied by the fact that this is an argument, and I am the person arguing my point- But I can't find where I say that explicitly, out loud. That would kind of be weird. Have I said this somewhere else?

> that he is arguing from authority, ignoring mathematical principles, that he is hysterical.

OH! Are you referring to where I was talking about mathematical principles? I suppose one way of looking at that was me saying something about his argument. From my point of view I was just talking about a subject though.

> that he doesn't read his own links,

it was my link, and it was him trying to make a point about me reading my own link. Of course I did read my own link. the problem we were having is that he skimmed it, found the one out of context quote that would support his own point, and pulled that out on display. This seems kind of ridiculous to me, because anyone can click in, and look to see that they're quotes pulled out of context in such a way as to reverse their intended meaning, and he's fooling nobody by doing that. Well, maybe he's fooled you.

> that he is making a fool of himself by not understanding what he has linked to himself.

it was my link and and trying to lie about a source that is RIGHT THERE, and anyone can look at is really kind of crazy.

> Up to this point, you don't use a single word to actually provide any substance at all to your argument.

Up until that point there was not a single word that seriously challenged my claims, either.

> Then, finally, you actually do address one of his points in the most dismissive and condescending way I have seen all day.

Well, because it came at the end of a back of forth involving me trying to get him to read something that addressed his point, and so at the end, well, it's condescending because you realise that the point he is making is at this point, not that SQL IS relational after all, but that the Relational model is so absurd and stupid to begin with so as to be useless. So, at this point I have to just accept that he is irredeemably ridiculous and let him just have that victory if he wants it.

> Kudos, I guess. No substance and all hot air. Somewhere halfway through your silly dance, irony ate its own hat

Why do you care so much?

> huh? Who are you? have you made me a project? ooh! that's so cute!

Two (now three) replies are quite a low bar to call something "a project". But I get it, you're totally casual about it.

> Well I could just wave off your weird outburst here but I guess out of fairness I'll address you point for point.

That wasn't necessary and I don't care for reading it. You might want to have this discussion with the original user replying to your statement, I was merely reiterating his complaint.

> Why do you care so much?

Because I read a quite interesting discussion on an interesting subject and you wasted my time with a weird and somewhat self-indulgent tangent. I felt that you deserved to have a little of your time wasted in return. I think we're done here.

"because you realise that the point he is making is at this point, not that SQL IS relational after all, but that the Relational model is so absurd and stupid to begin with so as to be useless."

No, I was saying that your (and Fabian Pascal's) definition of a relational database is absurd and stupid and useless.

Your definition seems to be a server that enforces the relational model whereas EF Codd's definition seems to be something that enables the relational model.

EF Codd's definition seems quite good. It also seems to include IBM's original SQL system and Ingres and everything similar.

And your (and Pascal's and Date's) response seems to be a rather dishonest attempt to rewrite history and to change what relational means in order to make your stupid, stupid ideas work, so you can continue to whinge about SQL and NULLs.

It would be a lot easier just to learn to live with the fact that people need NULLs.

The majority of HN won't like me for saying it .. but the major factor in NoSQLs "popularity" is not needing to learn SQL. Which is a crying shame, as it's one of the corner stones of computer science.

This is just a personal opinion. But as apps scale you run into concurrency problems. You want to maintain a reliable consistent state but due to latency and resource limits that is difficult to do within your own code. So the responsibility is moved onto another system designed for that purpose. NoSQL simply solves the same problem in a slightly different way. The consistent state is allowed to be abstract and doesn't need to be transactional.

The real dichotomy is between handling state yourself in code and letting outside software do it for you.

Nothing else offered Codd's relational model of:

0. simple user commands

1. data independence (from hardware and software implementation)

2. automatic navigation (instead of single record access).

The rest was competitive history with the giant of the time, IBM, who created and used SQL in their relational System R product. SQL variants rapidly became commercial successes, with the earliest external adoption and most notable being that by Oracle.

In time, SQL became a standard and, as a relatively simple declarative language (the HTML of data if you will), it is easy to learn and widespread.

Relational DBs work for processing of business transactions -- when you need to get the amount of money and all the other details exactly right, when you need to retrieve the right record according to a variety of criteria which may come from other types of data, and when you need to be sure the info is inserted in exactly the right format.

For this, there are DB transactions, SQL SELECT with joins, and schemas.

Much of this is possible in other technologies, but RDB does it well.

Simple. Same reason people code in C or Python or Bash. And not C#, Ruby, ZSH, or much worse: their own implementation or whatever: It standard. Everyone knows how it works. It's proven.

So unless it's really not good enough, diversity for the sake of diversity is the opposite of being efficient.

Hence, you're asking the wrong question. In fact, you shouldn't ask a question. Show or prove your solution brings a REAL benefit. If you can't, you're indeed wasting your time.

Perhaps it's standard, but implementations vary so wildly that I doubt this is of any importance. MySQL vs PostgreSQL vs Oracle, it's just madness.

Tools. All data processing tools right from analytics to integration need to work with the datastore. RDBMS has accumulated these tools for so many years. It is not a level playing field for NoSQL to beat that ecosystem.

Direct answer, SQL is the only thing that learnt at schools.

Short answer: SQL databases were easier to use for developers and non-developers than other client server relational database systems in the late 80s and early 90s (dbase derivatives, btrieve and ctree were common then). Tools like Btrieve and ctree were largely libraries, and really didn't have the ad-hoc query capability that SQL has. Dbase suffered from incompatible implementations... so an application written on one Dbase clone wouldn't work exactly right in another. Old school client-server databases like Btrieve and ctree were extremely tightly coupled, and there was little separation of concerns between storage and logic. Lots of messy, messy code... and writing the storage and retrieval parts of your application were non-trivial. Since end users had no way to do their own queries (outside Dbase), most reporting was written in C, pascal or even BASIC. There were a few reporting tools like Crystal Reports that tried to solve the reporting problem, but it was often just as easy to roll the report in C...

When the internet took off, you had MS-SQL server on Windows and several open source SQL servers on Linux (MySQL, mSQL and Postgres) that freed web developers from having to write lots of custom data storage/retrieval code. A singe SELECT or UPDATE command did what 150-200 lines of C could do. It also meant that web apps could go to market without advanced reporting capabilities or CRUD for detailed configurations because you could always just roll an SQL query to deal with that later.

SO, here is what you are up against in selling noSQL vs SQL:

* Old developers who lived through the 80s and 90s database hell and see SQL as the thing that freed us.

* An extremely mature toolchain vs. new applications without the toys needed by non-developers (reporting, ad-hoc query for end users, integration with spreadsheets, etc...)

* Familiarity with the query language. Lots of devs know SQL

* Risk of something new.

* Fear of having your job turn into writing reports constantly because no one else in the company can.

SO if you want to be persuasive in selling a noSQL solution:

* Define the problem clearly and be prepared to show why Mongo, Couch or whatever is the right solution.

* Avoid arguing that SQL isn't relational. You might be right, but Oracle, IBM, Microsoft, Ashton-Tate, Sybase, Progress and everyone else with an SQL product has outspent you on marketing.

* Be ready to deal with questions about reports, ad-hoc queries, stored procedures and security. Each of these issues is solved and documented in mature SQL based systems.

* Giving Sally in accounting access to a unix command line is not going to be as well recieved as giving her access to Excel, Access, MS SQL Management Studio or Navicat.

What exactly are you trying to store? Depending on what you are persisting, SQL may make perfect sense.

- Codd's relational model



- Chen's ERM

Amazing no one's quoted the inner platform effect.


Hmm, SQL query too slow takes 10 seconds for one query. I know, I can use a non-SQL solution. Each query will take 100 milliseconds. Whoops turns out the users need all the relational "stuff" for my reporting and also in my CRUD to keep it consistent enough. I mean, everybody knows databases are consistent and codd normal form and relational, and nosql is a database, therefore when the users spec a database they'll be fine either way? Well no big deal I'll just write my own JOIN. After all, I'm as good of a programmer as the team at Oracle who's been working on JOINs since before I was born, so I'm sure a couple cans of red bull and I'll have something better. Whoops, each "nosql" query does in fact take 100 milliseconds but now I need to run 1e6 of them and shove them into my homemade JOIN routine, so I've now "improved" my total overall system time from 10 seconds to 10 kiloseconds. Whoops.

I've never been able to implement a nosql solution at work because I always get stuck in inner-platform tar pit, or massive consistency issues making the CRUD a nightmare, etc. Usually if you complain about that in the nosql community you get pretty intense complaints that your business is doin' it wrong, lol at a paying gig thats not going to work very well... I'd like to try something new but I haven't had the business case for it yet, or rephrased a deep enough analysis has so far always excluded nosql solutions and that Might indicate a positive analysis isn't deep enough. Hear hooves, think horses not zebras. I've bought the "seven databases in seven weeks" book and its an excellent introduction unfortunately its something like a guided introduction into smooshing up against the business requirements limits of each DB as much as a technical intro.

Another point is the original request was "persistent storage" not why does nosql not work most of the time. Back in the olden days we called that a "filesystem" but lots of $5/mo webhosters don't allow that type of thing, but they do give you mysql... Also its hard for the PHP coder types to not shoot themselves in their foot with filesystem access, for example, allow the user of the order.php form to store their picture in username.gif what could go wrong other than lack of input sanitizing and some goofball is going to register the username order and upload a "picture" with the filename order.php that actually does something rather naughty, etc. Even if it works, next week, the boss asks to store two image files now, one full size and one icon size now. Hmm the filesystem soln doesn't scale... Rather than all that input sanitation, wouldn't it be simpler to just dump it into a blob column in a mysql table?

Relational databases have a sound mathematical model. You may not think of it this way, but a table can be seen as a predicate in logical terms. You can frame most logical queries as a chaining of predicates and functions, and you can create arbitrary (finite) predicates using tables. That's powerful.

NoSQL came into style with the web, in which you have large data sets but don't need most of the relational features. You might have a trillion Wuphfs, but you'll never need ad-hoc queries because you're doing mostly CRUD on the Wuphfs. Sharded SQL is a painful way to deal with that problem space. NoSQL systems are often better adapted to it. NoSQL is almost always less powerful abstractly but easier to scale.

The major problem that I've seen with NoSQL systems is that they tend not to handle the edge cases very well. Remember the old joke about the man who goes to the doctor and says, "It hurts when I do this", so the doctor says "Don't do that". Sometimes, you have to "do that". Many NoSQL products are also quite low-level, requiring that more traditional database functionality be written in client code. That gets nasty-- slow, flimsy, hard to coordinate. You don't want, for example, clients to be responsible for generating sequential unique IDs because the clients all have to agree on what the rules are, in addition to generating a lot of communication overhead each time a new block of IDs is requested.

Furthermore, if you're using NoSQL to build something that will be marketed as a database product (e.g. within your company) and are exposed to requirement accretion, you'll find that the requirements are often inspired by expectations derived from relational models: ACID transactions being the big one.

I am not saying that SQL is the best for all problems, but it is a damn good technology. Personally, I think that a lot of SQL hate comes from the limitations of MySQL. If that's the source of your headaches, look into Postgres. Postgres is surprising in how modern and powerful it is.

>NoSQL is almost always less powerful abstractly but easier to scale.

This is no longer true. We now have second-generation NoSQL data stores that provide strong consistency, fault-tolerance and ACID transactions, based on the sharded NoSQL architecture that scales easily.

Excellent points. Still, "powerful" is defined by client needs (especially performance) and client needs are based on 30 years of expectations derived from the relational model (even if most clients have no idea that's what they want).

If you're exposed to the accretion of business requirements you do not control, you'll eventually be pushed toward a definition of "power" that relational databases meet the best... except when the dataset gets large.

What you say is still true, of course, and will only become more true over time. As more work is put into technologies, they tend to become more well-rounded. Languages that used to have awful performance (100x C) are now closer to 3-5x, for one example. The best NoSQLs now have many of the traditionally relational benefits, and the best relational databases (e.g. PostgreSQL) are overcoming traditional deficits associated with "relational".

Just out of interest, how are your opinions on the superiority of NoSQL systems received by





All I can say is "some of my best friends have worked on RDBMSs." I don't hold it against them. ;-)

More seriously: Al, Johannes, and many other notable researchers in the DB space have complex, nuanced opinions on NoSQL systems. And these opinions are being revised daily as new information comes in; we've seen some researchers go from decrying the whole NoSQL movement to becoming some of its fiercest proponents, seemingly overnight. I would do them disservice if I were to try to present them here.

So I'd encourage you to ask them directly. Or perhaps we could have a panel at a conference. That'd be fun if done right.

"I'm sad now because"

Why would you be sad? SQL databases are just a tool, and happen to be the right tool. Why does that make you sad?

It's like saying "I wanted to eat soup with chopsticks and everyone told me to use a spoon instead. That made me sad."

Woaw. HN, the place created by pg, who advocated trying to "beat the average" and have "frighteningly ambitious startup ideas", etc.

Yet HN has become the place for the status-quo. Where people are going to vote everything encouraging the status-quo (like articles about the benefits of C# and SQL) as if it was the holy gospel...

"Immediately, a SQL relational database is proposed as the solution."

For a start you're probably not trying to solve a problem involving big-data ; )

Ask the big data players who are using fast and gigantic key/value stores how did SQL work for them...

For smaller needs, then SQL offers, basically: ACID. Most companies do not care about what SQL can really do: they'll wrap everything into ORMs and perform business logic outside of SQL and in crazy models that aren't relational at all and then store back everything into SQL.

What most business care about is ACID, not being truly relational (which SQL DBs ain't anyway), not set theory.

The most saddening thing about traditional SQL DBs is that there's no basis for queries: the same query shall return different results depending on when the query is made (because most SQL DBs are "update in place" DBs / value-oriented DBs).

In the Real-World [TM] this is very problematic because when you can't reproduce a problem you have to try to reproduce the state the DB was in at a particular time and query from there. It is a gigantic time waster.

In my view for really big data needs SQL simply doesn't cut it while for smaller needs I still want ACID but I want something CRA ("create read append-only"), not CRUD.

I have nothing against using something like Datomic (which is CRA) backed with a SQL datastore.

But the "traditional" SQL DB that the herd --bent on never creating anything disruptive-- is using? Not for me.

CRUD is dead baby. CRUD is dead.

There are, today, people who've read and understood pg's writings who are using Lisp dialects (like Clojure) and CRA DBs (like Datomic, backed or not by SQL) to gain a competitive edge.

This kind of stuff is the reason why I come to HN. Not to read why the status-quo is acceptable (even if it's all most of the HN crowd knows).

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact