
SQL Databases Don't Scale - LiveTheDream
http://adam.heroku.com/past/2009/7/6/sql_databases_dont_scale/
======
acabal
Can we please stop with these ridiculous "SQL is utterly hopeless" and SQL vs
NoSQL articles? Saying that SQL is not scalable, full stop, is ignoring
decades of real-world evidence. There's a right tool for every job, and SQL
isn't always that tool, but articles like this aren't helpful in the
slightest.

------
yuvadam
I think TFA gets it all wrong.

It's not that SQL databases do not scale _per se_ , they just can't
arbitrarily scale.

Most applications today manage pretty well with nothing more than sharding and
replication. Sure, it's not the optimal solution, and it's not easy - but it
gets the job done.

Facebook, at the very core (behind all the layers of cache), run, arguably,
the largest MySQL landscape in the world - and they seem to be doing pretty
good.

Until Heroku comes up with a better solution (or a better question), TFA is no
more than flamebait.

~~~
SigmundA
"It's not that SQL databases do not scale per se, they just can't arbitrarily
scale."

Tell that to Clustrix, Vertica, and VoltDB.

SQL is a query language, Relational is a data model, neither one prevents
scaling out.

~~~
yuvadam
I was merely using the same terminology as the article.

------
gregjor
I can't count how many Rails fanboys have written this same article. What do
they think big companies, banks, government agencies -- organizations with
huge databases -- use? What kind of database do the RoR developers think the
real world runs on?

Claiming that SQL databases don't scale, especially using MySQL as the
example, is simply ignorant. Oracle, DB2, SQL Server all scale to handle
databases orders of magnitude bigger than anything the most successful RoR app
is running.

~~~
kolektiv
Well, they don't scale indefinitely, and you do run in to Brewers Theorem at
some point. While you're right in saying that if you have enough cash and
horsepower to throw at SQL databases you can get a very long way, there is
some point where anything which must be consistent is going to have an impact
on availability.

This gets especially critical once you stop assuming that data can be
centralised. Once your required transactional context needs to scale large
areas (geographically or logically) then the overhead of this is mighty. 2PC
itself does not scale indefinitely, and while interesting research is being
done in alternative transactional approaches, it would be tough to argue that
any are acceptable solutions to the problem.

Also your example of banks is perhaps not an ideal one. While banks may indeed
run some very large databases, they don't run one HUGE database which is
expected to be always consistent. Things like ATM systems etc. run based on
eventual consistency and tolerance favouring availability precisely because a
simple horespower approach to a transactional database (whether SQL or not)
wouldn't currently work in this domain.

~~~
gregjor
Nothing scales indefinitely, not even RoR. But that wasn't what the article
complained about. If the author had written "SQL Databases Don't Scale
Indefinitely" I wouldn't have any complaint about it.

My point is that there are plenty of real-world examples of SQL databases that
are scaled way beyond any RoR web app's needs, so flatly claiming they don't
scale is wrong.

The solution banks (and other large organizations) have arrived at -- using
multiple databases with different transactional and availability requirements
-- IS a solution to scaling large databases. There's no requirement that SQL
databases be monolithic. Even the original article describes using a MySQL
master for updates and one or more slaves for reads. That's a valid
scalability solution if your application can live with the possibility of
stale data in the slaves.

------
jasonjei
Wow, a lot of dislike of SQL, and a bit biased in their description of scaling
SQL and sharding. Again, use the right tool for the right purpose, not what
you think is intrinsically better.

Our app does double entry accounting, and considering the relational structure
of accounting ledgers as well as that every entry needs to be posted in an
ACID manner, we don't feel comfortable building out, say, a banking system in
NoSQL.

Yes, SQL is painful to scale, but it can be manageable. We partition it in a
scoped schema way (e.g each company account/tenant has its own set of tables
on one of the shards with the server of the fewest consumed resources, or if
it's a tie, a coin toss). Scaling it in an arbitrary manner such as by names
beginning A to M, N to Q, etc simply seems like a bad way to do it. If your
app is like a 37 Signals app where it doesn't typically require access to
another account's data, this might be a good way to do it.

One of the programmers I hired was dead set on trying to use CouchDB for
everything. I told him don't try to use it for everything--it's good to use
for the audit trail and document revisions on invoices, orders, etc, but not
on actual double entry journals.

------
kls
The #1 way to get relational databases to scale is to either teach your
application developers to really really understand relational databases and
performance trade offs or don't allow them to be responsible for the data
model.

In my experience I have seen more atrocities committed due to developers using
a relational database as a file-system, an XML document, a key value store or
some variation in between.

When people complain about scaling in relational databases they usually mean
that it is more rigid, and not easily adapted to changes. Which is valid and
is why you cannot apply software development practices to database design.
With database design you have to plan for the future and think out how your
solution will grow as to not paint yourself into a corner. Further in
relational design there is no such thing as premature optimization, if one
thinks there is, one has already failed at setting up a robust relational
database architecture.

At one of my start-ups we supported massive amounts of traffic from
Hotels.com, Orbitz, Travelocity and Expedia all looking for pricing and
allotment each time someone hit their front door and we did it all on
applications that where backed by relational databases. We never saw Google or
Facebook size traffic, but we had constant load of more than any one of those
travel sites and many times load of all of them combined.

A solid application architecture and a sound relational model can scale quite
well you just have to have good people that understand each discipline.

------
jsr
It's useful to differentiate between "SQL Databases do not scale" and "SQL
Databases do not _cost effectively_ scale". The second argument is more
accurate.

Vertical scaling of a DB is definitely an option for many people and has been
used to scale many applications. However, the cost curve associated with
buying bigger and bigger hardware is super-linear; doubling CPU & Memory in a
single system leads to more than doubling the hardware cost. This can be
problematic for many businesses whose revenue growth is exceeded by cost
growth of the database.

Sharding is also an option for scaling, leveraged to great success by
Facebook, Yahoo, and many others. However as the article points out, sharding
prevents the developer from using many of the features that make a relational
database a productive development environment. There are lots of foot guns
that emerge in a sharded SQL environment and if you have not set up your
development constraints appropriately, you can slow the pace of development
considerably. This again leads to a cost problem because the incremental costs
of adding features grows as you add more things like sharding around your
database.

SQL is not useless and not hopeless. In a large number of cases, SQL is the
right solution. However the techniques used to scale SQL tend to be options
only to very large budget organisations. NoSQL solutions tend to be more cost
effective in their scaling approach (scale out vs. scale up) without crippling
the developers productivity. For these reasons, NoSQL solutions tend to be the
better choice for the cost-conscious.

------
leif
"SQL" is not a classification of a database, it's a query language. Some of
the databases which use SQL do not scale well or easily, and some databases
which don't use SQL don't scale well either.

What Adam means to say is "some of the semantics defined by SQL are hard to
get correct while maintaining scalability." Hard is very different from
impossible, and there is a large number of very smart people currently solving
this problem very well. There is another large number of smart people trying
to solve the problem by ignoring the hard parts of the semantics. It seems
they will likely come up with something very fast and scalable, but ultimately
less useful for certain things which are done easier with proper SQL
semantics. In fact, it's not clear in all cases that they are even making
something that's more performant: cf.
[http://sergeitsar.blogspot.com/2011/01/mongodb-vs-
clustrix-c...](http://sergeitsar.blogspot.com/2011/01/mongodb-vs-clustrix-
comparison-part-1.html)

Saying "SQL Databases Don't Scale" is like saying "oil paintings on wood
aren't appealing". Not all oil-on-wood paintings are good, but some are, and
some tempera-on-fresco paintings suck too. The logic is simply invalid.

------
jhugg
>> When hundreds of companies and thousands of the brightest programmers and
sysadmins have been trying to solve a problem for twenty years and still
haven’t managed to come up with an obvious solution that everyone adopts, that
says to me the problem is unsolvable.

I take issue with the assumption here. What you want, which is obviously
"free/open mysql-ish thing that scales indefinatly", is not a problem people
have been working that hard to solve until quite recently.

To put it in perspective, 20 years ago, many banks offered no access to your
money outside open hours and the internet was not a thing normal people used.

Today, if you go to Oracle or IBM, you'll find that they'll be happy to help
you solve your enourmous problem at great profit to them. The thing that's
changed in the last few years, is that Web 2.0 guys want the same power (or
more) for a tiny fraction of the cost.

This is a good thing. This is exciting. People recognize that the status quo
sucks and are working hard on change. The solution will probably involve some
SQL, and probably some other tools as well. Don't be such a downer. ;-)

------
bayareaguy
Dupe <http://news.ycombinator.com/item?id=859468>

And here too <http://news.ycombinator.com/item?id=690656>

------
SeanDav
MySpace uses SQL. If that isn't a vote for SQL scalability then I don't know
what is.

~~~
sausagefeet
What's MySpace?

------
ebiester
Isn't that what Oracle RAC is for? If you are managing that much data, and you
don't want to go the specialty DB route (nosql, for example) RAC scales.

It's also hellishly complicated, but so is the problem.

~~~
kls
Right I feel like some times when someone say relational databases don't scale
they mean that free databases don't scale without a lot of work. Scaling with
Oracle is fast and easy (in comparison to the alternatives), it is why people
still pay that kind of money for it. I personally find Oracle the company
distasteful but the facts are the facts and Oracle does scale.

~~~
ebiester
Sometimes I wonder... if three fortune 500 companies just spent the same
amount as their Oracle budget on Postgres devs, they could probably have a RAC
competitor in three years.

Then, just another billion on transition costs, right? :)

~~~
kls
I agree, there is no technical reasons that postgres could not be as easy or
as good at scaling as Oracle. Someone just has to put the money and effort in.

------
acangiano
DB2 pureScale (<http://programmingzen.com/2009/10/21/what-is-db2-purescale/>).
Checkmate.

------
js4all
Sure SQL databases can scale. Oracle for instance has a parallel query option,
but it costs real money and needs a lot of planning and configuration.

NoSQL databases scale much easier. Just add a new node and you are done. +1.
Many of them are also freeware. +1 again. Many of them are also faster,
because no parsing is done, +1. Many of them don't have and don't need
locking. +1. No looks mean no write waiting. +1.

Decide yourself.

------
sunjain
Oracle RAC may come closest to addressing most of these issues. Of course it
is pricey. However it fulfills pretty much all the constraints mentioned here
- application transparency while scaling(sharding does not involve changing
the app), horizontal scalability(add nodes as needed), failover for both read
& write transactions. It sounds like sale-pitch for RAC but it seems to geared
to deal with these kind of scenarios.

~~~
jhugg
RAC uses a shared disk; it doesn't eliminate the single point of failure. The
shared disk also adds a contention point, such that RAC often stops scaling
after a handful of nodes. Even the first few don't give you linear scalability
without changes to the application and extensive tuning.

I'm not trying to defend the article though.

------
chanks
(2009)

------
acconrad
This is baffling to me that the founder of Heroku, a service specifically on
automated web scalability, would write an article on how SQL doesn't scale,
when he uses Postgres as the de facto database for all of his customers.

------
zipdog
Is this why Google developed BigTable? Have other companies developed their
own BigTable-like solutions?

