
SQL Databases Don't Scale - sant0sk1
http://adam.blog.heroku.com/past/2009/7/6/sql_databases_dont_scale/
======
nettdata
I've been an Oracle database architect for almost 20 years.

His whole concept of "SQL doesn't scale" is the typical crap I always hear
from people that are either not database experts, are using the wrong database
technologies, or don't know what they're doing. More than likely a combination
of all three.

And just because you can create an object model, and a simplistic data model,
does not make you an architect of large, scalable database systems.

I have, over the past 4 years alone, built Oracle-based, fully scalable
databases that handle over 25 million users daily.

Go read up on their RAC architecture, and the "shared everything"
implementation.

Sure, it's expensive, but it works great.

For instance, if you play sports games from the world's largest video game
corporation, all of your online transactions (achievements, etc) go through
exactly such a system that I spent a year architecting and implementing.

If you do any online banking in Canada, or with some of the larger banks in
the US, that, too, is on my resume.

I find that this type of FUD comes about from people that aren't good at
designing and implementing large databases, or can't afford the technology
that can pull it off, so they slam the technology rather than accept that
they, themselves, are the ones lacking.

Most of them tend to come from the typical LAMP/SlashDot crowd that only have
experience with the minor technologies.

Those of us that do it for a living, using the right technologies, seem to
have no problems whatsoever scaling SQL.

Just saying.

~~~
blasdel
You totally ignore the background of the author -- his company Heroku :
Ruby/Rack :: Google AppEngine : Python/WSGI

But there's a huge problem -- AppEngine succeeds at seamless multi-tenant
truly-distributed clustered hosting thanks to BigTable. Heroku needs to
support standard Rails apps, so Postgres is the best they can do, and it's a
huge hole in their offering.

You just _can't_ make Postgres (or Oracle) scale on an ideal horizontal the
same way you can distribute IP, DNS, HTTP proxying, HTTP serving, memcache,
message queues, or bigtable. You can't expose Postgres as an ideal service
that just keeps up with what you throw at it.

~~~
ianso
WRT exposing Postgres, you're right that you can't simply expose a database
and then expect the DB to scale under random selects, inserts, etc. (RAC is
another matter I guess.)

However, as I said in the first comment to the blog post, if you define an
interface using stored procedures (a pattern familiar to many Oracle DBAs),
then PL/Proxy (<http://pgfoundry.org/projects/plproxy/>) lets you do hash-
based partitioning in a way that's more or less transparent to the end-user of
a DB, again assuming that it has a defined interface. The PL/Proxy installs
form a 'bus' between the DB and the user.

Self-promotion: I'm currently working on hacking PL/Proxy into something that
can be used to auto-scale a Postgres cluster on-demand, which has interfaces
as a pre-requisite, along with some other things. The end-goal is to do
exactly what you say: to scale-out Postgres like any other internet service on
an ideal horizontal. Link: <http://code.google.com/p/hotrepart>

(Hum, I'm gonna get accused of spamming for repeating myself so often :-)

~~~
bayareaguy
While Skype's PL/Proxy is a great way to make PostgreSQL more scalable, it
doesn't do much to refute the basic argument that SQL databases aren't
scalable since the SQL it helps you scale is limited to short RPC style
operations.

~~~
ianso
That's a good point, and it brings to light an unspoken assumption underlying
my post, which is that the use-case is a web-based, read-heavy OLTP-style
system. If you were thinking of things like OLAP and data warehousing, then
I'd agree with you absolutely.

However, under my re-qualified assertion :-) for large, complicated commits,
the logic would either have to be at the database level for it to be in the
same transaction, or a solution using temporary tables could be put together
for more convoluted calls.

Neither of these are elegant, I'll grant you, but the two basic approaches -
longer transactions over logic closely coupled to the datastore, or staged
writes - are what most DBAs on high-end databases end up doing anyway, and
would probably be reproduced in some form or another in any ACID system, no?
Either way you'd still have a database that scales.

------
pj
Okay, see... here's more of the no-SQL agenda at hacker news. This isn't a
question of "Does SQL Scale or not?" but "How much do SQL Databases Scale?"

I feel like the propaganda here is that because RDBMS doesn't scale to youtube
or google scale they _suck_ and that's not true. Like SQL is a waste of time
because at some point, you're going to need to shard your database.

Look, at _that_ kind of scale, you're going to have problems with any solution
to any problem. Handling that kind of scale is going to be expensive no matter
what solution you implement, whether it's map/reduce or flat files or some
other solution.

But deciding to build a system from the beginning on something non-relational
because someday you may have to accommodate that kind of scale is an example
of premature optimization. The vast majority of features you get with SQL are
going to outweigh the limitations of noSQL.

I've worked on some pretty high scale systems built on SQL and yes, there are
problems, but there's just something irrational going on here and it's off-
putting. It's like we are throwing out the baby with the bath water or
something.

~~~
gaius
As others have pointed out, the "no SQL" crowd are invariably MySQL users who
have run into the limitations of MySQL but for ideological reasons can't state
that the problems they encounter are specific to MySQL.

DB2, Teradata and Oracle users regularly tackle problems 100x larger than
MySQL can handle.

~~~
trezor
Agreed. I also fund it hard to sympathize with these startups having ambitions
to scale to Google-like sizes, yet are unwilling to pony up cash for a proper
database system like Oracle or SQL Server.

Seeing the amount of ugly hacks people are willing to come up with and employ
and features they are willing to cut, just to handle _trivial_ loads, kinda
makes me think that MySQL can only be considered free if your time is
worthless.

~~~
timwiseman
I agree with you with a caveat. Everything you say is absolutely true, but
remember that many startups are highly focused on conserving cash, and moreso
than hacker time.

The retail price (I know discounts can be negotiated, but for a point of
reference...) for MS SQL Server Standard edition is $5000, enterprise edition
is $25000. I have never had to research Oracle prices but I understand they
run even higher.

I worked as an MS SQL Server DBA for a while for a mid-sized company, and I
wrote and employed some "ugly hacks" to emulate some of the Enterprise
features because management at the time was unwilling/unable to pay for
Enterprise Edition.

~~~
gaius
SQL Server and the rest of the Windows stack are effectively free under the
BizSpark programme.

------
TimothyFitz
RAID doesn't meet his definition of "scalable" because it has a central
controller.

This whole post could be summed up as "ACID doesn't scale", which has been
proven. Consistency, Availability or Partition Tolerance; pick two
([http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.20.1...](http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.20.1495)).

Good introduction to non-ACID databases: <http://highscalability.com/drop-
acid-and-think-about-data>

~~~
nettdata
Oh please.

The mere fact that he's talking "RAID" instead of SAN speaks volumes. (No pun
intended). Any storage engineer worth their salt would be rolling their eyes
right now.

The doc you linked to was authored in 2002, about the same time that Oracle's
RAC was introduced (late 2001). Some of his citations are from the 80's. THE
80'S!

DB technology has come a LONG way in 8 years, and that paper is no longer
valid, unless you're talking about some basic, "minor"/open source db
technologies.

If you want to say "open source DB technologies have problems scaling", then
go right ahead, and I'll agree.

Just don't mind those of us who continue to build large, scalable systems,
using the proper DB technologies, that disprove that "sql doesn't scale"
generalization.

~~~
rjurney
SANS are too expensive. They don't scale cost-effectively compared to
commodity PC hardware.

~~~
nettdata
I'd beg to differ.

Go take a look at Adam Leventhal's work with the Fishworks stuff.

Specifically, go check out the Sun Storage 7310.

It will scale HUGE, and is nowhere near the stupid cost of NetApp or EMC or
the other major vendors.

I'd love to see how a bunch of commodity PC's will scale to 100TB, and still
be manageable, and have anywhere near the same feature sets.

Again, if you're railing against something as ubiquitous as a SAN, I'm not
sure there's anything I can say to change your mind.

Not that I'm really here to change your mind.

Again, the original focus was about SQL not scaling, and you seem to be
fixated on the cost of that scaling.

~~~
moe
_I'd love to see how a bunch of commodity PC's will scale to 100TB, and still
be manageable_

100T is 666 spindles when you go RAID10 with 300G drives (common size in the
SAS/FC area).

If you go S-ATA then it'll fit on 200 spindles using 1T drives. I'll stick to
the latter variant for now because I'm too lazy to lookup Sun's pricing for
SAS spindles, for my comparison below.

So, 200 spindles amounts to roughly 15 hosts (throwing in a few spares for
good measure). The whole setup will comfortably fit into _one_ rack, including
the FibreChannel machinery and other fluff that you'll likely want.

Thus from the hardware side this is trivially managable, 100T is just not a
lot of data nowadays.

On the software side it's up to your creativity and mostly depends on _what_
you actually need to store. I've seen people setup commodity postgres
clusters, as well as more fancy things like HDFS, GFS or homegrown storage
layers that way. And it worked.

And, regardless of the indeed relatively sane pricing of the Sun (formerly
StorageTek) products, the bottomline is what makes the difference.

In figures, for a 100T SAN on the 7310 you're looking at something like $50k
for one head, plus around $75k for three trays. We're in the $125k ballpark,
hardware-only. And I'm being rather optimistic here: This setup actually holds
only 96T and the head is _maxed out_ (3 trays max per controller/head). That
means your next upgrade will incur another $25k markup for the next head, good
thing you didn't ask for 150T...

Squeezing the same amount of storage into 15 min-spec supermicro pizzaboxes I
arrive at roughly $3000 per node, including spindles. A good buyer will get
them cheaper. This commodity cluster sets us back only ~$45k in hardware.

That's 1/3 the price of the 7310 solution, being _optimistic_ on the Sun and
_pessimistic_ on the commodity side.

That kind of difference makes up for a lot of development effort for the
custom solution - most of which is a one time investment anyways and yields
better flexibility in the long run. It's also the reason why most of the big
boys don't use off-the-shelf SANs for their primary storage.

~~~
itgoon
FWIW, I've built one of those SuperMicro machines, with eight drives, around
the beginning of the year.

Allowing for redundancy (RAID 5 on six disks for data, and mirrored the boot
drives), we ended up with 6.5TB of storage at about $3500 (that includes i7
proc and 12Gb RAM, and a fancy controller). My labor added some more on top of
the $3500.

Mostly, we've been happy. It is more sensitive to heat issues than our HP G5s,
and of course, generates more of that heat. If I were to go with any kind of
density, I'd need a much more robust cooling solution.

~~~
moe
Yep, that spec would be over the top for a storage node, though. A large chunk
likely went into the controller and the i7 whereas in a storage-node most
cheapo controllers (i.e. 2x8 is cheaper than 1x16) and whatever old xeon/core2
will do.

Wrt cooling, for actual servers (i.e. not storage nodes) we've had good
success with Sun XFires, the 4100+ range. They are very well built and the
markup over the SuperMicro junk is minimal (around 15% last time I checked).
Among the niceties is a nice array of hot-swap fans - something that
SuperMicro still doesn't seem to deliver in their popular chassis.

I'd also consider the xfires for storage nodes if they'd take 3,5" drives, but
afaik all their low-end models only have max eight 2,5" slots. That's just no
worthwhile density when compared to the larger SuperMicro tins (which go up to
30x3,5" now I think).

~~~
itgoon
Most of the money went to the hard drives themselves. They've come down a lot
since (1.5TB had just been announced), but it was around 70% of the cost.
Everything else was relatively cheap.

The 2.5 drives are good, and seem to becoming typical, because they run much
cooler. We've got some G5s in the same closet, with just as many drives, and
they don't run nearly as warm.

------
antirez
I'm the author of Redis so I should be biased the other way around, but the
Redis and KV experience taught me that to be exposed to the KV parading was
for me a similar experience as learning the Scheme language, I started to
write code in imperative languages in a new way. In the same way once you
start thinking at scalability of your data in a new way, from the point of
view of partitioning the data, organizing this data in away that is easy to
access for your usage pattern, make judicious use of serialization, SQL
databases can be a good pick. Simply you need to abandon the paradigm of
_let's design our nice tables and run our multiple joins and group by
against_.

But of course designing DBs in this new way makes most of the SQL features not
needed in most scenarios... and you still have the overhead. And there are a
number of other limits now that complex designs are not a good idea, for
instance there is no way to get back the data in the natural ordering (the
order you pushed this data, or the reverse)...

This is where KV stores start to be interesting as real world alternative, not
just in order to learn the paradigm of scalability. But again, if you don't
trust KV stores it is truly possible to use an SQL database in a more
conservative and scalable way.

------
bayareaguy
_SQL databases are fundamentally non-scalable, and there is no magical pixie
dust that we, or anyone, can sprinkle on them to suddenly make them scale._

I find this claim laughable. If the databases Mr. Wiggins chooses to work with
fail to meet his scalability requirements, perhaps he should consider
different databases. Scalable SQL databases have been around for over 3
decades.

~~~
qhoxie
Your comment would hold more weight if you elaborated some more. Are you
referring to scaling by his standards? Please explain.

~~~
bayareaguy
Mr. Wiggins would have us think in terms of "true scalability" which conflates
the following concerns: more servers creates more capacity (the classic
definition of scalability), the business logic of the app is separated from
concerns of scaling server resources (this is not possible past whatever limit
you set and so has no classic definition) and no single point of failure (more
classically formalized as availability).

Companies such as Teradata have long offered SQL systems which meet the
classic definitions of scalability and availability.

I think TimothyFitz's reply above was accurate. Mr. Wiggins article would be
better titled something like "ACID databases have scalability problems,
especially cheap ones startups use" but then, like
<http://news.ycombinator.com/item?id=690653> , it wouldn't get much response.

------
Confusion
_So while sharding is a form of horizontal scaling, it fails point #2: it is
not transparent to the business logic of the application._

I do not believe this is usually true, for two reasons. One may be nitpicking,
but 'where to get the data' is not part of the business logic: it's pure
application logic, dependent on your solution of the problem. In that sense,
his argument is wrong. The other reason definitely isn't nitpicking: you can
solve the problem by adding a layer between your DAO's and the databases, that
handles the 'where to get the data' question. So yes, it requires some
programming, but it is still transparant to the business logic. It does not
require an invasive change in your application and I think he is grossly
exagerating this point.

~~~
prodigal_erik
I don't see a good reason for "which server has my data" to be in-your-face
application logic to code, when "which sector of which disk has my data" has
long since been delegated to the platform and forgotten.

~~~
Confusion
If you are bound to SQL databases, either for historical reasons or because
the no-SQL database make other problems harder, then 'scaling the application'
is a pretty good reason for writing that logic. There is no silver bullet.

------
alexgartrell
Crazy idea: Write your data-handling code in it's own tight little module. Use
good abstractions so that the rest of your app doesn't give a crap HOW it's
happening. When it's time to update to millions of clustered servers running
bigTable, you rewrite the module, and you're done.

Software Engineering saves the day!

~~~
alexgartrell
3 upvotes and 2 downvotes without a single response.

As a junior member of Hacker News, I demand an education when I'm downvoted!
:)

~~~
akeefer
I wasn't one of the ones who upvoted or downvoted you, but just because you
asked . . . while abstractions around your data-handling code are always a
good idea, on some level they always leak. (For example, see the classic post
[http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Compute...](http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Computer+Science.aspx)
describing how ORM, one of the most common approaches to encapsulating data-
handling code, tends to fail).

Most of your code _is_ data-handling code on some fundamental level, so it's
pretty inevitable that it's going to matter to your application how that data
is stored and what sorts of operations you can do on it, and that's especially
true if you care about performance (which you generally always do after some
level). So even if you have a nice abstraction of your query layer, the kinds
of questions you can ask the database efficiently depend heavily on the
underlying storage mechanism, and so your application logic has to be built so
it only asks the right kinds of questions.

Even abstracting away the differences between, say, Oracle and MS SQLServer
and MySQL is difficult enough, because of the different capabilities and
performance characteristics. When you do that, you basically end up coding to
the lowest common denominator, which can often limit what your applications
does.

Trying to come up with an abstraction that can encapsulate the difference
between a row- versus column-oriented database, or between a relational
database and some other kind of storage, is pretty much a losing proposition:
they're too fundamentally different in terms of what kind of data you can
store, how you can store it, how you can query it, what kinds of transactional
guarantees you get, and what operations are fast and which ones are slow.

So you really do kind of have to take your best shot at it, choose an
approach, and if you choose wrong and have to change, it's just going to hurt.
A lot. Good encapsulation and abstraction will ease some of the pain, but it's
more like drinking whiskey before your leg gets sawed off than it is like
general anesthesia.

------
stuff4ben
I thought it was a good summary of the problems facing SQL databases as
organizations grow. However I would have appreciated some solutions to the
problem.

~~~
dflock
I think one of the points made at the end was that there kinda isn't one -
people have been trying to come up with the perfect solution to scaling
RDBMS's for ever and the current state of the art (pretty much) is as
described in the article - i.e, not great. People with really massive MySQL
setups (eBay, for example) basically just do a huge amount of the stuff
described here, but with lots of message-queue type stuff to glue it together
and loads of hard work; i.e. you can make it scale but it's really really hard
work because RDBMS's aren't inherent scalable.

~~~
gaius
_the current state of the art (pretty much) is as described in the article -
i.e, not great_

That simply isn't true.

For me, thousands of transactions per second and 10s of terabytes of data on a
single database is normal. It's unremarkable, it's everyday, it's what we do,
we have done it for years. And I know of installations handling 10x that. It's
only people who's only "experience" is websites that whinge about how RDBMS
can't handle their tiny datasets.

~~~
dflock
What hardware is this running on, roughly? What the author was getting at, I
think, was not that you can't do it, just that it's hard/expensive; you need
fairly beefy hardware and experienced DBA's to manage horizontal scaleout
manually.

~~~
gaius
That's a very curious statement, especially for HN. Why would you consider
needing good, experienced people to be a _disadvantage_?

~~~
dflock
Because that kind of hardware+DBA team is prohibitively expensive for a
startup?

~~~
gaius
Hardware yes, but the basic premise of a startup _is_ that your people are
top-notch. Scalability doesn't imply building something up-front that can
handle enormous loads - it means building something that can grow with you. As
opposed to "oh crap, we've got some load now, better start again" a la
Twitter.

------
wildwood
"Sharding kills most of the value of a relational database."

I can appreciate the point the author makes about partitioning schemes
requiring heavy integration with the business logic, but I disagree with the
claim that sharding doesn't work.

It's probably more accurate to say that sharding only works well if you design
your database very carefully, or just get lucky about how your data model maps
to sharding schemes. A bad sharded database can probably hobble your app, but
a good one does get you remarkably close to true horizontal scaling.

In my experience, at least.

~~~
psadauskas
If you data model maps well to sharding, it maps well to distributed
databases, and you're not gaining much by being on an RDBMS. I think the point
they're trying to make is if you're going through the contortions to shard a
relational database, you're better off just doing it in a distributed database
in the first place.

~~~
TheXenocide
There's more than one way to skin a mongoose and in this case the more common
shard solutions are fundamentally different from distributed DB in that a
distributed DB often times contains large amounts of a a subset of data that
has to be merged with data on other DBs whereas a shard can have entire schema
duplication on different shards where you can use relational technology just
fine and even if the query is distributed you can just add rows to the end of
a dataset rather than joining the data columns manually (implementing your own
relational joins). In many cases you partition your sharded data in such a way
that the majority of your queries are not distributed so that they can
continue to leverage the relational model you had before, only performing a
distributed query for more complex actions like reporting and such.

Also going back to the first place isn't really an option, shards usually come
of a system that has grown, not ground up design. Ultimately I still support
the distributed model myself, but a shard model does support relational data
much more so than a distributed DB (at least out of the box).

------
brlewis
This article is about relational databases that you read from and write to. It
isn't SQL-specific. It's a good article, not another "SQL is obsolete" as you
might think from the title.

------
Shakescode
In the final paragraph he compares himself with Capt Kirk, facing the
Kobayashi Maru: "we can only solve this problem by redefining the question."

Well, perhaps that's as difficult as solving the "SQL scalability that 20
years of brilliant programmers haven't solved"?

...because he declines to _attempt_ to reframe the question.

[ or maybe he's leaving us wanting more: like his next post? ]

Anybody up for reformulating the question?

------
ojbyrne
At a management level there are larger problems. SQL was a solution to the
"letting programmers run the show" problem. I.e. a commoditized language that
executives could mostly understand and budget for. See pretty well everything
Philip Greenspun has written.

------
JulianMorrison
Scale like what? If you aren't IBM, Facebook or Google, why do you even care?

~~~
jacquesm
ouch, I'm not IBM, Facebook or Google but SQL scalability issues are a good
part of my daily workload.

(the other part is made up from file system scalability issues).

Once you you get past a certain level these are non-trivial problems and
anybody out there that is busy solving them has my interest.

------
tybris
Can anyone give me the context? I don't believe in absolutes. I only believe
in "it depends on the application/budget/knowledge/etc"

------
antidaily
Maybe I am getting old, but I no longer find arguments over semantics as
interesting as I once did.

~~~
jmtulloss
I agree, but I'm not sure this is purely a semantic argument. Care to
elaborate?

------
tommy_chheng
Isn't Facebook using MySQL?

------
trezor
Summary of an article about how "SQL Databases Don't Scale":

    
    
      1. Mentions RAID, not SAN
      2. Mentions MySQL and only MySQL (with the exception of PostgreSQL once).
      3. Mentions Master-slave replication as a killer scalability feature.
    

I suggest he rename the piece to "A $500 server and MySQL Don't Scale" and
then we can all agree and get along.

~~~
khaless
Yeah, there certainly are a few more tricks with regards to scaling RDBMS than
the author covered. Depending on what your demands are there are different
techniques and protocols which best suite you and can go very far to solving
your problem. But with that in mind there may be better ways to solve your
problem, and we should not forget about those.

If You start sharding you may get write gains, but you have to work very hard
to keep things consistent (depending) and you may have to duplicate shards to
make them highly available ($$$). Oh - and later down the track your schema
might change in ways which your sharding scheme is just not flexible enough to
deal with and depending on who you are that may be too much of a risk.

Besides the issue is really with availability, consistency and performance. It
is very hard to scale all three of these together and even your cashcow
solutions will hit their limits (although some of their limits are quite high
:))

