SQL Databases Don't Scale

mbrubeck · on Oct 4, 2009

It's not the scale, it's the reliability.

When I worked at Amazon.com we had a deeply-ingrained hatred for all of the SQL databases in our systems. Now, we knew perfectly well how to scale them through partitioning and other means. But making them highly available was another matter. Replication and failover give you basic reliability, but it's very limited and inflexible compared to a real distributed datastore with master-master replication, partition tolerance, consensus and/or eventual consistency, or other availability-oriented features.

I think everyone is missing the boat by evaluating datastores in terms of scalability. Even at Amazon's size, database scaling is only occasionally a problem. By the time your site is that big, hopefully you've split your software into independent services, each with its own (relatively) small database. And for the services that do need a distributed cache or persistence layer, there's now a huge body of literature and software that solves those problems.

Reliability is another matter. Every business can benefit from improved reliability, regardless of size. And reliability is harder. Scaling is basically an engineering problem, which can be solved with the right architecture and algorithms. Reliability is more like security: it's a weakest-link problem and requires constant vigilance in all areas, including engineering, testing, operations, and deployment. Adding redundant components improves reliability against hardware failures (which are independent between hosts) but not software bugs (which typically are not). Many of the comments on http://news.ycombinator.com/item?id=859058 are missing this point.

Anything that mitigates reliability problems has huge potential for saving costs and increasing value. CouchDB, for example, doesn't even provide basic scaling features like automatic partitioning, but it does provide reliability features like distribution, replication, snapshots, record-level version history, and MVCC. That is what makes it interesting and useful in my opinion. Similarly, although people talk about Erlang as a concurrency-oriented language, it's really a reliability-oriented language. The concurrency in Erlang/OTP is not designed for high-speed parallel computation. Rather, it's one part of an overall strategy for reliability, a strategy which also includes the ability to update code in running processes, a standard distributed transactional database, and built-in process supervision trees. (Steve Vinoski wrote something similar in his blog a while ago: http://steve.vinoski.net/blog/2008/05/01/erlang-its-about-re...)

timr · on Oct 4, 2009

"By the time your site is that big, hopefully you've split your software into independent services, each with its own (relatively) small database. And for the services that do need a distributed cache or persistence layer, there's now a huge body of literature and software that solves those problems."

Yes, yes, a thousand times, yes. We seem to be slowly moving toward this ideal at JTV as we grow, because scaling monolithic database architectures is a pain, and it isn't reliable. Even if you scale your database via replication or sharding, you've still got a global choke point for reliability and performance. If your database(s) get slow, if replications lag, if a server dies or whatever, there's a good chance it will take down your whole site. On the other hand, if you use services, you can lose individual features without taking down everything.

I would go so far as to say that it's probably a good idea to build a service-oriented architecture from the start. It's only a bit harder to do, it keeps your code base cleaner and more orthogonal over time, and it makes scaling much easier, should the need ever arise.

bayareaguy · on Oct 4, 2009

Amazon's CTO certainly is a big proponent of SOA.

Here is a light interview (by none other than Jim Gray) on this published in ACM Queue from 2006.

http://portal.acm.org/ft_gateway.cfm?id=1142065&type=pdf

ecq · on Oct 4, 2009

As i mentioned earlier, you can have a highly-scalable "shared-nothing" architecture using existing RDBMS's. It depends on your design and implementation.

Also, when used in active-active configuration, replication actually decreases the reliability of your application by the number of replicated nodes, because you're increasing the number of point of failures in your application (although scalability increases). Replication increases reliability only if used in an active-passive configuration. I've ran into this issue over and over again.

mbrubeck · on Oct 4, 2009

That's exactly why businesses like Amazon that are concerned with both scale and availability are using non-relational databases that replace ACID with "eventual consistency" or other weakened/modified guarantees. Then you can have active-active replication while increasing (rather than decreasing) reliability in the presence of node failures and network partitions.

ecq · on Oct 4, 2009

That's only applicable to applications that does not require ACID. I've seen architectures where sharding + active-passive replication was used to provide scalability without sacrificing reliability (ACID-compliance).

here's an interesting paper on sharding with Oracle vs. Mysql

http://www.oracle.com/technology/deploy/availability/pdf/ora...

mbrubeck · on Oct 4, 2009

The CAP theorem is a very strong limit on providing both availability and consistency in any distributed system. In your sharding+replication example, what happens when the datacenters containing your master and slave lose their network link? There's no way you can maintain write availability for clients in both datacenters while also providing the ACID Consistency guarantee. (But systems like Dynamo or CouchDB can do so while guaranteeing eventual consistency.)

After seeing some real-world, big business problems solved with weakened consistency guarentees, I'm skeptical that there are as many problems that "need" ACID as most people think. Rather, I think that (a) most engineers have not yet been exposed to the reasonable alternatives to ACID, and so have not actually thought about whether they "need" it, and (b) most businesses do not yet have the knowledge they need to weigh ACID's benefits against its availability costs.

ecq · on Oct 4, 2009

I agree with the CAP theorem and it applies to my example. In my example replication only provides a backup copy of the data and is not used by the application, that's why it's active-passive (provides reliability). This configuration provides the highest level of data protection that is possible without affecting the performance of the master database/shard. This is accomplished by allowing a transaction to commit writes to the master database. The same transaction is also written to the slave, but that transaction is optional and written asynchronously. Two-phase commit is not performed.

peoplerock · on Oct 4, 2009

Yes it's interesting. Yes it compares Oracle and MySQL.

For balance it would be helpful to have an analysis from a more independent source than Oracle.

jbellis · on Oct 4, 2009

Sharding isn't scaling. It's not application-transparent, and it's an operational nightmare.

ecq · on Oct 4, 2009

Sharding isn't scaling

Sure sharding is scaling, although there are several types of sharding, with varying degrees of scalability. Usually people use a combination of both database (which what I was talking about) and table sharding (w/ replication,clustering,etc.) to achieve scalability. I've encountered several highly-scalable db environments like these.

It's not application-transparent

There are several databases that have features that can make sharding transparent to your application.

and it's an operational nightmare

it depends on the RDBMS that you're using.

dasil003 · on Oct 4, 2009

I could do without the sensationalist headline. In fact I've pretty much come to hate any headline that says "X doesn't scale". It might be fair in some circumstances, but most of the time it's an indication of an uninformed or koolaid-driven opinion.

Scale is all relative, the only thing that scales infinitely are problems where the parts are all independent. Therefore you need to pick a solution that will scale appropriately to your needs. For the vast majority of applications in existence, the properties of SQL databases scale just fine for their needs. Where advanced SQL doesn't scale, you can often squeeze several orders of magnitude more scalability through a variety of optimization techniques including sharding and denormalization. A "NoSQL" (for lack of a better term) database might have better "scalability" out of the box, but will also tend to require more work up front wrangling data sets and more bugs than well tested SQL databases provide for the same data management functionality. Some people think this is offset by the lack of a parity mismatch between OO code and relational data sets, but I think that is more a symptom of people who don't really grok the relational model and the power and assurances it provides.

Increasingly as we move to more and more cloud services there is definitely a need for new approaches to more scalable data stores. However there's not going to be something that comes along and is both more scalable and equally as powerful as the relational model across the board. It's going to require tradeoffs to solve different classes of application-dependent scalability profiles. However the relational model is a solid and proven theoretical basis for describing the majority of data we have. Sure you have some edge cases like graphs and trees, but most data is not better described by key-value stores. That's just wishful thinking.

timwiseman · on Oct 4, 2009

Like most decisions, choosing your datastore is all about deciding which trade offs are best for you. For a great many of those situations, it will be SQL, but for some others it will not. But each way comes with compromises.

greml1n · on Oct 4, 2009

They drank Flavor Aid, not Kool Aid.

johnnybgoode · on Oct 4, 2009

Heroku has been hyped up far and wide as being some kind of magical scaling solution. Just give Heroku your app and it will scale infinitely on EC2, right? Wrong. As they've admitted before, they haven't done anything to scale the database. Adding more app servers is trivial by comparison.

This isn't a knock on the Heroku folks; to their credit, they've been open about the DB issue. But you wouldn't see this most basic piece of information in media coverage from TechCrunch and many others. Unfortunately (?) for Heroku, the press has been writing checks in their name that the company can't cash.

Edit: Literally seconds after I made this post, blasdel posted about the same DB issue: http://news.ycombinator.com/item?id=859545

cmelbye · on Oct 4, 2009

Yet adding more database disk capacity will cost you a fortune for anything more that 0.5GB...

sprachspiel · on Oct 4, 2009

I think scalability and reliability are mostly orthogonal issues and should be discussed separately.

Scalability is a farce. In my experience applications are implemented in such hilariously inefficient ways, because it's hard to see where the time and especially the IO is spent. But adding more machines is not a good solution, because it becomes even more difficult to understand the performance behavior of the system. Also, it transforms the system into a distributed system and therefore adds a whole new level of complexity and failure causes. Most websites don't have the some problems as Google and Amazon.

Optimizing the application could prevent the need for scaling out to multiple machines. Optimization is not rocket science: Find out the bottleneck and fix it. Often it is random IO, so either load the data into RAM or change the algorithm to use sequential IO (e.g. through batch computation).

Reliability means that the system must work correctly all the time and there is no way to fix it (e.g. a satellite on a mission). Thus, for most applications availability, which is the fraction of time a system works, is a more appropriate metric.

There are two basic strategies for achieving high availability in a system: perfection and fault-tolerance. Perfection is the default programming model, which assumes that every hardware and software component of the system works as expected. Fault-tolerant systems, on the other hand, are hard and require that the system is designed around this idea. Also, fault-tolerance makes it harder to change the system. An often overlooked fact is that for many systems it is actually much easier to achieve a particular availability goal through perfection rather than trying to build a more complex fault-tolerant system.

blasdel · on Oct 4, 2009

Heroku : Ruby/Rack :: Google AppEngine : Python/WSGI

But there's a huge problem -- AppEngine succeeds at seamless multi-tenant truly-distributed clustered hosting thanks to BigTable. Heroku needs to support standard Rails apps, so Postgres is the best they can do, and it's a huge hole in their offering.

You just can't make Postgres (or Oracle) scale on an ideal horizontal the same way you can distribute IP, DNS, HTTP proxying, HTTP serving, memcache, message queues, or bigtable. You can't expose Postgres as an ideal service that just keeps up with what you throw at it.

alnayyir · on Oct 4, 2009

>You just can't make Postgres (or Oracle) scale on an ideal horizontal the same way you can distribute IP, DNS, HTTP proxying, HTTP serving, memcache, message queues, or bigtable.

Is there something about MySQL that exempts it from this scaling weakness that I'm not aware of?

I'm not bull-baiting, I'm just covering the bases and wondering if there's some reason it's not presently under discussion. Part of the reason I ask is because I've been pushing for a shift to Postgres at work after we get some more pressing matters dealt with.

johnnybgoode · on Oct 4, 2009

> Is there something about MySQL that exempts it from this scaling weakness that I'm not aware of?

No, there isn't. We're discussing Postgres since that's what Heroku uses.

alnayyir · on Oct 4, 2009

In that case, I'll carry on with my Postgres evangelism. Thank you for your time.

jacquesm · on Oct 4, 2009

we've been here before:

http://news.ycombinator.com/item?id=690656

I had to look for a long time to see why the dupe detector didn't pick that up, 'adam.blog' vs 'adamblog'.

jpwagner · on Oct 4, 2009

Good to link to the prior discussion, but that item is no longer open for discussion, so it's not inappropriate to re-post if an interesting discussion is to continue.

  feature request: dupe only detected when discussion is still open.

jacquesm · on Oct 4, 2009

The cutoff is only 45 days, we'd be literally flooded with dupes if that was done.

I think this was one of PGs recent optimizations to keep HN responsive, it makes good sense to archive articles when they're no longer being updated.

crad · on Oct 4, 2009

I think the author is unaware of a great tool for scaling PostgreSQL. A few years ago Skype released plProxy and pgBouncer, both of which are key to PostgreSQL scaling. What they in essence do is allow for transparent horizontal scaling of servers. One defines a cluster in plProxy and then when you run stored procedure queries, they remotely execute on the proper shared server based upon the primary key data. It's fast and effective and I have been using it in a very large OLTP environment using PostgreSQL, driving a US Top-25 pageview site.

crad · on Oct 4, 2009

I should clarify, plProxy is responsible for the transparent horizontal scaling. pgBouncer is a proxy that allows you to run all of this using a minimal number of PostgreSQL backends, which is essential because of the amount of inter-process communication locking and information sharing in PostgreSQL. pgBouncer basically manages a smaller pool of active backends reducing the overhead in running high transaction velocity servers.

cturner · on Oct 4, 2009

I'm glad this was reposted because it links into something I've working on for a while. I'm on what must be the ninth rewrite now, but have had a breakthrough on a long-term problem this weekend (notation).

In the past I've noticed that organisations run into trouble when they have one schema with multiple apps off it, because schema changes become difficult. I'm trying to write an engine that contains application business logic without polluting it with user interface considerations. It aims to solve a problem for which many people currently use stored procedures. However, with a better syntax and being stateful it should be far more effective at enclosing the schema and supporting rapid changes. The dynamic of interacting with the application is a bit like a MUD, in that each session can only do things that are exposed to it via the 'room' mechanism.

I had intended that its mainstay backend would be relational databases. But I now realise - if I'm building business logic into this layer (which will be the single point-of-entry into the schema) the datastore can afford to be a lot less complex than a traditional relational db.

My focus this weekend, to 'prove' the new design, is a tool that wraps a generic-webapp-framework-style CRUD app around a schema on the fly. So if you're on crud screen rendered for the "person" table then there'll be an organisation-foreign-key datapoint in the generated user interface. It will be either a radio button list, dropdowns or search interface depending on the number of records.

How do map/reduce programmers manage the structural integrity of their schemas without the sort of foreign key protections that you get with relational databases? Is there an analogue to the ORM layer that they use for this?

mr_luc · on Oct 4, 2009

I had two thoughts:

a) Changing 'schemas' is supported by document, or advanced key-value, stores like Couch or Cassandra. You end up thinking of your objects more as hashes and program accordingly; everything else is just a mapreduce task (ie, you've updated your schema. If you coded well, nothing should break because of that, but you'd like to update all of the old objects. So write a mapreduce task, like a migration, that does that).

b) DataMapper, for ruby, is used mostly as an ORM; however, the "R" is pretty optional, as people have written data adapters for other architectures -- from CouchDB to the SalesForce API.

Those are just some associated thoughts. I'm playing around with Cassandra and Couch on my server at the moment (grr, when I have a moment).

jrockway · on Oct 4, 2009

SQL is not a data storage scheme, it's a query language. If you mean "relational database", say "relational database".

tzury · on Oct 4, 2009

Seems like nowadays, everyone's saying SQL database do not scale, while apparently, most crowded sites are using mySQL (or other SQL based) and scaling it quite well, producing speedy websites that serve billions of pages everyday.

kscaldef · on Oct 4, 2009

Serving billions of pages a day is not a particularly hard scaling problem if most of those page views only require reads. It's writes, and particularly transactional updates, that are hard.

drusenko · on Oct 4, 2009

yes, but even harder are the entire set of problems where the data is extremely interrelated, like Facebook's is -- making it very difficult to shard effectively.

datums · on Oct 4, 2009

I wouldn't put HA into the same bucket as scaling. Scaling includes performance where HA doesn't. You can take a simple html site and decide to eliminate SPOF (single points of failure). You immediately increase your OpEX. I enjoy scaling problems.

fauigerzigerk · on Oct 4, 2009

It's not so much the DBMS that doesn't scale. It's particular data access patterns in combination with particular requirements that don't scale. RDBMS alternatives do little more than educate me about that fact

ecq · on Oct 4, 2009

Teradata, Oracle RAC/Oracle Exadata,Vertica, Netezza, etc.

jbellis · on Oct 4, 2009

All (?) [edit: I stand corrected: only some] of which rely on A Big SAN, which falls under the "vertical" classification, i.e., scaling-by-throwing-money-at-it. Which is fine as long as you're aware that your price/performance sucks, and as long as your SAN is big enough.

wmf · on Oct 4, 2009

No, most of the scalable SQL databases are shared-nothing.

jbellis · on Oct 4, 2009

... A little more looking shows that while some systems, like Oracle RAC (the one I'm most familiar with) and Sybase, do rely on shared storage, many do not. Sorry to generalize from insufficient data!

ecq · on Oct 4, 2009

You can have a shared-nothing architecture using traditional "shared-everything" RDBMS.

gaius · on Oct 4, 2009

Translation: Heroku doesn't know how to scale databases.

Meanwhile, out here in the real world, banks, stock exchanges, airlines, telcos, retailers are doing transaction and storage volumes on relational databases that say otherwise.

Also "SQL database" is what the PHB in Dilbert calls 'em. No-one with any experience in the field uses that term.

chanux · on Oct 4, 2009

Apparently we can put old stuff on HN again.

http://news.ycombinator.com/item?id=690656