Relational database alternatives aren't as applicable as they make them sound. They've provided some examples. The first one is a social networking profile with a list of interests. "Awesome! I can just store things in a JSON-like list and not have another table or anything!" Oh, what if I want to find all the users who have 'ruby' as an interest? Oh, I can only lookup by key? And yes, there are ways around that. You could create a table of interests and each interest would be a key in that table and it would have a list of people with that interest. So, you have 'Joe' => 'interests': 'ruby', 'blokus'; 'Amanda' => 'interests': 'ruby', 'rugby'. And then you have an interests table with 'ruby': ['Joe', 'Amanda']. And when you want to search by interest, you select the document from the interests table and get the list of people and then request the documents for those people. But is that more efficient? No. It both requires more code and takes longer to execute. A RDBMS is able to optimize a join on that data in a way that you can't.
Similarly, with the e-commerce orders: what if you want to retrieve by the product ordered? It's not unreasonable to assume a situation where certain products are fulfilled by warehouse A and others by warehouse B. Well, you're in the same boat again doing a less efficient thing.
Plus, what is hard to scale when it comes to a database? The article is passing off SELECT statements as hard to scale. Now, I'm not saying that they're a piece of cake. You can get into a lot of trouble. However, reads are relatively easy to scale since you can just add more boxes. Writes are hard to scale because, unless you shard and do other hard things, you only get the power of one box for writes (since the writes have to be done on every box while a read only has to occur on one box).
So, even if you're using a document based store, you eventually have to shard. Now, when you never relate data, sharding can be a lot easier since it can be done based on a hash function. Systems like memcached do this automatically. So when you say get(1, 42, 64, 128) it will be able to hash those ids and send each request to the proper server for that item. But that means that you lose out on a lot of ease. And most of these alternative stores don't do that for you (and it's why memcached is such a useful tool alongside a RDBMS).
And SQL databases do scale a lot more than Heroku (in these two articles) seems to let on. Wikipedia, Facebook, Craigslist, and Flickr are all backed by MySQL. Now, not MySQL alone. Memcached is a big part of it for all/most of them (I don't remember what each site uses exactly). There's a reason why many of the largest sites use a SQL database and it's not because they're unaware of other storage engines.
It seems like Heroku might be getting a lot of complaints that the service isn't magically scaling. Computers aren't magic and document based databases aren't any more magic. CouchDB uses B-Tree indexes just like relational databases. The difference isn't so much that these data stores offer better performance for some lookups. It's more that they only offer the lookups that can have good performance.
I feel like I should offer some free advice to Heroku: your SQL databases would scale better if your dedicated SQL boxes were Amazon's high-RAM boxes rather than the high-CPU ones they opted for. RAM means more for database performance than CPU. Oh, also, offer some consulting for clients on their database woes. A lot of the time, people are doing lookups that should be using an index, but they haven't created it and so the database has to do a full table scan rather than an index lookup. And there's a big difference there. A full table scan of 1 million records will take 50,000 times longer than an index scan. Yeah, indexes are good. One of the reasons that CouchDB "scales so well" is that you can only do queries on things you've made indexes for.
I'm not saying that non RDBMS don't have their place. They do. However, we keep seeing articles posted about RDBMS not scaling and it doesn't seem like they quite know the purpose of scale. Scale doesn't have to be infinite. Nothing will do that. The questions is: will it scale enough to handle the traffic? And SQL databases will unless you're really, really big. Do you expect your site to become a top 500 site on the internet? Heck, WordPress.com at #20 is doing fine implementing a not very efficient SQL backed blog. Now, I'm sure they implement caching and such, but it's still SQL backed.
Basically: learn a lot about indexes. If you become one of the top sites on the internet, hire someone who can help you. In the meantime make your product and don't worry too much about the FUD.
Oh, also, offer some consulting for clients on their database woes.
We do this at the day job. It runs about $X00 an hour, with a minimum of Y hours, if our customers need it. Heroku, on the other hand, has a lot of customers who are wondering how far their $50 a month is going to get them. For these customers, many of whom are Rails types who don't quite grok indexes, it might be a better solution to say "Um, look, rather than us teaching you a core engineering skill that you're manifestly unwilling to pay for, how about we suggest a technology stack that makes this skill unnecessary".
Previously, one of the Rails hosts (Dreamhost or Heroku, can't remember) released stats saying something like 97% of customers create no indexes. I totally understand how that can happen, too -- you expect ActiveRecord to be magic, and with what it does it is very powerful magic, but it is not magic that totally obviates your need to think about database design. (Edited to add: My business runs on Rails, I consider myself to have low to intermediate SQL ability, and if you contact my day job to get consulting on your database woes you won't get handed off to me anytime soon.)
If there are Rails developers who work with applications of any size at all and are not familiar with indexes, the problem isn't scaling - it's lack of knowledge of one's application's stack. The mentality that one's app should magically scale without any idea of what's under the hood is toxic.
Anyway, it doesn't get much simpler than: add_index :users, :account_id
Near trivial example: what index or indices do you need to support the business requirement "I want to know how long users stay active after they sign up, and I want you to be able to slice that data by signup date and by whether they're paying customers or not."
So programmer Bob goes off and does this.
"Oh, Bob, the screen only lets me slice the data by signup date and by customer type, but I want to slice by both at once."
So Bob makes a one line tweak to his controller (to use both conditions, instead of one or the other)... and BOOM, down goes the poor server.
Well this is a requirement in the Business Intelligence domain, so you should create a reporting database (probably a star schema) and put an analytics package on top of it. You'll get easy sub second queries.
Oh definitely - sorry, I didn't mean to be flippant or to suggest that dropping indexes on everything were some sort of magic scaling powder. I once worked on an app a client brought in from an outside company that had slapped indexes on every single column in the database (including longtext - it was Postgres). I've never seen something so broken.
But yeah. It's complicated. "Software is hard." But the best thing you can do is become aware of your ignorance, then try to eliminate what you can.
Such business requirements often fall under the "I do this 20 times a month not 20 times a second" so they don't need a full index. But, name one of the technology's that "scales" that handles this better than a modern SQL DB.
Could be that 97% of customers have no need for indexes. I have none on my personal site; with rows numbering in the dozens and pageviews/day numbering about 20, I could probably brute-force search over the database in Python and it'd still be fast enough...
I have some sites that don't have indexes. At <1000 views/day it's not necessary. Caching generated HTML is as easy and more effective in making it faster.
"Awesome! I can just store things in a JSON-like list and not have another table or anything!" Oh, what if I want to find all the users who have 'ruby' as an interest? Oh, I can only lookup by key? And yes, there are ways around that. You could create a table of interests and each interest would be a key in that table and it would have a list of people with that interest.
This is not true of CouchDB. Indexing is done on the keys generated by arbitrary Javascript views. A view returning results keyed by interest is trivial.
the writes have to be done on every box while a read only has to occur on one box
That isn't true. Well, it is with MySQL, but not with Oracle, and hasn't been for over a decade. As usual, most complaints about "SQL databases" (no-one who actually does databases calls them that) are really complaints specific to MySQL.
Similarly, with the e-commerce orders: what if you want to retrieve by the product ordered? It's not unreasonable to assume a situation where certain products are fulfilled by warehouse A and others by warehouse B. Well, you're in the same boat again doing a less efficient thing.
Plus, what is hard to scale when it comes to a database? The article is passing off SELECT statements as hard to scale. Now, I'm not saying that they're a piece of cake. You can get into a lot of trouble. However, reads are relatively easy to scale since you can just add more boxes. Writes are hard to scale because, unless you shard and do other hard things, you only get the power of one box for writes (since the writes have to be done on every box while a read only has to occur on one box).
So, even if you're using a document based store, you eventually have to shard. Now, when you never relate data, sharding can be a lot easier since it can be done based on a hash function. Systems like memcached do this automatically. So when you say get(1, 42, 64, 128) it will be able to hash those ids and send each request to the proper server for that item. But that means that you lose out on a lot of ease. And most of these alternative stores don't do that for you (and it's why memcached is such a useful tool alongside a RDBMS).
And SQL databases do scale a lot more than Heroku (in these two articles) seems to let on. Wikipedia, Facebook, Craigslist, and Flickr are all backed by MySQL. Now, not MySQL alone. Memcached is a big part of it for all/most of them (I don't remember what each site uses exactly). There's a reason why many of the largest sites use a SQL database and it's not because they're unaware of other storage engines.
It seems like Heroku might be getting a lot of complaints that the service isn't magically scaling. Computers aren't magic and document based databases aren't any more magic. CouchDB uses B-Tree indexes just like relational databases. The difference isn't so much that these data stores offer better performance for some lookups. It's more that they only offer the lookups that can have good performance.
I feel like I should offer some free advice to Heroku: your SQL databases would scale better if your dedicated SQL boxes were Amazon's high-RAM boxes rather than the high-CPU ones they opted for. RAM means more for database performance than CPU. Oh, also, offer some consulting for clients on their database woes. A lot of the time, people are doing lookups that should be using an index, but they haven't created it and so the database has to do a full table scan rather than an index lookup. And there's a big difference there. A full table scan of 1 million records will take 50,000 times longer than an index scan. Yeah, indexes are good. One of the reasons that CouchDB "scales so well" is that you can only do queries on things you've made indexes for.
I'm not saying that non RDBMS don't have their place. They do. However, we keep seeing articles posted about RDBMS not scaling and it doesn't seem like they quite know the purpose of scale. Scale doesn't have to be infinite. Nothing will do that. The questions is: will it scale enough to handle the traffic? And SQL databases will unless you're really, really big. Do you expect your site to become a top 500 site on the internet? Heck, WordPress.com at #20 is doing fine implementing a not very efficient SQL backed blog. Now, I'm sure they implement caching and such, but it's still SQL backed.
Basically: learn a lot about indexes. If you become one of the top sites on the internet, hire someone who can help you. In the meantime make your product and don't worry too much about the FUD.