Erm, that solution is a little extreme, unless there are a few steps missing.
The first things I'd do:
1. Aggregate those inserts into a single SQL command. You never want to be issuing O(n) SQL queries per request.
2. Use InnoDB, not MyISAM, for the database engine. I assume it's MyISAM because he's talking about table locking. InnoDB has row-level locking, so you'll be able to INSERT and SELECT from a table concurrently.
I had a Facebook app with 20MM MAUs, and it worked fine with two machines. A component of the app was voting in polls, and I was processing ~10k votes per second. Each "vote" is an insert into a votes table.
Soo...not really sure how you go from "We're running too many INSERT queries and locking the tables!" to "Async queues and MongoDB!"
I could not agree with you more here. And they are using InnoDB since it's Rails-based. Wrapping up all the INSERTs into a single query or transaction would really help.
I'd also say consolidating the validations into a single SELECT would help, but it seems they threw the validations out when moving to MongoDB.
And on the server level, Passenger isn't optimized for running a single site. I use Unicorn for my Rails app.
0. Make the right kind of table.
create table user_apps (user_id integer, app_id integer, primary key (user_id, app_id)); -- this does the validation for you, and (should) index the table on both user_id and (user_id,app_id).
1. Make the right kind of insert statement!
insert into user_apps select 5, 6 union all select 5, 7 union all select 5, 8; -- etc
2. How do you do recommendations? That sounds like it'd be tough to do well, especially within a web request.
I didn't read the part about validations. That's silly.
Even if you have to do it in the application layer, why would you need more than one SELECT? Just get the installed apps from the DB in a single query and do a set difference operation to only get the apps not already installed.
But I agree that putting the constraint in the DB layer is the correct solution. With MySQL you can just do INSERT IGNORE to only add apps not yet marked as being installed.
Yea I agree. From what I can tell from the blog post, they aren't even close to the limits of what mySQL can handle if properly configured unless they are running it on some kind of super tiny virtual machine. Seems like they just jumped the gun and over engineered it.
Indeed it is Tumblr's erorr page (http://tumblr.com/). It's interesting (re: shady) that they don't put their name or logo anywhere on the page. Most people visitors will never know it's Tumblr's fault when they attempt to visit a Tumblr-hosted site and run into this page.
It seems that a lot of the INSERT and SELECT issues could be addressed via a stored function/procedure in a database such as PostgreSQL, MySQL's stored functions/procedures aren't there yet. You would be able to do the validation within the stored procedure, or even in a trigger, which would likely be even more efficient. I always find it odd when the solution is to use something such as Mongo DB, and continue to write more of your database logic in your app, rather than in your database, where it can run fastest without any unnecessary round-trips.
You really gave us no information at all to justify your decision. For all we know you were running a 5.0 build on a 7200rpm disk with the default my.cnf settings.
My take away from this is to plan for traffic spikes from the start. You just never know when (if) you will suddenly get mentioned on a big blog or something.
Maybe write the many to many relations in a background job/queue, and push the result back to the user when they're synced. Granted, long polling will take a number of connections, but the socket server won't take down your front facing service.
Great to see that MongoDB turned out to be a quick solution, but it seems that the writes could have just been timed more effectively.
We might have been able to get away with raw SQL, but because of some of the complexities and dependencies on this same code, I felt like it wasn't the prudent choice during a fire. I wouldn't have had time to properly test it.
As for moving the inserts to an async job, that was one of our thoughts, too. Really it's what we did, but we just have the intermediary step of MongoDB. We wanted that step (rather than just immediately returning to the user and having the client not know when the inserts actually occur) because we wanted to keep the user experience great. Our app immediately sends you to personalized recs. We didn't want the initial experience to be: open the app => sent to a blank page of "recommendations."
An async job that didn't wait for any sort of confirmation would have left us in this position (or we would have had to implement a poller to check if the inserts were complete, which seemed wasteful and would need a push to Android Market which seemed less feasible.)
They did move their writes to an async job. They write the apps list into mongodb, then a daemon later copies it into mysql. Mongodb is being used as a cache (the frontend queries mongo then mysql)... this way they can read the data while also throttling writes to their main db.
The first things I'd do: 1. Aggregate those inserts into a single SQL command. You never want to be issuing O(n) SQL queries per request.
2. Use InnoDB, not MyISAM, for the database engine. I assume it's MyISAM because he's talking about table locking. InnoDB has row-level locking, so you'll be able to INSERT and SELECT from a table concurrently.
I had a Facebook app with 20MM MAUs, and it worked fine with two machines. A component of the app was voting in polls, and I was processing ~10k votes per second. Each "vote" is an insert into a votes table.
Soo...not really sure how you go from "We're running too many INSERT queries and locking the tables!" to "Async queues and MongoDB!"