Handling over 300 million events per day, in real time, on a budget

kprobst · on Jan 22, 2011

Glad to see someone's doing something like this on top of the Microsoft stack. I was expecting the usual Linux+Ruby+Hadoop+Redis+etc thing. Pretty cool.

jasonkester · on Jan 22, 2011

You can say what you like about Microsoft, but their server stuff just plain works.

You never really hear stories about how to make it scale because so long as you don't go out of your way to shoot yourself in the foot, it just sorta scales on its own without you thinking about it. Every piece of the stack is crazy fast, and they're all built by one organization to work together.

In that light, it's not all that surprising that it works so well. You see a lot of bias against MS in tech circles, but this is one thing that they're actually pretty good at.

quickpost · on Jan 22, 2011

Markus Frind is also doing some pretty huge volume on a Microsoft stack with plentyoffish.com...

http://plentyoffish.wordpress.com/2006/06/10/microsoft-aspne...

kprobst · on Jan 24, 2011

That's impressive. Thanks for the ref.

benologist · on Jan 22, 2011

The MS stack doesn't get much love but it does get the job done, probably no worse than anything else. There are languages and platforms better suited to some of what I'm doing though - MongoDB is something I plan to explore a lot more this year for some really heavy duty stuff that would be a pain to do in SQL Server.

kprobst · on Jan 22, 2011

Incidentally, I'm working on a project now that is a good fit for MongoDB. I've been testing it with the stock C# driver running on Server 2008 and I like the performance so far. It takes a while to mentally switch paradigms (for me at least) but once you "get it" it's turtles all the way down.

benologist · on Jan 22, 2011

Yeah it's really something else thinking in Mongo's query style. Spent a lot of time in Google and still do when I have to go in to the console heh.

AlexC04 · on Jan 22, 2011

Ben, this is really interesting. I was under the impression that Playtomic was a one man operation, but the idea that you're managing 300 million requests a day boggles my mind. (at my office we've got a team to handle that).

I'll have to go through your article with a fine toothed comb - but that's really quite a feat.

benologist · on Jan 22, 2011

It is and it isn't a one man operation - I'm the only one actually programming / managing the servers / creating it all etc, but in the background is my friend Antonio who helps finance the development and has for a long time now, and very recently joined is my friend and over the years sometimes boss Brian.

It's not really fair to say 'I' anymore as we ramp things up on the business side of it and formalize everything.

jbaker · on Jan 22, 2011

Thanks for the article. I have a question. What kind of latency do you experience using MongoHQ? And maybe you could discuss this part of the system a little more?

benologist · on Jan 22, 2011

The latency gets up to about 60ms+ but it's mitigated by caching a lot of the data temporarily. MongoHQ are using Amazon so if you're also using AWS it's going to be negligible.

I use MongoDB for a few components - user-created levels and leaderboards are the biggest parts. The big advantage is I can let developers attach custom data to scores/levels without having to even think about it - it slides right in just like any other property, and it's just effortless. The other big advantage is price - they're a lot cheaper than getting another server.

The leaderboards is the biggest one running at a total of almost 10 gigabytes because of the indexing.

I use MongoHQ because there's a learning curve with MongoDB that I never have time to explore properly, originally I ran it myself and even got it sharded and replicated, but when it falls over (and it did) you really need to know a lot more than I do about it. Those guys are awesome and I plan on building more on their service.

I use the Samus C# library for it, I noticed recently there's an official one now but it didn't exist when I started using it. One of these days I will upgrade to that.

Here's the leaderboard database stats. Mostly it's just lots of reads, and only if whichever server doesn't have the requested data in-memory already.

  btree accesses 	0.47 /sec
  btree miss ratio 	0.2%
  global lock ratio 	0.1%
  indexes	8
  objects 	24,222,154
  op deletes 	0.00 /sec
  op get mores 	0.73 /sec
  op inserts 	0.57 /sec
  op queries 	4.02 /sec
  op updates 	0.02 /sec
  all ops 	0.02 /sec
  size of data 	3.3 GB
  size of index 	6 GB
  size of storage 	3.7 GB
  uptime	4 days
  connections 	128