

Ask HN: How to deal with the C10K problem with a simple script? - jbm

I was asked by a client to setup a survey site for Japanese mobile phones that would be displayed at a major event that would be taking place at the Tokyo Dome.  The problem is that I'll be dealing with 10k users connecting to the site at the same time, and that the site will only be up for a day.<p>The client says that he wants no connections dropped, and to prepare the infrastructure so that it can handle it. (My initial idea was to try to find a web service to handle it, but even the best could only deal with something like 150-200 simultaneous connections.)<p>My idea is to setup an Amazon EC2 instance with nginx or Lighttpd and to make a simple Ruby or PHP script to handle the form that is submitted, and store the submissions to the Amazon SimpleDB.<p>Unfortunately, I've never dealt with scaling and load balancing for something like this, and I can't find any information about the kind of load I am looking at.  Can anyone offer any advice on how to handle scaling with EC2 or any other cloud service?
======
patio11
You're overthinking this.

The client doesn't mean that 10k people are going to hit the site at the same
microsecond. The client means that there will be 10,000 eventgoers accessing
the site within a particular period. Many of the clients in this industry,
being _cough_ fundamentally non-technical people, believe that the period of
reference is "one day".

As you've probably noticed, a $20 a month VPS plus Apache can _easily_ chew
through 10k visitors in a day (I wouldn't use Apache personally, but if I were
competent at configuring it, yeah, no problem whatsoever). Adjust upwards if
your client anticipates a particular instantaneous deluge (think "Steve Jobs
stands up at the keynote and says 'My robot minions, all of you need to hit
this URL right now!').

P.S. Your client is _wildly_ overestimating the audience that will actually
come to the site. Don't tell him I said that.

------
cperciva
This isn't the C10k problem. The C10k problem is "how do you handle 10k mostly
idle connections" -- the situation where per-concurrent-connection costs
matter (e.g., you really don't want to have 10k processes, each handling a
single connection).

Your problem is simply "how do I handle huge amounts of traffic". The first
half of the answer, as you guessed, is to use EC2; the second half is to throw
traffic at it and make sure that you get the response (whether it's in being
able to handle the necessary number of requests or scaling automatically) that
you want.

The middle half is to make your code as efficient as possible; I would not
recommend storing directly to SimpleDB. Just log form submissions locally and
then process them as a batch.

~~~
chuhnk
I have to agree with this. If you have to process 10k submissions in a very
short space of time, why have overhead of storing it in the db right then and
there when you can asynchronously process it later.

If you want something interesting to work on, you could have nginx for the
front end (faster than lighttpd), accept a form submission which gets written
to disk and maybe pop an id on a key-value store like redis, and another
poller app in ruby which pops id's off the queue and processes the files on
disk.

