

Scaling a startup from 0 to 40 hits per second in 3 days - mmaunder
http://markmaunder.com/2007/scaling-from-0-to-40-hits-per-second-in-3-days/

======
epi0Bauqu
I'm curious about your previous set up. You said mod_perl2, MySQL and Apache2
weren't cutting it for you, but I've scaled to that level fine with mod_perl2,
PostgreSQL and Apache2. The key was database pooling via Apache::DBI, all the
perl modules cached in a startup.pl via mod_perl2, keep alive and host lookups
(and a few other things) completely off in Apache2, and all queries using
indexes and indexes all in memory via postgres.

Early on, like in your situation, I did go to the file system at first because
I couldn't "make it work," but then eventually I went back to postgres after I
figured out its scalability details. If you have a lot of little files and you
hit them a lot, that will eventually probably become your bottleneck. Did you
figure out the bottleneck in your db setup or has there just not been enough
time yet? Just curious.

~~~
mmaunder
Ah, a fellow mod_perl hacker. :)

I'm using Apache::DBI, caching everything via startup.pl on load, keep_alive
is on but with a very small timeout, host lookups are off.

Also I'm using the worker MPM in apache2, just FYI. I've found it to be really
memory efficient.

I've been using MySQL with Apache::DBI for years and it's usually brilliant -
I ran WorkZoo.com, a high traffic job search engine with a combination of
MySQL and a full-text api.

With feedjit I'm basically storing weblogs. I either have to dump them into a
single table and query that - which is what I was doing and the high query
rate with read/write was a problem - or have lots of individual tables which
isn't feasible after about 500 with MySQL. So small files works best for me.

~~~
epi0Bauqu
We're talking one file per unique domain or per unique url?

Also, I take it from your comment that the bottleneck was in MySQL doing the
writes. I assume the read side is indexed appropriately so MySQL finds the
right part on the disk almost instantly. Do you think it is a locking issue
then, e.g. table lock vs row lock? (Forgive me, I haven't used MySQL in a
while.)

~~~
mmaunder
One file per URL. The problem with MySQL is pretty much what you've described.
I have (had) an index on a table that gets read a lot by the application. It's
amazingly fast - MyISAM table's really rock for fast reads on indexes. But
therein lies the problem because it also gets written to a lot. Every time it
gets written to MySQL needs to lock the table and rebuild the index.

You can improve things a bit by using INSERT DELAYED. When you use that, mysql
doesn't guarantee that it'll insert the row immediately, but the mysql query
returns immediately when you do the insert (it doesn't block) and mysql queues
up inserts and inserts them in bulk when it feels like it. The non-blocking
and bulk inserts that INSERT DELAYED give you speed things up, but only to a
point because you're still constantly rebuilding an index on a table that's
getting a lot of reads.

Mark.

------
palish
Congrats. It seems like the main bottleneck of webapps is the on-disk
database. I'd say that if your app is small enough you shouldn't even start
with a database, but that would be a preoptimization.

~~~
mmaunder
Thanks. A while back I was playing with a ramdisk on linux and syncing the
data to non-volatile storage. That worked quite well. I can't use that in this
case because there are lots of tiny files and they occupy too much space for
ramdisk. But looking at the output of vmstat, it looks like there isn't much
disk io, so I think the linux filesystem cache is working quite well.

~~~
rms
I'm curious how you reached so many Japanese bloggers. Were there some big
Japanese blogs that covered you at launch?

~~~
mmaunder
We got covered by <http://www.100shiki.com/> and then got installed by two
very high traffic japanese bloggers and it went viral from there.

Mark.

------
tocomment
Would you mind expanding a bit on keep-alive settings? In what cases would you
want to set it to a low value?

Question 2. As a web developer, should I be worried that I don't know as much
as you about scaling a web app? What will I do when my web app makes it big?

~~~
tocomment
I should have definately posted these questions in his blog's comments. I
don't think he's coming back :-(

~~~
mmaunder
Sorry, I've been on the road - I'm busy driving from Denver to Seattle
(currently in a hotel in Bozeman). I've replied to a bunch of questions and
will check back again later today.

Mark.

------
tocomment
I noticed if you come to a site using Feejit (sp?) via google, you're search
term is included in the coming from URL. Does this present any kind of privacy
issues? I remember AOL got in a lot of trouble for releasing search queries
from their users.

~~~
mmaunder
I've been thinking a bit about this. I have a couple of blogs of my own, so I
always think about feedjit in that context. Am I happy with my readers seeing
what search terms are sending people to my site and do I think my readers will
get mad because people can see what they're searching on?

feedjit only shows the most recent 10 referrers and clicks, so I don't think
theres anything there that'll give a 'competitor' some sort of strategic
advantage. Besides, tehy can just google around and find out what I'm showing
up for in the SERPS.

As far as privacy goes, as long as I'm not personally identifying people and
showing what search terms they're using, I think there aren't any privacy
issues. I see my own search terms showing up and my location 'Seattle, WA' and
no one knows who actually searched that term.

I haven't really applied my mind to this as much as I should, but those are my
initial thoughts and I'd love to hear if anyone feels different.

Mark.

------
zaidf
congrats on the great launch!

------
patrickg-zill
I suppose I will sound like a troll, but why not consider PostgreSQL?

~~~
mmaunder
I think at this stage both db's will probably do a fine job. PostgreSQL has
been getting faster and mysql has more features. I just know mysql. I know
where it keeps it's data files, how to recover from a crash, I know the my.cnf
config file backwards and how to tune it, so it's really just down to what I'm
comfortable with.

Mark.

