
Ask HN: Re-architecting on Slicehost to handle 50+ apache processes? - brandnewlow
http://forum.slicehost.com/comments.php?DiscussionID=3373
======
Fenn
Apache/FastCGI/PHP _should_ be very stable (it's mature/old).

I have used it on production sites for years, and in fact, have not used
mod_php since before PHP5 for the exact reasons you describe.

You may need to tweak your FCGI settings a bit - the amount of children/fcgi
process lifetime, etc all make a difference.

One thing I would highly recommend is ensuring you run under a PHP accelerator
(APC is free/great), particularly with FCGI. This will reduce the time each
PHP-FCGI child takes serving a PHP request (as it doesn't need to reparse the
code every time), which means you can have less children. Less children means
more memory available and we all love memory.

If you're having specific stability problems with fcgi, do some googling on
the errors you're getting. There definitely have been bugs in the PHP fcgi
interface, so an upgrade to later/latest version may help you there.

Alternatively, something like lighttpd/nginx could be a good option for you
(though they both recommend fcgi if you want them talking direct to PHP as
well).

Good luck!

------
bretthoerner
Can you not fix the media problem? Putting nginx up front to serve media would
relieve a lot of huge Apache/PHP processes.

Are you actually hitting the limit in practice? So you have 50+ concurrent
requests at once? Is this a "real business"? If so you shouldn't be so afraid
of putting in the time to setup nginx or something else... you'll literally
_have_to_ to different media serving and load balancing if you plan to grow
whatever_this_is.

As far as actual RAM usage, you'd probably save some space and lose very
little (depending on what your app does) on a 32-bit OS. Sadly, Slicehost only
offers 64-bit... you might check out Linode or one of their other competitors.
I'd only suggest this option if you're really broke, though. You'll still have
to do the work mentioned above on Linode eventually... the 64->32 would only
be a small stopgap.

I don't do any PHP anymore, but I think thread-safeness largely depends on
what external C libraries you use, and generally isn't guaranteed.

~~~
mandric
"you'll literally _have_to_ to different media serving and load balancing if
you plan to grow whatever_this_is."

I guess my point is that it's a mod_php issue and I may not need nginx. My
plan is to try a threaded apache without mod_php and use mod_fcgi for php. If
that doesn't work then try a threaded apache with mod_php and see what
happens.

Another solution is find another provider that offers more memory and similar
service for the same price. But it will be hard to match slicehost's service I
think.

~~~
markh
fastcgi with Apache will be a waste of your time I think.

Apache+php is solid, but a memory hog. If you want to keep things simple, move
to Nginx with php+fastcgi. This is the suggested config for Slicehost.

Or, as previously suggested, offload the client connection to an Nginx server
sitting in front of Apache. This will relieve the pressure on the php
instances and will also handle static file serving.

------
jrockway
I assume you need this many Apaches to fully use all of your CPU?

If so, there are a number of approaches to take. If your app is blocking on
sending/receiving data to/from the client, it would benefit you to segregate
the actual app servers from the frontend servers. You can use mod_fastcgi and
run your PHP stuff in FastCGI, or you can use a frontend threaded apache
talking to a backend PHP apache via mod_proxy.

If you are blocking on things like database access or other non-web IO, then
you need an event loop so your app can do something else while waiting for IO.
I don't know what your options are in the PHP world, but this is very easy to
do in Perl.

Finally, real threads will probably also solve your problem. I recommend CL or
Haskell. (Although I think Slicehost still uses an ancient Xen that can't
handle SBCL. Oops.)

~~~
mandric
Ooh. I didn't think of having two apache configurations on the same server ...
one that does media serving and is streamlined and one that does mod_php. I
would prefer this over nginx. At least have the sanity of staying in one
world. Thanks for that tip.

So I run multiple sites on this server and plan to keep adding more, so that's
another reason configuration maintenance is an issue. Not to mention one day I
plan to write a web gui to handle all this configuration ... and I don't need
nginx's super-duper performance. I need a stable piece of software to
configure, apache does everything, I've been using it for years, know all the
ins and outs, and memory is cheap. Just mod_php is the bear and you add drupal
to that and you have a beast.

I don't know what the bottleneck is really, that is part of the problem. But I
know that every now and then when we we get bursts of traffic I can find with
a /server-status that 50 apache processing are in a write state.

~~~
mandric
I also could live with having another slice just doing media serving. If I
could share the filesystem or run some rsync script, in some efficient/non-
cumbersome manner, between the media server (threaded apache) and app server
(main box with php etc) that could work and would allow more parallelism, as
they say, for faster web pages.

Then again you have the same problem with things like drupal, that just plop
media all over the place and it's up to the client to maintain the separation
of media in their code. Drupal users don't do that.

------
nessence
Go for nginx, it's worth it, get another IP if you have to (and can do so). If
you don't want to do that...

Don't turn keepalive off, set it to 7 seconds - or - look at your analytics
and see what your average user's session length on the site is and set it to
somewhere between 7 and half of that number. The idea is, if someone doesn't
click on something within 7 seconds they're either going to be on that page
awhile or they've left.

If you can, tweak the StartServers, MinSpareServers, and MaxSpareServers
parameters (if applicable).

The above parameter defaults can cripple your site if you're over 25
concurrent sessions and have a complex application (among other factors). The
minspareservers defaults to 20, I think, and at that rate you'll be guaranteed
a continued thrashing death/birth cycle for your apache processes which will
constrain all your I/O.

If none of this works then the complexity (read, bloat?) of your app (Drupal?)
is too much and you need a beefier host.

Media files have nothing to do with any of this. They'll hold connections open
but they won't constrain I/O like application memory footprints and child
process cycling.

------
gojomo
Find out if disk IO is the real bottleneck -- excellent chance that it is,
especially if MySQL is also on the same host.

Set 'noatime'. Ensure there's no swapping.

Don't assume more processes improve throughput; they could be worsening things
with contention at key places.

In case there are small memory-leaks anywhere in your stack that pile up, set
'MaxRequestsPerThread' to anything other than the default '0' (unlimited).

If you can't separate out static content at least make sure all outgoing
headers get maximum browser-side caching on non-volatile resources. You could
also add a front-end reverse-proxy-cache, on a tiny host, to get many of the
same offload-cost-of-static-content benefits.

------
oomkiller
It seems that you already know the solution to your problems, but don't want
to do it. NGINX! Nginx may be confusing to people that are long-time apache
users, but I actually find it to be easier, once you get the hang of it.

On another note, if you're filling up 50 processes and can't afford better
than Slicehost, you should seriously reconsider your business model as it's
obviously not working. Another option would be to speed up the execution of
whatever code is being executed in those processes.

Final Word: nginx+FastCGI

------
jawngee
nginx + php-fpm NOT fastcgi.

Serve your media/static from another smaller slice running nginx on a
subdomain, or have it picked up through cloudfront or some other CDN.

------
iamelgringo
__* edit __* dumb question. Sorry

~~~
mandric
The use case is handling 50 concurrent connections ...

~~~
SwellJoe
You could also make each connection finish faster. I'm not suggesting you
_don't_ need 50 Apache processes for what you're doing. Just that you might
find other way to serve the same userbase, if having 50 processes is a problem
for some reason.

mod_fcgid is more efficient for some classes of problem than mod_php...though
not "faster" by a huge amount, it does require a lot fewer resources including
having a lower process count for the same workload. Switching to mod_fcgid for
all PHP on your site would mean you could remove mod_php from your Apache
process, which is a big memory savings.

eaccelerator can speed up some PHP applications.

Profiling your code and removing the bottlenecks would be another good option.

Since PHP is historically not thread safe you probably can't switch to a lower
process Apache concurrency model (though there seems to be the possibility to
use a limited subset of PHP 5 in such an environment, I'd be hesitant to do so
in production, and you'll spend an awful lot of time digging up the facts on
whether the modules you're using are thread safe).

While I've seen suggestions of running a lighter webserver for assets that
don't need the full execution environment of Apache...I would instead suggest
you make a lean VirtualHost section just for those assets. Yes, it will still
be spawning Apache processes, _but_ Apache can share a huge amount of
resources with all the other Apache processes. It's hard to see in "top" and
the like, because shared resource usage isn't very clearly dilineated, but
overall resource usage of a single Apache installation (with many processes)
is probably going to be less than that of Apache plus a second web server,
particularly if you've already cleaned up your Apache and gotten rid of all of
the unnecessary modules. Only real data can tell us for sure, but I'd be
suspicious of any advice that seems certain that adding another webserver to
the mix is the best solution, without having data about your particular
deployment.

~~~
mandric
"You could also make each connection finish faster."

Well my I have set KeepAlive Off. Not sure what else I can do to close
connections.

"remove mod_php from your Apache process, which is a big memory savings."

That's what I'm shooting for but php under mod_fcgid has proven to be
unstable, or I have to dig deeper to find the problems.

"Profiling your code and removing the bottlenecks would be another good
option."

I've never done this on php any hints would be appreciated. But I am not too
crazy about digging through random client code, I'd rather be able to handle
the average inefficiency.

"I would instead suggest you make a lean VirtualHost section just for those
assets."

Well I'm not sure how to make a lean virtualhost. I thought I could only
remove modules from the main apache config, not a vhost. So I don't understand
how I can make a vhost leaner than the parent apache. I should look into this
further, but any advice is appreciated.

~~~
SwellJoe
_Well I'm not sure how to make a lean virtualhost. I thought I could only
remove modules from the main apache config, not a vhost._

I mean, "with no interpreters, SSI, rewrites, proxies, etc."

The process will always contain all the same modules. But, the execution path
can be very different.

