

Ask YC: Speeding up PHP - apexauk

Just installed APC on our server (Alternative PHP Cache - http://uk2.php.net/manual/en/book.apc.php) and already seeing some great improvements in page load times/CPU usage per request. Particularly good for us as we're using Propel with lots of DB tables so already have 570+ .php files in the cache.<p>Any recommendations/gotchas, or should everything be fine out-the-box?<p>And while we're on the subject, what other recommendations have people got for speeding up/optimising PHP/LAMP?
======
djhomeless
There is no right answer. Despite what others say, php is perfectly suited for
a lot of needs. Runtime may be your issue, but you also have to think of
supportability.

If your looking to optimize your site, as messenlinx suggested yslow is a good
place to start. Next, I would suggest looking at your queries, ie a select
statement across a big table vs. a more refined query across a smaller data
set will do wonders regardless of the language.

If your data is updated infrequently, you can look at caching your queries in
the data level. If not, maybe look at memcached?

------
thamer
APC is very nice because you can really plug it in and expect to get
performance improvements without any “real work.” That said, it really depends
on your architecture and the traffic/usage patterns you expect. Do you get
more reads? more writes? Do you use InnoDb or MyISAM?

Do you have a cache policy? For files? For DB data? Do you use the APC user
cache or Memcached? The APC user cache is faster (on the web front-end), but
is not shared. Memcached is shared but there will always be small network
latency. If DB access is a bottleneck, consider using a data cache.

Do you serve a lot of static files? Consider using a CDN (not a silver bullet!
There are drawbacks.)

There are a lot of strategies for high performance on
<http://highscalability.com/> . But don't overdo it! Most of the time you
don't need to set up your infrastructure to be scalable for millions of
clients.

As missenlinx pointed out, try also to speed up the page display by following
Yahoo! recommendations on <http://developer.yahoo.com/performance/rules.html>
; The number of requests per page is certainly something that made a
difference when optimized, at least in my experience - YMMV.

~~~
apexauk
good questions - i understand most of them were kinda rhetorical, but for the
purposes of useful discussion:

our app is generally more reads, but we're optimising different parts
depending on the read/write balance. mix of innodb & myisam - my understanding
is MyISAM better for reads, InnoDB for Transactions and Row-level-locking =>
better for lots of writes while still reading .. that hold water as a good
uber-simplified rule of thumb?

not using APC user cache or Memcached yet - thanks for pointing out the
differences. The FB presentation linked on the other thread gave central app
config data as a good example for using the APC user cache

for interest - what scale do you need to reach before justifying looking into
a CDN? (we're definately not there yet!)

more generally we're building features focusing on developer productivity,
then optimising as bottlenecks arise.

~~~
thamer
I would say you're right about the differences between MyISAM and InnoDB.

The "feature configuration" system at Facebook is pretty good. That said, it's
not perfect: the fact that there are two different caches whether you're
running in CLI or Apache SAPI mode has been a problem for me in the past; I
still haven't figured out a way to run scripts from a crontab while getting
access to the data cached by web pages. Running curl might be an option...

For a CDN, it really depends on the number of files you have, and how much
data you're sending out. If you're paying for a hosting plan with more
bandwidth in order to serve all your static images, then it might be
interesting. You can calculate the costs yourself, actually:
<http://calculator.s3.amazonaws.com/calc5.html> There are alternatives to S3
(I compared Akamai and S3 in a study in the web company I work for, S3 came
out cheaper for our traffic patterns.) This is not to say that cheaper is
better, you obviously have to look at the whole picture.

I general, the main problem we have to examine before moving our files to a
CDN is the cache policy. S3 has datacenters both in Europe and the US, and
there certainly is a propagation delay when you write a file. There are
techniques to avoid this, such as naming your images with a version number,
but you have to make sure your clients are going to find the files when your
pages start linking to them.

Another important issue is the downtime. S3 recently had a several-hour
downtime... this has never happened with our local hosting provider. What will
your plan be when this happens? If you can detect that your CDN is down, can
you redirect the traffic to your host? Will this risk bringing you down to, if
you downsized the equipment after the CDN migration?

It's certainly not an easy decision to take, but if done right and properly
planned, a CDN can save you a lot of money.

Anyway, have a look at Memcached. We added a "Cacheable" attribute to class
definitions in our in-house ORM so that cacheable data is taken from Memcache
instead of querying the DB (that is, if it is available in the cache;
otherwise there is a miss + Memcache SET). For our relatively large website
(~1000 people connected now, 1.7M subscribers), Memcache is saving 1 billion
queries a month. I would suspect that APC is saving at least that many also.

Having this feature in the ORM makes it transparent for the developers. This
is important, because having to wrap hundreds of DB GETs in

    
    
        if(val = get_from_cache()) {
            return val;
        } else {
            val = get_from_db();
            cache_set(val);
            return val;
        } 

is pretty tedious and leads to unmaintainable code.

~~~
apexauk
all good sense.. we'll be looking at memcached soon, and our use of Propel
should make it easy to integrate caching into the ORM like you suggest.

Question, though: I admit i've not given this much thought, but isn't the hard
part working out when to clear the cache for individual rows/objects when the
db is updated? I'd be grateful to hear any tips from how you've delt with
this, considering it sounds like we'll be implementing something very
similar..

~~~
thamer
It is a problem, that's for sure.

Let's take a simple example, the display of the list of available forums on
every page.

In our class declarations diagrams, we have something like this:

    
    
        +----------------------+
        | Forum                |
        +----------------------+
        | name: string(1,100)  |
        | topic: text          |
        | ...                  |
        | CACHE(shared, 86400) |
        +----------------------+
    

This creates a table for a forum definition, with its entries in the shared
cache (Memcache) for a day - CACHE(local...) is in APC's user cache.

This means that when a developer writes ForumModule::getAll(), it builds the
SQL query:

    
    
        SELECT * FROM Forum;
    

and uses it as the key to query the cache. In case of a cache miss, the query
is sent to MySQL and the data added to the cache with a lifetime of 86400
seconds.

When we add a new forum in the DB, it has no impact on the website, since the
data is taken out of the cache 99.9%+ of the time. We then have to manually
erase the cache using a maintenance script, running
ForumModule::removeAllCache() that will generate the same key and remove the
item.

Note that this is not always possible... For instance, if you're caching the
list of forums but have dozens of SQL results stored in cache, such as:

    
    
        SELECT topic FROM Forum WHERE id = 42;
        SELECT id FROM Forum WHERE category = 1729;
        ...
    

Then _these_ will be used as keys to get results from the cache, and we won't
be able to remove them. Last I checked, you couldn't enumerate the keys from
Memcache - you can with APC.

We usually solve this by setting low TTL values. A query per day or a query
every hour is not going to make a difference, but guarantees that all your
changes will be visible within a short time. In some rare cases (large upgrade
of the code base), we have flushed the caches completely. If some people have
to wait an hour to have the new forum displayed, it's not an issue. Certainly
we tune these values depending on the user experience that will follow.

There is something that I didn't mention but is quite important: Do not always
rely on the user to invalidate your caches. If the cached value is a list of
forums, fine. If it is a large tag cloud with hundreds of queries, update it
using a cron. Otherwise if 50 clients go to the tag cloud page just after the
cache expired and all get a cache miss and run the rebuild process at the same
time, you'll get a big hit on the DB. On a related note, don't rely on
Memcache, the data might not be there. Always have a fallback data source in
case the cache is not giving you what you expected. This can mean using other
caches for datasets that are complex to compute - for a tag cloud, I would
store the compiled data in MySQL, not in Memcache; I don't want the client to
regenerate them and I don't want the client to see a message saying that the
data is unavailable at the moment.

------
jawngee
<http://www.scribd.com/doc/88689/apcfacebook>

~~~
apexauk
interesting link.. thanks

------
jrockway
_what other recommendations have people got for speeding up/optimising PHP_

Use a different programming language.

No, seriously.
[http://shootout.alioth.debian.org/gp4/benchmark.php?test=all...](http://shootout.alioth.debian.org/gp4/benchmark.php?test=all&lang=php&lang2=ocaml)

If runtime speed is your design goal, then you definitely want to think
outside the PHP/Perl/Python/Ruby box. Those are fast enough for most people,
but other languages are _a lot_ faster. OCaml and Haskell are amazing; SBCL is
pretty good too.

~~~
josefresco
Every thread on HN about PHP garners one reply which always reads "choose
another language/framework blah"

Let's give people advice for their problem and specific language and keep the
opinions on framework/code to yourself.

~~~
jrockway
PHP is a horrible disease. We're just trying to help... before it's too late.

------
missenlinx
Best page i've seen around

<http://developer.yahoo.com/performance/rules.html>

~~~
apexauk
also good for optimising that side of things is
[http://www.thinkvitamin.com/features/webapps/serving-
javascr...](http://www.thinkvitamin.com/features/webapps/serving-javascript-
fast) \- and in fact Cal Henderson's whole book has been a great read -
<http://www.amazon.com/exec/obidos/ASIN/0596102356>

------
blender
You can use Xdebug to profile your application. KCacheGrind provides a pretty
and useful GUI.

Cheers

