Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ask YC: Speeding up PHP
10 points by apexauk on June 4, 2008 | hide | past | favorite | 14 comments
Just installed APC on our server (Alternative PHP Cache - http://uk2.php.net/manual/en/book.apc.php) and already seeing some great improvements in page load times/CPU usage per request. Particularly good for us as we're using Propel with lots of DB tables so already have 570+ .php files in the cache.

Any recommendations/gotchas, or should everything be fine out-the-box?

And while we're on the subject, what other recommendations have people got for speeding up/optimising PHP/LAMP?



There is no right answer. Despite what others say, php is perfectly suited for a lot of needs. Runtime may be your issue, but you also have to think of supportability.

If your looking to optimize your site, as messenlinx suggested yslow is a good place to start. Next, I would suggest looking at your queries, ie a select statement across a big table vs. a more refined query across a smaller data set will do wonders regardless of the language.

If your data is updated infrequently, you can look at caching your queries in the data level. If not, maybe look at memcached?


APC is very nice because you can really plug it in and expect to get performance improvements without any “real work.” That said, it really depends on your architecture and the traffic/usage patterns you expect. Do you get more reads? more writes? Do you use InnoDb or MyISAM?

Do you have a cache policy? For files? For DB data? Do you use the APC user cache or Memcached? The APC user cache is faster (on the web front-end), but is not shared. Memcached is shared but there will always be small network latency. If DB access is a bottleneck, consider using a data cache.

Do you serve a lot of static files? Consider using a CDN (not a silver bullet! There are drawbacks.)

There are a lot of strategies for high performance on http://highscalability.com/ . But don't overdo it! Most of the time you don't need to set up your infrastructure to be scalable for millions of clients.

As missenlinx pointed out, try also to speed up the page display by following Yahoo! recommendations on http://developer.yahoo.com/performance/rules.html ; The number of requests per page is certainly something that made a difference when optimized, at least in my experience - YMMV.


good questions - i understand most of them were kinda rhetorical, but for the purposes of useful discussion:

our app is generally more reads, but we're optimising different parts depending on the read/write balance. mix of innodb & myisam - my understanding is MyISAM better for reads, InnoDB for Transactions and Row-level-locking => better for lots of writes while still reading .. that hold water as a good uber-simplified rule of thumb?

not using APC user cache or Memcached yet - thanks for pointing out the differences. The FB presentation linked on the other thread gave central app config data as a good example for using the APC user cache

for interest - what scale do you need to reach before justifying looking into a CDN? (we're definately not there yet!)

more generally we're building features focusing on developer productivity, then optimising as bottlenecks arise.


I would say you're right about the differences between MyISAM and InnoDB.

The "feature configuration" system at Facebook is pretty good. That said, it's not perfect: the fact that there are two different caches whether you're running in CLI or Apache SAPI mode has been a problem for me in the past; I still haven't figured out a way to run scripts from a crontab while getting access to the data cached by web pages. Running curl might be an option...

For a CDN, it really depends on the number of files you have, and how much data you're sending out. If you're paying for a hosting plan with more bandwidth in order to serve all your static images, then it might be interesting. You can calculate the costs yourself, actually: http://calculator.s3.amazonaws.com/calc5.html There are alternatives to S3 (I compared Akamai and S3 in a study in the web company I work for, S3 came out cheaper for our traffic patterns.) This is not to say that cheaper is better, you obviously have to look at the whole picture.

I general, the main problem we have to examine before moving our files to a CDN is the cache policy. S3 has datacenters both in Europe and the US, and there certainly is a propagation delay when you write a file. There are techniques to avoid this, such as naming your images with a version number, but you have to make sure your clients are going to find the files when your pages start linking to them.

Another important issue is the downtime. S3 recently had a several-hour downtime... this has never happened with our local hosting provider. What will your plan be when this happens? If you can detect that your CDN is down, can you redirect the traffic to your host? Will this risk bringing you down to, if you downsized the equipment after the CDN migration?

It's certainly not an easy decision to take, but if done right and properly planned, a CDN can save you a lot of money.

Anyway, have a look at Memcached. We added a "Cacheable" attribute to class definitions in our in-house ORM so that cacheable data is taken from Memcache instead of querying the DB (that is, if it is available in the cache; otherwise there is a miss + Memcache SET). For our relatively large website (~1000 people connected now, 1.7M subscribers), Memcache is saving 1 billion queries a month. I would suspect that APC is saving at least that many also.

Having this feature in the ORM makes it transparent for the developers. This is important, because having to wrap hundreds of DB GETs in

    if(val = get_from_cache()) {
        return val;
    } else {
        val = get_from_db();
        cache_set(val);
        return val;
    } 
is pretty tedious and leads to unmaintainable code.


all good sense.. we'll be looking at memcached soon, and our use of Propel should make it easy to integrate caching into the ORM like you suggest.

Question, though: I admit i've not given this much thought, but isn't the hard part working out when to clear the cache for individual rows/objects when the db is updated? I'd be grateful to hear any tips from how you've delt with this, considering it sounds like we'll be implementing something very similar..


It is a problem, that's for sure.

Let's take a simple example, the display of the list of available forums on every page.

In our class declarations diagrams, we have something like this:

    +----------------------+
    | Forum                |
    +----------------------+
    | name: string(1,100)  |
    | topic: text          |
    | ...                  |
    | CACHE(shared, 86400) |
    +----------------------+
This creates a table for a forum definition, with its entries in the shared cache (Memcache) for a day - CACHE(local...) is in APC's user cache.

This means that when a developer writes ForumModule::getAll(), it builds the SQL query:

    SELECT * FROM Forum;
and uses it as the key to query the cache. In case of a cache miss, the query is sent to MySQL and the data added to the cache with a lifetime of 86400 seconds.

When we add a new forum in the DB, it has no impact on the website, since the data is taken out of the cache 99.9%+ of the time. We then have to manually erase the cache using a maintenance script, running ForumModule::removeAllCache() that will generate the same key and remove the item.

Note that this is not always possible... For instance, if you're caching the list of forums but have dozens of SQL results stored in cache, such as:

    SELECT topic FROM Forum WHERE id = 42;
    SELECT id FROM Forum WHERE category = 1729;
    ...
Then these will be used as keys to get results from the cache, and we won't be able to remove them. Last I checked, you couldn't enumerate the keys from Memcache - you can with APC.

We usually solve this by setting low TTL values. A query per day or a query every hour is not going to make a difference, but guarantees that all your changes will be visible within a short time. In some rare cases (large upgrade of the code base), we have flushed the caches completely. If some people have to wait an hour to have the new forum displayed, it's not an issue. Certainly we tune these values depending on the user experience that will follow.

There is something that I didn't mention but is quite important: Do not always rely on the user to invalidate your caches. If the cached value is a list of forums, fine. If it is a large tag cloud with hundreds of queries, update it using a cron. Otherwise if 50 clients go to the tag cloud page just after the cache expired and all get a cache miss and run the rebuild process at the same time, you'll get a big hit on the DB. On a related note, don't rely on Memcache, the data might not be there. Always have a fallback data source in case the cache is not giving you what you expected. This can mean using other caches for datasets that are complex to compute - for a tag cloud, I would store the compiled data in MySQL, not in Memcache; I don't want the client to regenerate them and I don't want the client to see a message saying that the data is unavailable at the moment.



interesting link.. thanks


what other recommendations have people got for speeding up/optimising PHP

Use a different programming language.

No, seriously. http://shootout.alioth.debian.org/gp4/benchmark.php?test=all...

If runtime speed is your design goal, then you definitely want to think outside the PHP/Perl/Python/Ruby box. Those are fast enough for most people, but other languages are a lot faster. OCaml and Haskell are amazing; SBCL is pretty good too.


Every thread on HN about PHP garners one reply which always reads "choose another language/framework blah"

Let's give people advice for their problem and specific language and keep the opinions on framework/code to yourself.


PHP is a horrible disease. We're just trying to help... before it's too late.



also good for optimising that side of things is http://www.thinkvitamin.com/features/webapps/serving-javascr... - and in fact Cal Henderson's whole book has been a great read - http://www.amazon.com/exec/obidos/ASIN/0596102356


You can use Xdebug to profile your application. KCacheGrind provides a pretty and useful GUI.

Cheers




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: