
Let's make the web faster: PHP performance tips by Google - Garbage
http://code.google.com/speed/articles/optimizing-php.html
======
gfodor
As someone who optimized a piece of PHP code recently, there are a number of
things I ran into that were counterintuitive wrt performance.

\- array_key_exists is 5X slower than isset (though there are different
semantics)

\- direct string concatenation is 2X faster than using "implode"

\- memoization can be a huge win since function calls are 5X slower than array
accesses. (No inlining in PHP)

Of course, these optimizations don't matter except in your hotspots. (Why we
still have to have this disclaimer on HN puzzles me, but people seem to keep
trotting out the usual Knuth quote out of context as a way to write off micro-
optimization techniques in general.)

Also important to keep in mind:

\- PHP has copy on write semantics for arrays. So if you set $foo = $bar
(arrays), you don't incur any additional memory until you alter $foo. (Aside
from the additional reference.) Once you change any entry of $foo, PHP makes a
copy of the whole thing. (This can result in massive performance and memory
bloat if you don't realize its happening.)

\- PHP arrays are not arrays, but are a hybrid linear array and hashtable.
("one data structure to rule them all.") So, even a simple "array" of integers
incurs more than what you'd expect memory wise. In fact, IIRC, an array of
integers incurs approximately 100 bytes of memory for each entry. Ouch. There
are extensions in new versions of PHP that allow you to use 'real' arrays. If
you're stuck using normal PHP arrays, good luck trying to design optimized
data structures for the problem at hand.

See also: <http://phpbench.com/>

~~~
lhnz
> There are extensions in new versions of PHP that allow you to use 'real'
> arrays.

Where?

~~~
dqminh
There is SplFixedArray <http://www.php.net/manual/en/class.splfixedarray.php>,
not sure if this is the one he mentioned though.

------
dave1010uk
I wrote a PHP application (kind of a journey planner) a few months ago that at
"beta" took about 1 minute to run each time due to all the processing it did.
I spent a good few days speeding it up and making it use less memory and it
now runs in under 10 seconds. Here's a few things I did that worked for me.

1\. Install APC and use it to cache objects as well as PHP files. Cache
wherever possible. I used APC lots, as well as caching some heavy processing
to disk. Cache results of expensive functions that are called many times in 1
script in a global or static variable.

2\. Unwrap lots of the lovely OO wrappers, such as the ORM (similar to
ActiveRecord). This made the code messier but much faster. PHP takes a big hit
every time it instantiates a new object.

3\. Take advantage of PHP's copy-on-write memory allocation; understand how
PHP does garbage collection; use references where possible; understand what
PHP is doing behind the scenes.

4\. Profile with xdebug and kcachegrind. Great for finding what's taking up
the time and which functions are being called many times. Inline small
functions that are called many times.

------
rickmb
Stopped reading after the first pointless micro-optimization. With some very
rare exceptions, the cost of these kind of optimizations in terms of resulting
in hard to maintain and change code are far greater than the cost of bluntly
adding more hardware.

I mean seriously, if you're using PHP's OOP functionality but you find
yourself having to squeeze out performance by dropping getters and setters on
your classes, you should really first take a step back and take a good look at
your entire approach.

~~~
vog
I fully agree with that.

Those kinds of optimization may be sensible to tighten some critical inner
loops of your code. But especially there you should avoid high-level features
anyway.

Also, before trying to squeeze out performance that way, you might as well
rewrite that part of your code in C, and let the C compiler perform _real_
optimizations.

However, chances are good that you don't even need that, as some intelligent
combination of already existing fast PHP functions (implemented in C) usually
does the trick.

------
smalyshev
Unfortunately, these tips were written by somebody who does not understand how
PHP works. See detailed critique of the previous version here:
[https://php100.wordpress.com/2009/06/26/php-performance-
goog...](https://php100.wordpress.com/2009/06/26/php-performance-google/) The
article may have changed since then but some of the points are still valid. It
also doesn't mention current stable version of PHP (5.3) and still has no
mention of bytecode caching. At least it mentions profiling now...

------
sfrench
This article feels very hastily thrown together.

No mention of APC and they didn't even bothering to discuss profilers beyond a
link (which doesn't even return xhprof on the first page!)

The getter/setter thing is the perfect definition of a premature optimization.
Making that optimization will result in an infinitesimally small gain compared
to what could be had optimizing bad SQL and caching deficiencies.

~~~
eropple
Agreed. Said other bits in another comment, but:

-mysql_query()? BAD GOOGLE WEBMASTER, NO COOKIE FOR YOU. Seriously - there is no good reason not to use PDO, or at least MDB2. (Doctrine is better, but it's just a superset of PDO.)

-References are a modern PHP programmer's (quiet you, stop laughing!) best friend. "Don't use shortened variables because you'll copy data! Just reuse the $_POST['barf']!" Never mind that $foo = &$_POST['barf'] achieves both goals...

~~~
smalyshev
$foo = $_POST['barf'] does not copy data.

------
qeorge
This is an old article (2009), which the PHP team has responded to:

"With regards to the new article posted at [URL], all of the advice in it is
completely incorrect. We at the PHP team would like to offer some thoughts
aimed at debunking these claims, which the author has clearly not verified. "

[http://groups.google.com/group/make-the-web-
faster/browse_th...](http://groups.google.com/group/make-the-web-
faster/browse_thread/thread/ddfbe82dd80408cc?pli=1)

Previous discussion:

<http://news.ycombinator.com/item?id=676856>

~~~
nbpoole
Those comments appear to be based on an older version of the page. The current
page doesn't have most of those "optimizations."

Edit: And confirmed. See
[http://web.archive.org/web/20090628004019/http://code.google...](http://web.archive.org/web/20090628004019/http://code.google.com/speed/articles/optimizing-
php.html)

~~~
tszming
The issue of "copy-on-write" ..still here, in addition to the potential SQL
Injection vulnerabilities in the sample code.

~~~
nbpoole
The example they gave ( _$description = strip_tags($_POST['description']);_ )
would trigger copy-on-write, unless I'm mistaken.

And the SQL example isn't meant to be copy/pasted: it's a demonstration of a
fairly common anti-pattern that has nothing to do with SQL escaping. Adding
escaping there would just confuse the point.

~~~
pak
The problem, like with all _bad code examples_ , is that people copy and paste
it without thinking and don't look back if it runs, so here we are in 2011
with SQLi holes in literally every other website.

Either write a complete, secure example or don't write an example at all. Like
a colleague of mine said, putting incomplete code samples on the Internet is
like handing out loaded Glocks to children; expect feet and heads to be blown
off. I wouldn't even write a toy example with mysql_query anymore because the
number of footnotes required (that people would ignore) would fill a page.

------
Joakal
I recommend Varnish (HTTP cache) where pages are returned instead of touching
PHP. APC, a compiled PHP script cache and going to be used in PHP6 releases.
Both should lower some resources required.

A lot of bad stuff about PHP in this: <http://www.phpsadness.com/> (Relevant
HN discussion: <http://news.ycombinator.com/item?id=2591845>)

------
Jach
I guess these are okay suggestions, though a few of them aren't going to make
a slow piece of code much faster. Micro optimizations result in micro gains.
(On getter/setters, I think their uselessness is better grounds for wiping
them out than function call speed, but I digress..)

I particularly dislike the glaring SQL Injection error and not using mysqli in
the example. They could have at least used a fake escape_data() function
around the values if they don't want to use prepared statements. And ignoring
that mysqli_query() would be slow called inside a loop, the solution is taking
an n loop to a 2n loop. Ah, if only PHP had inline Python generators to reduce
it to one...

~~~
stephenr
Using mysqli* functions would still be a "mistake" if you ask me. PDO exists,
and gives you all the benefits of mysqli but with relatvely painless cross-DB
support.

~~~
Jach
I totally agree that PDO is great. Though I also like the procedural style of
mysqli_ functions (don't shoot!). (In Java stuff I have a "object, build
yourself from these rows" OOPy pattern.) As for cross-DB support, unless
you're using an ORM it's probably going to be painful. Given that some
databases conform to ANSI SQL, others (like MySQL) do their own thing, etc.,
then the issue of built-ins and custom functions (e.g. LucidDB lets you write
Java/Jython/Javascript user-defined functions/procedures/transformations) I
don't trust any of the SQL strings to work on multiple databases.

PDO's advantage is a standard interface (like JDBC), and if you're planning to
ever use more than MySQL with PHP then yes you should use PDO even for MySQL
just to get used to the standard.

------
ars
The last optimization is semi-wrong.

From my experience it's faster to make SQL queries in a loop then it is to
make one huge SQL query that gets all the data, but requires post-processing
in PHP.

Let me clarify:

If you are using all the data from every column and every row, then certainly
- do one big query.

But a lot of the time you do code that gets a parent, then all the children.

You only output the parent once. So if you try to write a query that will
return all the children at one, you also by necessity are returning the parent
data multiple times - yet you only output the parent data once.

To do this typically you store a variable with the previous_parent_id, then
check if the new row matches it in your client loop.

Don't do this. It's slower.

Get just the parent data and loop on it, then get the child data in individual
queries.

The reason it's faster is database indexes. When you get the parent data you
want a sort order - hopefully that column is indexed and the database can
return it directly without sorting.

Same for the child data, you want it sorted, you have an index that covers the
parent_id, and your sort column and the database can directly return the data
to you.

But, if you try to join the parent and child table, only the parent data is
pre-sorted. The child data will need to be sorted after the join - often on
disk. This is terrible for performance. (The child index is used for the join,
but not the sort.)

Additionally you are often transferring lots of data, because you are
repeating the parent columns over and over uselessly. That's not free. Even if
it's a local database the database server still needs to buffer all that data
and so does the client.

Caveat: This is my experience with MySQL, it's possible other databases are
able to use indexes to sort both the parent and child records, even through a
join.

~~~
bad_user
It really depends on the type of query you're making.

If you're talking about a tree, here's how to get a whole sub-tree (as in all
descendants of a parent), without fetching parents multiple times, without
multiple queries, sorted in a previously defined order and also indexed and
really fast: [http://dev.mysql.com/tech-resources/articles/hierarchical-
da...](http://dev.mysql.com/tech-resources/articles/hierarchical-data.html)

    
    
         When you get the parent data you want a sort order
    

Not necessarily. People use a sort order mostly to limit the number of rows
returned (say you want only the first 50 items with the lowest prices). But
optimizing a SELECT using ORDER BY is really hard as there are other
restrictions you need to be aware about (like if you're using an index on
multiple columns, you can't have a range condition on the first column and
sort on the second, at least in MySql).

That's why, if performance is an issue, there are ways to workaround the need
to sort -- for example you can keep extra data, like page=1 if position is
between 0 and 50, page=2 if position is between 50 and 100, and so on, such
that LIMITing the query to the first 50 items is WHERE page=1 (basically
storage-efficient precached queries - if the conditions are stable, you can do
it).

And in cases where you can't fetch the data efficiently in a single query,
you're probably doing it wrong (like you chose the wrong data representation -
for example the relational model is really awful for describing anything
related to graphs).

Of course, I'm not talking about cases when you're fetching unrelated data or
cases where performance doesn't matter or cases when you've got BLOBs in your
parent :)

~~~
ars
You didn't really understand what I meant, which I guess is my fault.

When I say parent I don't mean in the same table, I mean an unrelated parent
(1 to many join).

And I use sort all the time without limit, I think it's important for data to
always be displayed in a consistent order.

> And in cases where you can't fetch the data efficiently in a single query,
> you're probably doing it wrong

Example:

I have a list of buildings, then I have a list of room, then a list of
contents.

I want to create a giant page on the website displaying this data. One table
for each building, one row for each room, one cell for each content.

You can not get all this data efficiently in a single query.

You can get the data by joining all three tables, but you will be repeating
the data about the building over and over for each content.

~~~
bad_user
Right, sorry, I misunderstood you.

------
Revisor
_Avoid writing naive setters and getters_

And let clients manipulate the class properties directly. After all
encapsulation is overrated. Best use only global variables.

 _Don't copy variables for no reason_

Forget about the distinction between input and output variables and about
readability. Those are overrated too.

The advice in this article is either trivial or wrong.

------
kalelias
This Article is an very good example how particular correct information can
create an biased view.

@google

Please remove the article or replace it with an more accurate and complete
one. If you really care about PHP performance to make the world a better
place, hire some skilled guys that can help the development of PHP core and
libraries. The side effect is that those guys will write better articles.

Thanks.

~~~
fmw
Why do you think so? It it doesn't contain any snide comments. Sure, it is
clear that the developers of PHP made some choices that have a negative effect
on performance, but so did the Ruby guys. Making tradeoffs that affect
performance doesn't necessarily look bad and this article remains neutral as
to the background of the performance issues in PHP.

As to hiring developers to work on PHP: I don't see how that would be of
benefit to Google. They aren't using it anywhere and I'm sure engineers at
Google wouldn't touch PHP with a pitchfork when it comes to selecting a
language for a new project. That doesn't mean that PHP is left behind in any
way, though, because Facebook and Yahoo are actively working on the PHP
ecosystem.

------
rimantas
So you shave .05 seconds off your PHP execution. Then I have to wait 12
seconds till the page is loaded, because front-end is not optimized at all.

~~~
pilif
No matter how slow the frontend is, the quicker the execution of that script
finishes, the quicker that process can handle another request, saving you
resources.

~~~
rimantas
Properly optimized front-end will help your servers a lot.

Case of typical website: 6-15 JavaScript files, handful of CSS files, tens of
images, 60-100 HTTP requests. On repeat visit still the same amount of
requests, majority of them are just 304 Not Modified

Case of FE optimized website: 1 request for main document, one for combined
and minimized .js, one for combined and compacted .css, 1-4 for CSS sprites
files and some content images. Repeated request: 1 request for document (the
rest of resources have far future expire time so are not requested). Server
has to serve order of magnitude fewer requests.

~~~
pilif
Agreed, but all the static assets shouldn't be served by your application
server but either from a completely different machine (or CDN), or at least
from a reverse proxy in front of the application server. Aside of eventual
port or file handle starvation, serving assets should have no effect on your
application server.

So while having one file with all the assets is advantageous for the end
users, it should have no or next to no influence on application server
performance.

------
voidr
> Avoid writing naive setters and getters

You might as well avoid double-quote strings while you are at it.

The test had 1 000 000 iterations which means that you won't gain more than a
few micro/miliseconds by doing this.

I would call this advice: naive optimization.

~~~
stephenr
Even more troubling to me, is that the author completely ignorss magic methods
__get and __set. I ran the two tests he provided locally, as well as an
altered version to use __get and __set, and the direct access method, and
using magic methods were both roughly 10x the speed of the getName, setName
methods.

Even ignoring the performance issue, why would you even consider using an
explicit setName over __set ?

~~~
pornel
> why would you even consider using an explicit setName over __set ?

• You don't put logic of all setters in one place. Although you could create
some dispatcher in __set that looks up method for the setter and calls it,
that's a bit of magic that may not be expected by someone using the class.

• You can control visibility and override setters using standard PHP syntax
rather than custom code in __set()

• Inside the class and derived classes it's clear when `$this->foo` is direct
access and when it's a setter, otherwise you need to be careful about property
declarations and visibility.

• It's possible to pass extra optional arguments to the setter

(none of these points are particularly strong, but there are non-insane
reasons to use setters)

------
ck2
You know what's much, much faster but no-one will ever recommend?

Globals instead of copying gobs of data between classes.

For example WordPress used to use a global object cache that was passed around
by pointer reference. Modern versions now throw around multiple copies of huge
gobs of data by copying it back-and-forth over and over again, like all the
users and posts and comments on a page. Makes a HUGE performance decrease that
is easily measured on complex pages (200-300% slower). If you have frequent
cache misses it's quite a workout for the system.

~~~
eropple
Most applications I see just reference-copy.

This article is really, really rudimentary, and not in the good ways. Why does
it recommend "not copying data" without explaining references? Why is _Google_
recommending the use of bare mysql_query? I don't care if it's a toy example,
they should be reinforcing best practices (or at least not-horrible practices)
by using PDO or something similar.

Granted, most PHP programmers I know are not particularly competent. (I am,
but I'm also weird enough to be willing to get competent with something like
PHP.) But we can at least _try_ to hammer good practices into them.

------
michaelchisari
This example:

    
    
      $description = strip_tags($_POST['description']);
      echo $description;
    
      ...
    
      echo strip_tags($_POST['description']);
    

Sacrifices readability for performance. Sometimes that's a worthwhile
compromise, but often times not. And if you plan on referencing $description
more than once (which, most code would), then it wouldn't make much sense to
run strip_tags each time.

~~~
wvenable
I doubt it even sacrifices performance. The temporary results of strip_tags()
still has to be allocated in order to be echo'd out -- so you're using the
same amount of memory. $description will be de-allocated as soon as it goes
out of scope.

~~~
eropple
Memory cleanup isn't free (the instance of $description in scope), and unless
you return by-reference, you're making a double copy.

~~~
smalyshev
Both versions of the code do exactly the same: 1\. Take description from
memory 2\. Allocate string and put the stripped version there 3\. Output the
string (here you might save if you don't use output buffering since no copying
to the buffer happens but just output) 4\. Release the memory allocated in 2.

~~~
eropple
Whoops, you're right--I was thinking it was getting returned, rather than
output.

------
BasDirks
Not trolling, honest question: why is PHP still used? What makes it preferable
over alternative server-side technologies? I have programmed PHP before, and I
know I could just Google this question, but this kind of search usually turns
up rants by language zealots.

~~~
tarmstrong
> Not trolling... I know I could just Google this question, but this kind of >
> search usually turns up rants by language zealots.

Isn't that what trolling is?

I'm not sure why that would even provoke rants by language zealots. It's a
simple question with a simple answer: big libraries and projects that can't be
ported in an afternoon.

That is to say, Haskell is a nice language but there's no Drupal port to
Haskell.

~~~
BasDirks
It's not trolling, look at the excellent answers that I received, including
yours.

