

PHP's silly PRNG - zokier
http://cod.ifies.com/2008/05/php-rand01-on-windows-openssl-rand-on.html

======
halo
For some added context, the normal rand function in PHP uses an implementation
based on libc.

PHP provides a second, safer, function, mt_rand, which is based around an
implementation of the Mersenne Twister.

~~~
ecaron
The difference when using mt_rand is astonishing.

PHP 5.3 graph using rand: <http://twitpic.com/gq81b>

PHP 5.3 graph using mt_rand: <http://twitpic.com/gq827>

~~~
jacquesm
The regular rand can do that just as well. see elsewhere in this thread.

~~~
pyman
> before you criticize a project make sure you understand the material.

He's right.

Mersenne Twister = mt_rand(). The guy that wrote this is an ignorant.

Ignorant: Lacking knowledge. Unaware or uninformed (adjective: moron).

~~~
jacquesm
The only thing that is left from this whole circus is a reference to a bug
that has been known about for a long time:

<http://bugs.php.net/bug.php?id=45184>

It would have really helped if they did a bit of research before writing such
an inflammatory post and then go all out to get people to spread it.

Anyway, the damage is done.

------
jacquesm
The only thing silly here is this article.

First off, the random generator in PHP is simply a 2^32 period random number
generator, nothing special or wrong about that. Yes, there are better ones but
it's good enough for its intended purpose.

About the image in the article:

The image is generated by using the 'scale' function of the PHP PRNG to sample
the 'lowest' bit.

You do not do a scale when you should be doing a one bit and.

It pays off to study an issue for a while, once you raise a stink it's hard to
undo the damage you've done.

By replacing the line in the code that plots the pixel with:

    
    
       imagesetpixel($im,$x,$y, rand() & 1);
    

this image appears:

<http://ww.com/random.php>

Problem gone, no fuss.

That's on a linux box running php 4.4.2, the results look pretty random to me,
it isn't perfect but the aliasing that was displayed in the original test
image is clearly gone using the 'regular' (so not mt_rand) random number
generator.

To make the result reproducible I've seeded the PRNG with '0' using srand(0);
at the beginning of the program.

If you want to use the 'scale' function in PHP for something that you should
be doing by a one bit and then you will reap the results of that, you are
effectively sampling the lowest bit after a bunch of float operations, which
is clearly sub-optimal.

PHP can't correct for incorrect use.

As for the rest of the code, the $x=0; line can be dropped, there is no need
to 'reset' your loop variable at the end of a loop, last I checked.

I'm no PHP fan but before you criticize a project make sure you understand the
material.

------
robryan
If mt_rand is really that much better, what is the point of keeping them both?
Shouldn't they just make rand map to mt_rand?

~~~
Mark_B
Probably a matter of mitigating risk. Despite its drawbacks, rand() is used a
LOT and works...well, "good enough".

In the interest of making sure the code you wrote for PHP3 will work on
current/future versions and not have anything unexpected happen.

~~~
zokier
I would like to see the code that breaks if PRNG is swapped. I mean, like is
there code out there that relies on the unrandomness of current rand?

~~~
jacquesm
Testsets + known output, spots where the PRNG has been used as part of the
seed driving cryptographic code.

The fact that a given seed makes your program deterministic is a great way to
'lock' the variation that a random generator would otherwise give you.

So, say you have a series of steps that produce a result given a random number
as the seed.

You come up with a bunch of cases where the run output has been saved and
checked by hand to conform, you now use that output to verify against when
maintaining the software.

If you should change the behavior of the PRNG then output will change and the
test will break.

A database with passwords that were 'encrypted' (I use the word loosely) using
the PRNG with parts of the password as the seed. If you change the PRNG under
water none of those users can log in.

There are probably lots of other examples, these are just two simple scenarios
where code would break.

~~~
DrJokepu
As far as I see these examples are all scandalous abuses of undocumented
implementation details. While I can't be bothered to look up the actual
documentation, I'm almost 100% sure that it's not guaranteed that PHP's rand()
will always yield the same results for the same seed. But if it is, well, that
would be a peculiar example of shortsighted language framework design.

~~~
jacquesm
> I'm almost 100% sure that it's not guaranteed that PHP's rand() will always
> yield the same results for the same seed.

That's the whole point of having a seed, isn't it ?

see the Periodicity entry:

<http://en.wikipedia.org/wiki/Pseudorandom_number_generator>

and

<http://www.php.net/manual/en/function.srand.php>

Does not make any mention of the behavior you cite, if you can't be bothered
to look it up why bring it up in the first place ?

~~~
DrJokepu
All I'm saying that while it is indeed true that having a seed implies a
deterministic behaviour, this should be an implementation detail that should
not be exposed to the the users of the interface.

~~~
jacquesm
Either I'm not following you, or you are not clear. Or both ;)

It is _meant_ to work that way, for precisely the kind of reasons I outlined
above.

Say you are writing a poker program. At the beginning of every game you seed
the random number generator, you save the seed.

Halfway through a test you find a bug in the program.

You fix the bug (you think) and you make a test case for the fix.

You run the test case with the restored random seed to see if the bug is
actually under control and does not influence any other part of the game up to
there.

And so on.

That's clearly documented and very common use.

If it wouldn't be like that you'd have to play poker until the cows came home
in the hope that one day that same sequence of play would occur.

~~~
philh
What DrJokepu may have been getting at is that the implementation of rand()
itself is not guaranteed. Expecting it not to change in future versions of the
library is dubious.

~~~
jacquesm
True, but if you do not _have_ to break stuff then don't.

After all, it was pseudo random to begin with, nobody made any claims as to
how random it was, simply that it was 'periodical' and that the period was
long enough that in a short lived program you would get the impression of
random data.

Every programmer even halfway worth his weight in bits would know that that
meant that if you needed 'high grade' random data you'd best look elsewhere,
something with a longer period that downsamples to 32 bits for instance, or a
hardware based random generator.

This has been true of the 'rand' function on a unix box as well, the period is
(2^(sizeof(int) x 8))-1, that's just about the only guarantee you get. (hn
eats asterisks)

The article linked here concentrates on the lower single bit of the output of
the random generator (the rand(0,1) part of the code will generate a number
between 0 and 1), but in fact completely misses the point that if you want a
number that is either 0 or 1 you _MASK_ out the lower bit instead of asking
PHP to do this using the range option, which is meant to give you a number
which has been 'scaled' down from the output of the PRNG in the php core.
Right now you are probably sampling the lower bit of a bunch of float
operations, not the best way to get one random bit.

(rand() & 1) would have done the job just fine.

So much for programmers that can manipulate bits, even after they see the
implementation they still can't figure out how to do it properly. What's so
hard about a one bit and ?

So, first of the article uses the PRNG in a completely nonsense way, then it
is labelled as 'broken' based on a broken test and next the PHP maintainers
get flak for staying backwards compatible when there was no clear need for
change.

It wasn't 'broken', it could be improved a lot at the expense of backwards
compatibility and so they chose the appropriate route and added a library
routine for that.

I'm no big fan of PHP, but this whole thread is way over the top.

here is the same example but with the lower bit masked instead of doing a
bunch of unnecessary float ops on the output of the PRNG:

<http://ww.com/random.php>

I would say that looks quite a bit better than the provided sample.

And I also don't get what the $x=0; is for at the end of the loop on the
x-coordinate, never knew that you had to reset your loop variables.

------
jacquesm
Making real random data using a computer is hard, very hard.

These are pseudo random number generators, I take it that everybody using them
understands that the 'P' means that you can not rely on the output of these
pieces of code to be truely random.

One of the first online casinos used a PIC to sample a noisy zener diode to
get to something less predictable.

And even with that as the source concept it took quite a bit of tweaking of
the design to get really random bits, it's surprising how much hum from your
powersupply can make it through to the bits of your a/d converter.

~~~
zokier
Yeah. Random is hard.

BUT when you actually already have far better implementation of PRNG, namely
md_rand in PHP, there really is no excuse for defaulting to broken behavior.

~~~
jacquesm
That may actually have to do with not upsetting peoples unit tests that depend
on this 'broken behavior'.

Let's say someone wrote an extensive body of statistical software that gets
tested with the known input from the PRNG in php, if you then change the
random number generator all the output will be different.

Or if someone used the PRNG in PHP to encrypt a bunch of data replacing it
would break the decrypt function leaving the user with inaccessible data.

~~~
rbanffy
Trusting the broken behavior of a broken random-number generator is hardly the
best approach to encrypting data. Anyone who writes a test that depends on the
predictability of a random number generator is in for a huge disappointment
when the data is to be decrypted.

Why not do it using a proper crypto library?

~~~
jacquesm
It was two examples, one for 'test' data, one for the passwords example.

The first showed a series of cases that have been saved to check if code has
been inadvertently broken.

The second example was about data stored in a database that depends on the
functioning of the PRNG in a predictable way for its use.

Both examples are 'contrived', just theoretical situations to show you that it
is possible to break stuff by changing the function of a piece of code that
implements a language feature.

The general message was: if you can remain backwards compatible then that is
probably a good thing because there are bound to be use cases for your code
that you can't think of that will have huge consequences.

So I think the PHP team did the right thing here in making a better random
number generator available besides the old one.

Legacy data and software is always a pain, as soon as you have an installed
base of '1' you have a problem. Designers of languages break existing code at
their peril.

That said, they should have probably implemented a better one right from day
1.

~~~
rbanffy
"That said, they should have probably implemented a better one right from day
1."

I still can't believe it's a good idea not to break behavior that depends on
the predictability of a broken implementation of a random number generator.
It's not like this code will be broken by fixing the RNG. It was broken from
the start.

------
yan
Zalewski's "Silence On The Wire" (which is totally fantastic in its own right)
graphs a few PRNGs and discusses their issues.

<http://www.amazon.com/exec/obidos/tg/detail/-/1593270461>

------
Scaevolus
The problem that you see here is mostly that rand() gets the range in a way
biased towards the lower-order bits, which in more LCGs are less random than
the higher-order bits.

~~~
jacquesm
Not only that, they invoke a bunch of float code that then has to balance on
getting the rounding of the lowest bit right after a whole bunch of float
scaling operations.

That's simply not the way to do it when you are only interested in one bit of
data to check the random number generator.

That's only handy if you want to limit the range of your random numbers
between two numbers with a bit more space between them than just the lower
order bit.

Say 500 and 10000 or 0 and 500 or something like that.

------
michaelcampbell
> Tuesday, May 20, 2008

I assume PHP still has this issue?

~~~
blasdel
Of course!

PHP only gains new issues, no warts are ever really removed.

