Hacker News new | comments | show | ask | jobs | submit login
A call to PHP's mt_rand generates only odd numbers (3v4l.org)
212 points by ComputerGuru 609 days ago | hide | past | web | 119 comments | favorite



Because max value should be "mt_getrandmax()" instead of "PHP_INT_MAX", it just gets a 32 bit number then scales it up.

see: http://php.net/manual/en/function.mt-rand.php

Under caution:

The distribution of mt_rand() return values is biased towards even numbers on 64-bit builds of PHP when max is beyond 2^32. This is because if max is greater than the value returned by mt_getrandmax(), the output of the random number generator must be scaled up.

edit: this post went from 5 points to 1, which I don't care about(in ~500 days I posted less than 10 times and I have ~35 points), but who downvotes documentation, seriously? -_-


While documented, that is surprising behavior. If it takes in an int, shouldn't it be able to take in PHP_INT_MAX? And shouldn't it yell at you instead of just silently going about its day?


The PHP approach seems to be that any crazy behavior is acceptable as long as it's documented.


To be fair, that is not an uncommon situation elsewhere either.


I think it is. Sure, there are plenty of calls out there in all sorts of systems that have crazy, documented behavior. But it's rare to see anyone outside of the PHP world defending the crazy behavior purely on the basis that it's documented. Crazy behavior is almost always there because of compatibility concerns, performance concerns, or an acknowledged bug that just hasn't been fixed yet. Outside of PHP, I rarely see anyone just say, "It's fine, that's how it's documented to behave" and leave it at that.


I recall a Rails bug used to hack github and inject a bugfix commit. The Rails community rejected the fix originally because the crazy behavior was documented.


Wow. Do you have any more info on this?



Outside of php and systemd.....

<Friday afternoon trolling is just too easy this week />


And mySQL of old.

I assume things are much better now, but I remember a time when things like INSERT <table> (<date_field>) VALUES ('2015-02-30') would not have raised any sort of error, amongst other terrible things that people would defend having to implement explicit checks for in other layers of your application.


> I assume things are much better now

Nope: http://grimoire.ca/mysql/choose-something-else


I'm genuinely curious, what are some examples of systemd's odd behavior?


Well, the fact my window manager is now intrinsically linked to my systems boot process strikes me as somewhat obscene. There was the amusement with it reacting to debug being passed to the kernel which caused the system to never finish booting...

systemd is by no means alone; in the last month I've discovered that `yum` deliberately breaks its output when you try and pipe it to something else (I mean really, ffs, really?). The justification for this isn't even internally consistent which really winds me up too...

Whilst I pick on a couple, there's been innumerable packages, languages and platforms over the years that have had their share of what could politely be called idiosyncrasies. I highly recommend The Unix Haters Handbook [1] for plenty of historic examples -it's a lovely read -oh, and for years one of the *nix bullet points was always "all the power in the world and not a safety in sight" [2]

[1] The Unix Haters Handbook: https://en.wikipedia.org/wiki/The_Unix-Haters_Handbook || http://www.csf.ac.at/fileadmin/user_upload/BioComp/training/...

[2] The Hole Hawg: https://web.archive.org/web/20150214132948/http://www.team.n...


> Well, the fact my window manager is now intrinsically linked to my systems boot process strikes me as somewhat obscene.

Gnome depends on logind, which is sensible. Yes, logind is part of a suite of system management tools which also includes tools to manage the boot process.

Your complaint would apply equally to your window manager being intrinsically linked to your serial port (via the Linux kernel).


Not just part of, logind will straight up not work if systemd is not running as pid1. As such, anything that depends on logind is dependent on a specific init sitting as pid1.


As an honest question, could you explain _why_ it's sensible?

Until a couple of years ago, my linux box was nearly entirely modular letting me install any combination of boot-loader, kernel, init system userland tools and/or desktop. How is this better?


What's more common outside PHP is that functions throw errors or exceptions when given invalid input, rather that trucking along and producing output that is clearly not what was intended.


Maybe they should just make the random functions return '4', documenting that it was chosen by a fair dice roll and is therefore guaranteed to be random.


That's not a "crazy behavior", it's a limitation of the algorithm they use. mt_rand() only has approx. 2^32 possible seeds; why would you expect it to support ranges larger than 2^32?

The best thing to do is to not use mt_rand().


> why would you expect it to support ranges larger than 2^32?

Because this is not a simple, stupid linear congruential generator. mt_rand(), is, as I understand it, based on Mersenne Twister, a well-regarded generator which has a period of 2^19937. And if PHP had done this right, mt_rand() would be seedable with an array of some 624 integers.

Mersenne Twister has an alternative "classic" seeding algorithm, drawn from Knuth, to be compatible with old-style generators. This algorithm takes a 32-bit seed. Apparently that's all that PHP is supplying.

It's dead simple to do random(n) uniformly. When I heard that some language implemented a function called mt_rand() which was hilariously broken in this respect, the first thing that came to my mind was "that's gotta be PHP". And it was.


I would not expect it to support ranges larger than 2^32, so why does it?

Returning garbage when one of your parameters is beyond a certain limit is "crazy behavior." Either support it properly, or don't support it at all.


If we were talking about C this would be called undefined behavior. If you don't satisfy the preconditions before calling a function then the not only is the output invalid but the entire program which uses it is invalid too.

I don't think a language like PHP, or indeed anything written in it, should have undefined behavior but I'm not the author of the library.


"Fixing" broken data silently is tremendously bad behaviour in general because you can't possibly know whether the caller knew that they were providing invalid data and whether the caller is going to be happy with your fix.

If you expect input parameters in some range and you get values out of that range, then you blow up. You don't silently truncate anything and you certainly don't reduce the entropy of your random number generator by nearly 50%.


This was my main doubt about PHP 7's scalar types, which will autocast values into the desired types. I don't expect the caller to know better than the callee what are the method's boundaries, and once casted you may not be able to see if the input was garbage. But yeah, when something breaks the method's author can wash their hands.


If the user asked for more bits of randomness, they need to generate more bits. They could have chosen a different algorithm, or perhaps called the random number generator twice and concatenated the bits. Scaling up is clearly the wrong thing to do.


It also makes me wonder how it behaves for numbers below the limit. I would wager that there's substantial bias in the results if you ask for a max of, say, 2^31 + 2^30.


The crazy part is that, instead of throwing an error when given input that doesn't make sense, it just silently produces garbage output.


This. Documenting odd behavior like this isn't an excuse for having odd behavior. It should just be changed to fail on bad input. It's not even a breaking change (apart from code already using it with bad input).


How else is it going to unseat INTERCAL?


First it needs a replacement for INTERCAL's "please come from".


If it's documented it's a feature.



A clear example of "worse is better" winning.


Almost everything in PHP is surprising behavior.

And yes, in the eleventy billion cases where PHP should yell at you that it's doing something totally different than what you wanted, it instead just goes about its day silently.


Which is why it was a great fit for MySQL, which would silently store invalid dates and truncate over-long strings with no warnings.


You joke, but I'm pretty sure this is one of the reasons why that stack was so successful. Same principle behind HTML rendering. "It's displaying something, isn't it? Errors are only suggestions."


I'm not so sure. It's kind of true for HTML (forgiveness was not "good" property of the format, but of the browser: 1 site works in one browser, but not in the other, and user will blame the browser, so let's display whatever the hell we can). But PHP in the early days was kinda handy (even though hacky) tool, allowing to use simple syntax to render HTML instead of custom Perl-scripts. And then it just got popular.

MySQL was once actually superior open-source database, being more performant and simpler to setup and use than PostgreSQL. And then, once again, it just got popular.

I don't think being faulty helped these technologies, although it sometimes is the case.


There are warnings. Most people just ignore them in their APIs


Trying to insert over-large data should have been an error by default from the outset, never a warning. Ilooked at MySQL in 2000 and went "You had one job, database; protect my data."


This is so true. It is the main reason I switched to PostgreSQL and never looked back.


... and accept nonsensical SQL queries, returning whatever it feels like.


>Almost everything in PHP is surprising behavior.

It's not. White it does have several surprising behaviors, "almost everything" is just an example of uniformed BS people say when it comes to PHP.

C also has undefined behavior and several bad decisions, ditto for C++, Javascript, etc. But PHP is just an easy target for trolls.


Even worse, some of the Internals teams argue in favor of silently trucking along instead of, e.g. throwing an exception: http://www.serverphorums.com/read.php?7,1250372

EDIT: because of HN's annoying "you are posting too fast" limit I cannot reply to the comment below.

The arguments against Exceptions began in the comments on the relevant pull request: https://github.com/php/php-src/pull/1397


from that thread..

   At what point do we stop blaming the developers for not knowing how to
   use our badly designed features, and accept responsibility for exposing
   an API that is hostile towards simple, efficient, and correct implementations?


It might be worth noting that 'sarciszewski was the author of that remark.


Actually that thread seems to strongly argue that either an Exception or fatal error should be thrown. I'm not seeing anyone in that thread wanting to silently truck along -- am I missing it?


Wow, that thread is current! This mentality really is systemic to PHP. It's like the exact opposite of "let it crash".


That is not the PHP way.


[deleted]


    php > echo PHP_INT_MAX;
    9223372036854775807


the error is from php 4.2 or something.


Ah, I see, I just misunderstood their interface. Thanks.


There is a caution box in the description, which contains a different warning, and then the warning you have cited is in a second caution box in the "Notes" section, after the changelog and the examples.

It doesn't surprise me that people might not notice the existence of that second warning. I believe that most developers wouldn't scroll down to read the changelog and the example if they think they understood what the function does from its description.


Why even accept ranges that span more than 2^32? That seems like an easy solution to a broken function.

Also fun:

    echo "<?php echo mt_rand(-mt_getrandmax(), mt_getrandmax());" | php
is always even on my PHP 5.5.20.


> Why even accept ranges that span more than 2^32? That seems like an easy solution to a broken function.

It may never have been designed to. If it was written before 64-bit machines were commonplace, mt_getrandmax() would always be the same as PHP_INT_MAX.


Why not generate two values from the underlying function and shift them into place if the range is larger than 2^32 -1?

Oh well, ofc. because this is PHP which goes by the principle of most surprise.


Yes, that's exactly what sensible RNG interfaces do, like std::random in C++. If your underlying engine only produces 32 bits at a time, it'll grab two of them when you request a 64-bit type.


no surprised about the down votes, i've seen a lot of logical explanations and correct responses get them. its almost like they would rather read drama then thoughtful answers.


> Because max value should be "mt_getrandmax()" instead of "PHP_INT_MAX", it just gets a 32 bit number then scales it up.

How does that answer anything?


Quoting a Reddit comment (https://www.reddit.com/r/lolphp/comments/3eaw98/mt_rand1_php...):

The problem is way worse than you think. Check out what this looks like when printed in hexadecimal: http://3v4l.org/XVTgS

Basically, what is going on is that PHP_INT_MAX is 2^63 - 1. mt_getrandmax() is 2^31 - 1. The way mt_rand() makes a random number when the limit is too large is that it makes a random number in the range [0,2^(31)), then it scales it to be a number in the range [0,MAX-MIN), and finally adds MIN.

So in your case, it scales everything by 2^32 and adds 1. Which is why the numbers are* extremely non-random. [See my other comment in this thread for a more detailed explanation and some more test scripts that prove this is what is happening.](https://www.reddit.com/r/lolphp/comments/3eaw98/mt_rand1_php...


And that's another reason why it's a good thing that PHP 7 (coming out soon!) has a new function called random_int() which provides an unbiased distribution of integers powered by a CSPRNG (yes, it uses urandom, in case anyone asks).

My employer is leading the effort to expose a compatible interface in PHP 5 applications so developers can add one line to their composer.json file and start writing code for PHP 7. It's MIT licensed and should nearing its 1.0.0 release soon.

https://github.com/paragonie/random_compat


Using /dev/urandom is not a good thing. It's only needed to get secure random numbers (CSPRGs) very slowly. You'll drain it so that the apps which really need it will be out of seeds. To get random numbers fast, you need to use the good bits of a PRG based on an LCG.

Everybody should know by now that the mersenne twister is not only bad, but also slow.

Everybody should know by now that the first bits are good, and the last bits horrible. That's why you should not use modulo %, but rshift or better use Melissa E. O'Neill's PCG, which uses the first good bits to improve the latter worse bits. http://www.pcg-random.org/


There is in practical terms no such thing as "draining a secure random number generator".

Most RNGs are at bottom stream ciphers. The stream cipher problem of stretching a small key into a very large keystream is fundamental to cryptography, and RNGs are an instance of that problem in the most favorable setting: no message boundaries and no coordinated state.

You don't in practice worry about AES-CTR "running out of key", and so you shouldn't worry about urandom "running out of entropy".

It's understandable why so many people believe this: the Linux urandom man page invents a whole parallel universe in which this is not only a live issue, but one with applications to specific kinds of cryptography! Until we find the appropriate incantations to shut down that particular portal to the world of the Elder Things, it's best just not to look at it.


Still no updates on this bug report: https://bugzilla.kernel.org/show_bug.cgi?id=71211

[tumbleweeds]


It sounds kinda analog to a one-time-pad vs. ordinary key encryption. Waiting for more "actual" entropy bits rather than stretching the existing entropy is like a one time pad, and it can (with a bit of effort, and assuming the integrity of the sources) guarantee similar security of the random generator to a one time pad.

Stretching of the key means (at least in theory) that you could break the random function and predict the random series or your position in it.

I guess your point of not worrying about ordinary key encryption should be applicable in an RNG context too and be similarly safe, though.


> Using /dev/urandom is not a good thing. It's only needed to get secure random numbers (CSPRGs) very slowly.

https://gist.github.com/sarciszewski/f7bd4c0358a44321787b

http://3v4l.org/j330O

mt_rand() performs worse than CSPRNGs, funny enough

Using a strong PRF we can take a 128-bit seed and turn it into an endless stream of random numbers. If /dev/urandom drains entropy, that's a bug that your OS should fix.

> Everybody should know by now that the first bits are good, and the last bits horrible. That's why you should not use modulo %, but rshift or better use Melissa E. O'Neill's PCG, which uses the first good bits to improve the latter worse bits. http://www.pcg-random.org/

Yeah, there is no pgc_random() function in PHP though.


> mt_rand() performs worse than CSPRNGs, funny enough

Because the Mersenne twister is outdated technology, which is bad and slow. Use pcg instead.

> If /dev/urandom drains entropy, that's a bug that your OS should fix.

Defying logic? Without a HW source for noise that's impossible. Even if people are arguing being practical to drop security, you should not be convinced. I didn't see tests which would confirm this theory.

> Yeah, there is no pgc_random() function in PHP though

Writing that binding would cost a php-c dev 20 minutes. But they should not. The problem is the bike shedding to replace that in the stdlib for rand(). Better, faster? But...


> Writing that binding would cost a php-c dev 20 minutes.

Most PHP devs are not C devs.


You do understand that urandom does not block waiting for entropy, I think you are confusing /dev/random and /dev/urandom.

  $ man urandom


Yes, I said "drain", not "block". /dev/random needs to wait, /dev/urandom just goes on creating worse numbers.


Using a loop of that length (for ($i=0;$i<10000;$i++)) on a site allowing people to execute code, and then linking it to HN effectively amounts to a do-it-yourself DDoS. I don't think I wanna be the host of that site right now.


The site itself picked up on this aswell. The following message appeared for me:

    "Abusive script
    This script was stopped while abusing our resources"
edit: Too slow: 504 Gateway Time-out


Shouldn't they cache the result?


A lesson they're learning right now, I suppose.


To find out later that not everything can be cached, e.g. time(), rand(), file_get_contents()... to find out further that no sys or lib call that contains I/O can be cached. And then damn it, why bother caching at all?


Which we could call 'suiciDDOS'


It's also being linked on Reddit for a double whammy DDoS.


... and it's down :) I guess they get this all the time.


They don't even caution about it.


I believe it's also generally an anti-pattern to do things like "num = rand() % max" or "num = rand() & bit_mask" where rand() returns an integer from a pseudo-random number generator, right?

PHP may not do a very good job at ensuring an even distribution throughout the space of possible integers, but for PRNGs in general (especially the quick & dirty ones), the worst place to grab bits from is the least-significant bits.

(my source is that I hung out with a copy of Numerical Recipes in college; Numerical Recipes has a nice chapter for learning about PRNGs, along with example code for a number of implementations)


It is OK to do (rand() % modulus) when (modulus + 1) divides (MAX_RAND + 1). Otherwise you will end up with non-uniform distribution.

But yes, in general the rand-modulo pair is an anti-pattern.


Actually the modulus needs to divide the number of possible output of rand(), which is by definition (MAX_RAND - MIN_RAND + 1), if you want to have a uniform distribution (assuming your rand function has a uniform distribution).

In the specific case of mt_rand, MIN_RAND is 0 by default, then the modulus needs to divide (MAX_RAND + 1).

In the general case, there's no rule in any uniform continuous implementation of rand such as (modulus + 1) needs to divide a specific number.


Whoops, you are right, I should have written (modulus) and not (modulus + 1). Another reason why this is an anti-pattern :). Thanks for correcting me!


Yes, don't use % even with a CSPRNG!

https://stackoverflow.com/a/31374501/2224584


It's true that's an anti-pattern, but it's not a particularly exploitable one in most of the places that really need CSPRNGs.


Agreed. It's a red flag for "expect more exploitable issues to be found around the corner" and can result in biased distributions, but it by itself does not break a RNG.


Oh? I get perfectly sane results using modulo, as it should if it is actually a completely random number: http://3v4l.org/nIBGT


You found the exception! Congratulations!

You're using a bucket size which is a power of 2. A random byte fits on the interval of [0, 255] which is evenly distributed modulo 256, which if you reduce mod 2^N for 1 < N < 8 will still produce a uniform distribution.

If you copied my JS code into your console and changed the 54 to 64 the bias evaporates. Hooray :D

However, for arbitrary ranges, you should use an unbiased selection strategy to ensure a uniform distribution.

http://3v4l.org/EgfPR - notice 0 is significantly more common than the other values when reduced mod 17

http://3v4l.org/LXmKq - same with mod 15

https://defuse.ca/generating-random-passwords.htm


This is why you read the documentation. Don't deduce what a method does purely on its name and your common sense, you are probably wrong...


Yes. In fact I propose that giving methods pronounceable names is an anti-pattern. Method names should be random strings of characters. Then you don't fall into the temptation of not reading the documentation and assuming things. In fact, make them long enough so that you can't memorize them.

someObj.dbrdCfj34uW31U289u(x)

This avoids a lot of frustration. Imagine if this was called someObj.sqrt(x). People could jump to the conclusion that this takes the square root of a value, when in reality it only does so on weekdays. On weekends it returns the CPU temperature.


I suspect you are being facetious, but I've for a long time thought that this was actually an okay idea, even for systems less screwed up than PHP.

When I need to write a function with lots of complex interactions, I mash the keyboard rather than try to come up with a name that's misleadingly simple. That way you have to read the documentation.


Use unicode brah, this ain't the 90s.

If there is no smiley in the function name it won't pass QA here.


At least in PHP. Unfortunately in my experience the PHP documentation is not frequently read by PHP developers or even PHP internals developers. It seems to exist mainly for PHP hatebloggers, for whom it's a gold mine.

This illustrates some problems with the PHP way of doing things. For some reason the PHP internals community prefers to have this function continue to spit out garbage when given invalid inputs instead of throwing an error, especially if spitting out garbage is what was done in the past. The answer is always, "Don't do that. RTFM." but most developers don't read the documentation for each and every function that they use, but rather rely on the name of and common sense until there's a error. If there's no error, the documentation never gets read.


I use PHP a lot (primary product is written in it) and I actually mostly like using it but you can get really burnt applying the "principle of least surprise" to the PHP standard library.

I really wish there was more support behind the idea of creating a new standard library and gradually deprecating the old one, it's been tried but never gets any traction.

Meanwhile frameworks and libraries continue to add their own (often very good) helper libraries to smooth the edges.


It's because there simply are more odd numbers than even numbers...

(j/k!)


I think if you consider Z rather than N the opposite is true

Proof:

For ever number k there exists -k which has the same parity as k (that is, if k is even, -k is even as well)

For every k there is (k + 1) with opposite parity (same thing with -k and -(k + 1)

Now, if we count the numbers, from 1 to infinite, the number of even and odd numbers is the same

from 1 to -infinite the number of even and odd numbers is the same as well

And you get an extra zero, an even number


Technically, you can create a bijection between all natural numbers and all even numbers. For all natural numbers k, there is an even number 2k (and for every even number 2k, there is a natural number k). So, the number of even numbers and natural numbers is the same. That's what happens when you try to reason with infinity ;)


Yes, with Natural numbers

Curious what is possible in Integer numbers

> That's what happens when you try to reason with infinity ;)

Ah yes! Fascinating subject


Cardinality of natural numbers and integers is also the same. 0 -1 1 -2 2 -3 3... 0 1 2 3 4 5 6...


Intuitions based on assuming that you can treat "infinity" the way you would a finite number are often, as in this case, wrong.

The "infinity" referenced here, aleph-null, is provably both the cardinality of the set of even integers, and the set of odd integers. (And also the set of integers, and any other countably infinite set.)

So, unintuitive as it might be when thinking about finite subsets of the integers, there are not only as many positive integers as negative integers, but as many of either as there are integers.

(There are, however, bigger infinite sets, there are, for instance, more reals than integers.)

See, https://en.wikipedia.org/wiki/Aleph_number


Some fun examples about infinity:

There are as many prime numbers as there are integers.

There are as many fractions between 10 and 11 as there are integers.

There are more real numbers between 41.00001 and 41.0002 than there are integers from -infinity to infinity. Even though both are "infinite."


Hilbert Hotel is a great thought experiment on this, I saw a BBC open university show on this one night after the pub and it properly bent my mind, I wish I had more mathematical aptitude as this stuff is utterly fascinating.


Count from 2 to infinity and 0 to -infinity and you will get the opposite result: you'll have an extra odd number, 1.


The function n -> n + 1 maps every even number into an odd, and it has an inverse (obviously n -> n - 1). Thus, the two sets have the same cardinality.


Into a unique odd (which is key) : P


I see your problem, PHP.


[deleted]


The bug works for me on PHP5. Are you on a 32-bit machine? Your values would suggest that you are. See Stormcaller's comment for an explanation.


Win7, 64 bit.


64-bit PHP on Windows does not support 64-bit integers. 'echo PHP_INT_MAX;' would return 2^31 on a 64-bit Windows build.

From http://windows.php.net/: "The x64 builds of PHP for Windows should be considered experimental, and do not yet provide 64-bit integer or large file support." PHP's principle of most surprise in action.


This isn't quite true. PHP has 64-bit releases for Windows, but they are considered experimental, and prior to PHP 7, they only used 32-bit integers. That is because of the Windows/Unix C "long" size issue, which PHP 7 fixes by using a typedef for its integer type which is defined differently on Windows.


I really meant there are no 64-bit integers on PHP on Windows, even if there are 64-bit builds. I wrote that comment on my phone while getting off the bus and just updated it.


Well, I object to the "principle of most surprise". The 64-bit builds are experimental for a reason (you wouldn't expect no 64-bit integers), and this issue really isn't PHP's fault.


It is a PHP bug. You're on a 32-bit system. The bug only occurs on 64 bit systems.


You on a 32-bit system?


Given that all the numbers are less than 2^32, there's less than a 10^-192 chance that it's a 64 bit system...


for reference, the estimated probability that a proton will spontaneously decay into a neutron in a second is ~4.1x10^-37.

this has never been observed.


To be fair, we aren't looking that hard...


Neutrino detectors should detect neutron decay, shouldn't they?

If so, we are looking very hard.


You can't set up one or three neutrino detectors for something that improbable and call it a serious effort.


Do they need setting? I imagined they'd detect one by themselves, without any tunning.

But then, 10^-37 is a very low number. I wouldn't be surprised if that means that all neutrino detectors could catch one, but failed to do so up to now.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: