
The bell has tolled for rand() (2014) - methanic
http://cpp.indi.frih.net/blog/2014/12/the-bell-has-tolled-for-rand/
======
dragonwriter
> For reasons I can’t fathom, the C standard mandates that rand() must always
> start as if it were seeded with 1. I can’t imagine why anyone thought that
> was a good idea. What is the sense of making a random number facility that –
> by default – isn’t random?

It's not a random number facility, it's a pseudorandom number facility. And,
unless you've explicitly seeded it from an unpredictable source, the results
are consistent across runs with otherwise similar contexts, which is a
desirable feature in many of the domains PRNGs are used for (it's undesirable
when you need an RNG instead of a PRNG, but then so is using a PRNG like
C-standard rand() in that case.)

~~~
unscaled
It goes without saying the determinism of PRNGs can be a very useful property.
But if you need that property, you if can explicitly seed it from a random
source, you can also explicitly seed it with 1 if you wanted.

The PRNG/RNG distinction you make is slightly false. Even if you're doing
cryptography usually don't want to use a true source of entropy directly,
you're better off feeding several sources of entropy into a PRNG. That's what
both /dev/random and /dev/urandom give you (despite some common
misconceptions).

What most developers really want is a non-deterministic (randomly seeded)
PRNG. When you actually want the PRNG to be deterministic, you usually also
need to seed it explicitly anyway (e.g. having multiple runs in the same app
or allowing the user to specify a seed to generate a map in a game). The
implicit is useful for very few developers (if any), and confusing for the
vast majority of junior developers out there.

------
smitherfield
I'm not sure whether it makes sense for C++ to be deprecating common C
standard library functions, before they've been deprecated and replaced in C.
`rand(3)` is certainly very outdated and poorly-specified, but not inherently
buggy regardless of usage scenario like `gets(3)` or C++ `std::auto_ptr`. And
anyone using `rand(3)` for cryptographic purposes is much too far gone to be
helped.

~~~
tonmoy
Someone converted our codebase into a multi threaded version, didn't realize
that rand() is not thread safe as multiple threads were generating same random
number from time to time.

PS: that someone was me

~~~
slavik81
It's an easy mistake to make. rand() is built around hidden, global mutable
state. It's fundamentally a bad design, and it's the same reason I dislike the
std::randint proposal.

~~~
xyz6633
std::randint is supposed to use thread-local storage for its state. At least
according to cppreference:
[http://en.cppreference.com/w/cpp/experimental/randint](http://en.cppreference.com/w/cpp/experimental/randint)

~~~
slavik81
It's not the thread-safety itself that's the issue. The thing is that when you
can see the state, it's obvious what can influence what. When you hide the
state, there's more documentation to read and interactions are less obvious.
"Show me your tables..."

------
problems
OpenBSD decided to violate the standard and return non-deterministic results:

[http://man.openbsd.org/rand.3](http://man.openbsd.org/rand.3)

~~~
windsurfer
Doing the safer and slower thing seems better on the surface than just giving
up and not compiling, and the article doesn't talk about this option. Could
anyone chime in with some problems with OpenBSD's decision?

~~~
avar
There's lots of uses for random but predictable numbers, e.g. adding random
fuzz to a test suite run that you'd like to be able to repeat.

A weak random number generator that usually powers rand() is also usually
faster and takes less CPU to run.

Both of these things were thought to be more important in more innocent times
when C was standardized.

OpenBSD's decision makes sense on balance, it gives some security to otherwise
insecure programs by default, the trade-off is breaking programs written
against the C standard which make assumptions about the documented rand()
behavior.

~~~
problems
> There's lots of uses for random but predictable numbers, e.g. adding random
> fuzz to a test suite run that you'd like to be able to repeat.

One thing to note though, different platforms don't necessarily implement the
rand() function the same way, the standard does not define an algorithm for
it, so unless you want your tests to run differently on every different C
library it might not be the best idea to begin with.

~~~
int_19h
Running the test randomly is fine - indeed, desirable for fuzz tests (some
eventual run might uncover a new issue). The requirement here is to have
reproducibility - i.e. if the test fails, it should be able to log some kind
of state, like a seed, that allows you to reproduce that same failure exactly.
Obviously you'd need to also use the same environment, but that's usually not
a problem.

------
throwawayx999
This is an article from 2014, previous discussion here:
[https://news.ycombinator.com/item?id=9037151](https://news.ycombinator.com/item?id=9037151)

~~~
StillBored
There are a couple comments from that discussion that mirror my general take.

The standard should be defining something that is good for the general case,
without a lot of levers. KISS! That means giving people 4 different PRNG
algorithms, plus a hardware interface, etc, is not the right direction. Your
average programmer shouldn't have to make a decision more complex than
"secure" or "fast" and the API should clearly make the distinction.

In this case, the linux kernel finds about the right match with /dev/random
and /dev/urandom. If you want something more complex then you can pick up a
3rd party library and spend 6 months learning about random numbers, how to
test their randomness, seed a CSPRNG, etc. Otherwise leave the implementation
choices to the experts and make it easy for the common programmer.

~~~
slavik81
On Linux /dev/random and /dev/urandom are rather oddly misshapen abstractions.
Neither device has sensible behaviour. /dev/random blocks unnecessarily after
seeding, and /dev/urandom doesn't block even if it hasn't been seeded yet.
Contrast that with FreeBSD, which provides exactly the behaviour I'd want:
block until the RNG has been seeded, then never block again.

~~~
rocqua
> /dev/random blocks unnecessarily after seeding

That's because it's supposed to be a random-number-generator, not a pseudo-
random-number-generator. It needs real entropy.

~~~
slavik81
I am not a cryptographer, but my impression is that the points at which
/dev/random stops are not well-justified. It doesn't directly pass out
entropy, but instead feeds it through a cryptographically-secure PRNG first,
making it a rather arbitrary line that's drawn to decide when entropy is
'exhausted'.

There was a brief HN discussion on the topic when FreeBSD made their change:
[https://news.ycombinator.com/item?id=8550457](https://news.ycombinator.com/item?id=8550457)

------
Animats
"gets()" "sprintf", "strcat", and the other buffer-overflow friendly C library
functions should have been deprecated in the 1990s. It's not too late. Force
maintainers to write

    
    
        #include <unsafe-deprecated-c-library-functions>
    

if they really want them.

~~~
astrodust
Make a #pragma for deprecated functions and then harass people that use them
if they want with a -w flag.

------
mastax
> The idea is that there should be a thread local random number engine –
> possibly hidden, but it might actually be exposed to the user; that’s still
> being decided. That will most likely be a Mersenne Twister – because that’s
> just the best all-round pseudo-random number generator, and it generates
> excellent “random” sequences.

I think this is a nice article, but this part stood out to me. I've heard many
times here on HN that it's one of the worst non-cryptographic pseudo-random
number generators in common use.

I couldn't find the specific article, but this Q/A had some good information
and links: [https://cs.stackexchange.com/questions/50059/why-is-the-
mers...](https://cs.stackexchange.com/questions/50059/why-is-the-mersenne-
twister-regarded-as-good)

~~~
rurban
10 years ago the Mersenne Twister was good enough, nowadays not anymore. I
wrote this: [https://cs.stackexchange.com/questions/50059/why-is-the-
mers...](https://cs.stackexchange.com/questions/50059/why-is-the-mersenne-
twister-regarded-as-good/53475#53475)

------
elihu
> rand() is specified to always be seeded the same way.

> For reasons I can’t fathom, the C standard mandates that rand() must always
> start as if it were seeded with 1. I can’t imagine why anyone thought that
> was a good idea. What is the sense of making a random number facility that –
> by default – isn’t random?

This seems like a good decision to me. There are plenty of use cases for
arbitrary values that are consistent across runs (for instance, repeatable
unit tests). If the application calls for non-determinism, let the application
author do it themselves from an appropriate seed value.

(I maybe somewhat biased from my experiences threading random number generator
state in Haskell programs.)

~~~
int_19h
I would argue the opposite - while there are many cases where you want values
to be consistent across runs, I don't think they're the majority of the casual
use of rand(). And especially for something that has "random" right there in
the name, it is a very strange and counter-intuitive default.

Furthermore, even for those cases where you do want consistency, I doubt you'd
want a hardcoded seed value. Most of such cases allow the seed to be
parametrized between runs without recompiling (via command line arguments,
configs etc), and so they'll still need to invoke srand to set the seed.

On the other hand, saying that without srand, the default is random, would
actually allow modern implementations to e.g. initialize from /dev/random.
Right now there's basically no cross-platform way to set a truly random seed
in ISO C.

------
cjensen
To my mind this is ludicrous. Rand() is useful for many common educational
assignments. It's also useful for many applications which want pseudo-random
numbers for purposes unrelated to security.

I recently needed some pseudo-randoms, and discovered that I now have to stick
a bunch of boilerplate in the code to obtain them. Is there really a
difference between me sticking the boilerplate in vs having a function called
rand() that does the boilerplate for me?

------
John_Cena
When I was taught C in the early 2010's these caveats were added as a way to
show us the limitations inherit in data type ranges and make us wary of all
libraries we would take at face value. I think that lecture was one of the
most memorable of my college education. While I understand its limitations,
would someone explain why the author is so adamant that classes should not
teach it as a lesson on the practical limitations of programming?

~~~
scott_s
That is not the author's claim. The author's claim is that even knowing its
limitations, it's still not good to _use_ in even simple situations. This is a
different claim from yours, which is that it's useful to _demonstrate_
limitations imposed by API design.

~~~
John_Cena
Well then why not allow it to be used in practical exercises? What good is a
lecture without practical application?

Edit: The author does state "Responsible teachers and tutors should stop
teaching it to new C++ programmers"; I misinterpreted his point.

------
jepler
C++11's random number facility isn't as good as all that.

Here are some of the pain points I'm aware of:

There's no way to use the random device to uniformly seed the internal state
of an arbitrary pseudorandom generator. In fact, the usual thing to write will
seed with only 32 bits of randomness. (this has been noted by others in this
thread)

Some of the floating-point functionality such as
std::uniform_real_distribution and generate_canonical have trouble where the
text specifies half-open intervals like [0,1) but the (pseudo)code in practice
gives closed intervals like [0,1]. Usually this can only be observed when
generating values of type 'float'. This makes me generally suspect that the
real number distributions in the standard library are probably poorly
specified in general. (in fact, the original report that led to issue 2524 was
in about the exponential distribution, which per the standard must be defined
in terms of generate_canonical) [http://www.open-
std.org/jtc1/sc22/wg21/docs/lwg-active.html#...](http://www.open-
std.org/jtc1/sc22/wg21/docs/lwg-active.html#2524)

Finally, there is support for some frankly weird things in the basic random
engines: you can specify a random engine that only returns the numbers 37 and
38, for instance. It turns out that e.g., std::linear_congruential_engine,
including std::minstd_rand, can have a minimum possible output value of 1
rather than 0, which in turn makes the random engine's min() method necessary.
But why not require engines to generate integral values on the closed interval
[0..n], and make it the responsibility of linear_congruential_engine to
subtract 1 before returning? A random number generator with (2^31)-2 distinct
possible values is going to make for much weirder statistical anomalies than
one with period 2^x when someone uses them carelessly (and what other way is
there to use random numbers?).

~~~
jepler
Another flaw with generate_canonical<float> is that if you generate, say, 2^40
of the things, you should expect to get quite a few values smaller than 2^-32;
but with minstd_rand, or mt19937 I believe you will not ever see values
smaller than 2^-32 or so. A similar problem exists with double, though it
takes a much longer-running problem to prove it (ain't nobody got time to
generate 2^61 random numbers)

------
TazeTSchnitzel
Normally reusing the standard implementation is the right way to do things.
But in the case of C's rand(), it's so awful that PHP's rand() using it under
the hood was considered a serious flaw in PHP, and has now been replaced with
Mersenne Twister. (Which is itself not a particularly good PRNG, but it's so
much better than rand() you'd be forgiven for ignoring that.)

------
kazinator
Catching up with 1980's Lisp:

[http://clhs.lisp.se/Body/f_mk_rnd.htm](http://clhs.lisp.se/Body/f_mk_rnd.htm)

------
Veedrac
Oh look, yet another person seeding mt19937 poorly. I see them like clockwork.
There was a lot of good discussion about this the last time I mentioned it to
someone completely different, so here's that before I copy-paste it:
[https://www.reddit.com/r/programming/comments/6e10b8/when_ra...](https://www.reddit.com/r/programming/comments/6e10b8/when_random_numbers_are_too_random_low/di6upbd/?context=3)

\---

Your code

    
    
        thread_local static std::random_device rd{};
        thread_local static std::mt19937 engine{rd()};
    

does not properly seed a mt19937.

It is unfortunate how people keep trying to use `std::mt19937` and,
unsurprisingly given how hard it is to use, how they (almost)[0] inevitably
fail. Do yourself a favour and use a randomness library that doesn't hate
you[1], is simple to seed correctly, produces decent quality randomness, isn't
horribly bloated and has more features to boot. It's 2017[2]. Let the Mersenne
Twister die.

[0]:
[https://codereview.stackexchange.com/q/114066/40768](https://codereview.stackexchange.com/q/114066/40768)

[1]: [http://www.pcg-random.org/](http://www.pcg-random.org/) or
[http://xoroshiro.di.unimi.it/](http://xoroshiro.di.unimi.it/)

[2]: Well, late 2014 in the article, but close enough.

~~~
mturmon
I notice that the PCG paper, while seemingly drafted in 2015 or earlier, still
has not appeared in ACM _TOMS_. I thought the jokey/insider-y style is kind of
fun, but VERY different from the usual voice of the journal. I wonder what the
cause of the delay is?

------
jwilk
Are there any good random numbers libraries for C?

~~~
ben_bai
Providing good random numbers should be done by the OS (kernel and libc).
Don't try to fake it in userland.

    
    
       Please wiggle your mouse for 1 minute so we can gather entropy for you AES key.
    

REALLY?

~~~
rocqua
I always though that was supposed to feed /dev/urandom.

~~~
ben_bai
/dev/urandom is a perfectly good source for random numbers. Because it's fed
from the kernel which is gathering all kinds of entropy, including mouse
movement interrupt timings. Bad implementations not considered.

One problem with /dev/urandom is you always have to open it, create a
filedescriptor just to get to some random number!?

