It's highly ingrained in my mind that rand() provides a deterministic sequence, and I use deterministic psudorandom sequences all the time.
If you are changing this just in OpenBSD isn't that going to be a portability nightmare? I would find this behaviour in OpenBSD extremely surprising.
I'm very sceptical of the concept that developers expect true randomness out of rand(). Unless the way people are taught this stuff has changed drastically, the very concept of randomness in computing was always introduced right alongside the idea that they don't produce true randomness.
Infact, the need to call srand() or get the same sequence every time can't help but teach you the fact that it's a deterministic sequence.
If people wanted deterministic sequences, they wouldn't be calling srand(time(NULL)). Or srand((time(NULL) + buf_len) ^ rand()). Or, my personal favorite, srand(getpid() * time(NULL) % ((unsigned int) -1)). Though I will admit it is very difficult for me to properly imagine what that developer was expecting.
I tend to agree. The determinism of random() is a feature, but it's a feature that almost nobody actually needs. It's just that the alternatives are either more trouble than they're worth or platform-dependent.
I use the determinism with test cases generating random input data. Every run of a test case generates a random seed based off the system time and reports the seed used. When a failure happens, the seed is fixed using an environment variable. Then the failed case can be reproduced until the bug is tracked down.
This isn't an unusual practice.
The determinism in standard PRNG functions is an important feature not a weakness. Theo is being foolish. If he's unhappy with the way the *rand functions are being used he should just change the code calling into the library to use random seeds derived from a suitable entropy source, not change the library itself.
I would even say it's common within programs that generate random sets, but that those use cases make up a tiny minority of all the uses of rand(), and it shouldn't be difficult to determine them from context.
Why are things like Mersenne Twister or linear congruential generators platform dependent?
To me, there are three classes of reasons you would need random numbers:
1. Simulation. Things like Monte Carlo. In this case, only the statistical quality of the numbers matters.
2. Repeatable simulation. Things like fuzzing tests or benchmarks, where you want them to be repeatable. In this case, you care about determinism and the statistical quality of the numbers.
3. Resource allocation. In this case, you have a set of things that you want to be unique, but to either make them unpredictable (process numbers, invoice numbers) or unique without sharing state (guids). You care about the statistical quality and unpredictability here.
The problem is that 2 and 3 are at odds with each other. 3 requires a constant source of entropy, and 2 requires there to be no entropy.
My understanding is that the way cryptography is done is by having hardware entropy source in the CPU fed into some sort of hashing algorithm. Having a CSPRNG with only n bits of initial entropy either exposes the problem of if an attacker can somehow get the state of the CSPRNG, then they can predict future random numbers that are generated (and previous ones, depending on implementation). Constantly reseeding with entropy means that it's not possible to do that.
It's in fact half of a feature, because the generation method is not specified. So you can't use it for terrain generation in portable code, for example.
I'm not going to argue that rand() isn't misused by programmers who don't know any better, but at the same time you have legions of coders who understand rand() very well and those expectations will be totally broken here.
I wouldn't have even bothered to check the the man pages for rand before assuming it was deterministic for a given seed to srand().
I imagine they wanted a different set of numbers each time, and didn't really care what they were, provided the sequence was vaguely unpredictable. On the face of it this doesn't sound to me like a great reason for breaking standardized behaviour - but you know what they say about people who comment from the sidelines.
The sequence isn't necessarily even portable within a system, as the libraries used by your application may themselves call rand() in unpredictable ways. For instance, consider a DNS resolver that generates query IDs using rand(). If the resolver library generates a new ID for retransmitted queries (e.g, when a packet is dropped), any DNS resolution may desynchronize the state of the random number generator.
And, just as if that weren't bad enough, consider multithreading. The order in which threads execute is unpredictable, so, if more than one thread calling rand() is executing at once, the state of the RNG may become dependent on the order threads run in. Long story short, setting a global RNG to a known state isn't a sufficient way to make a program deterministic.
The use cases are more about being able to repeat a run on a given system - a particular simulation or a unit test that triggered an error (e.g. a sequence of values in a hash table). I could probably come up with a story where I'd want rand to be portable across systems, but it wouldn't be plausible.
For a relatively common example of where you want portable pseudo-randomness, many games generate "random" worlds with a known PRNG algorithm and expose the seeds to the player. Players can then share interesting worlds with other players by simply sharing the seed. They can also generate "personalized" worlds by using their name or other meaningful text as the seed.
Of course, it's completely reasonable to expect such games to bring their own PRNG implementation for that.
> Of course, it's completely reasonable to expect such games to bring their own PRNG implementation for that.
Indeed. I was going to point out that rand() and random() might be horrible options even if you want a deterministic, repeatable stream of random numbers. I am pretty sure celeron55 initially tried using these functions in world generation for minetest, and seeing the rng's output as cubes in a 3D image made it hilariously clear how predictable and patterned the output is.
Actually if you try using rand() for map or noise generation, you'll find that most implementations of it produce utter shit. It's not even random looking. It produces very obvious patterns. Really, it's shit.
So despite that it could still be useful in some situations, but I don't think that's good enough reason to require its existence in standards and standard libraries. If your application happens to need something that carries a remote resembleness to randomness, and it needs to be fast, your application could just provide its own implementation of that special functionality. rand is just a few lines of code...
Woops, I totally forgot about that, I think I was confused with range reduction on the final result. Nevertheless, Tyche-i is also blazing fast and small:
It's slightly disappointing that he didn't look through at least a sample of the apps using something like `srand(silly_value)` to see what the numbers are used for. (at least that's what I understand from his post) They're using silly pseudo-random numbers, but what if all of them expect nothing more? They're breaking a known interface that may just happen to affect some people when they don't expect it, but there's a chance the "good side" of the change is not even going to be noticed.
I mean, use cases like choosing a tip of the day to display, or choosing a starting colour for displaying a number of objects, or a million other ideas probably wouldn't care if 0 was chosen 10% of the time as a result.
It depends in what way you want your numbers to be "better". You might mean deterministic and fast and with less predictable patterns (Mersenne Twister), deterministic and indistinguishable from true random (and slower) (CSPRNG), non-deterministic (depends on the platform).
Is it really wise to choose a random number generator based on RC4, which is known to have remarkably strong biases? Surely it would be better to use something based on ChaCha20, Keccak or Spritz if you have the choice?
Yes, it is misleading, it used to use rc4. Posix are proposing to standardise it under a new generic name, so it is unlikely to get changed until this happens.
For most systems, open /dev/urandom and read from it.
For pure C, use a secure hash function such as SHA-256.
For a simpler approach, just fail to initialize the variable for which you want random data, especially if you frequently want the value to just be zero when debugging.
Those first two suggestions are good. The last one, not so good.
First, it's not very random. As you point out, it will often just be zero. If zero is acceptable to you, then just initialize it to zero. If you need something reliably different then you can't count on uninitialized memory to deliver it.
Second, it will trip diagnostics in tools like Valgrind, which you'll have to either silence or ignore.
Third, and this is a real fun one, any C program which reads from uninitialized memory is non-conforming, which means that the compiler is allowed to assume that it never happens. For an example, let's say you wrote some code to flip a coin "randomly":
int uninitialized;
if((uninitialized % 2) == 0) printf("heads");
else printf("tails");
The compiler would be within its rights to eliminate both branches of the if statement. It would, in fact, be within its rights to start a game of nethack upon encountering such code (as early versions of gcc did when encountering a #pragma statement) or make demons fly out of your nose (a common example of what C compilers are allowed to do with such things, albeit one that's somewhat difficult to implement).
First, sampling atmospheric noise is not good general entropy source, because it needs careful setup to ensure that what you are sampling actually is atmospheric noise and not some signal, also such noise might well be unpredictable, but certainly is not secret (ie. not readily usable for cryptographical purposes).
Second, there are significantly cheaper ways to get quality entropy in modern digital system. Relative timing of various internal events in system with multiple clock sources is one of the meaningful sources. Also, purpose built hardware RNG integrated into some silicon is essentially free (and mostly cheaper than it's interface circuitry).
The problem is not that system does not have usable entropy, but that application tend to not use the good sources that are already there (ie. /dev/urandom, CryptGenRandom/rand_s and such)
It's highly ingrained in my mind that rand() provides a deterministic sequence, and I use deterministic psudorandom sequences all the time.
If you are changing this just in OpenBSD isn't that going to be a portability nightmare? I would find this behaviour in OpenBSD extremely surprising.
I'm very sceptical of the concept that developers expect true randomness out of rand(). Unless the way people are taught this stuff has changed drastically, the very concept of randomness in computing was always introduced right alongside the idea that they don't produce true randomness.
Infact, the need to call srand() or get the same sequence every time can't help but teach you the fact that it's a deterministic sequence.