I wrote some replacement classes that address all of the known issues, here:
I think so, too. Because it was not only referred to, but also linked ;-)
- It is not thread-safe (and might start outputting a serie of 0 when called in parallel)
- There is a bug (acknowledged by Microsoft but not fixed for backward compatibility reasons) in the implementation meaning that the generator has an abnormally short period and is, overall, less random looking.
There's enough scenarios like this that making a change to a PRNG algorithm is a very dangerous and breaking change. People rely on the fact that, given the same seed, you get the same sequence of values.
I understand not wanting to change the implementation now, but users should never have assumed it would be stable in the first place.
It's not an assumption. It's directly in the documentation.
"If the same seed is used for separate Random objects, they will generate the same series of random numbers."
 - https://docs.microsoft.com/en-us/dotnet/api/system.random
“However, note that Random objects in processes running under different versions of the .NET Framework may return different series of random numbers even if they're instantiated with identical seed values.”
The actual answer is that you shouldn't just provide a default "Random" class, you should provide a more general class with a pluggable algorithm.
In practice for .NET it sounds like that’s actually correct -- the bad implementation will never be fixed. That seems like a bad thing.
And this is true for all backward-compatibility concerns: you’ll have a bug, or a poor syntax decision, or a crappy api, thats required to be there because of downstream concerns. If you keep breaking people’s programs to improve the language, people will either eventually stop updating, or stop using the language altogether, because it becomes a massive PITA to get any new features; do it enough and people will say fuck it, you cant be trusted to stay stable, I’ll write it myself. And eventually a library will come along that promises stability, and you’ll be back in the same boat.
Stability is a feature. And judging from how languages treat stability today, and how one of microsofts major reasons for success was its almost obscene adherence to backwards compatibility, it is an important feature.
The cost is of course that these problems persist, and eventually build up untill someone forks, or a major version increments.
But theres a reason that perfect is the enemy of good. Breaking programs arbitrarily to fix bugs/issues slaughters downstream productivity.
Macs were incredible for backwards-compatibility back in the 80s and 90s, as good as PCs if not better. Games from 1985 would run happily in System 7 and MacOS 8. It didn’t help them win against the PC.
Since the return of Steve Jobs, Apple have become increasingly aggressive about killing off old “obsolete” hardware and software features. As a Mac or iOS developer it can be incredibly frustrating, constantly having to jump through new hoops just to be permitted to stay on the platform. But that doesn’t seem to have hurt Apple’s business success in the slightest.
To answer your initial question--
Whats the point of providing a seed() function if the algorithm can change from under your feet, for any given implementation?
I was imagining that the algorithm would be stable across runs but permitted to change across major library updates, say.
But I forgot there are two parts to it. One is seed(), the other is the no-args constructor that uses the system clock but no additional randomness. Can we at least agree that that one should be fixed? It’s hard to see how any users even could have a hard dependency on that specific implementation. Like, code that absolutely requires independent Random objects created in the same millisecond to have the same seed? Do you see a big risk in breaking clients like that, for the benefit of improving randomness for everybody else?
In this particular case, though, there is a chance that the bug is what is actually breaking the programs: As mentioned in the GitHub comments, it is possible to produce not-too-contrived simulations which fail completely under System.Random, and for which a fix would make the program less broken.
As long as Microsoft fails to document the brokenness on MSDN, there will be users assuming that the PRNG does what it's supposed to do, and who are at risk at drawing incorrect conclusions on statistics. What they do state in the documentation is the following :
> The implementation of the random number generator in the Random class isn't guaranteed to remain the same across major versions of the .NET Framework. As a result, you shouldn't assume that the same seed will result in the same pseudo-random sequence in different versions of the .NET Framework.
Apparently their implementation has several shortcomings.
If they wanted to incorporate Math.NET Numerics (which is pretty good, I have used it before for a lot of stuff) into the standard library alongside the existing RNG architecture, I'd be down for that, but I expect the same seeds to play for me the same sequences everywhere because if they don't simulations, games, etc. will break.
Eventually you can deprecate the existing System.Random and turn its implementation into (using Math.NET Numerics terminology) a separate RandomSource, and eventually remove System.Random, but you can't change the contract of System.Random.
Once upon a time that was inconvenient for me. I found Math.NET Numerics to be a good replacement.
Also check out threadlocal for thread-safety.
"RNGCryptoServiceProvider is generally a safer choice when you need to generate random bytes. Creating an instance of this class is expensive, so it’s better to populate a 400-byte array than call the constructor 100 times to populate a 4-byte array."
Why would you call the constructor 100 times to popuate 100 4-byte arrays?
It's a service, surely the best approach to call the constructor once, then populate 100 4-byte arrays by calling GetBytes 100 times from the same service?
It is explicit code without the downside of the expensive constructor.
edit for clarification: It's true that getting all the data at once will still be much faster because it'll save all the other overhead, but it's not the constructor at fault there.
First of all, you're completely right. I do actually mention this fact just before the start of the section "An experiment". Here, the argument is that the bias this odd/even mismatch introduces is orders of magnitudes smaller than what is introduced by the rounding errors; that is, a perfect theoretical RNG drawing from that range would not produce nearly as biased a result (and conversely, if you were to run the snippet in the blog post using `rng.Next(2, int.MaxValue)` or `rng.Next(0, int.MaxValue - 2)`, you wouldn't see the same bias, even though the ranges are still odd/even-biased to about the same extent).
EDIT: I feel like I miss something in the explanation. Can anyone explain how the seed table is actually used?
And thanks for the interesting article btw, just realised you're the author :-)
“Core” and “Framework” have different implementations of the same class? Who names these things?
I've almost started wishing that Microsoft started assigning a reference guid to their products to ease searching about them since many of their products follow the same pattern of having similar names that are only differentiated by a common English word