Hacker News new | past | comments | ask | show | jobs | submit login
Randomness in .NET (lowleveldesign.org)
75 points by lowleveldesign 8 months ago | hide | past | web | favorite | 40 comments



I think this is the github issue referred to:

https://github.com/dotnet/corefx/issues/23298

I wrote some replacement classes that address all of the known issues, here:

https://github.com/colgreen/Redzen/tree/master/Redzen/Random

https://www.nuget.org/packages/Redzen/


> I think this is the github issue referred to

I think so, too. Because it was not only referred to, but also linked ;-)


Heh, fair enough :) Not easy to find though eh (v. similar colour for normal and anchor text).


It was hard to see. It also isn't underlined like a normal link.


Two vital information on .Net PRNG :

- It is not thread-safe (and might start outputting a serie of 0 when called in parallel)

- There is a bug (acknowledged by Microsoft but not fixed for backward compatibility reasons) in the implementation meaning that the generator has an abnormally short period and is, overall, less random looking.


I'm really bummed out over the fact that you can't change the way a random number generator generates numbers because apparently people depend on the exact algorithm.


It does make some sense. Lets say you're making a game with a procedural world (e.g. something like Minecraft) which the player explores and makes changes to. Instead of storing the entire (potentially infinite) world, you just store the world seed and the changes players make. In that case, if the algorithm underlying the PRNG changes, the entire game would break.

There's enough scenarios like this that making a change to a PRNG algorithm is a very dangerous and breaking change. People rely on the fact that, given the same seed, you get the same sequence of values.


You should use your own PRNG in that case.

I understand not wanting to change the implementation now, but users should never have assumed it would be stable in the first place.


users should never have assumed it would be stable in the first place.

It's not an assumption. It's directly in the documentation.

"If the same seed is used for separate Random objects, they will generate the same series of random numbers."[0]

[0] - https://docs.microsoft.com/en-us/dotnet/api/system.random


Read just a little further:

“However, note that Random objects in processes running under different versions of the .NET Framework may return different series of random numbers even if they're instantiated with identical seed values.”


I mean, yeah. You probably should. But it's entirely reasonable of a game developer to say "I'm not an expert in random numbers, but Microsoft has lots of smart engineers, I'm sure they did their research and provided a good implementation".

The actual answer is that you shouldn't just provide a default "Random" class, you should provide a more general class with a pluggable algorithm.


No, that’s not reasonable at all. You’d be assuming not only that the implementation is exactly what you want, but also that it will be identical on all platforms and will never change.

In practice for .NET it sounds like that’s actually correct -- the bad implementation will never be fixed. That seems like a bad thing.


Whats the point of providing a seed() function if the algorithm can change from under your feet, for any given implementation? In your scenario the only way to have seed() is through a custom implementation, because any implementation may haves bugs or inconsistencies, that may be fixed at any time. And only your own implementation will stay stable and sane

And this is true for all backward-compatibility concerns: you’ll have a bug, or a poor syntax decision, or a crappy api, thats required to be there because of downstream concerns. If you keep breaking people’s programs to improve the language, people will either eventually stop updating, or stop using the language altogether, because it becomes a massive PITA to get any new features; do it enough and people will say fuck it, you cant be trusted to stay stable, I’ll write it myself. And eventually a library will come along that promises stability, and you’ll be back in the same boat.

Stability is a feature. And judging from how languages treat stability today, and how one of microsofts major reasons for success was its almost obscene adherence to backwards compatibility, it is an important feature.

The cost is of course that these problems persist, and eventually build up untill someone forks, or a major version increments.

But theres a reason that perfect is the enemy of good. Breaking programs arbitrarily to fix bugs/issues slaughters downstream productivity.


I think that is sometimes right and sometimes wrong. It’s not consistent enough to elevate to a principle.

Macs were incredible for backwards-compatibility back in the 80s and 90s, as good as PCs if not better. Games from 1985 would run happily in System 7 and MacOS 8. It didn’t help them win against the PC.

Since the return of Steve Jobs, Apple have become increasingly aggressive about killing off old “obsolete” hardware and software features. As a Mac or iOS developer it can be incredibly frustrating, constantly having to jump through new hoops just to be permitted to stay on the platform. But that doesn’t seem to have hurt Apple’s business success in the slightest.

To answer your initial question--

Whats the point of providing a seed() function if the algorithm can change from under your feet, for any given implementation?

I was imagining that the algorithm would be stable across runs but permitted to change across major library updates, say.

But I forgot there are two parts to it. One is seed(), the other is the no-args constructor that uses the system clock but no additional randomness. Can we at least agree that that one should be fixed? It’s hard to see how any users even could have a hard dependency on that specific implementation. Like, code that absolutely requires independent Random objects created in the same millisecond to have the same seed? Do you see a big risk in breaking clients like that, for the benefit of improving randomness for everybody else?


> Breaking programs arbitrarily to fix bugs/issues slaughters downstream productivity.

In this particular case, though, there is a chance that the bug is what is actually breaking the programs: As mentioned in the GitHub comments, it is possible to produce not-too-contrived simulations which fail completely under System.Random, and for which a fix would make the program less broken.

As long as Microsoft fails to document the brokenness on MSDN, there will be users assuming that the PRNG does what it's supposed to do, and who are at risk at drawing incorrect conclusions on statistics. What they do state in the documentation is the following [0]:

> The implementation of the random number generator in the Random class isn't guaranteed to remain the same across major versions of the .NET Framework. As a result, you shouldn't assume that the same seed will result in the same pseudo-random sequence in different versions of the .NET Framework.

[0]: https://docs.microsoft.com/en-us/dotnet/api/system.random?re...


> I'm sure they did their research and provided a good implementation

Apparently their implementation has several shortcomings.


Another use case would be you simulating a physical system with random noise. You get an error for a certain noise sequence. You might want to be able to get the same random sequence back.


But does it need to be reproducible for all time, on all platforms?


Basically, yes.

If they wanted to incorporate Math.NET Numerics (which is pretty good, I have used it before for a lot of stuff) into the standard library alongside the existing RNG architecture, I'd be down for that, but I expect the same seeds to play for me the same sequences everywhere because if they don't simulations, games, etc. will break.

Eventually you can deprecate the existing System.Random and turn its implementation into (using Math.NET Numerics terminology) a separate RandomSource, and eventually remove System.Random, but you can't change the contract of System.Random.


That actually makes sense and I have used PRNGs in that way myself. I should have thought of that.


also - An instantiated identical seed can return different sequences on different framework versions [0]. (Also discussed in the article)

Once upon a time that was inconvenient for me. I found Math.NET Numerics to be a good replacement.

Also check out threadlocal for thread-safety.

[0] https://docs.microsoft.com/en-us/dotnet/api/system.random

https://stackoverflow.com/questions/19270507/correct-way-to-...

https://numerics.mathdotnet.com/


I don't understand why the poor implementation of the default constructor (seeding the RNG with the time) cannot be changed. It is not reproduceable anyway, so why not initialize with entropy requested from the OS?


Who says it’s not reproducible if you’ve chosen to mess with time ;)


Why not do PRGN2 or PRNGEx, like they've done with other functions? Maybe they have and I'm just blowing hot air.


A useful snippet from the article:

"RNGCryptoServiceProvider is generally a safer choice when you need to generate random bytes. Creating an instance of this class is expensive, so it’s better to populate a 400-byte array than call the constructor 100 times to populate a 4-byte array."


Both options there feel like the wrong solution.

Why would you call the constructor 100 times to popuate 100 4-byte arrays?

It's a service, surely the best approach to call the constructor once, then populate 100 4-byte arrays by calling GetBytes 100 times from the same service?

It is explicit code without the downside of the expensive constructor.

edit for clarification: It's true that getting all the data at once will still be much faster because it'll save all the other overhead, but it's not the constructor at fault there.


The blog-post by fuglede with a detailed analysis of the implications of the RNG-bug is well worth a read: https://fuglede.dk/en/blog/bias-in-net-rng/


Isn't there already bias in that computation because the range for the random numbers includes more even than odd numbers since it's the interval [0, 2147483647)?


Author here; thanks for the interest!

First of all, you're completely right. I do actually mention this fact just before the start of the section "An experiment". Here, the argument is that the bias this odd/even mismatch introduces is orders of magnitudes smaller than what is introduced by the rounding errors; that is, a perfect theoretical RNG drawing from that range would not produce nearly as biased a result (and conversely, if you were to run the snippet in the blog post using `rng.Next(2, int.MaxValue)` or `rng.Next(0, int.MaxValue - 2)`, you wouldn't see the same bias, even though the ranges are still odd/even-biased to about the same extent).


I might be missing something, but the three predicated values are not in the "nextSeed" table? Whereas the original three are.

EDIT: I feel like I miss something in the explanation. Can anyone explain how the seed table is actually used?


The seed array is an internal state of the PRNG algorithm. It evolves over time and PRNG uses values from this array (plus some additional parameters, such as inext or inextp) to generate new "random" numbers. Thus, after seeding, there is no real randomness in the non-cryptographic PRNGs. To learn more, have a look at Marsenne Twister, which is also quite popular and has a nice description in Wikipedia [1].

[1] https://en.wikipedia.org/wiki/Mersenne_Twister


Thanks for the link! I kind of figured it out over time, but the explanation helps!

And thanks for the interesting article btw, just realised you're the author :-)


Thanks :)


Edit again: I think I figured it out(?). The previous values are 'recorded' in the seed array, but the last one (2012846163) is what you put the offset to for generating the next one. The next one is generated and in it's turn becomes the 'seed' etc. So the new values wouldn't be in the array _yet_ at the point we inspected. But all we have to do is 'replay' the RNG starting at the 'previous recorded value' to be predicting it correctly?


The algorithm used in the .NET Core is the same as in the .NET Framework [...] There is a difference, however, when we use the default constructor.

“Core” and “Framework” have different implementations of the same class? Who names these things?


.NET Standard is an interface of which .NET Core and .NET Framework are implementations. I agree that it's confusing terminology at first, but for daily drivers of .NET languages, it should be pretty transparent.


As a .NET dev it is transparent, but it makes it incredibly annoying to Google anything. Core and Standard are both words that might show up on any webpage and they both share the name .NET.

I've almost started wishing that Microsoft started assigning a reference guid to their products to ease searching about them since many of their products follow the same pattern of having similar names that are only differentiated by a common English word


It's somehow effectively doubled the number of search results for any one topic, only half of which will work. But you're right, for a daily user you learn quickly to treat as another criteria to sift through to get relevant results.


Thank you for sharing - I will have to put some work into integrating this with my software that is unfortunately based on a tripleton design pattern, but it's worth the effort.


Oh, tripleton might require a special PRNG: http://dilbert.com/strip/2001-10-25 :)




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: