> Marsaglia is most famous for the diehard suite of RNG tests, so he knows his stuff.
Fame is relative, for my part Marsaglia is most famous for The Ziggurat Method for Generating Random Variables and numerous other PRNG delights .. the testsuite comes way down the list.
Once upon a time I made a branchless generator of large blocks of random points on S2 (the surface of a unit sphere) for an astrophysics sim, it was a fun exercise
The Ziggurat algorithm is fantastic, and is way faster than the conventional methods (like Box-Muller) for generating normally-distributed variants from a uniform pseudorandom source. I implemented it a while back and it was one of those things that was a great pleasure to do.
I have also used the diehard suite a while back to convince myself that a random mixer thing I came up with was suitably random. Basically I wrote a thing to read from various sources of entropy, bleach them a bit and then mix them all together in a way that was extremely chaotic before feeding them into the kernel entropy pool. This was because it used to be a problem (back circa 1998 or so) that if you had headless servers serving a lot of ssl sessions they would run out of entropy and block from time to time. At the time, diehard had been implemented in fortran and then compiled to C using a gnu fortran transpiler so it was completely impossible to understand what each of the tests were actually doing.
On the topic of tfa, one of the most practical “RNG in your brain” techniques I heard of was suggested by the poker player Dan Harrington in his books, which is for sampling at random with some probability between some discrete choices. So say you have something you want to do at random 1/3rd of the time and the other 2/3rds of the time you’re going to do another thing[1], the way he says to do this is base it on your watch face. So you decide [2] that if the second hand of your watch is between 1 and 20 then you will do thing A and if it’s between 20 and 60 you’ll do thing B. Then you look and you do whatever thing. The good part about this is clearly even if people know you’re doing it they can’t exploit it because they don’t know the set of choices you’ve given yourself, what the boundaries are where you choose between strategies etc. It doesn’t work if you need to sample rapidly but in the context he gave it (tournament poker) you would have a random amount of time more than a minute between decisions anyway so its fine in that context.
[1] From a game theory point of view, it can be shown that in games with randomness and hidden information like poker, the optimal strategy is a Nash equilibrium in mixed strategies which means you need to sample a probability distribution to choose your actions in this way.
Isn't inverse transform sampling even faster than ziggurat? Most languages have an approximate erf^-1 implementation, everything is straight forward from there. I needed to generate random numbers from a truncated normal distribution and found inverse transform sampling easier to implement than trying to understand and adapt the ziggurat algorithm.
Quite possibly. Iirc the benefit of ziggurat is that you get a "real" Gaussian with infinite tails in both directions vs inverse transform truncates it but it has been a very long time.
I'm staying vague for the sake of anybody noodling away at this, it's fun but not deep.
It's a twofer, first there's a choice of algo to choose an unbiased unit vector (or point on sphere), then there's implementing it on a particular vector RISC architecture to avoid stutters from branch prediction gone awry - constant steady performance being desirable (in this and all the other parts of the larger program).
"Branchless" decisions are possible;
Compute D = decision = 0 or 1 as result (no branch, just arithmetic)
Compute R = result = DA + (!D)B = result A OR result B depending on D with no jumps or decision penalties.
It's an implementation hack and one dependant on architecture, some chips are fancy enough to pursue two branch in parallel and using the correct choice without have to suffer a time costly pipeline flush for path better not taken.
DSP architectures have long pipes and consistent timing, complex arithmetic statements (once setup) take a single clock cycle; interate result vector = modulo scalar vector1 * input vector1 + modulo scaler vector2 * input vector2 type FFT "atomics" are a single clock .. you can push inputs once per clock and extract ouputs once per clock (with a delay of time to fill pipe).
I want to take this opportunity to recommend Hillel's writing and depth of analysis (on top of the quirkiness of doing it in a weird lang). Compared to the equivalent answers on stackoverflow[4], there is just no comparison.
This is the most in-depth analysis of Marsaglia's mental PRNG I've seen.
A question, though. The author writes:
> So what other numbers work? ... if n is a good multiplier, then the period of the orbit starting from 1 should be 10n − 2.
> (2 3 6 11 15 18 23 27 38 39 42 50 51 62 66 71)
> The last interesting number is 18. It has a respectable period of 178 and has every possible digit transition. The downside is that you have to learn the 18 times-table. This isn’t too bad: I internalized it with maybe 10 minutes of practice.
Why is 18 the "last" interesting number? The largest number on the list, 71, has a period of 708 (10n − 2). Given that he is willing to memorize the first 10 multiples of a 2-digit number, why didn't he choose 71 over 18, which has a period of only 178?
The 2-digit transitions for 71 are pretty evenly distributed:
One thing this article didn't discuss is choosing a seed, I think humans are better at coming up with a single random number than with a sequence of random numbers, but I still like to incorporate some "randomness" from the environment.
You can e.g. add the count of something you don't know the count of until you count it to the seed. For example: floor tiles in a room, books with the color red in a book shelve, ... you can obviously also throw a coin or something like that, but that's a more involved action.
If you are really dedicated, then you could mix in some environmental randomness after ever n iterations.
And as long as you don't need numbers at a high rate, you can just use the seed as the random number. I rarely need more than one in sequence so I just use the second hand.
Come to think of it, I rarely need to choose between more than 10 alternatives so it's realistic to get at least two somewhat random alternatives per minute out of this idea.
If we're talking about something what can be done by a human, then Othello counters or coins are each 1 bit. If not, then a hash of a new photo is probably simplest
One I use for random gaming needs, but requires two brains, if I want to emulate a dN die (d6, d10, d20, whatever) is I think of a number between 1 and N, and the other person does the same. Then add them together and mod by N. Works well enough, although you can get into game theory and the results aren't very random because we tend to pick certain numbers.
Thinking about it, I wonder if both people picking multiple numbers and adding them all up mod N would improve things.
> Thinking about it, I wonder if both people picking multiple numbers and adding them all up mod N would improve things.
I bet that our bias against clustering and repetition when creating a pseudo-random list of numbers would mean that a list would be less random than an individual pick. As a simple example, if picking two numbers, people would probably be more likely to pick one even and one odd number than two even or odd numbers, since a mix of even and odd feels more authentically random, resulting in an excess of odd sums.
As an alternative, what if each person created a short list and then blindly chose one number from the other person's list?
blindly picking from a shortish list of values then doing the addition and mod thing only seems better than the naive case of blindly picking from a list of 1 to N when we are talking about large values of N
When I want to be random enough I pick a large number than mod by another random number that is bigger then the max option. (mod again in the max number if needed) This seems unpredictable enough I've been unable to see any patterns in it.
lets say you need to pick a number from 1 to 20, if you take 1235 and mod 22.
mod is easy to do in your head as you can just subtract in steps until you get there. so 50 times 22 is 1100, we are left with 135. 5 * 22 is 110 so we are left with 25. 25 - 22 is 3.
I picked 3 and I would be hard pressed to guess it would end up 3.
In your example, 1 and 2 are twice as likely as any other number, because of the 22 possible results of n%22, 3 through 20 all only have one result that yields them, but 1 gets generated by 1 and 21, and 2 gets generated by 2 and 22. You could adapt your algorithm by adding "if the result is greater than the range of values you're picking from (i.e. 21 or 22 in your case), try again with a new number."
I often use the Fibonacci numbers mod 10 when I need to do this in my head. It isn't great -- odd numbers are twice as frequent (because mod 2 they're [1,1,0]) and you get sequences like [7,0,7,7], but easy to keep track of and the cycle is 60 elements long.
The other one I use sometimes is the powers of 2 mod 100 and use the leading digit. This gives a cycle length of 20 with a flat distribution.
This is about an interesting way to produce a pseudo random sequence. From the title I was expecting something a little different, more akin to Diceware [1], but without dice. Thinking of random numbers, but debiasing with John von Neumann's method sprung to my mind.
Technically any RNG should be very able to produce a king string of the same digit, and artificially filtering that out would be a very bad sign indeed.
Here is my idea: pick a longish phrase of words, "at random". A song lyric, a book quote, etc. Count the number of words or the number of letters. Apply modulus if necessary. Repeat.
We might be bad at generating uniform, independent random numbers, but I submit that we’re still capable of generating some true randomness. Perhaps take the best of both worlds and mix in new entropy at each step? Something like, take the previous 2-digit value, add a "random" digit and generate the next output from that?
I had a math teacher who trolled the entire class.
We (the class) wanted to go outside instead of having a regular lecture. He made a deal that if we win his game we go outside, otherwise we had to be extra focused for the rest of the period.
Everyone in the class had to pick a random number between 1 and 100 and write it down. He told us that he would get one guess for each pupil (around 30) and if he managed to guess all numbers he would win. If any number was left we go outside.
Knowing how bad humans (and especially 12 year olds) are at picking random numbers, and knowing which numbers were mostly picked, he guessed all numbers with still a handful of guesses left.
If he had done it enough times with different kids in the past, he would have the perfect data to know the likelihood of certain numbers picked and him winning.
That from a teacher's perspective seems like a fascinating idea to try out.
I think it's much better because the teacher doesn't have to match guesses to students. For example, for each student the odds are 30/100, roughly one in three. And any duplicates can be matched by a single guess.
That formula is missing something because for over 100 students the teacher can't lose. (they get over 100 guesses and there are only 100 numbers possible)
More precisely, if there are N students, the probability is (min(N,100)/100)^N. This is 1 for N ≥ 100. And the probability at N=30 is indeed a tiny 2e-16, which shows that the children's "random" picks were far from uniformly random.
(Incidentally, even with N=99 the probability is 0.37 ≈ 1/e, and the probability is lowest at N=37 ≈ 100/e. This is not a coincidence.)
The teacher picks 30 numbers out of 100. Then each student (independently) picks one number. If random, that is 0.3 ^ 30. Obviously, the students are not picking random.
Mathematical randomness is not exactly a skill required for survival. In fact, our ability to produce random numbers is so bad that our brains consider true randomness to not be random, as this blog post from Spotify shows:
https://engineering.atspotify.com/2014/02/how-to-shuffle-son...
Doesn't Spotify basically need to pay nonrandom royalties on stuff inside any given playlist such that streaming some content costs them more than others, while the users payment is constant? If that is true it defies belief that they would avoid cost optimization on their side and give users a truly random shuffle. Especially if they can argue that users don't know randomness when they see it
I'm not sure that has anything to with the algorithm described in the blog post. I posted it because it's a popular, often-referenced example for this type of problem. The problem exists outside of Spotify, in any music player. And I believe that back when this was written, Spotify was paying the same amount of money for all streams, as it was way before the whole problem with people streaming white noise to launder money arose that they currently try to combat by essentially not paying artists with little exposure at all.
Spotify's shuffle is hilariously bad though (to the point that there exist outside tools to pre-shuffle your playlists). I see that article more as an "excuse" for their bad engineering.
I'm sure there are people who will get angry if they get two songs in a row from the same artist on a "random" playlist, but I'll much rather have that than the pseudo-random Spotify nonsense that will only play similar tracks next to each other or just ignore 90% of the playlist.
Reminds me of a rather lively IRC debate I had on a programming channel about how 1,2,3,4,5 was a perfectly fine random sequence of digits. Some just flat out refused to accept that, even though I repeatedly highlighted that it was just very unlikely.
At Pomona College, there’s a bit of folklore around a talk a math professor gave in which he “proved” that all numbers are equal to 47.¹ Consequently, students started noticing 47 everywhere since psychologically, it’s a popular number to choose as a “random” number between 1 and 100,² although some of the coincidences (the number of trees on the quad, the exit number on the 10 to get to Claremont) don’t fit into that schema.
⸻
1. Spoiler: there was a divide by zero in his proof.
2. Why? Because it’s near the middle of the interval and is prime, two things that will happen if you unthinkingly just grab for a number.
> The number 47 makes frequent recurrences in dialogues and on computer screens in Star Trek.
> The origin of the significance of 47 can be traced to Star Trek: The Next Generation and Star Trek: Voyager writer Joe Menosky, who attended Pomona College in California. There is a club at Pomona called The 47 Society, which claims that there exists a mathematical proof that all numbers are equal to 47.
> Joe Menosky first started including references to 47 in his scripts in the fourth season of TNG, and the in-joke quickly caught on among the rest of the staff. Since then, references to 47 have been included in many episodes and movies of all the modern series.”
Fame is relative, for my part Marsaglia is most famous for The Ziggurat Method for Generating Random Variables and numerous other PRNG delights .. the testsuite comes way down the list.
* https://www.jstatsoft.org/article/view/v005i08
* https://en.wikipedia.org/wiki/Ziggurat_algorithm
Once upon a time I made a branchless generator of large blocks of random points on S2 (the surface of a unit sphere) for an astrophysics sim, it was a fun exercise