Interesting, but there's one part I'm not sure I agree with.
> Pseudo-random number generators are useful for many purposes, but unbiased shuffling isn't one of them.
A properly seeded CSPRNG is perfectly fine at this. And if it's not, then all of our cryptography is pretty much screwed. This is why in modern kernels, /dev/random and /dev/urandom are the same (minus differences in behavior when the initialization isn't complete). As D.J. Bernstein put it, it's superstition to not trust CSPRNGs. https://www.mail-archive.com/cryptography@randombit.net/msg0... And if it's good enough for crypto, it's good enough for card shuffling.
There are 52! possible decks of cards— if you don’t have more than 230 bits of entropy in your prng per deck shuffled things go badly pretty quick. You also tend to share deck state so far with the adversary.
Shuffling cards is a surprisingly demanding prng application.
It’s almost certainly still good enough since it’s unlikely to be biased in ways that matter. Most hand shuffling methods have nasty biases as well.
For standard riffle shuffling, there is often a bias in pulling from the pile opposite from the last pulled card (so alternating pairs of left and right). The model saying “7 shuffles is sufficient” as people like to quote presumes you pull from a pile only proportional to the size of the pile (so you get many fewer alternating runs).
There were a lot of practical attacks on poker PRNGs without enough internal state, where one could form a database of possible hands based on the first few cards.
I think the bare minimum requirement for a good shuffler is "all hands are possible." And then getting some more bits beyond that is good to debias it.
That is such an interesting idea! I wonder if someone has systematically studied hand shuffling biases and what consequences it has (or should have) for brick and mortar casinos
Yes. Card counters (advantage players at blackjack) studied shuffle tracking. I haven't seen much good material published on it, but I played on a team with a math whiz (physicist) who ran the numbers. For the casino we were playing at the time, the last 20 or so cards would be distributed over the bottom half of a six-deck shoe. A strategic cut would bring those three decks to the front, and you could play with a slight advantage.
Suppose you're at a full table and 12 of the last 20 cards are aces or tens (rare but possible). These get shuffled and cut into the first three decks, giving you a true count of four (12/3) for the first half of the shoe, which is significant.
We never really put it into practice, though, since: 1) you have to track the count for the last 20 cards in addition to the regular count, 2) shuffles change, 3) dealers are inconsistent, 3) casinos use different shuffles, 4) the typical advantage is likely to be much smaller.
My knowledge on this is at least 20 years out of date, though, so who knows?
The bias I saw would likely be difficult to exploit without studying the mathematics a lot more. It seems to pass several statistical tests, but the problem I found was more about independence.
Let's call the two splits L for left and R for right and consider the resulting deck to be defined as a sequence of these letters indicating whether that card was taken from the left or right pile so in theory you can specify a shuffle uniquely like "LRRLLLLLLRRR...". This method of labeling shuffles has the advantage of being extremely easy to record and also makes the bias I found apparent where there are way too many pairs of "LR" and "RL" compared to "LL" and "RR" like the "7 shuffles is enough" model suggests there should be.
For some reason, when I did all this analysis last year, I was mostly thinking about how to ensure my MTG decks are sufficiently shuffled. I didn't really think about the implications for card counting. However, normal riffle shuffling seems to have less bias (but still present) than the mash shuffling I looked at and I think most casinos use machines for shuffling these days.
This is tangental, but I believe Persi Diaconis recommended like 13 riffle shuffles* if national security is at stake. Who knows, maybe one day nuclear launch codes will depend on the Solitaire cipher.
* Depending on how chunky your shuffle is. A pro card dealer can do almost a perfectly alternating riffle, which is obviously no good for national security.
Yes, and the trouble I've seen is that MTG players assume they aren't shuffling like a pro card dealer when their shuffles are actually even more uniform. IIRC Diaconis' original study even mentioned that the number of required shuffles balloons out of control with these very uniform shuffles.
Yes. In particular, the cards that are never dealt out don't matter at all. So for example, in Texas Hold 'em with 9 players, you "only" deal out 23 cards, so we have overcounted the number of possible decks by 29!
It's not even obvious to me that repeated shuffles of this form
• Given a deck of 52 cards (C1, C2, ... C52) pick some k near 26 and split the deck into two piles, (C1, C2, ..., Ck) and (Ck+1, ..., C52). Call these piles P1 and P2,
• Make a new empty pile, and then repeatedly take cards from the front of P1 or P2 and append those cards to the new pile. Cards from the same source pile should be in the same relative order in the new pile. Continue until all the cards are in the new pile,
• When taking cards from P1 or P2 to append to the new pile, take about the same number each time,
are capable of resulting in all possible 52! permutations. That kind of shuffle preserves a lot of order so I wondered if there might be some order that it cannot remove.
It turns out that you can in fact reach all permutations. I don't know of any elegant way to prove this though. I have an ugly way to do so.
Let R be a perfect out riffle shuffle, where the deck is split exactly in half (k = 26), and cards are merged by pulling one at a time from alternate piles starting with P1. In other words the deck after one shuffle is (C1, C26, C2, C27, ..., C26, C52).
Let S(n) be a shuffle just like R with one exception: when the cards that would end up at positions n and n+1 from the bottom of the shuffled deck are the next two cards to be pulled from P1 and P2 switch the order you pull them.
The resulting shuffle is the same as if you did R and then swapped the cards at positions n and n+1 from the bottom.
Let O(X) be the "order" of a shuffle X. The order of a shuffle is how many times you have to do apply the shuffle consecutively to get back to where you started. O(R) for example is 8. Start with a new deck and do 8 perfect out riffle shuffles and you will be back to the original order.
It turns out that if you take a deck and do O(S(n))-1 shuffles using S(n) and then do an R you get back to where you started except the cards n and n+1 from the bottom are swapped.
A swap of any two cards can be accomplished using a series of swaps of adjacent cards and so this gives us a method to swap any two cards in the deck using only R and S shuffles. Since any permutation can be generated using only pair swaps this means any of the 52! possible permutations can be reached using only R and S shuffles.
This isn't very efficient. Just swapping n and n+1 in the deck this way takes at least 16 shuffles, and many more for some values of n. Here's a table:
n # of shuffles
0, 50 72
1, 49 56
16, 17, 33, 34 40
22, 28 120
other 16
Remember, adjacent swaps are just one small step in getting to a given permutation. Actually reaching an arbitrary permutation using R and S shuffles this particular way would take tens of thousands or more shuffles. But it does show that all permutations are reachable using reasonably normal shuffles.
I think we really need to consider the efficiency of the shuffle as well when evaluating a shuffling technique. If it takes hundreds (much less tens of thousands) of lengthy steps to shuffle the deck, then no one is going to do it.
Additionally, your analysis only considers when the permutation is reachable. You would also want to look at how likely each permutation is.
Counterpoint, even if you only have 256 bits of entropy (which is common for many widely used PRNGs), you can prove that there's bias due to the pigeon hole principal, but finding a shuffle that is under-represented is computationally impossible.
Dealing 2^200 hands, assuming that one hand takes 1ns, is more than ... pretty much any conceivable physical quantity one can think of. (Technically, the heat death of the universe takes longer than that but all card decks will be long gone, having been consumed by black holes.)
This is a long-winded way of saying that cryptography with 2^256 security margin ought to be sufficient for all human-scale applications.
A Mersenne twister or another PRNG with a long sequence length is fine for deck shuffling. A CSPRNG is more comfortable.
As to the idea that it's superstition not to trust CSPRNGs: sometimes, you want to eliminate the variable, and sometimes your CSPRNG is actually worth attacking. A lot of CSPRNGs also involve secret state, so if you are worried that this state might get exfiltrated, some paranoia is ok.
The post here recommends using Intel's RDSEED, which is ironically trusted far less than /dev/urandom by most people who have a secret to keep or a process to protect.
> A Mersenne twister ... is fine for deck shuffling
It isn't. Please don't do that! Mersenne Twister is fully reversible given its 624 consecutive outputs (see, for instance, this pretty good write-up: https://blog.ollien.com/posts/reverse-mersenne-twister). In other words, someone who can observe 624 outputs can reconstruct its entire output sequence forward and backward.
You're right that you wouldn't want to use any non-cryptographic PRNG for gambling games of any significant stake. Many low-stakes games will use them, relying on having many different players connected to one result source to protect themselves.
However, "624 consecutive outputs" is not the way to think about it, because those outputs are truncated to narrow integers. A single shuffled deck leaks about 230 bits of information, and backing out the state of a Mersenne twister needs several thousand bits of information.
This is very interesting, thank you!
When I was implementing random shuffling I used xoroshiro128+ instead of Mersenne twister and felt pretty good about it but only because it was used for randomness needed for Monte Carlo simulations, not for gambling games where there are potential adversaries looking to break it.
Are you aware of how that one (xoroshiro128+) or other pseudo random generators fare against adversaries?
Don't use a non-CSPRNG if you worry about adversaries. CSPRNGs are only ca. 10 times larger and roughly as fast if you can use SIMD.
According to [1], you only need 4 outputs to predict xoshiro. Predicting PCG is a bit more difficult, but still on the level of "exercise for cryptography students" [2].
What secret state do you mean? All PRNGs have state right? Which you need to keep secret if you don't want the sequence to be predictable. Is there something more I'm missing?
CSPRNGs keep the state secret so an observer can't tell the next output from the previous ones. Most other PRNGs have a sequence that can be backed out from observing their outputs.
> The post here recommends using Intel's RDSEED, which is ironically trusted far less than /dev/urandom by most people who have a secret to keep or a process to protect.
What? If security is a special consideration for some form of data, why would someone choose a non-physical noise source over a hardware-based noise source?
Trust. It's suspected that RDSEED is essentially backdoored based on the published structure of the RNG. TRNGs are allowed to use a cryptographic conditioning component on top of a non-full-entropy physical stream, and that is what Intel does, using a long-lived secret key. The output you see is actually the result of cryptographic postprocessing, which may be partially transparent to the entity that has the secret key.
Intel has no published third-party audits of their silicon, and has a very close relationship with some three-letter agencies. Hence the concern.
By contrast, urandom is completely open, and is reseeded relatively frequently with entropy. If both are effectively acting as a CSPRNG, the one with public attention and no secret key that can be handed to a third party is better.
I recently stumbled across some other ways that random number generation can go wrong.
Suppose you want reproducible results from a seed, along with parallelism. Algorithmic random number generators are usually mutable and generate a sequence of results, limiting parallelism. Rather than a sequence, you want something tree-shaped where you can create an independent random stream for each child task. In higher-level API's, a jump or split operator can be useful.
Counter-based random number generators [1] seem pretty useful in that context. An immutable random number generator works like a hash algorithm that maps each input to an output that's difficult to predict. The problem with this is being careful to avoid using the same input twice. You can think of it as allocating random numbers from a very large address space in a reproducible way. How do you partition the address space, predictably, so that every address is used at most once, and nobody runs out?
Giving each child a unique ID and generating a stream from that is one way. If the tree is deeper, you'll want a unique seed for each path.
When a mutable random number generator is copied to a child task (or maybe just to an iterator), the same random numbers might be generated in two places. Avoiding this is the sort of thing that Rust's borrow checker can prevent - borrowing is okay, but you want to prevent multiple concurrent ownership.
> When a mutable random number generator is copied to a child task
When you make a child task, the parent adds one to the internal state of the random number generator and advances the random sequence by one. The child adds two to the internal state and advances the random sequence by one.
Now you can make any arbitrary tree of tasks, and those tasks each get their own random stream, there is no shared-between-task state or locking needed, and the whole thing is reproducible, even if parent and child tasks are scheduled arbitrarily. fork-ing and use of random numbers can be arbitrarily intermingled.
Makes sense. Do you have any particular implementations in mind that work this way?
I think this depends on the random number generator. For a counter-based RNG, the "internal state" is just a counter, so adding 1 or 2 would result in reusing random numbers in different streams.
The complexity is because it is not a cryptographic pseudorandom generator.
If you have something cryptographic, than any basic operation on the internal state (ie. Increment by one) is enough to totally change the output stream in a way and outsider cannot predict.
You have to think about how your operations interact with other operations you might also do on the random state. Like what if you add one to the state, but someone else subtracts one from the state back to where you started, ooops! Or what if you create a new rng2 by adding one to the state of rng1. Then later you create rng3 by adding one to the state of rng1. Now rng2==rng3. Oops!
That's why I like the orignal idea you described of adding 1 and 2 and advancing each rng. This ensures that both streams are irreversibly altered which should make it very hard to accidentally misuse.
This seems like an area where a mutable API beats an immutable API. It's very easy to accidentally reuse old states when doing functional programming. On the other hand, if you need a way to serialize the RNG, (to send it to another task, perhaps) then the risk of reusing states comes back again.
The counter-based RNG I was looking at is an immutable API. It makes it easy to initialize generators in separate tasks by feeding each one a unique ID, but figuring out how to do a split operation is harder. Conceptually, the way to go is to multiply the key by 2 and add either 0 or 1 for each branch, to give them unique IDs. (This is called a pedigree.) However, for unbalanced trees, you will run out of bits. Figuring out how to do deal with that seems tricky?
The LXM generator paper (non-crypto) looks like good reading:
They initialize the new child with the next random values of the parent. There is an extra parameter that they use to add more input bits, to reduce the probability of overlap.
Another fun one for reproducibility -- rounding. It can happen when compiled for multiple architectures, compiling a second time with different optimizations, .... Things like normal random numbers are generated through _some_ kind of rejection sampling, and crossing that threshold from different rounding behavior will give you a wildly different result. Even if all your application code is tolerant of small perturbations, the result isn't reasonably reproducible.
Another way things go wrong with the prng tree idea (assuming the implementation is correct and you actually have independent streams) is race conditions in other parts of your code. E.g., say you have two workers reading a queue and using fancy tree-based randomness. You don't have "race" conditions (data races) in safe Rust without a compiler bug, but the language doesn't define which worker will read which item from the queue. You wind up using a different part of the stream for different inputs.
With the counter-based RNG approach, apparently one idea is that you uniquely identify child nodes in your tree somehow. That is, each work unit is a node with a unique ID that's deterministically assigned and has nothing to do with random number generation at all. Then you use that to pick the random stream.
An API that uses split or jump seems a little easier to screw up.
Here is a trivial shuffle algorithm that is completely unbiased and only requires an unbiased coin (or random number generator giving bits):
1. Randomly assign each element to list A or list B.
2. Recursively shuffle lists A and B.
3. Concatenate lists A and B.
To prove it's correct, note that assigning a random real number to each element and sorting based on that number is an unbiased shuffle. Then we note the above does in fact do that by considering the fractional base-2 expansion of the random numbers, and noting the above is in fact a base-2 radix sort of these numbers. We can sort these random real numbers even though they have an infinite amount of random bits, as we can stop expanding the digits when the prefix of digits is unique (which corresponds to the event that a list is down to a single element).
I call the above algorithm RadixShuffle. You can do it in base-2, but also in other bases. For base-2 you can make it in-place similar to how the partition for Quicksort is implemented in-place, for other bases you either have to do it out-of-place or in two passes (the first pass only counting how many elements go in each bucket to compute offsets).
The above can be combined with a fallback algorithm for small N such as Fisher-Yates. I believe even though the above is N log N it can be faster than Fisher-Yates for larger N because it is exceptionally cache-efficient as well as RNG-efficient whereas Fisher-Yates requires a call to the RNG and invokes an expected cache miss for each element.
---
Another fun fact: you can turn any biased memoryless coin into an unbiased one with a simple trick. Throw the coin twice, if it gives HH or TT you throw away the toss, if it's HT or TH you use the first toss as your unbiased coin.
This works because if p is the probability that heads comes up we have:
HH: p^2
HT: p(1-p)
TH: (1-p)p
TT: (1-p)^2
Naturally, p(1-p) and (1-p)p are equiprobable, thus if we reject the other outcomes we have distilled an unbiased coin out of our biased coin.
I was able to make a variant of the higher-base version that runs in a single pass, by stopping when one partition fills up and using a different method for the remaining (asymptotically few) elements. I described the idea, which is based on another effort called MergeShuffle, here: https://mlochbaum.github.io/BQN/implementation/primitive/ran...
Performance-oriented library with no benchmarking instructions, fun. I get 850ms to shuffle 32-bit integers up to 1e8 with this library versus 400ms in BQN (•rand.Deal•_timed 1e8). However, BQN also has a large advantage at smaller sizes, such as 425us versus 120us for 1e5 elements, so a lot of this may be the random number generator. I haven't figured out how to get PCG working yet. BQN uses wyrand which is faster but I now know has quality issues (can't even generate every possible output; I need to update the page I linked...).
It's substantially the same algorithm so any differences would just be down to implementation details. Other than multi-threading which BQN doesn't do. The usage is also a little different as BQN generates shuffled integers directly; generating the integers is 100ms of the 850ms for rip_shuffle but I'm not sure whether it makes sense to subtract that or not.
Not quite the same, rip_shuffle does have some contortions to be able to run in-place (I'm still scratching my head about who's running these sorts of high-performance algorithms with no auxiliary memory available), so if those cost anything they could contribute to the difference.
I implemented this for GPUs back in college and you’re right, it’s really good for parallelism. This shuffle is also great if you want to do a completely unbiased shuffle with real cards. Fischer-Yates is impractical to do with a real deck of cards.
This is a nice, practical algorithm. Beware though that in theory it can take an unbounded amount of time, since in order for it to make progress it must generate at least one heads and one tails on the input, and although runs of all-heads or all-tails become exponentially unlikely as input size increases (or as the number of passes performed on a fixed-size input increases), there's still no guarantee that a mixed run will happen before any fixed amount of computation has been done.
If it's implemented to run in the typical "depth-first sequential" manner of a recursive algorithm, in which each problem generates its own subproblems, solves them all, and then immediately continues execution until it is itself solved, then it is guaranteed to eventually terminate, since the only way to stall progress forever in this situation would be an infinite, uninterrupted sequence of one outcome (e.g., heads), and that would contradict the assumption that P(heads) = 0.5. OTOH, if a "breadth-first" computation was used, in which all subproblems at a given recursion depth are solved in sequence before any higher-level subproblems, the algorithm could run forever: The top-level problem could produce a mixed run, resulting in two equal-sized subproblems, each of which never makes any progress due to one subproblem always getting all-heads, the other always all-tails.
Any unbiased algorithm that uses an unbiased coin to shuffle n > 2 elements must be potentially unbounded.
Proof: there are n! possible permutations. If the algorithm always finishes within k coin tosses then there are 2^k possible outcomes. For n > 2 we have that n! does not divide 2^k, so not all outcomes can be equiprobable.
This is a proof by contradiction: it shows that any bounded algorithm doesn't have equally probable outputs, i.e., that it is necessarily biased, contradicting our assumption that it is unbiased. Therefore, an unbiased, bounded algorithm cannot exist (for n > 2): any algorithm is necessarily either biased or unbounded.
> Here is a trivial shuffle algorithm that is completely unbiased and only requires an unbiased coin (or random number generator giving bits):
How does this compare to a regular Knuth shuffle where, since you only have a 1-bit generator, when you need (for example) an integer between 0 and 23, you treat the bits you generate as the binary expansion of a real number in [0, 1), and generate them until floor(24*n) is unambiguous?
(Obviously, the random number generation is more conceptually complex, but aside from that.)
You need at least nlog(n) bits for the same reason as regular shuffling. There are n! permutations and it therefore takes log(n!) bits to uniquely describe a shuffle. nlog(n) is the same as log(n^n) which is obviously greater than log(n!) (this is Stirling’s approximation).
OK, but I was kind of hoping for a treatment that involved the workings of one or both algorithms. They are subject to the same theoretical lower bound, yes. Do they have any advantages or disadvantages relative to each other? Does one of them do a much better job reaching the theoretical bound?
It kind of makes me mad that the very simple RS algorithm (divide and conquer) isn't more widely known given that it's so easy to implement and that it's actually parallelizable unlike Fisher-Yates. I think it's also better pedagogically since you can easily do it with a deck of cards and a coin or some dice while Fisher-Yates is pretty unwieldy for anything beyond very small lists.
That reminds me of a method for rolling a dN if all you have is a dM.
Divide the interval [0, 1) into M equal pieces, [0, 1/N), [1/N, 2/N), ..., [(N-1)/N, 1).
Use the dM to generate a real number in [0, 1). If that real number is in the interval [(k-1)/N, k/N), your simulated dN roll is k.
To generate a real number in [0, 1) with the dM you simply roll the dM an infinite number of times, and take each roll as a successive base M digit in a base M fraction 0.abcd... where a, b, c, d, etc are base M digits.
You don't actually have to roll an infinite number of times. You can stop when you have enough digits to tell which interval the number is in. For example if you were trying to roll a d12 using a d10, and the first two d10 rolls are 0 and 7 you can stop, because all base 10 numbers that start with 0.07 are in [0, 1/12).
If the first two rolls had been 0 and 8 you would have to keep rolling, because although 0.08 is in [0, 1/12) subsequent rolls could change that because 1/12 = 0.083333(3).
For any particular N and M this can be turned into a state graph where you start at the start node and then select links to follow with your dM until you reach a terminal node labeled with the simulated dN value.
Here are such graphs for rolling a d7 with a d2 [1], a d7 with a d6 [2], and a d5 with a d8 [3].
Looking into it a bit, it appears that this particular fun fact was first thought of (or at least put to paper) by von Neumann in "Various techniques used in connection with random digits" in 1951. Yuval Peres proves in "Iterating von Neumann's procedure for extracting random bits" that if N goes to infinity it approaches the entropy limit.
So for particular small choices of N there might be more clever schemes, but for large N you can't do a whole lot better.
Nice shuffle, and explanation. I like how the "no items assign to one of the lists" "wasted" recursions throws out coin flips (which is needed to get the uniform permutation distribution).
I'm sure I'll get roasted for this, but getting the right answer will scratch a years-old itch. Why aren't most random number generators seeded using the system clock in addition to the existing algos?
seeding PRNGs from the system clock is quite common. Using the system time as a source of entropy for generating secure random numbers is problematic because it's not random. You want sources that are "noisy" to contribute.
Let's say we're generating some private key, and using the time as a source, and then writing the cert to a file, you'd probably be able to get a pretty good idea from the timestamp of that file what the system time was when the private key was generated.
* original Fisher-Yates is slow; Knuth's variation is what makes it fast
* Fisher-Yates relies on "generate integer in range", which is often implemented badly (with bias, or with unnecessary slowness). See the PCG blog for the best-known solution.
What's unnecessary slowness in this situation? Are we worried about a single division per random number?
I wish this page showed some separate charts for fast and slow RNGs and some better slow options. If you actually care about proper shuffling and the minuscule bias you get from 2^64 % 52 then you should be using a CSPRNG, not a cheap method.
You can generate so many megabytes of random per second with a CSPRNG though. I really don't see the use case where you care about bias enough to use very careful range functions but don't want to bother with a better generator.
In Monte Carlo simulations you usually don't care about predictability (so using a non-CSPRNG is fine), but statistical bias may compromise your results.
The C++ distributions are problematic, because the standard does not specify the algorithms. If you specify the generator and the seed, you can still get different random numbers on different platforms. Which is why applications that need reproducible results must use another library or implement their own distributions.
There are N! permutations in a shuffle (no duplicates) and there's an algorithm where you pick one random number between 1->N! and then bring up that permutation (though I don't know how to do it better than N log N with an order-statistic tree). I like this because it requires exactly one random number.
A trivial solution in functional programming (for those of us who find this swap stuff really unreadable and error-prone) would be something like:
Of course this is N-Log-N, but personally I think it's easy to forget how small logN grows. Like log10(number of atoms in universe) = 82, so if your dataset is smaller than the number of atoms in the universe you could think of it as less than the constant 82.
Your algorithm will never shuffle a deck of cards because randomly selecting a 64 digit means your N(logN) algorithm will need to generate on average the 2^63th or larger permutation. It’ll be several millennia for every deck shuffle. If you don’t believe me try to write it.
Also, I’m not really sure your nlog(n) algorithm claim is accurate - python’s itertools says that the time complexity to generate the Nth permutation is still N!. So for a 52 deck card on average you’re going to have to generate 2^88 permutations.
By comparison, shuffling is O(n) where your n is the number of items.
I don't know what you're talking about, and that makes two of us.
Writing the Nth-permutation is something I've done before as a leetcode challenge, here's example code [1]
Maybe python implements it poorly, but it can definitely be done in n-log(n), and also try researching a little more before you pick a disagreement. I'm sure a simple google search could have saved you the trouble.
You can calculate the probabilities and expected number of collisions analytically, but evaluating the expressions numerically is tricky for large numbers. I can recommend https://herbie.uwplse.org/ to rewrite the expressions into a form that can be evaluated.
> > In 2020, the Linux kernel version 5.6 /dev/random only blocks when the CPRNG hasn't initialized. Once initialized, /dev/random and /dev/urandom behave the same. [17]
> Pseudo-random number generators are useful for many purposes, but unbiased shuffling isn't one of them.
A properly seeded CSPRNG is perfectly fine at this. And if it's not, then all of our cryptography is pretty much screwed. This is why in modern kernels, /dev/random and /dev/urandom are the same (minus differences in behavior when the initialization isn't complete). As D.J. Bernstein put it, it's superstition to not trust CSPRNGs. https://www.mail-archive.com/cryptography@randombit.net/msg0... And if it's good enough for crypto, it's good enough for card shuffling.
FYI I am not a cryptographer