It's always been pretty mind-blowing to me that you can store over 225 bits of information just in an ordering of one deck of cards. A standard way to store data in this way is a factoradic encoding like Lehmer coding:
I found new joy in shuffling a deck of cards, after learning that every (proper) shuffle that every human's ever done has returned a unique deck that nobody's ever held before.
edit: I just remembered a guy who made a javascript demo that encodes small strings into a card deck order: https://jerde.net/peter/cards/cards.html (explanation page linked)
It's crazy things have sensible probabilities at all, but surprisingly often the virtual numerator and denominator are of comparable size, despite the magnitudes involved.
I used this fact in an interview ages ago. The interviewer wanted a function, in Java, that shuffled a deck of cards such that every permutation was equally likely. I pointed out this was not possible using the standard library random functions since the seed is a long (akshually... it's 48 bits).
Since log(xy) = log(x) + log(y), you can simply calculate sum(log2(i) for i in range(1, 53)) :) might be a nicer formulation for when you don’t have arbitrary precision support.
Author draws a comparison right in the front page:
> My method stores data within a deck of cards, whilst his uses the deck as a key to encrypt data kept elsewhere. His method is more useful if you want to encrypt a lot of data, while mine could be more useful if you want to smuggle a shorter hidden message without any possibility of detection
I don't understand the claim that this is more efficient, it's trivial to find a method that's information-theoretically optimal: choose an arbitrary order for the cards, and encode log(52!)/log(2) bits as the lexicographic order of the permutation, doing a binary search each time.
Shouldn't be too hard to do even with pen and paper since the 2-adic eval of 52! is large.
OP's code still only has log2(52!) different binary strings that can be encoded, but they vary in length from 52 to 1378 bits (and are the leaves of a binary tree). This is handy for easily encoding sequences drawn from a non-uniform distribution, like strings of English letters, by giving each letter a fixed-length binary codeword. It's sort of flipped from the more common method of giving symbols a varying-length binary codeword (e.g. with Huffman coding) and encoding a fixed-length binary string in a deck of cards (or anything else).
For people like me who didn't know what a "2-adic eval" is, it's a fancy way of saying that 52![1] can be divided by two 49 times and still give an integer.
He isn't making the claim that it can encode more bits than the information theoretic maximum you suggest _on average_. On average his method is similar or worse than optimal. But some messages are effectively compressed in this encoding, and so more plaintext bits can be encoded. These better than average messages "borrow" bits from the worse than average messages, similar to how a compression algorithm works.
One can do this mentally easily enough. 52! = 525150....321, and there are 26 even numbers in this progression, so we have 26 powers of 2. Taking them out, there are 13 of those even factors divisible by 4, but we already took out the first 2 from each so we have 13 more 2's, giving 26 + 13 = 39. Now on to factors divisible by 8, they are half of those that are divisible by 4, so half of 13, giving 6 more 2's (52 is not divisible by 8 so we round down). Thus so far we have 39 + 6 = 45 two's in the factorization of 52!. On to numbers less than 52 that are divisible by 16, that's half those divisible by 8, so 3 more, getting us to 48. Finally there is there is the factor 32 = 2^5 of 52! giving us one more 2, hence 49. i.e. for p a prime, the largest k such that p^k divides n! is given by k = Floor(n/p) + Floor(n/p^2) + ... + Floor(n/p^t) where p^(t+1)>n
When you see something that doesn't look right it's good to engage and work things out, but it's also courteous to check that you haven't misunderstood. I see how you could arrive at the understanding you had, that "how many times you can divide by 2" is equivalent to base-2 logarithm. It's not the right interpretation however, and in context it's clear.
Could I recommend phrasing this kind of comment as a question in future? (Notwithstanding the lifehack of making a false statement in the internet being the shortest path to an answer.)
Fair, I should have rephrased the comment to more directly reference the thread-starter, which is encoding bits "using lexicographic order of the permutation, doing a binary search each time." It's not that your computation of 2-adic decomposition is wrong, it's the idea that using 2-adic decomposition produces the number that is too low.
Let me elaborate:
I am not 100% sure what user qsort meant by "binary search", but one of the simplest manual algorithms I can think of is to use input bits as decision points in binary-search-like input state split: you start with 52 cards, depending on first input bit you take top or bottom half of the set, then use 2nd input bit to select top or bottom of the subset, and so on, repeat until you get to a single card. Then place it in the output, remove from input stack, and repeat the procedure again. Note there is no math at all, and this would be pretty trivial to do with just pen & paper.
What would be the resulting # of bits encoded this way? With 54 cards, you'd need to consume 5 to 6 bits, depending on input data. Once you are down to 32 cards, you'd need 5 bits exactly, 31 cards will need 4-5 bits depending on the data, and so on... If I'd calculated this correctly, that's at least 208 bits in the worst case, way more that 51 bits mentioned above.
(Unless there is some other meaning to "51" I am missing? but all I see in the thread are conversations about bit efficiently...)
To be clear I agree with your interpretation about how much data you can store in the deck permutation and how to search it, my previous comment was only about p-adic valuations. I can't actually see how the 49 is relevant either.
The basic method would be to assign a number, 0 through 52!-1, to each permutation in lexicographic order. Because 52! is not a power of 2, if you want to encode binary bits, you can only use 2^N permutations, where that number is the largest power of 2 less than 52!. You can not losslessly encode more than N bits, that's a hard bound, they just won't fit.
If you wanted to turn this into an actual protocol, you would presumably flag some permutations as invalid and use the other ones. You would then encode one bit at a time doing a binary search of the set of valid permutations.
Because 52! has a large number of 2s in its factorization, for a careful choice of the valid permutations it should be practical (or at least not significantly more impractical than the OP's proposed method) to perform this by hand because you would be able to "eyeball" most splits of the binary search.
And another 51 bits of info for whether the picture of the puppy on the back of the card is right-side-up or upside-down! (Or, absent puppies, any other asymmetrical pattern or image that you chose for the message deck).
Because your recipient has to be able to determine the reference orientation of the deck, you get 51 bits of extra information from puppy orientation, and another 50 bits of extra information from face-up/face-down orientation.
To place the deck in correct orientation, in preparation for decoding, ensure that the top and bottom card are face up, and that the puppy on the back of the top card isn't upside-down.
In an asymmetrical design, the orientation of face carries no extra information, since orientation information is already carried by the orientation of the design on the back.
For decks with a symmetrical back design, the following cards have asymmetrical faces:
- The seven of diamonds.
- The ace, three, five, seven and nine of hearts, clubs and spades.
In my deck (a standard design), none of the face cards are asymmetrical.
So there are sixteen cards that carry orientation information, one of which must be used to define the reference orientation of the deck, yielding 15 additional bits of information.
This is cool (especially the encryption), but certain sequences of bits can’t be encoded. If you have n items left in the sequence then you cannot encode a run of n zeros. So it’s not a general encoding scheme.
https://en.wikipedia.org/wiki/Lehmer_code
The python library 'permutation' has some functionality around this that's fun to play with:
https://permutation.readthedocs.io/en/stable/#permutation.Pe...
I found new joy in shuffling a deck of cards, after learning that every (proper) shuffle that every human's ever done has returned a unique deck that nobody's ever held before.
edit: I just remembered a guy who made a javascript demo that encodes small strings into a card deck order: https://jerde.net/peter/cards/cards.html (explanation page linked)