
Cheating on a string theory exam - cperciva
http://www.daemonology.net/blog/2017-02-21-cheating-on-a-string-theory-exam.html
======
asrp
Pick 5400 23-bit vectors (the answers you will write) and your friend picks
the one that minimizes its Hamming distance to the true answer.

    
    
        5400*23 + 5400*23*22/2 < 2**23
    

so answering 21 questions with certainty is impossible. (Pick the true answers
so as to maximize the Hamming distance between it and whatever answer you
decide to write, regardless of the scheme used.)

And there is a size 4096 < 5400 covering code [1] so answering 20 question is
possible (by picking the 23-bit vectors in that code).

[1] see n = 23, R = 3 here
[http://www.sztaki.hu/~keri/codes/2_tables.pdf](http://www.sztaki.hu/~keri/codes/2_tables.pdf)

Edit: formatting

~~~
curiousgal
If someone could ELI5 this, that'd be great!

~~~
philbarr
My take on this:

The naive approach (which is what I did) is to assign each second to a binary
number and work out how high the number goes - this works out at 2^12.

The clever way other people did - and which I've been thinking about all
morning, is:

The complete answer to the test is a set of true/false answers that we map to
23-bit vectors.

We know that there 2^23 vectors, which is way more than the 90*60=5400 ways we
have to identify each of them.

Pretty sure we're all clear on that.

The hard part comes when we start saying that we can let some of the bits be
wrong. This lets us say that some of the 2^23 23-bit vectors are pretty much
the same as some others. I think of it like grouping vectors together into
groups that are all related to each other by flipping up to 3 bits.

So let's take an example with 4-bit vectors instead because I find it more
intuitive. This is a test with 4 true/false answers. Our naive method would
make us think we need 2^4=16 seconds to correctly identify true/false answers.
But if we look at what happens when we say one of those answers can be wrong,
it gets interesting. Let's look at the first 4-bit vector:

    
    
      [0 0 0 0]
    

Now we can let any bit be wrong:

    
    
      [0 0 0 0]
      [0 0 0 1]
      [0 0 1 0]
      [0 1 0 0]
      [1 0 0 0]
    

...and we have FIVE possible vectors identified by the SAME second. Just by
looking at the first vector we've reduced our space from 2^16 to 2^16-5. And
remember we can group all of our 4-bit vectors together like this.

If you let 2 answers be wrong you have larger, fewer groups.

If you let 3 answers be wrong you only need one bit of information! Your
cheating friend gets up on second 1 or second 2.

This is my intuitive way of looking at it anyway. For the actual maths and
such look at other answers...

~~~
philbarr
This line:

Just by looking at the first vector we've reduced our space from 2^16 to
2^16-5.

Should be:

Just by looking at the first vector we've reduced our space from 2^4 to 2^4-5.

Can't edit it now - sorry if that confused anyone.

------
steventhedev
A simple approach that gives 13 correct answers minimum is to have him walk
out after X seconds, where X is the number of true answers. Answer true for
everything if X is greater than 12. Otherwise answer false for everything.

This uses only 6 bits of information, and you should be able to pack some
extra info rather easily into the remaining 7.

Thinking about alternative approaches leads me to think that the only cases
that you should focus on to solve this are those where the number of
true/false is almost equal. Figure out how to pack them together, layer a few
solutions together, and you should be able to answer the exam to within some N
guaranteed.

EDIT: 12 correct answers. not 13. And you can easily pack this into 1 bit.
Leaving my wrong answer for posterity.

~~~
SomeStupidPoint
Doesn't your method only guarantee 12 answers -- half of 23 (rounded up)?

That can be reduced to a single bit:

If you should mark 'true' on all the answers, he leaves during the first half
of the exam; if you should mark 'false' on all the answers, he leaves during
the second part of the exam. This gets you at least 12 correct answers in 1
bit.

This leaves you with 11 bits to modify the pattern with to improve your score.

~~~
szemet
Using this on only 3 questions, you get 2/3 success rate:

000,001,010,100,000 -> buddy signals 0, I write 000

111,110,101,011,111 -> buddy signals 1, I write 111

with 7 bit you definitely win 14 questions from 21

~~~
SomeStupidPoint
You can get at least 17 by using the 1 bit to express preference for just the
first 12 questions, and the remaining 11 bits the answers to the last 11
questions exactly.

I've been trying to think if there's a clever scheme to spread the knowledge
around to beat 17 with statistical tricks.

(As a fun aside: 17/23 is about 73%, which typically gets you a C -- so you'd
pass the exam.)

~~~
szemet
6 bit to win 12 question from the first 18 (group by 3) 5 bit for the
remaining 5

17 answer and one bit remained

~~~
SomeStupidPoint
Which answer do you use the extra bit on that's guaranteed to give you one
more correct?

For any you choose, there are answer patterns where you already counted that
answer being correct in your guaranteed number as part of the 2/3rds correct.

So I don't think you can (trivially) use that bit to guarantee another correct
answer.

------
skizm
More practically (and specifically ruled out in the question): you can see
when your friend "starts" the exam (picks up his pencil) and when he ends
(gets up from his seat). Also it probably takes him at least 10 minutes to
read the whole test and figure out the answers.

This makes it easy mode: friend takes 10 minutes to read test and get answers
in his head, start the timer (0 seconds) at the 10 minute mark. Friend
"starts" sometime in the next 34 minutes. 34 minutes is 2048 seconds or 11
bits of info. Then "finishes" between 44 minutes and 78 minutes: another 34
minute window and another 11 bits of info. 22 of 23 questions answered. Friend
coughs on way out if last answer is true. Boom, all questions answered.

Yes, OP said we can't do this, but it is more practical than memorizing 5400
bit vectors, doesn't assume the friend can instantly finish the test, gives
the friend some lead time to actually do some binary conversions, and ensure
the correct times down to the second.

------
frivoal
Unless I am misunderstanding something, all the answers here seem to assume
that the friend is either able finish instantly, or to leave before he is
done. That isn't stated in the question, so I don't think it's valid to assume
you can use all of 90 minutes * 60 seconds to transmit information.

"a fraction of the time" isn't a very clear problem statement, as
9999999/1000000 is a fraction, but that's certainly not what is meant by the
common sense understanding of this phrase.

I am wondering what we can do without clarifying the statement further.

We can still encode 90*60 23-bit vectors and have the friend pick the one, in
the remaining time after he finishes, that minimizes the distance to the true
answer, but without knowing how early he finishes. That is probably still the
way you can get the best score on average, but I don't think it does better
for the guaranteed number of correct answers than just transmitting 1 bit of
information (which guarantees at least 12 correct answers).

Would it help to use odd vs even seconds to indicate whether the fiend
finished early enough to use the clock to indicate an arbitrary time vs the
time at which he should have left was already gone when he finished his exam?
I can't think of how to use this (or a similar) signaling to improve the worse
case odds.

On the other hand, while I cannot think of a way to improve your worse case
situation, you should be able to improve average score by sorting your 5400
23-bit vectors so that the distances between the last ones are as large as
possible.

------
dbfclark
Presumably the extra credit is that the binary Golay code is closely related
to the Leech Lattice, and thus to the entire moonshine situation, which gets
you to string theory. See:

[https://en.wikipedia.org/wiki/Leech_lattice](https://en.wikipedia.org/wiki/Leech_lattice)
[https://en.wikipedia.org/wiki/Monstrous_moonshine](https://en.wikipedia.org/wiki/Monstrous_moonshine)
[http://motls.blogspot.com/2015/03/umbral-moonshine-and-
golay...](http://motls.blogspot.com/2015/03/umbral-moonshine-and-golay-
code.html)

~~~
cperciva
Yes. In particular, the Leech Lattice gives rise to Bosonic String Theory.

~~~
dbfclark
Hey, I'm a mathematician, not a physicist. We tend only to get as far as "oh
yeah, here's all this pretty math we care about. I hear it has some
applications to physics..."

~~~
cperciva
Yeah, that's pretty much where I'm at too. :-)

I originally wrote the puzzle without reference to string theory, then added
the "extra credit" part mainly as a hint.

~~~
dbfclark
Sure. Moonshine I know a lot about, relative to physics at least. Given that I
now work somewhere that specializes in term embeddings, it kind of feels like
I've spent half my life injecting stuff into vector spaces...

------
unabridged
I'm thinking create set of 90 * 60 = 5400 different True False patterns. Pick
the one that gets the most answers correct. I'm assuming there is some
algorithm with a name I don't know that can select the optimal set of
patterns. Then the minimum they get correct in any situation probably has
something to do with shannon entropy ...

------
rinovan
I would say that the lower bound is 18.

Because we want to represent a 23 bits string by 12 bits codewords, the max
distance between 2 codewords is 11 bits.

So the protocol would be to agree on a dictionary of 12 bits codewords and
send the closest codeword to the actual answer.

The max hamming distance between any answers and a codeword would be
floor(11/2) = 5. Which corresponds to 18 correctly answered questions.

~~~
blauditore
I'm not sure I understand the max hamming distance part.

From what I understand, the hamming distance would only be relevant for
expectations with a certain probability, but worst case will still be 12.

------
allenz
Here's my writeup, ___spoilers:_ __

We 're looking for a lossy compression function that maps strings of length 23
(answers) to strings of length 12 (bits of information in 5400 min). We seek
to minimize the number of errors, aka the Hamming distance. Under the Hamming
distance metric, the space of strings forms a hypercube where two strings are
adjacent if they are related by a bit flip (one error).[1]

This gives an elegant visual interpretation: we're searching for a way to
partition the hypercube of 23-bit strings into 2^12 balls[2] containing 2^11
elements each.

What is the minimum radius of the balls? There is 1 element in the center of
the ball, 23C1=23 elements at distance 1, 23C2=253 elements at distance 2, and
23C3=1771 at distance 3, which sums to precisely 2^11. This combinatorial
coincidence makes it possible to build a beautifully symmetric 23->12
compression function!

In order to avoid overlap, the centers of any two balls must be 7 bits apart.
One idea to evenly space the centers: write all possible first 12 bits (1 bit
distance), then distribute the remaining bits 6 bits apart.

We can also see that inverting the compression function gives an error-
correcting code going 12->23, mapping 12-bit inputs to optimally-spaced 23-bit
strings. There is a duality between lossy compression (aka rate-distortion
theory[3]) and error correction.

As it turns out, the optimal 12->23 error-correcting code is the perfect
binary Golay code discovered in 1949.[4] The inverse of the Golay code is the
compression function we're looking for.

A bit of historical trivia: Golay's 1949 paper was reviewed by Berlekamp, who
in 1974 called it the "best single published page" in coding theory. At the
time, Berlekamp was working as a code breaker with Jim Simons at the Institute
for Defense Analyses. Later, Berlekamp would help Simons found Renaissance
Technologies, which remains today the most successful quant hedge fund in
history.[6] Renaissance was famous, of course, for hiring many of the best
minds in string theory.

[1]
[https://en.wikipedia.org/wiki/Hamming_distance](https://en.wikipedia.org/wiki/Hamming_distance)

[2]
[https://en.wikipedia.org/wiki/Ball_(mathematics)](https://en.wikipedia.org/wiki/Ball_\(mathematics\))

[3]
[https://en.wikipedia.org/wiki/Rate%E2%80%93distortion_theory](https://en.wikipedia.org/wiki/Rate%E2%80%93distortion_theory)

[4]
[https://en.wikipedia.org/wiki/Binary_Golay_code](https://en.wikipedia.org/wiki/Binary_Golay_code)

[5]
[https://en.wikipedia.org/wiki/Hamming_bound](https://en.wikipedia.org/wiki/Hamming_bound)

[6]
[https://en.wikipedia.org/wiki/More_Money_Than_God](https://en.wikipedia.org/wiki/More_Money_Than_God)

~~~
kowdermeister
Thanks for the clearest (and still mysterious) explanation so far in this
thread.

------
botexpert
Similar to the lottery problem and other covering problems.

Let's say lottery has N numbers. Tickets contain K numbers. What is the least
amount of tickets M that you need to buy so you guarantee when the winning
ticket is pulled that you matched at least R numbers on that winning ticket
with your pool of tickets? LP(N, K, R) = M.

LP(N, K, K) = (N choose K), is winning the lottery and for that to be sure we
need to buy all tickets.

LP(N, K, 2) is solved, I believe, for many values (theoretically).

LP(N, K, 3) is already a problem.

------
xfs
Solvable by basic information analysis

> answer at least N out of the 23 questions correctly

This represents S(N) = \sum_{i=N}^23 \binom{23}{i} acceptable states of
correct answers out of 2^23 all possible states, which requires
-log_2(S(N)/2^23) bits of self-information transmittable with a code of
alphabet size of 5400. Therefore the largest N that satisfies this is 20.

23 - log_2(S(20)) = 12, log_2(5400) = 12.399

~~~
mrkgnao
But you can use those 12 bits as a key into some pre-arranged set of
information, each of which carries more information.

~~~
xfs
Actually you can't. There is no prior information about the distribution of
correct answers given in the question so everything is uniform distribution
and there is nothing to set up "pre-arranged" entropy encoding with.

~~~
Robin_Message
The trick (to get around the information theoretic limit of 12) is to be
uncertain about which of the answers are correct.

For example, with 1 bit, you can transmit if the majority are 1 or 0, already
getting you at least 11 right answers but _no information about which ones are
right_.

------
owenversteeg
This is really cool, since I'm currently working on a way to transmit
extremely small amounts of data (~30 bits) in a situation where you cannot use
any RF waves.

Of course, it's banned here but the simplest thing (for the person taking the
test) would be to take a short amount of time - or a long amount of time - to
fill in each bubble. A short scribble = false, a long scribble = true. To
thwart the proctors you could swap the keys every other question, such that
for question 1 short=false, long=true; question 2 short=true, long=false, etc
etc.

Of course, for any of these answers the proctors could trivially catch you
cheating. The best way to prevent this would be applying a simple one-time pad
to them - exchange a small number beforehand and use that.

------
BrandoElFollito
Shouldn't the friend also communicate the time he actually finished the exam
(which can be different depending on the difficulty of the questions)? In
other words, it took him 23 minutes to answer, left after 53, which means that
the information about the answers is encoded in "30" (53-23). If it would have
taken him 17 min to solve the questions, he would have left after 17+30=47
min.

This also means that the more difficult the questions, the less information he
can provide (because the "time" allocated to coding the answers is smaller (as
in smaller keyspace)

------
pizza
My attempt:

\- need 23 bit string to fill 23 yes/no answers

\- the test is 90min * 60sec/min = 5400 sec

\- log2 5400 time-indexes = ~13 bit time-address capability

23 bits required - 13 bits given = 10 bits that are unavailable. You can
answer 13 questions.

~~~
cperciva
I rephrased the question slightly for clarity after you posted this: You're
looking for the largest value N such that you can _guarantee_ that you get at
least N questions correct. Since log(5400)/log(2) is only ~12.3987, your
approach only counts as 12 correct answers.

(You can do better.)

~~~
yakult
We can start with the prior that there will probably be a similar number of
trues and falses. We are mapping 23 bits of state into 12 bits, we can leave
out those states where T>>F or vice versa. If the test turns out to be one of
those unexpected states, pick the answer that gives the best score.

Edit: Rereading, it looks like we're optimizing the worst case, rather than
average. So we're looking for a 12:23 map where at most n bits are inaccurate,
minimizing n. I'm sure there's a signals alg that does this...

------
wnoise
log_2 of 60*90 is about 12.4. This shows that a naive scheme would expect to
get about 12.4 + (23 - 12.4)/2 = 17.7 questions right. I would be very
skeptical about any scheme that purports to guarantee better than this. A
reasonable approach would be trade the possibility of doing better than this
with the possibility of doing worse.

In fact, I can guarantee 17 correct questions, with only 11 bits which has the
same floor for expected value (11 + (12/2) = 17):

Divide the first 18 questions into 6 groups of three. Use the first 6 bits of
time to indicate the majority in each group of three, giving you 12 guaranteed
answers. The last 5 questions can be encoded directly.

It seems like there should be room for improvement, as this does very well on
each group in the 1/4 time that each is uniform. I also worry that the average
here is 12 + 1.5 + 5 > 17.7.

Perhaps overlapping the groups could help, which starts to look like an LDPC
code, but with max, rather than parity. The difficulty is that overlapping
bits can no longer guarantee more, because the previous ones could have
already done so.

------
xarope
TIL about Golay codes; interesting read, thanks!

------
SFJulie
Encoding time as a one hot [https://en.wikipedia.org/wiki/One-
hot](https://en.wikipedia.org/wiki/One-hot)

Getting 12 out of 23 should get him the average on the lucky assumption the
teacher was not a joker that put all answers higher than bit 6 to false ...

------
blobman
The friend could easily submit far more answers than 23 if need be. He just
needs to go to the toilet.

Step 1: 10 bits information - First 10 answers as seconds from start of exam.
After x seconds, friend goes to toilet.

Step 2: Restart counter

Step 3: On return from the toilet, take time taken in toilet. You may want to
limit this to 6 bits of information / ~ 1 minute in the toilet. Anything
longer might be suspicious.

Step 4: Restart counter

Step 5: Only 7 bits of information to hand over to get 23 questions. Count
time till he leaves.

Step 6: Repeat toilet-going process for more answers in different situations.

There should be no suspicions, as the answer-receiver will not go to the
toilet _.

_ Do not drink tea when attempting this method.

------
bicepjai
Could someone elaborate the solution ? Thnx

------
SandB0x
I like the double entendre in the title.

------
szemet
Reed Solomon for example? Or anything to increase Hamming distance between
legal codes.

------
deepnotderp
Could something similar to "illegal primes" be used?

------
anitasvasu
ELI5 Version:

In order to understand the solution, we need to first understand basics of
coding theory. Unlike envelopes when we send messages over a wire, or through
air (wireless) it is possible that the messages we send get corrupted.
Specifically if we are sending “Hello” it can end up becoming “Hgllo” where e
became g because of some electric/electromagnetic distortion. If we typed
incorrectly using our mobile keyboards it would have auto corrected “Hgllo” to
“Hello”. This is possible, because our natural languages have a lot of
redundant information. Otherwise if we lost one word in a conversation whole
sentence would not make sense.

Coding theory is about how do we systematically put redundant information so
that computers can correct for errors. We are only going to talk about one
type of error, where a letter gets replaced by another letter. If ABCDEF
becomes GBCDEH ; here two letters are wrong, and we say they have a distance
of 2. What will happen if we simply repeat each letter? Message ABC becomes
AABBCC. Now if a single error happens GABBCC we know original message got
corrupted. We still can’t figure out what the original message is. It could be
either ABC or GBC. So a better technique would be to repeat each letter
thrice. Message ABC becomes AAABBBCCC. Now if a single error happens
GAABBBCCC, then we can say if there is only one error then the correct message
is ABC by taking majority. Note that this cannot correct for two errors
because if you see GGABBBCCC then you will think real message is GBC.

What if the message has 4 letters, ABCD becomes AAABBBCCCDDD? This scheme
requires you to send 12 letters just to correct 1 error in 4 letters. We
denote this by saying [12, 4, 1] — where 12 is the coding distance, 4 is the
original message length, and 1 is the number of letters we can correct for.

For now let us temporarily restrict our focus to binary letters (0, 1). This
is because in computers we communicate purely using 0s and 1s. For example
letter A would be represented by a number 65, which would be represented as
0100 0001 an 8-bit message, or simply called a byte.

It turns out if we want to send 4 bits, then using 12 bits for it is actually
not very efficient. You could do better. In fact just 7 bits is enough to
represent 4 bits where you can correct for a single error. This is called the
hamming code.

Let us understand why our repeating scheme works in more detail. Consider any
two valid code sequence, for example AAABBBCCCDDD and AAAAEEECCCDDD, even
though only one character changed in the input, 3 letters changed in the
output. So if we make sure distance between any two coded sentences is 3 then
we can correct for one error.

There are 16 possible words using 4 bits (0000, 0001, 0010, … 1111). If we can
assign code words for each of them such that distance between any two is 3,
then we can say that it can correct for 1 error. Hamming code essentially
achieves it. If are encoding ABCD, then first 4 bits is the same as the the
input viz ABCD, 5th bit is A + B + D; 6th bit is A + C + D, and 7th bit is B +
C + D. Of course sum like 3 will be replaced by 1 again by taking remainder to
2. Such codes are also called linear codes, where each bit can be written as
linear combination of input bits. You can read chapter 5 of Jiri Matousek’s 33
Miniatures to see a quick proof on why hamming code is indeed a valid code
using a bit of linear algebra. Or you can simply write down 16 of those codes,
and check every two combination and make sure they differ in at least three
places. (You will need to compare 120 pairs though).
[http://kam.mff.cuni.cz/~matousek/stml-53-matousek-1.pdf](http://kam.mff.cuni.cz/~matousek/stml-53-matousek-1.pdf)

What if we want to correct for more than 1 error? Golay code is one such code.
G23 and G24 are two codes that encodes 12 bits in 23 and 24 bits respectively.
In the first case hamming distance is 7 and in second case it is 8. There is
no reason to have a distance 8, distance 7 is enough because

Imagine two codes valid codes (x0) and (x7) which are 7 distances apart. Then
using 7 changes you can reach from x0 to x7.

(x0) — x1 — x2 — x3* — x4* — x5 — x6 — (x7)

Even if (x0) makes 3 errors, and (x7) makes 3 errors you will only end up with
(x3) or (x4). And never get confused. So if we want to correct for k errors
then hamming distance between any two code words must be 2k + 1 or higher.

It turns out that G23 is not just a good code, it is a “perfect” code. Meaning
if we want to send 12 bits, with 3 bits of correction, then you must need 23
bits.

Now back to our problem, you have 23 answers in front of you, assume it is G23
code and decode back to 12 bits. You get up at a time to represent these 12
bits. We know that when your friend encodes this 12 bits back to G23 code then
at most 3 bits will be out of place. In other words you friend will get at
most 3 answers wrong. Hence getting at least 20 of them correct.

Now relationship with String theory requires a bigger write up. But
essentially it is because the internal structure of field theory on some kind
of donut shaped universe, the type of transformations you can do there will
look something like the type of transformations you can do on the Golay codes.

------
seesomesense
Binary Golay code...

