

How to break the 'rapper code' - jgrahamc
http://blog.jgc.org/2012/02/how-to-break-rapper-code.html

======
coderdude
Recently I began reverse engineering the game files for Circuit's Edge (1989).
Most of the in-game text is stored in a separate file called DISKTEXT.TXT. The
method Westwood Associates used seems to be a mix of encryption and
compression, and looks very similar to the kind of encoding seen in this
article.

The file basically uses a number of bytes between 128-254 to represent two-
letter combinations. After an hour of tweaking a Python script I finally had
the file decrypted. As someone who had never before dabbled in this sort of
thing I felt very accomplished, although I quickly realized how rudimentary
their methods were.

~~~
jgrahamc
How did you come to realize that you were dealing with bigrams?

~~~
coderdude
Trial and error for the most part.

First I tried to find sequences of characters that, without modification,
already looked like real words. I found one in particular that turned out to
be the name of the city in which the game takes place ("Budayeen", though it
was completely garbled).

That was a stroke of luck because that word appeared many times in the text
and gave me some clue about the adjacent words, since there are only so many
words you could reasonably put around the name of the city.

I tried globally replacing the characters in DISKTEXT.TXT but was ending up
with a word half as long as I thought it should be (for "Budayeen"). One of
the fortunate things that happened though was that it revealed to me, by
accident, a couple other words (even though the decryption wasn't correct, it
made some previously indistinguishable sequences look more like real words,
which I pursued).

I think it really clicked that I was dealing with bigram substitution when I
had to come up with a theory of how whitespace was so cleverly hidden.

Here's the lookup dictionary if anyone is interested:
<http://pastebin.com/pxJU7q7F>

And here's the script that does the substitution:
<http://pastebin.com/8eYrGbzQ>

You know, now that I actually think back to it, the lower 4 bits represent one
character and the higher 4 bits represent the other. And there I was thinking
in bytes the whole time.

------
evan_
"Dr Olsson, who has worked on hundreds of cases for police around the world,
told the trial: ‘The thought process behind the code shows someone who is very
able, very intelligent, very skillful.’"

I'm sure he means "relative to the typical criminal". The average criminal
would probably not use a code at all, or would possibly use the classic A=1
B=2 cipher. Shuffling the numbers such that E=10, T=5, etc. is genius level
compared to that.

~~~
andrewtbham
By building up the criminals intelligence and skill, he is building up his own
intelligence and skill by breaking the code.

~~~
bilbo0s
Actually, the code maker IS pretty able.

You need to consider that this individual had to create the code and
communicate its nature to compatriots on the outside. The code had to be
simple enough for said compatriots to use in encoding information themselves,
as well as being amenable to manual encryption and decryption. Since we can
assume said compatriots probably were not exactly computer programmers or
mathematicians. All of these requirements had to be met while keeping the code
reasonably difficult for police to decipher.

I think it is, too often, tempting to only consider one side of the creation
process in situations like this without giving due consideration to context.
Giving full consideration to context, this would, to me, seem a relatively
dangerous individual.

On another note, this is another example of how far law enforcement is willing
to go, in terms of resources, to get their man or woman. Normally, the vast
majority of us are not worth the effort of listening to our phone calls, or
reading our emails, snail-mail, texts, or web posts. HOWEVER, once a spouse's
body turns up, or ANY bodies turn up ... or maybe a bank goes under ... all of
that changes. They will look through EVERYTHING. And you will be worth the
effort to decrypt it.

It's not only national security that will get you that level of resource
allocation.

~~~
maxerickson
It sounds like the codebreaker in the article only took a few hours to break
it (but maybe something got lost in the writing), that's not a huge allocation
of resources in a case involving assault and such.

If the police were in the habit of decoding simple ciphers like this, they
could have put it in front of somebody that wouldn't describe this code
breaking process as painstaking.

------
eggbrain
I hope John doesn't mind, but I finished up the cipher, and cleaned up some of
the transcribing of the symbols, as he got a few wrong (G, P, and a few
others)

You can check out the modified code (with the translated solution as well)
here: <http://pastebin.com/umz5mM5F>

~~~
jgrahamc
Thanks. I literally spent about 15 minutes on this and I was sure there were
transcription errors.

~~~
jcr
This was fun. Thanks for leaving it to be an exercise for the reader.

The interesting bit is the underscore '_' --it can be either P or G depending
on context, but I'm yet to figure out if there's some rhyme or reason to how
it works.

------
BitMastro
From what I remember, a more efficient way of decrypting simple alphabet
substitution codes is by working on the frequency of bigrams and trigrams,
rather then single letters, because apart from E, already T and A can be
easily swapped in the frequency order.

~~~
tptacek
In practice, you spend a minute or two moving the dials ("E", or maybe " ",
&c) and when you have the right one you can tell because the output starts
making sense. Not saying you're wrong, just that you're overthinking it. :)

~~~
tripzilch
Actually no. Trigram analysis is the best way to solve simple substitution
ciphers. Most common trigram is nearly always "THE" with a much higher
probability than single-letter trials will get you.

Sure, some trial and error will probably yield results, but what he's
describing is (part of the) systematic approach, not "overthinking".

~~~
tptacek
Um. Like. I'm sure you're right? But in practice, single-character
substitution ciphers are so trivial to solve that you can more or less try
"E", then space, then _maybe_ "T" and have it.

You don't even need to know what a "trigram" is to solve single character
substitution over English text; if you couldn't do it in a job interview in
code on a whiteboard here, that'd be a "NO HIRE".

Also note that the "most efficient" ways of "breaking" (strange word to use
when we're talking about Carmen Sandiego-grade ciphers) ciphers aren't always
the best. For instance, the best way I know to break multi-character
substitution in web apps is comically inefficient (in terms of ciphertexts
required), but fits in just a couple lines of code.

~~~
jgrahamc
Also, in this example the writer didn't use the word THE, he was writing in a
vernacular and used DA.

------
omegant
What tools do you use for this kind of deciphering?, pencil and paper, some
kind of mode for vim ( or any other editor), or special software?

~~~
jgrahamc
I wrote a small program in Perl to do it. All I needed was the number
frequencies and some way of doing substitutions.

When I did the New Scientist code breaking competition I just used an EMACS
buffer and did M-% substitutions. [http://blog.jgc.org/2011/06/how-to-break-
new-scientist-ciphe...](http://blog.jgc.org/2011/06/how-to-break-new-
scientist-cipher.html) When I worked on the 'Reddit code' I think I just used
EMACS again: <http://blog.jgc.org/2010/12/breaking-reddit-code.html> When I
worked on how the Zodiac Killer enciphered the 408 message I started out by
hand and then wrote a small program: [http://blog.jgc.org/2011/06/how-zodiac-
enciphered-zodiac-408...](http://blog.jgc.org/2011/06/how-zodiac-enciphered-
zodiac-408-cipher.html)

For the hidden part of the GCHQ challenge that I discovered and reversed the
key for I think I wrote some code in C: [http://blog.jgc.org/2011/12/down-
gchq-rabbit-hole-or-i-think...](http://blog.jgc.org/2011/12/down-gchq-rabbit-
hole-or-i-think-there.html)

In general, I like to stare at things, work by hand and write code to
automate.

~~~
omegant
Thank you!, since I read "the code book" by Simon Sinth, all this kind of
deciphering just seems awesome to me. I'll check your articles for sure!

~~~
J3L2404
I think you mean Simon Singh, but yeah a great book.

~~~
omegant
Yep you are right, I am reading writting from the Iphone and made that
mistake.

------
egometry
The 'expert' comment at the end is interesting and sad... speaks to the fact
that any field that deals with non-obvious knowledge will basically be
saturated with people who present themselves as knowledgeable but provide
little utility.

------
Jach
This is awesome, both how easily it's broken and how primitive the methods
that criminals still use are. Are there literally no calculators or cell
phones that can calculate available in prison? You'd think they'd all use
[http://en.wikipedia.org/wiki/Diffie%E2%80%93Hellman_key_exch...](http://en.wikipedia.org/wiki/Diffie%E2%80%93Hellman_key_exchange)
these days. I remember more than one documentary sensationalizing "the codes",
especially focusing on an apparently frequent use of gang-symbolism instead of
words...

~~~
tomjen3
I doubt you could get access to something like that in prison, but you could
at least you the Vignere cipher. It can be done by hand, but it is still
moderately difficult.

Or be fancy and use the solitare algorithm. It is pretty much designed for
this case.

~~~
FreeFull
I wonder if you could change the Vignere cipher to be homophonic. Would that
be harder to decrypt than a standard homophonic cipher?

------
jcr
I cleaned up the input a good deal by studying a much better image of the
coded message (url in code and below). I also left the lines/rows as they were
written, and added block/paragraph delimiters.

<http://pastebin.com/6EfmSpyi>

Though I might not still be alive in 25 years when he gets out of prison, I
still didn't think it would be wise to automate the correction of his spelling
and grammar. ;)

------
tripzilch
For those who enjoy this sort of thing, check out the challenges at
3564020356. First one's a simple substitution cipher (plus links to tutorials
on it--though the links are dead probably, so that's another challenge)

------
jcr
For those playing along at home, there is a better image of the complete note
in this article:

<http://www.dailymail.co.uk/news/article-210> 6384/Rapper-Kieron-Bryan-
jailed-25-years-codebreaker-exposes-gangland-hitman.html?ito=feeds-newsxml

The @code array in jcg's perl script is a bit off. It seems his sister's name
might be "Koh Koh" --probably pronounced like "Coco" of "Coco Channel" fame. I
haven't found any supporting evidence, but it could explain the "6 25 4, 6 25
4," at the start of the message.

------
TazeTSchnitzel
For those outside the United Kingdom, the _Mirror_ is a tabloid...

------
jrockway
Surprisingly clean Perl script for a non-programming blog. Nice work!

~~~
jgrahamc
I am a programmer: <http://blog.jgc.org/2012/02/programmer.html> :-)

