
They Cracked This 250 Year-Old Code, And Found a Secret Society Inside - pstadler
http://www.wired.com/dangerroom/2012/11/ff-the-manuscript/all/
======
kens
I suspect there's a second code hidden in there. From the article, describing
the code symbols that are Roman letters:

    
    
        These unaccented Roman letters appeared with the frequency 
        you’d expect in a European language. But they don’t 
        represent letters—they mark the spaces between words.
    

It's implausible that these characters just happen to appear with a language-
like frequency distribution and are all meaningless spaces. I suspect they
actually have a meaning and provide a second message.

To clarify, it's like taking "SthisEisCtheRfirstEmessageT" and assuming all
the capitals just indicate spaces.

~~~
cbr

        It's implausible that these characters just happen to
        appear with a language-like frequency distribution and
        are all meaningless spaces
    

Really? If I were to try to pick random letters I suspect I would end up
mirroring the frequency that they appeared in English.

~~~
Groxx
Probably not. People are bad at random:
[http://scienceblogs.com/cognitivedaily/2007/02/05/is-17-the-...](http://scienceblogs.com/cognitivedaily/2007/02/05/is-17-the-
most-random-number/)

~~~
wlievens
I think his point is valid exactly _because_ people are bad at randomness.

~~~
Groxx
/me rereads

Oh, heh, I can see it that way now. I had intended my comment to say that,
since you'd be trying to reach that set of ratios to hide things, you'd
probably fail miserably against any competent analysis.

------
danso
A wonderful read. I know a little bit about frequency analysis and was
surprised to see how straightforward its application was (in theory). I'm even
more surprised that after a decade of Google, that this approach wouldn't be
one of the first things tried out given the length of the text. As the OP
describes, it was a chance encounter at a conference that machine learning was
finally introduced into the problem. Until that point, the linguist had been
trying in vain to decipher the text...is there still such a gap between the
researchers and the computational experts who know how to implement solutions?

* to put it in a less-polite way: how the _F_ else would you solve a problem like this, with non-computational methods?

~~~
Avshalom
>Until that point, the linguist had been trying in vain to decipher the text

Well no, the linguist tried in vain to do frequency analysis by hand on ~88
symbols for ~100 pages for a couple months before saying "bugger this for a
game of soldiers" and went on with her life.

"She tried a few times to catalog the symbols, in hopes of figuring out how
often each one appeared. This kind of frequency analysis is one of the most
basic techniques for deciphering a coded alphabet. But after 40 or 50 symbols,
she’d lose track. After a few months, Schaefer put the cipher on a shelf."

~~~
dyselon
Like a lot of people that played Fez, I recently did some frequency analysis
by hand, to crack the alphabet in that game. It was pretty tedious, and I
messed up frequently. I wouldn't blame her for giving up after a few mistakes.

------
Turing_Machine
The next time I'm at the eye doctor, I'm going to be wondering what that eye
chart _really_ means. :-)

Another poster mentioned the Voynich manuscript. It's available on archive.org
if anyone wants to try their hand:

<http://archive.org/details/TheVoynichManuscript>

Here's a list of others:

<http://www.omniglot.com/writing/undeciphered.htm>

------
gebe
Wow, not often accomplishments from people you actually know and have had as
teachers end up on the frontpage of HN. I was at the same talk by Kevin Knight
as Schaefer and I can vouch for that it was a mighty interesting one! I
actually changed my curriculum a bit (to include cryptography) as a result of
his talk.

------
keithpeter
Good catch, nice read, with a computational angle.

Take a walk down some of the older lanes in London, say near Borough Market or
back up towards Southwark, or the other side between Brick Lane and Petticoat
Lane, and imagine yourself back in the 1700s.

Coffee houses, close groups having meetings, private rooms upstairs in narrow
houses. The feeling that _true knowledge_ was being passed on. The _meaning_
people found in the processes of the primitive technology.

It strikes me that the boring bits of the decoding (tokenising the symbols,
entering the tokens) could be farmed out using a web site hosting scans of
texts. The computational resource could perhaps be spare cycles on a PC with
an appropriate application. Scope for lay science of a particularly
interesting kind, _and_ the refinement of algorithms as they are applied to a
larger corpus of texts.

------
Leszek
> Eventually we turned to the last items in the Oculist trove: nine copies of
> a four-page document written in a mixture of old German, Latin, and the
> Copiale’s coded script. The message was more or less identical in every set.

I feel kind of sorry for them, that at the end of their journey they found
what was essentially a Rosetta Stone for the code they were decoding.

~~~
Avshalom
That sentence says the nine copies (sets) were more or less identical not that
the german latin and copiale were translations of each other.

~~~
Leszek
Oops, you're right, parsing fail.

------
nnq
this: "The unaccented Roman letters didn’t spell out the code. They were the
spaces that separated the words of the real message, which was actually
written in the glyphs and accented text." makes me think of a cyphertext
within a cyphertext, something like an ancient form of stenography.

...maybe the symbold used as spaces are not actually random and there's
another message hidden there, with another cypher, offering the writers of
this "plausible deniability" regarding its existence: they could only give the
way to decipher the first level of encryption and say that's all there is,
while the really important information was hidden in the "space characters"...

(... now putting my tinfoil hat back in the closet :) )

------
stcredzero
Actually, they cracked a 250 year old code and found a secret society inside a
secret society. (True. Read the article!)

------
Jun8
And now if only someone cracked the Voynich manuscript!

~~~
fsiefken
yes, unfortunately the frequency and language analysis didn't result in
anything useful except for some vague hints the encoded language might be
asian, perhaps written by a westerner who traveled there.

------
BerislavLopac
I'll be calling my rock band "Quiet Bulldozer". ;-)

~~~
ansgri
There's a composition I like very much, by GY!BE, "She Dreamt She Was a
Bulldozer, She Dreamt She Was Alone in an Empty Field". Maybe you could do
similar genre?

------
tsunamifury
This introduction feels eerily similar to an opening interview at Google.

~~~
chime
How so?

------
BaconJuice
Enjoyed reading this. Thank you.

------
k2xl
Question (maybe a dumb one) but how does an algorithm account for symbols that
might mean a series of letters? Or a symbol that stands for a different letter
depending on the symbol before or after it?

~~~
shabble
In general, using _n_ -grams[1], probably at the character level. (So, as the
article mentions, the bigram "ch" is common in German, and "qu" is much more
common than "q _X_ " for any _X_ in English)

You can analyse texts you believe to be similar (in language, period, subject,
etc) to the coded message you are attempting to crack, and use that to build
tables of these n-grams in various semantic units.

Of course, these are useful in many more things than code-breaking, and Google
have various datasets they make publically available.

The Google books ngram viewer[2] is a fun tool to play around with, or for the
more serious, you can download a corpus of ~24GB of analysed web data they've
crawled (from around 1 trillion source words)[3]

One actual example of a code constructed in the manner described is the
Playfair cipher[4] which was used for a time in the late 1800s, but is now
thoroughly broken.

[1] <https://en.wikipedia.org/wiki/N-gram>

[2] <http://books.google.com/ngrams>

[3] [http://googleresearch.blogspot.co.uk/2006/08/all-our-n-
gram-...](http://googleresearch.blogspot.co.uk/2006/08/all-our-n-gram-are-
belong-to-you.html)

[4] <https://en.wikipedia.org/wiki/Playfair_cipher>

------
Roelven
Woah. Awesome story but was kinda disappointed with the ending, just leads to
more riddles & codes.

------
myWordBiLLY
This was a fun read. Thanks for sharing.

