

Machine translation cracks 18th century occult cipher - hachiya
http://www.theregister.co.uk/2011/10/27/machine_translation_cracks_occult_code/print.html

======
kleiba
I just read the original paper (<http://aclweb.org/anthology-
new/W/W11/W11-1202.pdf>).

As far as I can tell, the article by theregister is complete nonsense: the
researchers suspected a simple substitution cipher and used statistical
analysis of the distribution of (co-)occurrences of the text symbols. Once
they conjectured that the source language was German, they were able to use
letter frequency information from modern German texts to derive likely
mappings between the symbols and letters from the German alphabet.

This has nothing to do with machine translation. MT does not typically operate
on the level of letters. The only time the researchers mention machine
translation in their paper is on page 6, where they piped the partly
deciphered text into a German-to-English translation software to see where
their mapping was incorrect.

The remarkable fact is that apparently none of the researches speaks German,
so they used such software to check if it could make sense of the sequences of
letters deciphered so far. They might as well have used an old-fashioned
dictionary to achieve the same result. On page 7 they talk about finally
talking to native speakers of German (quite late in the endavour I think), to
help them with the remaining problems.

The register article gives the impression that the researchers sort of pasted
a transcripted version of the original document into an MT system which then
magically produced a "translation" into German autonomously. That is not at
all what happened.

~~~
pvg
The cypher is a substitution one but not a 'simple' one since multiple symbols
map to single plaintext letters and some symbols map to n-grams common in
German. The analysis is related to language identification, rather than
machine translation as a whole - even if the researchers guessed the plaintext
language correctly to begin with.

It doesn't seem particularly remarkable none of them spoke German, they
certainly spoke languages close enough to it to recognize their initial
results were German-looking enough - at which point they enlisted the help of
German speakers. They did, in fact, just use a dictionary at one point.

The analysis is more sophisticated than simply guessing the source language
and the fact it's a substitution-type cypher and the techniques on the n-gram
level are arguably not entirely unlike what Google uses for MT, but on the
word and phrase level. You're right that drawing a direct connection between
them is somewhat tendentious.

~~~
kleiba
Your first point is fair enough, it is not the "simplest" substitution cipher
one could imagine (one-to-one mapping), but I would say it's still a fairly
simple mapping.

The reason I mentioned that none of the researchers speaks German is
remarkable in two senses. First, it is the only reason they had to use machine
translation in their approach at all! It was only for their testing the
quality of their intermediate results, and not - as theregister suggests - a
central part of the deciphering. Second, the fact that you don't have to speak
the target language gives some credit to their methodology, because to some
degree there's hope that the complete algorithm could be automated.

~~~
tripzilch
"Simple substitution cipher" is a cryptographic jargon, that means
substituting single characters for other characters or symbols, independent of
their context or position.

See
[http://en.wikipedia.org/wiki/Substitution_cipher#Simple_subs...](http://en.wikipedia.org/wiki/Substitution_cipher#Simple_substitution)

~~~
kleiba
Oh, nice, I didn't know that!

------
DanBC
The Register article links to three other sources. Here they are in clicky
form.

([http://www.wired.com/wiredscience/2011/10/copiale-cipher-
cra...](http://www.wired.com/wiredscience/2011/10/copiale-cipher-crack))

([http://www.eurekalert.org/pub_releases/2011-10/uosc-
csc10241...](http://www.eurekalert.org/pub_releases/2011-10/uosc-
csc102411.php)) > _To break the Copiale Cipher, Knight and colleagues Beáta
Megyesi and Christiane Schaefer of Uppsala University in Sweden tracked down
the original manuscript, which was found in the East Berlin Academy after the
Cold War and is now in a private collection. They then transcribed a machine-
readable version of the text, using a computer program created by Knight to
help quantify the co-occurrences of certain symbols and other patterns._

(<http://stp.lingfil.uu.se/%7Ebea/copiale/>) > _The “Copiale Cipher” is a 105
pages manuscript containing all in all around 75 000 characters. Beautifully
bound in green and gold brocade paper, written on high quality paper with two
different watermarks, the manuscript can be dated back to 1760-1780. Apart
from what is obviously an owner's mark (“Philipp 1866”) and a note in the end
of the last page (“Copiales 3”), the manuscript is completely encoded. The
cipher employed consists of 90 different characters, comprising all from Roman
and Greek letters, to diacritics and abstract symbols. Catchwords (preview
fragments) of one to three or four characters are written at the bottom of
left–hand pages._

------
Luyt
A note on the content of the decrypted book (105 pages), which according to
the Register,

 _"[...] has been revealed as the rituals and political thoughts of a German
secret society, with a strange fascination for eye surgery and
ophthalmology."_

However, the WikiPedia entry [1] makes this ritual sound more like a
fraternity/student joke:

 _"[...] an initiation ritual in which the candidate is asked to read a blank
piece of paper, and on confessing inability to do so, is given eyeglasses and
asked to try again, and then again after washing the eyes with a cloth,
followed by an "operation" in which a single eyebrow hair is plucked."_

[1] <http://en.wikipedia.org/wiki/Copiale_Cipher>

~~~
yesbabyyes
Yeah that's what it says in the book:

 _He carries him thereafter to a secondary table where, next to a lot of
candles, several instruments and eye glasses, microscopic perspective, a cloth
and a glass of water must be present. He has to lower himself on to a tabouret
and to look upon an unwritten piece of paper for a while. If, after a while,
he answers that he cannot see anything written on there, than the master of
ceremonies puts him a pair of eye glasses and asks him again if he is not able
to read the writing. Answer no. During this time the master of ceremonies
comforts him as good as he can, raises his hopes for improvement washes his
eyes with a cloth and if nothing helps, he will announce that they have to
proceed with the operation

then all those present members reach for the candles place themselves around
the candidate and the master of ceremonies [...] plucks a hair from the
eyebrow with a pair of small tweezers under constant urging, comfort and
encouragement and concludes herewith the operation _

[http://stp.lingfil.uu.se/~bea/copiale/copiale-
translation.tx...](http://stp.lingfil.uu.se/~bea/copiale/copiale-
translation.txt)

