
Computer scientist uncovers clue to deciphering the Voynich manuscript - kposehn
https://globalnews.ca/news/3984789/voynich-manuscript-computer-decipher-alberta/
======
rspeer
I attended the ACL talk where he presented this. While the method was
reasonably well-justified, it seems clear that the data was flawed, leading to
a flawed model, leading to a false result. The author failed to ask himself
the question "what if this didn't work?".

Here's what I would identify as the main flaw: To identify the language of a
centuries-old document, he trained his model on a _modern_ multilingual text
-- the Universal Declaration of Human Rights.

(A questioner in the audience asked "If you needed a document in hundreds of
languages, why didn't you use the Bible?", to which he had no real answer.)

So he ran this somewhat anachronistic model and it told him that the text was
a good statistical match for being in... Esperanto. He threw this prediction
out on the basis that it makes absolutely no sense. You'd think this would be
the first sign that the model did not fit the data.

The #2 prediction was Modern Hebrew, which of course did not exist in the 15th
century, so he said "that's close enough to biblical Hebrew" and claimed that
as the result.

He asked a Hebrew speaker he knew to decipher a couple of sentences based on
the model. The Hebrew speaker could not actually decipher them. The author
pressed on and the Hebrew speaker obligingly produced some unsatisfying
guesses at sentences, which sound remarkably like sentences from unsuccessful
attempts to decipher Linear B texts.

And I don't understand why anyone would so confidently report a modification
of the #2-ranked prediction of the model. We might as well go ahead and say
"AI confirms: the Voynich Manuscript is in ancient Esperanto!"

The much more realistic interpretation is that the Voynich text matches the
statistics of _none_ of the language samples that the model was trained to
recognize, and the model is overconfident in the probabilities it outputs.

I think the fact that the method was cool must have been part of the paper's
acceptance at ACL. But it does not give us a real answer about the Voynich
manuscript.

~~~
eindiran
I studied that manuscript a bit at university, and the questioner in the
audience was definitely on the right track wrt the Bible being valuable as a
source of training data. We trained on the pre-15th century translations of
the Bible in a number of languages. Off the top of my head, there was Old
Church Slavonic, Vulgate Latin, Bibilical Hebrew plus a number of other
languages people had proposed over the years, but the statistics from Voynich
Herbal A and Herbal B were never very similar to any of the languages. The
closest fit we found was for Manchu, trained using a Manchu translation of the
'Dao de Jing' which made us hopeful for Basanik's Manchu Hypothesis or a
modified Proto-Manchu hypothesis
([http://www.ic.unicamp.br/~stolfi/voynich/04-05-20-manchu-
the...](http://www.ic.unicamp.br/~stolfi/voynich/04-05-20-manchu-theo/)) but
as we dug deeper, the n-gram frequencies ended up being wildly off.

Over the years, I have been more and more doubtful that the Voynich Manuscript
is natural language at all. Hopefully someone will come along and prove me
wrong, but I have stopped hoping that any of these headlines is _the answer_
for quite a while.

~~~
IntronExon
You might enjoy this: [https://postbarthian.com/2018/01/10/review-david-
bentley-har...](https://postbarthian.com/2018/01/10/review-david-bentley-
harts-new-testament-translation/)

It’s a bit off topic, but in the spirit.

------
ageitgey
This is a bad article like most mainstream media writing on ML/AI. The
original is a bit better and has an interview with him:
[https://globalnews.ca/news/3984789/voynich-manuscript-
comput...](https://globalnews.ca/news/3984789/voynich-manuscript-computer-
decipher-alberta/)

He _only_ claims to have potentially identified the document's language as a
form of Hebrew using a language detection model trained on UN documents. That
let him incidentally decipher a few lines by literally typing them into Google
Translate, but he doesn't claim to be an expert on Hebrew or have the ability
to translate the whole document. He's looking for Hebrew experts to test his
theory.

But by the time the story makes it into it's third re-write on a chain of bad
news websites, it's "Man claims his AI deciphers unbreakable code that stumped
Enigma codebreakers" or whatever.

~~~
DonaldFisk
The articles in the Canadian press are just clickbait. Their paper is
[https://transacl.org/ojs/index.php/tacl/article/view/821/174](https://transacl.org/ojs/index.php/tacl/article/view/821/174)

and the claims are somewhat more modest.

The language identification method they use (does it count as AI?) assumes
that the Voynich Manuscript text is in a language they check for, the words of
which may have been anagrammed. There is no consensus that this is the case.
The Voynich Manuscript could be entirely meaningless, or encrypted using a
different mechanism, or be written in a language they don't check for. They
then find the closest match. The use of anagrams adds an extra degree of
freedom, which makes finding spurious matches more likely.

So they identified Hebrew, which is at least plausible. They then translate
_one line_ of the manuscript, _using Google Translate_ , into English.

There was no control. They should have applied their method to text known to
be meaningless, and questioned their approach if meaning was found. Gordon
Rugg (whom they cite) and myself (here:
[http://web.onetel.com/~hibou/voynich/generated-voynich-
manus...](http://web.onetel.com/~hibou/voynich/generated-voynich-
manuscript.html) ) have generated meaningless Voynichese text.

I haven't applied their entire approach, but Google Translate translates the
first line of my generated manuscript, when the language is identified as
Bengali, as "Do not worry about the fact that the person is very friendly".

~~~
sekh60
Brb, writing an article about how Gordon Rugg and Donald Fisk's previously
thought-to-be-meaningless text was deciphered by an AI. Did they channel
ancient Bengali spirits when writing it? Read more to find out.

------
anotheryou
I'm quite confident in this guys approach:
[https://www.youtube.com/watch?v=4cRlqE3D3RQ&feature=youtu.be](https://www.youtube.com/watch?v=4cRlqE3D3RQ&feature=youtu.be)

Even if he's wrong, it's super cool to watch his explanation. It's making so
much sense.

I doubt any AI can do better then him here, so much vague clues you have to
get to the bottom of and loads of slight matches because it's an old language
that only has relatives, but no equivalents today.

~~~
fapjacks
Yes! I came here to post this. He builds on work by Stephen Bax widely
recognized in the "Voynich community" as being the most insightful. As a
trained linguist, this guy does the grunt work to build a model, which he then
uses (in his "Voynich update" video) to predict an outcome successfully. That
is a huge indicator that he's onto something. What a fascinating series of
videos!

~~~
anotheryou
exactly. Stephen Bax was the most promising before Volder Z took it this step
further :)

I'm so glad someone finally shares my enthusiasm, I have very high hopes to
have a full translation in my lifetime. [SPOILERS:] It sounded like someone
just needs to find some old knowledgeable Sinti/Roma.

------
perone
I trained a word2vec model in the past using the EVA transcription if someone
is interested: [http://blog.christianperone.com/2016/01/voynich-
manuscript-w...](http://blog.christianperone.com/2016/01/voynich-manuscript-
word-vectors-and-t-sne-visualization-of-some-patterns/)

------
dbatten
Unrelated to the content of the story, but interesting nonetheless...

While reading the article (which was low quality with a flawed attention-
grabbing headline, as others have pointed out), I noticed 3 consecutive
articles linked on the right side were for porn-related stories. I don't know
too much about the International Business Times, but the name would have made
me think it was a relatively upstanding news source. Plus, the .co.uk always
adds a bit of extra class for us Americans.

So, I looked the place up on Wikipedia[1]. Here's what I found...

"In late 2011, Google allegedly moved the outlet's articles down in search
results in response to excessive search engine optimization activity."

"Reporting in 2014, Mother Jones claimed that IBT journalists are subject to
constant demand to produce clickbait; one former employee reportedly
complained that management issued 'impossible' demands, including a minimum of
10,000 hits per article, and fired those who couldn't deliver."

And it goes on...

So, yep, pretty much a click-bait factory. No wonder the story is so mangled.

[1]
[https://en.wikipedia.org/wiki/International_Business_Times](https://en.wikipedia.org/wiki/International_Business_Times)

------
dTal
Sigh. Call me when there's a translation.

------
jwildeboer
TL;DR it didn’t.

------
reificator
I don't think anyone is going to get further in deciphering this thing than
Randall Munroe:

[https://xkcd.com/593/](https://xkcd.com/593/)

~~~
forgot-my-pw
Not solved, but this one sounds pretty close: [https://www.the-
tls.co.uk/articles/public/voynich-manuscript...](https://www.the-
tls.co.uk/articles/public/voynich-manuscript-solution/)

Article: [https://arstechnica.com/science/2017/09/the-mysterious-
voyni...](https://arstechnica.com/science/2017/09/the-mysterious-voynich-
manuscript-has-finally-been-decoded/)

~~~
rnhmjoj
This is my favorite translation: [https://youtu.be/lhtZc-
nFNt0](https://youtu.be/lhtZc-nFNt0)

