
Unsupervised Machine Translation Using Monolingual Corpora Only [pdf] - stablemap
https://arxiv.org/abs/1711.00043
======
mabbo
Neat! If I'm reading this right (smarter people please correct me if I'm
wrong), the process they used is:

1- Train a system that translates language A sentences into a representation
space, and can translate back from that space.

2- Train a second system that does the same, but with language B, onto the
same representation space.

3- Train an adversarial system that tries to look at the representation space
and identify which language the sentence came from, retraining the language
translation systems to try to fool that recognizer. Retrain the models to try
to not be recognized by this third system.

The best way to 'hide' from the recognizer is to have a very similar
distribution, to make it use similar points in the representation space for
the same concepts.

Brilliant stuff.

~~~
OscarCunningham
I wonder if this relies on the sentences of the corpora referencing similar
concepts. I'd be interested to see if it worked on a corpus of modern English
and a corpus of classical Latin or Greek.

~~~
visarga
> the model starts with an unsupervised naive translation model obtained by
> making word by word translation of sentences using a parallel dictionary
> learned in an unsupervised way

So they DID have a parallel dictionary at first. How they learned it is
another problem.

------
canjobear
It seems like there's a hard limit on how well this approach can work, defined
by the extent to which the distribution of words in texts uniquely identifies
their meaning without reference to any external semantics.

With monolingual corpora and no semantic grounding, all you can say is that
certain words (or letters, morphemes, or whatever) covary in certain ways that
are similar to the ways in which other words covary in the other language.
It's super interesting that this is enough to identify some translation pairs
across languages.

~~~
OscarCunningham
In particular I wonder if it can get "left" and "right" the correct way
around.

~~~
schoen
I'd imagine you can potentially get that by connections with strength,
weakness, dexterity, etc.

~~~
harperlee
I'm not sure I understand you... But the parent's comment is quite
interesting, how could the system end up with a correct left/right
understanding or our world, instead of a mirror world? Without experiencing
the outside world, it seems impossible!

~~~
pathsjs
The remark is that the two words, while apparently interchangeable, in actual
use are not. One is often used with positive meaning related to dexterity and
so on. Also, they have different political meanings. So they are not at all
symmetric in actual use, much like every other word pair

~~~
OscarCunningham
Not all of those differences are cross-cultural.

~~~
schoen
They definitely work in a lot of languages.

[https://en.wiktionary.org/wiki/derecho#Spanish](https://en.wiktionary.org/wiki/derecho#Spanish)
[https://en.wiktionary.org/wiki/direito](https://en.wiktionary.org/wiki/direito)
[https://en.wiktionary.org/wiki/diritto](https://en.wiktionary.org/wiki/diritto)
[https://en.wiktionary.org/wiki/sinister#Latin](https://en.wiktionary.org/wiki/sinister#Latin)
[https://en.wiktionary.org/wiki/dexter#Latin](https://en.wiktionary.org/wiki/dexter#Latin)
[https://en.wiktionary.org/wiki/laevus#Latin](https://en.wiktionary.org/wiki/laevus#Latin)
[https://en.wiktionary.org/wiki/%CE%BB%CE%B1%CE%B9%CF%8C%CF%8...](https://en.wiktionary.org/wiki/%CE%BB%CE%B1%CE%B9%CF%8C%CF%82#Ancient_Greek)
[https://en.wiktionary.org/wiki/recht#German](https://en.wiktionary.org/wiki/recht#German)

and many languages related to these. Some Slavic languages have a close
cognate between words meaning 'right (direction)' and words meaning 'right
(correct)', although I didn't immediately find examples in current use on
Wiktionary where the two forms are identical. A "dated" example is Polish

[https://en.wiktionary.org/wiki/prawy#Polish](https://en.wiktionary.org/wiki/prawy#Polish)

Edit: a Finnish friend pointed out that this works in Finnish, which is
European but not related to any of the other languages above.

[https://en.wiktionary.org/wiki/oikea#Finnish](https://en.wiktionary.org/wiki/oikea#Finnish)

I don't know if these linguistic phenomena definitely occur in non-European
languages, but I think that many people in the Middle East, many parts of
Africa, and South Asia consider the left hand unlucky or impure not only
because it's weaker in most people but also because it's conventionally used
when going to the toilet. It would be a little surprising to me if that
attitude didn't show up in languages from those places too.

Edit: it looks like it works in Arabic too

[https://en.wiktionary.org/wiki/%D8%B4%D9%85%D8%A7%D9%84#Arab...](https://en.wiktionary.org/wiki/%D8%B4%D9%85%D8%A7%D9%84#Arabic)
[https://en.wiktionary.org/wiki/%D8%A3%D9%8A%D8%B3%D8%B1#Arab...](https://en.wiktionary.org/wiki/%D8%A3%D9%8A%D8%B3%D8%B1#Arabic)
[https://en.wiktionary.org/wiki/%D8%A3%D9%8A%D9%85%D9%86#Arab...](https://en.wiktionary.org/wiki/%D8%A3%D9%8A%D9%85%D9%86#Arabic)

------
nl
This is pretty interesting, and something I've played around with too
(although not to the extent they have - I was playing with aligning word
embedding and using them for cross lingual tasks).

Google released a paper[1] doing zero-shot translation between unseen pairs.
That relied on a shared representation which they called "interlingua", and
that seems quite similar to this

[1] [https://research.googleblog.com/2016/11/zero-shot-
translatio...](https://research.googleblog.com/2016/11/zero-shot-translation-
with-googles.html)

~~~
unhammer
[https://en.wikipedia.org/wiki/Interlingual_machine_translati...](https://en.wikipedia.org/wiki/Interlingual_machine_translation)
is an old concept, used to be seen as The Right Way (as opposed to the
pragmatic way) of doing MT back in the day, before the success of (very
pragmatic, low-knowledge) statistical approaches.

I find it interesting that in Google's system, they didn't set out to build an
interlingua, but instead built a system that happened to end up with something
like one …

------
unhammer
This is pretty cool! But I would've liked to see some eval on smaller datasets
(or just a graph of BLEU vs training set size), since the main use for
something like this is where you don't have parallel data, in which case the
monolingual resources are likely to be much smaller too. A few years back I
was trying different methods to create bilingual dictionary word-pair
candidates (to be post-edited by a linguist – checking is faster than writing)
for Saami languages, and tried aligning pure word vectors, but the existing
corpora were too small to get good vectors, so I gave up that avenue :(

Another neat approach is [https://www.isi.edu/natural-
language/mt/RLILdecipher.pdf](https://www.isi.edu/natural-
language/mt/RLILdecipher.pdf) , specifically designed for low-resource
languages that happen to have a higher-resource related language (e.g. Haitian
Creole–French). They train a source language character-based "cipher model" (a
weighted finite state transducer that turns source characters/character pairs
into target characters/character pairs), to maximise probability over a target
language language model.

------
zardo
What an awesome idea. I wonder how well this technique would work for
different data types. Text and audio for example.

~~~
DonaldPShimoda
Most audio translation is first transcribed to a text representation, I think,
and then _that_ text is what's translated. If you assume a perfect speech
recognizer (for the sake of argument), then the end translation should be the
same.

Or am I misunderstanding your point?

~~~
zardo
>Or am I misunderstanding your point?

Only that you are talking about specifics, while I was talking in generality.
(IMO) unsupervised learning of semantic relationships is a 'big deal'. This
seems to be a very general method that would extend to any pair of data
streams, but that hasn't been proven out yet.

------
matheist
Feels reminiscent also of CycleGAN
([https://github.com/junyanz/CycleGAN](https://github.com/junyanz/CycleGAN))

------
BoiledCabbage
At first I thought this was really insightful now it seems like just
redefinition of meaning.

An analogy is an experiment where a scientist says "I can prove these two
people have ESP and can pick the same #". He sits them in two separate rooms
and has her first pick a number. He goes to the second room and has him pick a
number. 17? No higher. 24? No higher. 52? No lower. 43? Yes exactly!!

"See, these two people both came up with the same number without any
communication with each other. ESP!"

If they want to show this language learning is powerful, demonstrate that once
trained it can now be applied to a third language without any new "adversarial
feedback".

Or it reminds me of the fake "we can communicate faster than light via quantum
entanglement" claims. But then the caveat comes that they can only figure out
what the value of the bit of info they communicated was causally.

~~~
benkuhn
I think you missed something.

Previously, machine translation required being trained on a bilingual corpus,
that is, a corpus of _the same set of sentences_ in eg English and French.
These corpora are pretty hard to come by and expensive to produce.

The paper describes a technique to use two monolingual corpora instead, i.e.
one set of sentences in English and a _different_ set of sentences in French.
That's _way_ easier to find.

It's far from just a definitional trick.

~~~
JPLeRouzic
>> These corpora are pretty hard to come by and expensive to produce.

Actually there are lot of texts translated by qualified translators in several
languages, for political reasons: EU's commission websites, perhaps some other
countries official websites.

You can have a look at Linguee [0] which uses this to provide translation
suggestions:

[0] [https://www.linguee.com/](https://www.linguee.com/)

