
Facebook's AI Just Set a New Record in Translation - wfalcon
https://www.forbes.com/sites/williamfalcon/2018/09/01/facebook-ai-just-set-a-new-record-in-translation-and-why-it-matters/
======
delhanty
The referenced link to the original work on Facebook code looks more suitable
for HN:

[https://code.fb.com/ai-research/unsupervised-machine-
transla...](https://code.fb.com/ai-research/unsupervised-machine-translation-
a-novel-approach-to-provide-fast-accurate-translations-for-more-languages/)

~~~
gwern
And was already submitted:
[https://news.ycombinator.com/item?id=17886827](https://news.ycombinator.com/item?id=17886827)

------
eganist
Heads up for those of you obliging Forbes' forced resistance against
adblocking:

There's (edit: what appears to be) an active exploit in their ad network, one
that's getting around Chrome's redirect blocking through an apparent 0day.

[https://imgur.com/a/sRIB7pn](https://imgur.com/a/sRIB7pn)

I'm on Chrome Beta 69.0.3497.53 on Android, so this may not apply outside
that.

Chrome team:
[https://bugs.chromium.org/p/chromium/issues/detail?id=879938](https://bugs.chromium.org/p/chromium/issues/detail?id=879938)

~~~
jdangu
Chrome's protection only works in cross-origin iframes [1] and has been in
beta for years. I haven't checked in a while but can't find a source that
confirms that it went live.

Forbes serves a large portion of their ads in same origin iframes and so is
not fully covered by this protection.

[1] [https://blog.chromium.org/2017/11/expanding-user-
protections...](https://blog.chromium.org/2017/11/expanding-user-protections-
on-web.html)

------
throwaway2246
> Instead of giving the system whole words, they give the system the word in
> parts. For example, the word “hello” might be given as 4 word parts “he” “l”
> “l” “o”. This means we could learn a translation for the word “he” without
> the system ever having seen the word “he”.

Can anyone add context to this? Can't seem to wrap my head around this part.
Doesn't "he" as a part of a word translate differently in different words?

~~~
slashcom
An easier way to understand it is in the context morphology: word prefixes and
suffixes mean things, and words have common roots.

For example, polymorphism could be decomposed into poly-morph-ism.
Antidisestablishmentarianism, which is unlikely to appear much in the corpus,
becomes anti-dis-establish-ment-arian-ism. Now the system can learn how to
reuse "anti-" or "establish" from other examples more easily than trying to
learn the full word's meaning from the one or two examples it might see in the
corpus.

BPE is a clever way to induce these sort of decompositions automatically
without any linguistic annotation, making them useful in multilingual
settings. Other languages are much more morphologically rich than English, and
there it _really_ benefits.

~~~
jstandard
Thanks, this makes much more sense than the author's strange example.

------
pnloyd
>> For example, “translate” between neural activity in the brain to videos on
a screen,

That sounds almost to good to be true. Excited to see what gets developed with
these techniques!

------
gok
In unsupervised translation, to be specific.

------
personjerry
This seems like it would only work for similar languages (i.e. romance
languages), in that it depends on the embeddings within languages to be
similar.

~~~
maneesh
Is Urdu a romance language? Or related closely to English ?

~~~
lgessler
It's a distant relative: Urdu is Indo-Aryan, which is the language family
Sanskrit belonged to, and Indo-Aryan is in the Indo-European language family,
to which English and Romance languages also belong.

GP's point is still a good one though: while Urdu and English have diverged
quite a bit despite being of the same stock, they probably still share a lot
more typologically than, say, English and Mandarin or Warlpiri.

------
DataJunkie
Ok, but do they actually use it for anything?

