
Zero-shot transfer across 93 languages - moneil971
https://code.fb.com/ai-research/laser-multilingual-sentence-embeddings/
======
minimaxir
> The encoder is five-layer bidirectional LSTM (long short-term memory)
> network. In contrast with neural machine translation, we do not use an
> attention mechanism but instead have a 1,024-dimension fixed-size vector to
> represent the input sentence.

5 layers of 1024-cell _bidirectional_ LSTMs (edit: actually 512-cells x2?)?
Can consumer GPUs even fit that (+ the decoder) into RAM?

~~~
pmalynin
It’s not 1024 cells. The size of the hidden vector is 1024. Which is roughly
an order of megabyte per cell (1024x1024 matrix). Here they have a cell per
word, which is reasonable.

~~~
minimaxir
Max-Pooling a 1024-cell output LSTM will result in a 1024-sized vector.

Looking at the code for the Encoder
([https://github.com/facebookresearch/LASER/blob/fec5c7d63daa2...](https://github.com/facebookresearch/LASER/blob/fec5c7d63daa2bea9d127e8252b10e87278586c0/source/embed.py#L169)),
each LSTM has the same amount of hidden cells. (although the default
parameters of that class don't quite match the ones used in the post; so I
_assume_ it's 512x2x5).

~~~
pmalynin
Yes but they're not maxpooling in the last dimension. They're max pooling over
the sequence length [0], (the other way doesn't really make sense in this
context).

The output size is 1024, the hidden vector size is 512 but they're using
bidirectional LSTMs which concatenates the outputs of each direction -- so the
total is 1024 [1].

[0]
[https://github.com/facebookresearch/LASER/blob/fec5c7d63daa2...](https://github.com/facebookresearch/LASER/blob/fec5c7d63daa2bea9d127e8252b10e87278586c0/source/embed.py#L246)

[1]
[https://pytorch.org/docs/stable/nn.html#lstm](https://pytorch.org/docs/stable/nn.html#lstm)

~~~
minimaxir
Gotcha, that makes sense. (I'm less familiar with dimension ordering in
PyTorch)

------
nograpes
I think the English translations of the Hindi and the Bulgarian are mixed up.
The Hindi should be "Their destination was secret", and the Bulgarian should
be "Nobody knew where they went". Also, the Devnagari script is not rendered
properly; the diacritical marks should be directly over (or under) the related
character, and conjunct characters are not "squished" together.

~~~
thaumasiotes
> The Hindi should be "Their destination was secret", and the Bulgarian should
> be "Nobody knew where they went".

Those aren't good translations of each other (barring a helpful context); to
me, "their destination was secret" states that the _goal_ was for nobody to
know where they went (with a weak implication that the goal was achieved),
whereas "nobody knew where they went" states that, in reality, nobody knew
where they went (and says nothing about whether that was intentional).

~~~
yorwba
> Those aren't good translations of each other

That's because they're not translations. The table in the article makes it
pretty clear that they're arbitrary sentences that were classified as
"related" based on their embedding vectors.

------
raldi
Can someone explain what "zero shot" means? The link doesn't explain, and some
basic googling doesn't either.

~~~
avinium
It's a catch-all term for "learning to solve a task without seeing any
specific training examples for that task."

In this context, it's learning to translate from (say) Czech to Urdu, without
ever having been explicitly provided Czech-Urdu sentence pairs.

~~~
raldi
Is it just a synonym for
[https://en.wikipedia.org/wiki/Unsupervised_learning](https://en.wikipedia.org/wiki/Unsupervised_learning)
?

~~~
avinium
Not quite.

Unsupervised learning implies that there are no human-annotated labels
whatsoever (in this context, meaning that the model had no paired translations
at all).

Zero-shot learning (usually) means that the model can generalize learning from
seen labels to unseen labels.

That being said, conceptually I guess there could be an "unsupervised zero-
shot learning" model - say, a language model that learns word embeddings from
English wikipedia, and trying to use those embeddings to generate French
sentences. My guess is that it simply doesn't work.

~~~
nl
To expand on this (complete correct) response, unsupervised training is often
part of the training process for a zero-shot prediction task.

For example it's pretty common to use unsupervised learning to build
embeddings for each target language, align the embeddings somehow (noting that
you don't have labels, so you are using the multi-dimensional shapes within
the embeddings to try to match them) and then finally test against labelled
data (the zero-shot thing).

Zero shot, cross modal transfer is something humans do really well. You can
read a description of a Platypus and then label it correctly even if you have
never seen one before.

A seminal paper in this was Richard Socher's _Zero-Shot Learning Through
Cross-Modal Transfer_ [1]. It's the paper that earmarked him as a star, and
look at the co-authors (Chris Manning and Andrew Ng).

[1] [https://arxiv.org/abs/1301.3666](https://arxiv.org/abs/1301.3666)

~~~
raldi
Okay, so unsupervised learning would be if you had never seen any Earth
animals before, and were presented with 99 photos of fish and one giraffe and
noticed that the latter was the oddball, whereas zero-shot would be like if
you were told that a giraffe was yellow and brown with four legs and a long
neck and then said, "That must be a giraffe!" the first time you ever saw a
photo of one.

~~~
nl
Yeah that's a reasonable way to look at it.

------
XaspR8d
Tangent: I've always wondered if there would be utility in humans gaining
expertise in writing "for" translation, i.e. knowing what kinds of semantic
and syntactic constructs are the least lossy when localized, or perhaps even
learning to write in some intermediate, non-native-human language whose
reduced feature set guarantees a certain level of translatability.

I suppose the answer might be that machine translation will improve fast
enough that such a field wouldn't have time to emerge. But I always think that
using humans to intelligently fill gaps in machine competency is a _neat_
solution!

~~~
radarsat1
Living the last several years as an ex-pat, I can at least attest that one
tends to lean towards using expressions and words that are easy to understand
or even slanted towards expressions that are translations of expressions in
the target language instead of native expressions or word choices. One sort of
subconsciously also starts to repeat the typical English mistakes that non-
native speakers use when speaking English to you, which is bad because it
reinforces their mistakes but it's hard to avoid. (E.g. like dropping articles
or pronouns.)

And yes, I do the same when using Google Translate, I will _very often_ write
things in English in a way that I know will translate better to the target
language, similarly to how I will write a search query using words that I
think will be more likely to return useful results even if they aren't
completely "natural". I just consider it part of a skill of knowing how to use
automatic translation to my benefit. You literally learned to be "skilled" at
using something that's supposed to be auto-adaptive, which is interesting in
itself. To start to learn and adapt to its dynamics. This is also interesting
just for the fact that I have learned to depend on automatic translation as a
way of living. Many things related to living abroad would not have been as
easy or even possible without the help of Google Translate, despite having
learned the language colloquially.

~~~
peteretep
Also mimicking the way non-native speakers pronounce words is useful. You can
get very very far with English in Thailand in almost every situation if you
know how to Thai-ify words (Apple -> Ah-Poon, Stereo -> Sah-teh-lee-oh" for
example)

~~~
wingerlang
It get extra interesting when you learn to read Thai script, you can really
see how they have tried to write the English words in Thai script and how the
Thai speech/reading rules "breaks" the word.

For example Apple, in Thai it is written แอปเปิ้ล with each Thai character
trying to be 1:1, for example แอ=ae ป=p (loosely). The interesting part is at
the end, ล this character on its own is generally pronounced as an L. So for
the English word, they put the L at the end to make the word apple. But - in
Thai, characters have different sounds depending on their locations, ล is
pronounced N when it is placed at the end of a syllable - so "apple" in Thai
is generally pronounced "appen" instead.

~~~
peteretep
And the nickname "Ple" is pronounced "Pun"!

~~~
wingerlang
Can you write it in Thai?

~~~
peteretep
Sure, it's just the last syllable of what you posted, เปิ้ล. See also:
[http://www.thai-language.com/id/152113](http://www.thai-
language.com/id/152113)

~~~
wingerlang
Oh right, yeah that's a tricky one. I've always read it as 'pen' but looking
closers there is technically the consonant cluster present.

------
sideral
The license is CC non-commercial. Does anyone here know if this means that it
cannot be used to train models that will be used commercially?

~~~
jahewson
The data they provide is a test set (not a training set) and is from
[https://tatoeba.org](https://tatoeba.org) which is in turn under the
permissive CC-BY license. You can use that data for anything you like, but you
can't use the code in the GitHub repository for commercial purposes - that
includes _running it_ to train models.

Academic code often used to be under licenses like this, though it's much less
common now.

------
etiam
Wish they'd stayed away from LASER. That acronym's already taken...

~~~
jholloway7
Are you referring to the Alan Parsons Project?

~~~
owenversteeg
They're referring to the term laser itself, which was originally "light
amplification by stimulated emission of radiation" :)

~~~
throwaway427
And your parent is referring to this clip:

[https://www.youtube.com/watch?v=2Duj2oZIC8U](https://www.youtube.com/watch?v=2Duj2oZIC8U)

------
nahh
If we get babelfish from Facebook, was it worth it?

~~~
DoctorOetker
if we didn't have copyright, we'd eventually or already have it too... so
IMHO, no

------
vladislav
"LASER achieves these results by embedding all languages jointly in a single
shared space (rather than having a separate model for each)". There could be a
good reason for why the mutual embedding of several languages works better
than individual, beyond the extra data. If human languages share some minimal
representation (universality so to say), training on multiple languages may be
required to extract it with today's techniques, since training on just one
language is bound to overfit to its particulars.

------
stephanimal
That graphic of the language families seems to misspell Estonian (as Estinain)
and Finnish (as Finish) ? Seems like an odd oversight for such a project.

~~~
jechamt
I came here to post this (and add little to any substantial discussion). I'm
glad to see someone else beat me to it! Also: Slovak (Solvak) and Slovenian?
(Solvene). There were so many I thought maybe I misunderstood how they were
being represented in that graphic.

------
MikusR
It even works on Latavian language (as shown on the top animation)

------
ahurmazda
I have been pleasantly surprised by FB's suggested translation even when the
messages are written in seemingly (to me) complicated transliteration of
Bengali.

------
vectorEQ
pretty nice release. just note that i find it a bit silly to refer to the
berber language as if it's 1 language. it's a group of languages, and moreover
they are phonetic, so the text you train on can vary greatly between the
languages and even the writers on how you would write it.

------
bregma
My hovercraft is full of eels.

~~~
yesenadam
That was one of the first phrases I learnt in Spanish: _Mi aerodeslizador esta
lleno de anguilas_

p.s. Is there any truth to the story that many decades ago, an early machine
translator, going from English to Chinese and back again, rendered "Out of
sight, out of mind" as "Invisible idiot"?

------
oska
I find it interesting that this page has already been 'snapshotted' on the
Internet Archive 24 times [1], less than a day after it appeared. Is this
because, like me, people are wary of visiting any facebook domain? Or is it
because people consider it an important research result? (Obviously it can
also be both).

[1] [https://web.archive.org/web/*/https://code.fb.com/ai-
researc...](https://web.archive.org/web/*/https://code.fb.com/ai-
research/laser-multilingual-sentence-embeddings/)

~~~
munk-a
By default I'm quite wary of clicking on any links to facebook's domain, it is
a bit silly since their tendrils cover pretty much every corner of the web,
but I still hesitate.

