Hacker News new | past | comments | ask | show | jobs | submit login
Zero-shot transfer across 93 languages (fb.com)
283 points by moneil971 56 days ago | hide | past | web | favorite | 85 comments

> The encoder is five-layer bidirectional LSTM (long short-term memory) network. In contrast with neural machine translation, we do not use an attention mechanism but instead have a 1,024-dimension fixed-size vector to represent the input sentence.

5 layers of 1024-cell bidirectional LSTMs (edit: actually 512-cells x2?)? Can consumer GPUs even fit that (+ the decoder) into RAM?

Notwithstanding the memory complexity, I love that they reach such impressive results with relatively straightforward components: theres no attention mechanism, and encoding into latent space is simply max pooling over the last LSTM layer.

I don't fully understand how the training step works though. They train only on translation to english and Spanish?

Anyway very cool stuff.

It’s not 1024 cells. The size of the hidden vector is 1024. Which is roughly an order of megabyte per cell (1024x1024 matrix). Here they have a cell per word, which is reasonable.

Max-Pooling a 1024-cell output LSTM will result in a 1024-sized vector.

Looking at the code for the Encoder (https://github.com/facebookresearch/LASER/blob/fec5c7d63daa2...), each LSTM has the same amount of hidden cells. (although the default parameters of that class don't quite match the ones used in the post; so I assume it's 512x2x5).

Yes but they're not maxpooling in the last dimension. They're max pooling over the sequence length [0], (the other way doesn't really make sense in this context).

The output size is 1024, the hidden vector size is 512 but they're using bidirectional LSTMs which concatenates the outputs of each direction -- so the total is 1024 [1].

[0] https://github.com/facebookresearch/LASER/blob/fec5c7d63daa2...

[1] https://pytorch.org/docs/stable/nn.html#lstm

Gotcha, that makes sense. (I'm less familiar with dimension ordering in PyTorch)

I think the English translations of the Hindi and the Bulgarian are mixed up. The Hindi should be "Their destination was secret", and the Bulgarian should be "Nobody knew where they went". Also, the Devnagari script is not rendered properly; the diacritical marks should be directly over (or under) the related character, and conjunct characters are not "squished" together.

Also the Hindi translation is very formal, hardly anyone speaks like that colloquially. Which, is understandable because the translation probably reflects the data trained on, but something to keep in mind.

There are also half a dozen typos in the third graphic (the one showing relationships between languages). Hopefully the sloppiness here isn't a reflection on the overall quality of the work.

> The Hindi should be "Their destination was secret", and the Bulgarian should be "Nobody knew where they went".

Those aren't good translations of each other (barring a helpful context); to me, "their destination was secret" states that the goal was for nobody to know where they went (with a weak implication that the goal was achieved), whereas "nobody knew where they went" states that, in reality, nobody knew where they went (and says nothing about whether that was intentional).

> Those aren't good translations of each other

That's because they're not translations. The table in the article makes it pretty clear that they're arbitrary sentences that were classified as "related" based on their embedding vectors.

Can someone explain what "zero shot" means? The link doesn't explain, and some basic googling doesn't either.

It's a catch-all term for "learning to solve a task without seeing any specific training examples for that task."

In this context, it's learning to translate from (say) Czech to Urdu, without ever having been explicitly provided Czech-Urdu sentence pairs.

Not quite.

Unsupervised learning implies that there are no human-annotated labels whatsoever (in this context, meaning that the model had no paired translations at all).

Zero-shot learning (usually) means that the model can generalize learning from seen labels to unseen labels.

That being said, conceptually I guess there could be an "unsupervised zero-shot learning" model - say, a language model that learns word embeddings from English wikipedia, and trying to use those embeddings to generate French sentences. My guess is that it simply doesn't work.

To expand on this (complete correct) response, unsupervised training is often part of the training process for a zero-shot prediction task.

For example it's pretty common to use unsupervised learning to build embeddings for each target language, align the embeddings somehow (noting that you don't have labels, so you are using the multi-dimensional shapes within the embeddings to try to match them) and then finally test against labelled data (the zero-shot thing).

Zero shot, cross modal transfer is something humans do really well. You can read a description of a Platypus and then label it correctly even if you have never seen one before.

A seminal paper in this was Richard Socher's Zero-Shot Learning Through Cross-Modal Transfer[1]. It's the paper that earmarked him as a star, and look at the co-authors (Chris Manning and Andrew Ng).

[1] https://arxiv.org/abs/1301.3666

Okay, so unsupervised learning would be if you had never seen any Earth animals before, and were presented with 99 photos of fish and one giraffe and noticed that the latter was the oddball, whereas zero-shot would be like if you were told that a giraffe was yellow and brown with four legs and a long neck and then said, "That must be a giraffe!" the first time you ever saw a photo of one.

Yeah that's a reasonable way to look at it.

Tangent: I've always wondered if there would be utility in humans gaining expertise in writing "for" translation, i.e. knowing what kinds of semantic and syntactic constructs are the least lossy when localized, or perhaps even learning to write in some intermediate, non-native-human language whose reduced feature set guarantees a certain level of translatability.

I suppose the answer might be that machine translation will improve fast enough that such a field wouldn't have time to emerge. But I always think that using humans to intelligently fill gaps in machine competency is a neat solution!

Living the last several years as an ex-pat, I can at least attest that one tends to lean towards using expressions and words that are easy to understand or even slanted towards expressions that are translations of expressions in the target language instead of native expressions or word choices. One sort of subconsciously also starts to repeat the typical English mistakes that non-native speakers use when speaking English to you, which is bad because it reinforces their mistakes but it's hard to avoid. (E.g. like dropping articles or pronouns.)

And yes, I do the same when using Google Translate, I will very often write things in English in a way that I know will translate better to the target language, similarly to how I will write a search query using words that I think will be more likely to return useful results even if they aren't completely "natural". I just consider it part of a skill of knowing how to use automatic translation to my benefit. You literally learned to be "skilled" at using something that's supposed to be auto-adaptive, which is interesting in itself. To start to learn and adapt to its dynamics. This is also interesting just for the fact that I have learned to depend on automatic translation as a way of living. Many things related to living abroad would not have been as easy or even possible without the help of Google Translate, despite having learned the language colloquially.

About five years back I had to do a good deal of communication with the aide of Google Translate. I would frequently copy what I wrote into Google Translate, then copy the result into another tab to translate back to English. I found that if the text could make a round trip without loosing anything important, then it could be understood.

Have you tried DeepL? It is a translation service with surprising quality.


Also mimicking the way non-native speakers pronounce words is useful. You can get very very far with English in Thailand in almost every situation if you know how to Thai-ify words (Apple -> Ah-Poon, Stereo -> Sah-teh-lee-oh" for example)

It get extra interesting when you learn to read Thai script, you can really see how they have tried to write the English words in Thai script and how the Thai speech/reading rules "breaks" the word.

For example Apple, in Thai it is written แอปเปิ้ล with each Thai character trying to be 1:1, for example แอ=ae ป=p (loosely). The interesting part is at the end, ล this character on its own is generally pronounced as an L. So for the English word, they put the L at the end to make the word apple. But - in Thai, characters have different sounds depending on their locations, ล is pronounced N when it is placed at the end of a syllable - so "apple" in Thai is generally pronounced "appen" instead.

And the nickname "Ple" is pronounced "Pun"!

Can you write it in Thai?

Sure, it's just the last syllable of what you posted, เปิ้ล. See also: http://www.thai-language.com/id/152113

Oh right, yeah that's a tricky one. I've always read it as 'pen' but looking closers there is technically the consonant cluster present.

> or perhaps even learning to write in some intermediate, non-native-human language whose reduced feature set guarantees a certain level of translatability

I cannot pass an opportunity to mention Europanto (not to be confused with Esperanto), a language where you arbitrarily mix and match words from various European languages.

If you know a Roman language and a Germanic language (in addition to English) you can craft sentences that will be understood by many, because of the shared roots and vocabulary.

When you write a sentence you select among synonyms and origin based on how close the word is to its variants in the other languages.

Unfortunately mon Deutsch est too bad pour escribir full paragraphes en cette technik.


“Unfortunately mon Deutsch est too bad pour escribir full paragraphes en cette technik.”

Wow, this is beautiful! I love it! Thank you for exposing me to this idea. It really optimizes for the kind of pattern matching most humans are adapted for.

that works on word level, but unfortunately is the syntax too different to full paragraphs adapt (this is German word order). too different is an exaggeration but highlights the problem the GP mentions. por mir zu traduire weird idiomatics sono (sonet?) malade to het capo/tete/Kopf--For me to translate weird idiomatics sounds crazy in the head (that's along the lines of for them to ... and not ... sounds crazy for me, although the former might stem from a corruption of the latter if I'm any judge).

Tangent from your tangent: people are really bad at this for their native language. They usually tend towards using language that would be understandable by a young child.

This biases them towards using simple, concrete words that are used in everyday spoken language, despite the fact that many abstract terms and jargon can be very easy to translate, while many terms that are used in everyday language (most notably common idioms) can be very hard to translate, and which concrete terms are used as analogies for abstract concepts can be very culturally specific.

This, a hundred times.

To a native English speaker, "I turned the light off" is a perfectly straightforward sentence. To a foreign learner... can you imagine how many different meanings "turn off" can have? (Not to mention "turn" and "light" can both have a gazillion meanings.)

In my native language you "close" the lights to turn it off and "open" the lights to turn it on. It was really hard for me to explain this to my American roommates, one of them kept saying that this is "illogical" because when you turn lights on you're actually closing the circuit not opening it...

My favourite example of such variation between languages and locales concerns examinations. The examples that follow are of different English locales; in the case of India, their form of English tends to match the words that would be used in their local languages.

In different places as a student you might: give an exam; take an exam; write an exam; do an exam. Give and take are opposites, and some of these terms can be used for the examiner as well, again with variation between countries, languages and locales.

If without context you said “give an exam” in India you’d mean you were the student performing an exam, but if you said “give an exam” in Australia it’d probably sound a little odd but indicate you were the lecturer presenting an exam to your students. Similarly with “write an exam”: in India that sounds perfectly normal and indicates you’re the student, performing the exam; but in Australia it sounds perfectly normal and means you’re the examiner, preparing the exam.

In English you turn off the switch, set off the alarm, and switch off the alarm again. And they have the nerve to tell others what's logical? :P

(Also, I hope you're not turned off by my remarks. Or turned on. That would be awkward.)

Your remarks may not be a turn-on, but as it turns out, I wouldn't turn them down. Or away, unless they turned up again. They set me off. It's clear that it was a set up; in fact it's starting to set in. They're spot-on; but if they weren't, they still wouldn't be spot-off.

It's fun to think about how we sit down, we sit up, we stand down, we stand up, we lie down, we don't lie up. Except according to Merriam-Webster, who claims that we do when we stay in bed. (https://www.merriam-webster.com/dictionary/lie%20up)

That's a fun set of examples. I think lie up is used naturally sometimes, but I may have just been thinking about it hard enough to fool myself.

"She had obviously not slept, the events of the previous day being so disturbing. 'You were lying up all night,' I said the next morning."

Ah, interesting. I may have heard it in that context before, hard to tell.

You can be laid up sick in bed, but I've never heard "lie up."

It kind of makes sense for lamps that are constantly burning and where you regulate the brightness by opening or closing a small door. I guess ancient China must have had such lamps for that usage to develop. On the other hand, I can't find a rationalization for the use of "open" to mean "drive a car".

> constantly burning

The sun has this property, so we open and close the blinds / curtains to regulate light flow.

Small step from there to electric lights.

When my daughter was first learning to talk, she kept asking us to close and open the lights. We've only spoken English to her, but somehow she decided that lights should be opened and closed.

I wonder if this is a more natural metaphor.

Français? I know that even English speakers in Quebec will often "open" and "close" the lights.

I don't know what language your parent commenter is speaking, but in Chinese, we also "open" and "close" the light.

Italians often say this too. They also "open" and "close" air conditioners where I would turn/switch them on/off.

This for current technology but maybe your "opening the light" has an older origin. Going back in time we had gas lamps (open gas?), candles, fireplaces. In my language (Italian) we use "accendere" and "spegnere" which we use also with fire, as in light a fire or estinguish a fire.

I remember some old adventure games (SCUMM engine maybe?) where you picked a verb (from a menu) and then an object. Open and Close were apparently synonyms for Turn On and Turn Off, so you could Close the Light or Turn Off the Door.

That's very interesting, and especially confusing to anyone with a bit of electronics education. A light switch in a simple circuit needs to close the circuit to allow electricity to flow through the light source.

English is my first language, and yet my wife and I tell each other to “open the blinds” and “close the blinds” with far more frequency than “turn on/off the light” (which is on a timer).

In my language you "close the circuit" and when you do so you "open lights" and vice versa. It's not confusing (to me) since "the circuit" and "lights" (in plural) are different words.

An apocryphal story on commercial translation for aircraft maintenance manuals stated how aircraft engineers were instructed to "take out the broken object and place it back in". The original sentence was "to replace the broken object".

Particularly bad: some objects will be fixed by removing them from a socket and then being put back in. Loose connections, wrong orientation, dirty contacts...

Like radarsat1 mentioned, after living as an expat you start to optimize speaking for the understanding of the locals. At first you're pretty bad at this, but after understanding the local language more you get much better. I'm both Dutch and American, but when in the Netherlands (speaking English) I optimize my English for easier understanding by Dutch people. (Which is not really necessary, but does help in many situations.) That takes the form of simpler words and speaking slower and clearly, but also more subtle things, like modifying word order, using as many cognates as possible, and avoiding false cognates.

Some of the things are really subtle.

My wife grew up in Tamil Nadu, India - and English was her primary language growing up. When we travel, her English is infinitely easier for people to understand than mine, despite how hard I try. (In India, especially, but elsewhere too.) I think, "speak simply, deliberately, clearly, not-quickly" and get blank stares. She can mumble and speak quickly - comprehension is instantaneous.

Some things are obvious, like your example of modifying word order. Front-loading objects in sentences seems critical for understanding. Hindi and Tamil are both subject-object-verb languages, though I'm not convinced this is the reason it works, and suspect it's more universal than that.

Her pronunciation also changes dramatically. It doesn't feel so much as speaking broken English as being fluent in a local pidgin. After a while, sentences like "Ma, what you take?" start to sound so natural that I catch myself doing it unconsciously.

Software translation does often use an intermediate language, and one of them (mentioned in the link) is Esperanto, which is an artificial/constructed language.


Sounds like your idea for a constructed language specifically designed for translation might be a good one. Maybe it could even help improve the accuracy of machine translation?

I’ve been learning Esperanto for a while, and although it’s fun, I really don’t expect good things from having a machine translation system use it as an intermediary. Something else, sure, but Esperanto was designed with (1890s Euopean) human translators in mind, and A.I. works best when you don’t limit it to thinking like you.

In the field of Technical Writing there are whole books written on how to write for human and machine translation. Main takeaway: consistency and unambiguity.

Controlled languages try to solve this with constraints. There is quite a bit of researcha round this topic: https://scholar.google.de/scholar?q=controlled+language+tran...

I teach part time, and my students are largely ESL, while my non-English skills are decidedly limited. They speak English, but there's a big gap between "mostly-not-broken" and "can handle arbitrary nuance".

I regularly struggle to (1) find language to explain concepts that doesn't require strong fluency in English and (2) to recognize if I've achieved number 1.

There are languages like Lojban that are designed to be an ideal intermediary language for machine translation.


You can get a pretty good idea today of how close a specific text is to that ideal by pasting it into Google translate, translating it to a few different languages, then translating it back and comparing the results to your intended meaning.

The license is CC non-commercial. Does anyone here know if this means that it cannot be used to train models that will be used commercially?

The data they provide is a test set (not a training set) and is from https://tatoeba.org which is in turn under the permissive CC-BY license. You can use that data for anything you like, but you can't use the code in the GitHub repository for commercial purposes - that includes running it to train models.

Academic code often used to be under licenses like this, though it's much less common now.

Wish they'd stayed away from LASER. That acronym's already taken...

They did redundant work to make it LASER (Language-Agnostic SEntence Representations). If they'd just gone with initials they'd have had LASR, which is a cooler-looking acronym anyway.

Are you referring to the Alan Parsons Project?

They're referring to the term laser itself, which was originally "light amplification by stimulated emission of radiation" :)

And your parent is referring to this clip:


I think they're referring to the Towne's project ;)


I propose "debabelizer".

If we get babelfish from Facebook, was it worth it?

if we didn't have copyright, we'd eventually or already have it too... so IMHO, no

"LASER achieves these results by embedding all languages jointly in a single shared space (rather than having a separate model for each)". There could be a good reason for why the mutual embedding of several languages works better than individual, beyond the extra data. If human languages share some minimal representation (universality so to say), training on multiple languages may be required to extract it with today's techniques, since training on just one language is bound to overfit to its particulars.

That graphic of the language families seems to misspell Estonian (as Estinain) and Finnish (as Finish) ? Seems like an odd oversight for such a project.

I came here to post this (and add little to any substantial discussion). I'm glad to see someone else beat me to it! Also: Slovak (Solvak) and Slovenian? (Solvene). There were so many I thought maybe I misunderstood how they were being represented in that graphic.

It even works on Latavian language (as shown on the top animation)

I have been pleasantly surprised by FB's suggested translation even when the messages are written in seemingly (to me) complicated transliteration of Bengali.

pretty nice release. just note that i find it a bit silly to refer to the berber language as if it's 1 language. it's a group of languages, and moreover they are phonetic, so the text you train on can vary greatly between the languages and even the writers on how you would write it.

My hovercraft is full of eels.

That was one of the first phrases I learnt in Spanish: Mi aerodeslizador esta lleno de anguilas

p.s. Is there any truth to the story that many decades ago, an early machine translator, going from English to Chinese and back again, rendered "Out of sight, out of mind" as "Invisible idiot"?

I find it interesting that this page has already been 'snapshotted' on the Internet Archive 24 times [1], less than a day after it appeared. Is this because, like me, people are wary of visiting any facebook domain? Or is it because people consider it an important research result? (Obviously it can also be both).

[1] https://web.archive.org/web/*/https://code.fb.com/ai-researc...

By default I'm quite wary of clicking on any links to facebook's domain, it is a bit silly since their tendrils cover pretty much every corner of the web, but I still hesitate.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact