5 layers of 1024-cell bidirectional LSTMs (edit: actually 512-cells x2?)? Can consumer GPUs even fit that (+ the decoder) into RAM?
I don't fully understand how the training step works though. They train only on translation to english and Spanish?
Anyway very cool stuff.
Looking at the code for the Encoder (https://github.com/facebookresearch/LASER/blob/fec5c7d63daa2...), each LSTM has the same amount of hidden cells. (although the default parameters of that class don't quite match the ones used in the post; so I assume it's 512x2x5).
The output size is 1024, the hidden vector size is 512 but they're using bidirectional LSTMs which concatenates the outputs of each direction -- so the total is 1024 .
Those aren't good translations of each other (barring a helpful context); to me, "their destination was secret" states that the goal was for nobody to know where they went (with a weak implication that the goal was achieved), whereas "nobody knew where they went" states that, in reality, nobody knew where they went (and says nothing about whether that was intentional).
That's because they're not translations. The table in the article makes it pretty clear that they're arbitrary sentences that were classified as "related" based on their embedding vectors.
In this context, it's learning to translate from (say) Czech to Urdu, without ever having been explicitly provided Czech-Urdu sentence pairs.
Unsupervised learning implies that there are no human-annotated labels whatsoever (in this context, meaning that the model had no paired translations at all).
Zero-shot learning (usually) means that the model can generalize learning from seen labels to unseen labels.
That being said, conceptually I guess there could be an "unsupervised zero-shot learning" model - say, a language model that learns word embeddings from English wikipedia, and trying to use those embeddings to generate French sentences. My guess is that it simply doesn't work.
For example it's pretty common to use unsupervised learning to build embeddings for each target language, align the embeddings somehow (noting that you don't have labels, so you are using the multi-dimensional shapes within the embeddings to try to match them) and then finally test against labelled data (the zero-shot thing).
Zero shot, cross modal transfer is something humans do really well. You can read a description of a Platypus and then label it correctly even if you have never seen one before.
A seminal paper in this was Richard Socher's Zero-Shot Learning Through Cross-Modal Transfer. It's the paper that earmarked him as a star, and look at the co-authors (Chris Manning and Andrew Ng).
I suppose the answer might be that machine translation will improve fast enough that such a field wouldn't have time to emerge. But I always think that using humans to intelligently fill gaps in machine competency is a neat solution!
And yes, I do the same when using Google Translate, I will very often write things in English in a way that I know will translate better to the target language, similarly to how I will write a search query using words that I think will be more likely to return useful results even if they aren't completely "natural". I just consider it part of a skill of knowing how to use automatic translation to my benefit. You literally learned to be "skilled" at using something that's supposed to be auto-adaptive, which is interesting in itself. To start to learn and adapt to its dynamics. This is also interesting just for the fact that I have learned to depend on automatic translation as a way of living. Many things related to living abroad would not have been as easy or even possible without the help of Google Translate, despite having learned the language colloquially.
For example Apple, in Thai it is written แอปเปิ้ล with each Thai character trying to be 1:1, for example แอ=ae ป=p (loosely). The interesting part is at the end, ล this character on its own is generally pronounced as an L. So for the English word, they put the L at the end to make the word apple. But - in Thai, characters have different sounds depending on their locations, ล is pronounced N when it is placed at the end of a syllable - so "apple" in Thai is generally pronounced "appen" instead.
I cannot pass an opportunity to mention Europanto (not to be confused with Esperanto), a language where you arbitrarily mix and match words from various European languages.
If you know a Roman language and a Germanic language (in addition to English) you can craft sentences that will be understood by many, because of the shared roots and vocabulary.
When you write a sentence you select among synonyms and origin based on how close the word is to its variants in the other languages.
Unfortunately mon Deutsch est too bad pour escribir full paragraphes en cette technik.
Wow, this is beautiful! I love it! Thank you for exposing me to this idea. It really optimizes for the kind of pattern matching most humans are adapted for.
This biases them towards using simple, concrete words that are used in everyday spoken language, despite the fact that many abstract terms and jargon can be very easy to translate, while many terms that are used in everyday language (most notably common idioms) can be very hard to translate, and which concrete terms are used as analogies for abstract concepts can be very culturally specific.
To a native English speaker, "I turned the light off" is a perfectly straightforward sentence. To a foreign learner... can you imagine how many different meanings "turn off" can have? (Not to mention "turn" and "light" can both have a gazillion meanings.)
In different places as a student you might: give an exam; take an exam; write an exam; do an exam. Give and take are opposites, and some of these terms can be used for the examiner as well, again with variation between countries, languages and locales.
If without context you said “give an exam” in India you’d mean you were the student performing an exam, but if you said “give an exam” in Australia it’d probably sound a little odd but indicate you were the lecturer presenting an exam to your students. Similarly with “write an exam”: in India that sounds perfectly normal and indicates you’re the student, performing the exam; but in Australia it sounds perfectly normal and means you’re the examiner, preparing the exam.
(Also, I hope you're not turned off by my remarks. Or turned on. That would be awkward.)
It's fun to think about how we sit down, we sit up, we stand down, we stand up, we lie down, we don't lie up. Except according to Merriam-Webster, who claims that we do when we stay in bed. (https://www.merriam-webster.com/dictionary/lie%20up)
"She had obviously not slept, the events of the previous day being so disturbing. 'You were lying up all night,' I said the next morning."
The sun has this property, so we open and close the blinds / curtains to regulate light flow.
Small step from there to electric lights.
I wonder if this is a more natural metaphor.
My wife grew up in Tamil Nadu, India - and English was her primary language growing up. When we travel, her English is infinitely easier for people to understand than mine, despite how hard I try. (In India, especially, but elsewhere too.) I think, "speak simply, deliberately, clearly, not-quickly" and get blank stares. She can mumble and speak quickly - comprehension is instantaneous.
Some things are obvious, like your example of modifying word order. Front-loading objects in sentences seems critical for understanding. Hindi and Tamil are both subject-object-verb languages, though I'm not convinced this is the reason it works, and suspect it's more universal than that.
Her pronunciation also changes dramatically. It doesn't feel so much as speaking broken English as being fluent in a local pidgin. After a while, sentences like "Ma, what you take?" start to sound so natural that I catch myself doing it unconsciously.
Sounds like your idea for a constructed language specifically designed for translation might be a good one. Maybe it could even help improve the accuracy of machine translation?
Controlled languages try to solve this with constraints. There is quite a bit of researcha round this topic: https://scholar.google.de/scholar?q=controlled+language+tran...
I regularly struggle to (1) find language to explain concepts that doesn't require strong fluency in English and (2) to recognize if I've achieved number 1.
Academic code often used to be under licenses like this, though it's much less common now.
p.s. Is there any truth to the story that many decades ago, an early machine translator, going from English to Chinese and back again, rendered "Out of sight, out of mind" as "Invisible idiot"?