
Google Translate invented its own language to help it translate more effectively - alexkadis
https://www.linkedin.com/pulse/mind-blowing-ai-announcement-from-google-you-probably-gil-fewster
======
jdmichal
No, it didn't. Or, rather, if it did, then so does every human. The neural
network is doing what NNs do and associating particular input patterns with
particular neurons / pathways. So the same or highly similar concepts end up
in the same place, which is how they are connected for the purpose of
translation.

All this likely has similar analogues in the human brain. That is, I would be
rather surprised if there wasn't a dedicated neural pathway identifying a
banana, which fires whenever the thought of a banana is invoked. This is also
where banana is associated with yellow and food and delicious etc.

Also, don't forget that in the human brain reading and listening may as well
be two separate languages processed by entirely different portions of the
brain. I would have to see pretty convincing evidence to believe that reading
"banana" and hearing it don't at some point touch the same part of the brain
where the concept and associations of "banana" "live".

------
xbmcuser
I disagree with the authors conclusion that it invented a new language. I know
a few languages if you can see into my brain you would probably find that my
brain has saved meanings of words that only my brain could understand that is
not a new language. What Google translate is doing now is that it comprehends
multiple languages so articulates the output in the required language.

------
balabaster
This is fascinating... I wonder how it copes with the words and phrases that
don't have any meaningful translation. Figures of speech that only work within
a culture because of the culture and removing that removes the context in
which the phrase makes sense.

I remember a past girlfriend who had cute little phrases which I'd love to
remember right now by way of example but they escape me, which made no sense
when translated to English because the context that made them make sense
didn't exist outside of her language.

~~~
jdmichal
This is known as idiomatic speech, with any particular phrase being an idiom
[0]. They are absolutely a part of translating natural language, and the must
be treated as a complete phrase set-piece. One of the big original
advancements in Google Translate was actually exactly this: Treating the
phrase as the unit of translation rather than the word. I'm sure they didn't
lose that moving forward.

[0] [https://en.wikipedia.org/wiki/Idiom](https://en.wikipedia.org/wiki/Idiom)

------
jasode
For technical HN readers, I think the article[1] that the author linked is
better.

After reading Google's explanation, I don't think his comment is accurate:

 _> Google Translate invented its own language to help it translate more
effectively.

>What’s more, nobody told it to. It didn’t develop a language (or interlingua,
as Google call it) because it was coded to. It developed a new language
because the software determined over time that this was the most efficient way
to solve the problem of translation._

That makes it sound like the middle GNMT box (alternating in blue and orange)
was automatically fabricated by the algorithm. Instead, what seems to have
happened is that the _existence_ of an "intermediate" representation was a
deliberate architecture choice by human Google programmers. What got "learned
by machine" was the build up of internal data (filling up the vectors with
numbers to find mappings of "meaning").

Google programmers can chime in on this but as an outsider, I'm guessing the
previous incarnations of translate was more "point-to-point" instead of
"hub-&-spoke".

With the 103 languages, the point-to-point when computed as "n choose k"[2]
means 5253[3] possible direct mappings. (Although one example pair such as
_African Swahili_ to _Australia Aborigine_ would probably not be filled with
translation data.)

With the new GNMT (the intermediate hub), you don't need a 5253 mappings.
Instead of (n!/k!(n-k)!) combinations, it's just n. (However, I'm not saying
that reducing the mathematical combinations was the main motivator for the re-
architecture.)

An analogy would be the LLVM IR intermediate representation. One can target an
"intermediate hub" language like LLVM-IR. This reduces the combinatorial
complexity of all frontend programming language compilers to understand all
backend machine languages. Instead of languages like Rust & Julia writing
point-to-point backends to specific machine languages like x86 & ARM & Sun.
The difference with Google's GNMT is that the keywords of "intermediate
language" was not pre-specified by humans.

[1] [https://research.googleblog.com/2016/11/zero-shot-
translatio...](https://research.googleblog.com/2016/11/zero-shot-translation-
with-googles.html)

[2] [https://en.wikipedia.org/wiki/Combination#Number_of_k-
combin...](https://en.wikipedia.org/wiki/Combination#Number_of_k-combinations)

[3]
[https://www.google.com/search?q=(103%5E2-103)%2F2](https://www.google.com/search?q=\(103%5E2-103\)%2F2)

~~~
rvense
> Instead, what seems to have happened is that the existence of an
> "intermediate" representation was a deliberate architecture choice by human
> Google programmers.

Which has been proposed as a machine translation technique for many, many
years.

Maybe Google are the first to learn an interlingua, but they are not the first
to use the term, much less the concept.

------
wooot
What is the difference between how this neural network translates from
Japanese to Korean, and just translating from Japanese to English and then
English to Korean.

------
Hnrobert42
Is it just me or do you have to have a linkedin account to read the article?

