
Google’s Neural Machine Translation System - fitzwatermellow
http://arxiv.org/abs/1609.08144
======
d_burfoot
This paper is a good illustration of what I see as a critical shortcoming of
modern NLP work: it is all about math and algorithms, there is nothing about
actual language.

The paper has lots of information about: neural network architectures,
parameter update equations, learning rules, and inference algorithms.

There is nothing about: part of speech categories, relative clauses,
morphology, affixes or compound words, the theta criterion, the
content/function distinction, verb tenses, agreement, or anything else related
to actual linguistic phenomena.

To me this seems a bit like the Aristotelian thinkers who tried to reason
about physics based on pure mathematical analysis, without any empirical work.

~~~
d_burfoot
Fred Jelinek famously quipped: "Every time I fire a linguist, the performance
of the system goes up." Here are two hypotheses that could explain this
observation:

1) Linguistic knowledge and theory is not useful for NLP work; instead, people
should rely entirely on machine learning techniques.

2) Linguistic knowledge is useful in principle, but the field of linguistics
has not yet obtained a sufficiently high-quality theory, and it is better to
rely on ML than on low-quality theory.

Almost everyone in NLP adopted hypothesis #1; I believe hypothesis #2 is true.

~~~
gradys
I'd extend hypothesis #2. We might not have a sufficiently high-quality
theory, but we might also not know how to integrate the benefits of the theory
we have with the benefits of data-driven approaches to NLP like those
described in this paper.

It's clear to me that data-driven techniques are an essential part of the
answer to the problem of language understanding. It's infeasible to design a
complete set of rules for understanding natural language with all its
exceptions and fuzzy relationships. I'd point to Wittgenstein for a stronger
philosophical argument on this.

That's not to say that theory has no place in language understanding, but it's
not immediately obvious how to combine rule-oriented theory with data-driven
ML.

There has been work in this area though, and actually several important,
state-of-the-art-moving contributions have been inspired by linguistics. My
favorite are recursive neural nets, which benefit from the linguistic insight
that language has recursive structure. This paper[1], for example, describes a
system that combines a dependency parser with a recurrent neural net (gross
oversimplification) to yield a state-of-the-art sentence understanding model.

[1] - [http://arxiv.org/abs/1603.06021](http://arxiv.org/abs/1603.06021)

~~~
MarkMc2412
Do you have any further detail or reference to Wittgenstein here? Curious to
learn more

~~~
thegreatestape
Not OP, but Wittgenstein covers this in his Philosophical Investigations. In
the Tractatus, he tried to create an all-encompassing system for language, but
in PI he more or less says it's impossible. It's an amazing read in any case,
it has totally changed the way I approach philosophy.

~~~
MarkMc2412
Ok, thanks!

------
Houshalter
I hate when people judge machine translation progress by looking at Google
translate. Google translate is really old and is no longer state of the art.
But the best neural network based systems like this are much too expensive to
use in production, at least for free. They say their new neural network ASICs
will make it more practical, at least.

Anyway the scale of these neural nets is quite incredible. Google is getting
far ahead of what any individual researcher with a few consumer GPUs can do.

~~~
habitue
But not, apparently, ahead of what you can do with 8 consumer GPUs

~~~
aab0
A $3k Tesla K80 GPU is not 'consumer', nor do most consumers (or most
researchers or small businesses) have $24k to drop on a set of GPUs alone.

~~~
dgacmu
You're right - a $3k k80 is generally inferior for most DNN applications than
a $1k Titan X. The primary reasons that big companies use K80s have to do with
achieving high computational density, and licensing issues, more than the
performance of a single GPU. Sticking 8 Titan X boards in a machine is a
bummer job if you want to pack them closely together. But for academic
researchers, a quad Titan X box is pretty solid and quite affordable.

~~~
aab0
A quad Titan X is still $4k for the GPUs alone, and was only possible in the
past few months - people might have wanted to get stuff done in the years in
between the last generation and the current generation...

------
runesoerensen
Related blog post: [https://research.googleblog.com/2016/09/a-neural-network-
for...](https://research.googleblog.com/2016/09/a-neural-network-for-
machine.html)

------
kuschku
60% better than Google translate still is at best elementary school level
fluency in a language. It’s a needed improvement, but still far from good.

It’s crazy how bad Google translate is, try it with any German text, and
you’ll get 90% ununderstandable garbage out.

EDIT: Seriously, downvotes for this?

Try this:
[https://translate.google.com/translate?sl=auto&tl=en&js=y&pr...](https://translate.google.com/translate?sl=auto&tl=en&js=y&prev=_t&hl=en&ie=UTF-8&u=http://www.spiegel.de/politik/ausland/bratislava-
eu-verteidigungsminister-wollen-enger-zusammenarbeiten-a-1114151.html)

What’s " If you look at Ursula von der Leyen and Jean-Yves Le Drian before the
meeting of defense ministers in Bratislava, one might think that Berlin and
Paris would never prefer liked." supposed to mean?

Or "Individual states can not prevent the European Council may decide by a
qualified majority, the SSZ."

~~~
lorenzhs
I think you're getting downvoted because you picked one of the hardest
language pairs there is for machine translation. German grammar is really hard
to parse, e.g. verbs being split with one part at the beginning and the other
at the end of the sentence. Generally, figuring out which part of the sentence
refers to which other part is quite hard.

Google Translate works a lot better with many other language pairs, so people
might be downvoting because your opinion sounds exaggerated and doesn't line
up with their own experience. It's also quite dismissive of the submission,
which is never nice. The guidelines ask us to "avoid gratuitous negativity". I
often find your comments to be quite negative, often unnecessarily so.

~~~
DonaldFisk
Translations from French to English seem to be a lot better. Why is that?

~~~
pigscantfly
The French/English language pair was the problem machine translation research
started with and has seen the most work. There are also greater similarities
than most language pairs between those two due to the Norman conquest in 1066.

~~~
DonaldFisk
I accept your first point, but I disagree with your second - 1066 and all that
notwithstanding, English is still very much a Germanic language, and
grammatically very different from French.

~~~
pigscantfly
You're right; I was arguing that French and English are more highly correlated
than a pair of languages chosen uniformly at random and therefore easier for
MT, but I could have made that more clear. I think the difference in accuracy
between English <\--> French and English <\--> German does largely come down
to historical academic preferences and data availability, as opposed to
linguistic reasons. English <\--> French vs. English <\--> Mandarin (for
example) is a different story.

------
andrewtbham
"In addition to releasing this research paper today, we are announcing the
launch of GNMT in production on a notoriously difficult language pair: Chinese
to English.... we will be working to roll out GNMT to many more of these over
the coming months."

interesting news for all the translation startups... like yc's unbabel.

------
WhitneyLand
Dumb question: Why doesn't Google Translate hardcode perfect translations for
1,000,000 of the most popular requests?

So many times it screws up really basic, common phrases and I'm always
wondering how is that bar that low?

~~~
kafkaesq
Not a dumb question at all. Actually a very fundamental question.

------
vnglst
"Using a human side-by-side evaluation on a set of isolated simple sentences,
it reduces translation errors by an average of 60% compared to Google's
phrase-based production system."

Does this mean that after checking the translation by a human it becomes 60%
better than the phrase-based production system? (by which they presumably mean
Google Translate...?). That seems rather disappointing.

~~~
wyldfire
I interpret this to mean "new system delivers 60% improvement. 'How did we
measure this improvement?' you ask? Well, we asked humans to evaluate the old
system and the new one."

~~~
vnglst
Yeah, that makes more sense! Thanks.

------
readams
I've been attempting to use this system for a while today to converse with a
native Chinese speaker and it seems to be better but still very far from
human-level translation. Maybe they've been using especially bad human
translators in their comparisons?

~~~
DonaldFisk
If its performance is dependent on proximity of words in the sentences of the
two languages, its output for Chinese would be better than you'd otherwise
expect. Chinese and English have similar word orders (though the underlying
grammar's different).

