
The Shallowness of Google Translate (by Douglas Hofstadter) - jsomers
https://www.theatlantic.com/technology/archive/2018/01/the-shallowness-of-google-translate/551570#hn?single_page=true
======
skybrian
Previous comments:
[https://news.ycombinator.com/item?id=16267363](https://news.ycombinator.com/item?id=16267363)

------
paulsutter
Superb reminder of the difficulty of true NLP. Google Translate is an amazing
and very useful tool, but machines have a long way to go before they really
understand language.

Example sentence used by article:

> In their house, everything comes in pairs. There’s his car and her car, his
> towels and her towels, and his library and hers

~~~
danso
I agree with the thrust of the article, that Google Translate is fundamentally
limited (as most systems are) by its inputs and ostensible goals. But what
does it mean to "really understand language"? What is the recognizable
benchmark for "true NLP"?

At some point, how "good" comprehension/translation is very subjective,
because it intersects so heavily with culture and history. Think of how
debates still rage over interpreting Shakespeare among Western readers and
academics, even though there isn't a language barrier.

~~~
eesmith
I notice you quoted "really understand language." That is not a quote from the
essay. Is it an indicator that you are taking the conversation in a different
direction?

Hofstadter has written a lot about the question of what it means to "really
understand a language", and the related more relevant question of what it
means to translate from one language to another. Several of his essays in
"Metamagical Themas" in the 1980s covered those topics, and they are quite
interesting.

He gives an example "Mary was sick yesterday", with different levels of
'understanding'. Call it "sentence M" (quoting from
[https://archive.org/stream/MetamagicalThemas/Metamagical%20T...](https://archive.org/stream/MetamagicalThemas/Metamagical%20Themas%2C%20Hofstadter#page/n29/mode/2up/search/street)
, including apparent typos)

> 1\. Sentence M contains twenty characters.

> 2\. Sentence M contains four English words.

> 3\. Sentence M contains one proper noun, one verb, one one adverb, in that
> order.

> 4\. Sentence M contains one human's name, one linking verb, one adjective
> describing a potential health state of a living being, and one temporal
> adverb, in that order.

> 5\. The subject of Sentence M is a pointer to an individual named 'Mary',
> the predicate is an ascription of ill health to the individual so indicated,
> on the day preceding the statement's utterance.

> 6\. Sentence M asserts that the health of an individual named 'Mary' was not
> good the day before today.

> 7\. Sentence M says that Mary was sick yesterday and

> Just where is the boundary line that says, "You can't do that much
> processing!"? A machine that could go as far as version 7 would have
> actually understood-at least in some rudimentary sense-the content of
> Sentence M.

Elsewhere in the same book he (like in this Atlantic essay) talks about the
problems of translating gender across languages.

I think it was in Godel, Escher, Bach where he talks about the problem of
translating a line from Russian, along the line of "He lived on B____ street."
It's possible from the story, which takes place in a real city, to figure out
which street that was. Let's say it was "Main Street". Does the English
translation keep the original Russian word and initial? Does it translate to
"Main Street" and replace the "B" with an "M"?

Or consider his book "Le Ton beau de Marot: In Praise of the Music of
Language", which contains 88 different translations of a 16th-century French
poem.

I think it would be difficult to incorporate all of those thousands of pages
of writing on the topic into a single essay for a lay public magazine, and
that your expectations are too high.

While it's true that "good" is subjective, this is a solved problem in the
Turing test or Chinese room sense. We judge professional translators, like
those who work at the UN. We judge students learning a foreign language.
There's no reason to believe that we can't apply similar techniques to judge
machine translation.

Indeed, he gives many concrete examples of a minimum level of translation
competency which should be expected for a good-enough system.

~~~
danso
I agree that the OP gives good and reasonable examples of what _should_ be
doable by a competent translator. I think he is entirely on point about how
people fail to realize how shallow Google Translate in terms of what we
understand as "understanding" language. My quote about "really understand
language" is from the GP -- I was asking _him_ what he meant by that phrase.

I don't think I disagree with the OP much at all. But I was confused because
much of his essay shows how Google Translate is getting the most basic things
wrong. But then he ends with discussions about what it means for a computer to
have true understanding, the type of understanding that can't be achieved with
just more data.

It seems to me that the obvious screw ups that Google Translate is
demonstrated to make _could_ be alleviated through better algorithm design and
data -- I.e. _without_ achieving what Hofstadler argues is true understanding.
Just like a self-driving car could be very safe despite having no more deep
understanding of driving than Google Translate does of words.

~~~
eesmith
My apologizes, I missed up on the thread context.

This sort of conversation of "really understand" is Searle's Chinese Room
thought experiment. Hofstadter and many others have written a lot on the
topic.

------
oh-kumudo
Seems Google has update their translation model?

For the Chinese one, at least:

>> After one year of working in Tsinghua University, Zhong Shu was transferred
to Mao's translation committee to live in the city and back to school on
weekends. He still holds the post of graduate student.

The leader of the Mao Selected Translation Committee is Comrade Xu Yonglian.
Introducing Zhong Shu to do this job is Tsinghua classmate Qiao Guanghua.

On the appointed day, after dinner, an old friend hired a rickshaw to come
from the city to congratulate. After the guests go, Zhong book said to me in
fear:

He thought I had to do a "Southern study walk." This is not a good thing to
do.

>>

Now they correctly singles out person's name, "锺书(Zhong Shu)", as comparing to
transliterate it as Book(meaning of the character Shu). Even with that
南书房行走，IMO, it did a not bad job, at least knowing it is its own entity, not to
break into parts then translate.

As a native speaker, the style of the example text provided is quite elegant
and old-school. 南书房行走 is a very confusing phrase, it looks like a verb but
used as a noun phrase, and without context, it is hard for me to tell the
meaning of it.

The updated version of the translation is pretty serviceable. Google Translate
works best with functional text, like news/report, etc. Not quite there with
literature, which is well known, probably on purpose. As someone works on MT
project, this quality is pretty amazing. I won't necessarily say it is
shallow, TBH.

~~~
abusoufiyan
>As someone works on MT project, this quality is pretty amazing. I won't
necessarily say it is shallow, TBH.

Kind of exactly what this article is saying. To the people who work on this
stuff, it's fascinatingly accurate and servicable, but to regular people it's
a very poor substitute for actual bilingual understanding.

------
gwern
I wonder about those samples. The Chinese I would expect to be kind of garbage
going off BLEUs from published papers, but the French and German should be
great. Am I imagining it, or do they just not sound like RNN translations? The
RNN translations in my experience tend to flow well and are comprehensible,
albeit sometimes just wrong because they picked a wrong semantic
interpretation. But these samples sound like the older phrase-based
translations, where they immediately descend into word salad. As people noted
at the time, the jump in quality for French/German was so large that people
knew when the new translations went live without Google publishing anything...
(I assume Hofstadter at least was careful enough to do his sampling _after_
the new RNN translations were opened up.)

Google doesn't discuss how it rolls out the translation upgrades, exactly, and
it's an uneven deployment. Can any Googlers comment on the possibility
Hofstadter was using the old translations? Or can any NMT researchers compare
and contrast his examples with current SOTA models?

~~~
GuiA
The French translation in the article matches what Google returns today (ie
garbage), albeit for one difference (it now translates "in pairs" to "par
deux" instead of "en paires" (which is a slight improvement, but doesn't
address the fundamental errors of the translation).

The garbage it gives is consistent with the garbage statistical approaches
tend to give; particularly when translating from a language without gendered
pronouns to a language with them.

My favorite, most concise example to demonstrate this is the sentence "my
cousin and her wife". Anyone with basic understanding of English grammar would
infer that my cousin is a woman married to a woman; Google Translate gives me
back a French sentence where suddenly my cousin has become a man.

This is a great example of something that a rules based translation system
would never get wrong (of course, rules based translation systems have plenty
of other shortcomings) and that statistical approaches have a hard time
dealing with.

See also:

[https://twitter.com/seyyedreza/status/935291317252493312](https://twitter.com/seyyedreza/status/935291317252493312)

[https://twitter.com/taikadahlbom/status/935612093906194432](https://twitter.com/taikadahlbom/status/935612093906194432)

------
CurtHagenlocher
Hofstadter's "Le Ton beau de Marot" (which is about translation) is one of my
favorite books of all time.

------
ceautery
Hofstadter is still a boss. I felt like I was reading Metamagical Themas
again.

------
cooper12
Great article that tackles some of the unwarranted hype machine learning has
gotten lately, with everyone foaming at the mouth about Skynet and their jobs
becoming obsolete. The author also tackles the common excuse of "it'll only
get better". One interesting highlight of the article for me was the
parenthetical around his translation of a paragraph in Chinese: "it took me
hours". Translation is so much more than vectors between words in two
languages and this article expresses that quite clearly.

------
ronilan
Google translate, like a screwdriver is a tool. One can possibly do nasty
things with it, but one, if is not a tool, and if limits oneself to just
screwing things, should be ok.

Can we translate that? Naaa. But that doesn’t make the tool shallow.

P.S - I use the following trick to improve the odds of a good translation -
the phrase has to be a “stable Google translate triangulation”.

This is when a phrase does not change while switching back and fourth between
three languages, two of which you know at a native level.

Dope for me. YMMV. :)

------
fenomas
Always nice to read Hofstadter, but in my experience Google Translate often
returns gibberish even for single words, never mind subtleties of grammar. I
don't think it replaces a foreign language _dictionary_ yet, so arguments that
it doesn't replace human translators seem like overkill!

Random examples for Japanese: for "七輪" (brazier) it returns "tambourine", for
"ちゃぶ台" (tea table) it returns "Shabu-bashi".

------
mirimir
If you don't know any Chinese, and you need to use a website in Chinese,
what's the alternative?

I like the Firefox add-on Perapera for Chinese and Japanese. You hover, and
see pop-up translation.

Edit: style

~~~
abusoufiyan
Hire a human translator...

~~~
mirimir
Really? You can get short translations 24/7 with just a few minutes lead time?
Can you recommend any services?

------
hyperpallium
I tell to you, although he appears as the friendly interlocutor, this fellow
is an interrogator of the utmost cunning, with the most sinister goal... of
entrapment!

------
debt
Rev works really well but an actual human that transcribes what you say and it
takes a day.

