
Douglas Hofstadter: The Shallowness of Google Translate (2018) - andrepd
https://www.theatlantic.com/technology/archive/2018/01/the-shallowness-of-google-translate/551570/
======
crazygringo
> _that human translators may be an endangered species. In this scenario,
> human translators would become, within a few years, mere quality controllers
> and glitch fixers, rather than producers of fresh new text._

Huh? As someone who's worked in translation briefly before, that's _already
the case_ and has been for at least a decade -- at least for written material
that's not "art" (literature, poetry).

It's far faster for professional translators to run something through Google
Translate or similar and then clean it up, than type it out from scratch. And
identical quality in the end.

There's little "soul-shattering" about this at all. Someone who needs to churn
out translations of UN meeting notes day after day isn't doing it for their
soul -- they're doing it because it's their job and people of different
nationalities can find out what happened in the meeting to do _their_ job.

~~~
irq11
That’s true...people are lazy and costs want to fall, but it’s a little
disconcerting in areas that require nuance and detail.

I’m by no means a translator, but it isn’t difficult for me to find sentences
where Google Translate simply _butchers_ the nuance of Japanese/English
conversions. Things like politeness and tone? Gone.

Maybe it’s better for closer language pairs, but if a professional translator
starts with automated translations for an important document, they’re taking a
huge risk. Even as an intermediate speaker, the things I am good at (concept
understanding, idomatic speech, politeness, tone) are exactly the things that
translation tools are bad at, and correcting the errors often just requires
re-writing whole paragraphs from scratch.

~~~
rwmj
Google Translate is spectacularly bad at Japanese/English. I received an email
yesterday where someone who is visiting me said (in translation) that she was
looking forward to eating "my rice" (proper translation: my food), which is
such a basic thing I can't believe that Google still gets it wrong.

~~~
p1esk
As a counter example, GT is surprisingly good at English to Russian
translation: my sister’s husband knows no Russian, but is able to participate
in family iMessage chats with no issues. Most of the time I can’t tell his
messages have not been written by a native Russian speaker. He is able to
fully participate in a written conversation. Mind boggling!

------
coldtea
> _In this scenario, human translators would become, within a few years, mere
> quality controllers and glitch fixers, rather than producers of fresh new
> text. Such a development would cause a soul-shattering upheaval in my mental
> life. Although I fully understand the fascination of trying to get machines
> to translate well, I am not in the least eager to see human translators
> replaced by inanimate machines. Indeed, the idea frightens and revolts me.
> To my mind, translation is an incredibly subtle art that draws constantly on
> one’s many years of experience in life, and on one’s creative imagination._

Well, there's another way to achieve the same effect, by turning anything
written or spoken into such trivial BS, that any old translation will do.

When nuance is not appreciated or delivered, then you don't need to translate
nuance. When literature is replaced by skimming online articles, there's no
much nuance to begin with.

~~~
cptskippy
I don't think you're wrong and there is still many valid use cases for machine
translation.

There are many situations where text should not be nuanced or contain
subtlety, like business signage or instructions. It should be clear and
direct, however people often want to convey frustration either actively or
passively. Often people want to be playful around awkward or uncomfortable
messaging. And there are times when they want to appear humble.

------
oefrha
Shallowness is one obvious problem (obvious to any bilingual person). The more
troubling problem I’ve noticed is that occasionally it’s inexplicably outright
wrong about basic things.

Anecdote time. One day in late 2017 (I remember that because of an event), a
friend of mine read a Google-translated page in Chinese, and sent it to me
(fluent in Chinese) because a statistics was “amazing”. I looked at it, turned
out Google translated 万, meaning “ten thousand”, to “million”. Of course the
number was unbelievably good. I honestly couldn’t think of how that happened.

~~~
danaris
I don't know about Chinese, but in Japanese, 万 is frequently used
idiomatically in the same way we might in English use "a million"—just to mean
"a whole hell of a lot."

~~~
oefrha
You’re right, except it was a statistic like 2.8万, translated to 2.8 million.
Your take could point to a possible cause, though, maybe also putting it under
the “shallow” category.

(Additionally, when 万 is used in Chinese to mean a whole lot, I would
translate it to “thousands of” or “tens of thousands of”.)

~~~
carlmr
Myriad might be the perfect translation then. It comes from Greek for 10,000
and is now routinely used for "a lot" in English.

------
pen2l
The trick is not to rely on Google translate for a finished translation, but
to use it just as a tool toward reaching that goal.

As someone who is learning French these days, it is an absolute boon.

One of my favorite things is letting Google auto finish a sentence.... because
those auto finished sentences usually tend to be phrases or constructs French
speakers use a lot.

The other neat thing is that Google shows a check mark near a translation
which it feels with high confidence is good — in my experiences those
translations really are good and without fault.

~~~
andrepd
He addresses that:

>Let me return to that sad image of human translators, soon outdone and
outmoded, gradually turning into nothing but quality controllers and text
tweakers. That’s a recipe for mediocrity at best. A serious artist doesn’t
start with a kitschy piece of error-ridden bilgewater and then patch it up
here and there to produce a work of high art.

For somebody learning a language, they would be even more easily mislead.

~~~
Mathnerd314
It's faster than using a dictionary to look up words one-by-one, and Google's
actually integrated a dictionary into the interface. And beginner sentences
are well-represented in the corpus so they're relatively unlikely to mislead.

Using Google Translate for learning a language could be likened to using
training wheels on a bike.

~~~
Jamwinner
The training wheels dont randomly fall off of bikes. Either a great analogy
counter to your point, or a terrible one.

------
timonoko
Did somebody say "shallow"?:
[https://photos.app.goo.gl/wDy8TQR69iwuGUZb6](https://photos.app.goo.gl/wDy8TQR69iwuGUZb6)

~~~
clarry
_The intersection of a donkey and a horse is called a mule or a mule . The
mule father is a donkey and the mother is a horse cat, while the mule father
is a horse and the mother is a donkey cat. Both junctions are almost always
sterile._

 _At the age of two, stallions are driven away from the herd and form their
own herds by collecting oaks._

[https://translate.google.com/translate?sl=auto&tl=en&u=https...](https://translate.google.com/translate?sl=auto&tl=en&u=https%3A%2F%2Ffi.wikipedia.org%2Fwiki%2FAasi)

 _There are four species of wild deer in the wild: deer , wild deer , deer and
white-tailed deer . Elk are by far the largest of these species. All species
of Finnish deer are game animals . Despite its name, spruce deer do not belong
to deer but to deer [2] [5] ._

 _It has also been referred to as the deer [4] [5], to distinguish it from the
deer ( Cervinae ). The Mammalian Nomenclature Committee has proposed that the
sub-tribe be renamed in Finnish to the goats ._

[https://translate.google.com/translate?hl=&sl=fi&tl=en&u=htt...](https://translate.google.com/translate?hl=&sl=fi&tl=en&u=https%3A%2F%2Ffi.m.wikipedia.org%2Fwiki%2FPeurat)

 _The subspecies of Canis aureus lupaster, previously defined as a subspecies
of live cabbage in North Africa, originates from South Asian gray matter based
on DNA analyzes published in 2011. [18] [19] The North American red wolf (
Rufus ) has also been the subject of scientific debate, for example as a cross
between coyote and gray matter._

 _The same genus of wolves as the wolf canis is the goldfish, the warbler ,
the warbler , the Ethiopian wolf and the coyote , and the dingo , which is
nowadays generally classified as a wolf subspecies._

 _The survival of young kittens depends a lot on how many adult wolves and
babysitters remain in the herd._

 _Less than half of the wolf catching attempts are successful. A deer or a
creepy male stuck in a threatening position is likely to survive, but the
fugitive gets the wolves to his feet._

 _The wolf can carry loose pieces of pregnant mother and puppies in the nest
or vomit the contents of their stomachs into the puppies._

 _In Central Finland, for example, the formation of new worms has been
observed to have led to a reduction in foxes and raccoon dogs and, as a
result, to an increase in woodland chicks. [40] This behavior, on the other
hand, exposes wolves to infectious diseases in small beasts , especially the
wardrobe ._

[https://translate.google.com/translate?hl=&sl=fi&tl=en&u=htt...](https://translate.google.com/translate?hl=&sl=fi&tl=en&u=https%3A%2F%2Ffi.m.wikipedia.org%2Fwiki%2FSusi)

 _Whales are also reared as fur animals, known as the blue fox._

 _The bean is most active in the twilight, when it comes up in search of food.
Its main food is the fells and, to some extent, the moles . In years when
small rodents are scarce, the bean has to deal with birds, their eggs and
chicks, and berries and carrion, for example._

 _The situation of the species is particularly poor in Scandinavia and
Finland. However, globally, the whale is not endangered. According to a report
published in 2008, the stock of nuts is gradually gaining ground in the Nordic
countries ._

 _In Finland, the whale is classified as extremely endangered . [5] As a
popular fur animal , in the twentieth century, wild white fox was hunted
almost extinct in Finland. Even in the early 20th century, the species was
abundant in Lapland, and during the winter of 1908-1909, whales migrated all
the way to the southern coast of the country._

 _Attempts have been made to improve the livelihood of Naal by hunting foxes
and providing the nipples with nourishment and other food. [12] [13] In Norway
and Sweden, dogfish are fed on dogfish, which has contributed to the growth of
the stock. [7] The survival of nails is also hampered by the fact that they
are no longer provided with enough reindeer kills for wolves and wolves._

[https://translate.google.com/translate?hl=&sl=fi&tl=en&u=htt...](https://translate.google.com/translate?hl=&sl=fi&tl=en&u=https%3A%2F%2Ffi.m.wikipedia.org%2Fwiki%2FNaali)

fi->en translations are so funny that I read them when I'm feeling down. Few
other things make me laugh out loud so much.

Part of me hopes they never invent a better way to translate.

~~~
andrepd
It's like an algorithmic "English as she is spoke" :p

[https://en.wikipedia.org/wiki/English_As_She_Is_Spoke](https://en.wikipedia.org/wiki/English_As_She_Is_Spoke)

------
robomartin
I’ll equate this to being critical of any early technology as if no further
development will happen.

Translation, good translation, is not easy. If it were, it would have been a
solved problem years ago. And, yes, some languages pose more challenges than
others.

I often have to write the same text in multiple languages. I usually write in
English and use Google Translate to quickly generate a rough draft translation
in other languages, say, Spanish. I then copy-paste the translation, proofread
and edit. It is a huge time saver. Aside from not having to re-write the
material, the cognitive load of the proofreading and editing process is
significantly lower.

~~~
glenstein
I guess the right things to take from the article are (1) the validity of the
criticism, as a legitimate face value observation, but also as a way of
sketching the kinds of problems and areas where we hope for progress and (2)
the fact that Google Translate is nevertheless useful in its current form.

I think his overall thesis isn't a declaration that no further progress will
be possible, but that it may have to draw from resources that are more than
just statistical associations of words - things like interior human
experiences. So it would not just measure words against surrounding passages,
but against some understanding of concept generation that emerges from human
subjectivity.

And, this is where I think a fuse blows for many people, things get
complicated, circuits fry, and people find themselves filled with an urge to
declare"well, computers just CAN'T do X." But I think what Hofstadter is
saying is that it _is_ a hard problem but also the kind of thing that in
principle could be done to make progress on translation.

~~~
robomartin
Achieving human-level understanding is the canonical problem in AI. I think I
can say we don’t yet know how to encode and represent this thing called
understanding.

We are excellent at classification and sub-par at understanding, with
translation being just one of the many application domains that will benefit
when we go through the understanding inflection point.

------
lucb1e
[https://deepl.com](https://deepl.com) seems to be better at this. I've long
been looking for differences, dreading that Google (with its superior amount
of money, larger dataset, and perhaps also a larger talent pool) would
outperform it on something. It has always been on par so far, often giving
word-for-word the same translation. When trying the author's example, however,
Deepl actually does it correctly (also outperforming Microsoft Translate,
which I threw in for good measure): the manually-translated french turns into
correct English. (Not speaking French, I can't judge the french translation it
makes.)

Now I'm reading this article:
[https://news.ycombinator.com/item?id=21559633](https://news.ycombinator.com/item?id=21559633)
and this made me curious again:

> However, in German, there is no expectation at all that the subject must
> come first (although it often does). These two German sentences share the
> same meaning: "Der Hund hat den Ball." and "Den Ball hat der Hund."

It totally trips up Google (Den ball hat der Hund -> "the ball has the dog")
but Deepl has this sort of silliness for breakfast (Den ball hat der Hund ->
"the dog has the ball").

Of course, this is not negating the point of the article: translating is
interpreting (or "art", as the author puts it), so deepl still can't be as
good as a human. But for edge cases, deepl does seem to be better, at least in
these two examples. In real-world cases that I tried so far, as mentioned,
I've found it to be 100% equivalent so far (n=20 or so).

------
crgwbr
From my own experience, I’ve found that Google can be a useful tool. But you
have to understand the difficult bits of each language you’re translating
between and specifically write around them, making the translations tool job a
lot easier. E.g. specifically avoid idioms, words with double meanings, etc.

------
glenstein
Hofstadter is trying to draw a conceptual distinction between "understanding"
(generating language by drawing from a full interiority of human subjective
experience), and mere "translation" (based on looking at words).

But analyzing how words are used is a great way of working backward into
something that gets closer and closer to human subjectivity. Some neural net
trained on how words are used would start to model concepts that reflect
"under the hood" human experiences that generate language.

Maybe it's not 'enough', and a robust translation engine would have a sense of
irony, jealousy, favorite foods, etc that it would need to rely on to
translate things, and maybe that set of concepts to be obtained via something
other than training on words. But I don't think there's a clear conceptual
line to be drawn. Studying the way we use words can get us a loose
approximation, maybe even a good one.

~~~
Mathnerd314
The Chinese example at the end where he had to look up the phrase himself (on
Google) is particularly revealing. Humans don't have any more insight into
words than the machine. They're just better at interpreting explanations
written by humans and intended for humans.

------
lowdose
I lately think the auto translation on youtube is better than on google.

------
fizixer
If I'm not wrong, this is a continuation of Hofstadter's critique of the "new
AI" (i.e., AI boom since 2006 based on machine learning) and how it is at odds
with his conception of what needs to be done in order to move towards AGI
based on ideas surrounding his book Godel, Escher, Bach [1].

For context, please read this in light of the 2013 piece in The Atlantic [2].

[1]
[https://en.wikipedia.org/wiki/G%C3%B6del,_Escher,_Bach](https://en.wikipedia.org/wiki/G%C3%B6del,_Escher,_Bach)

[2] [https://www.theatlantic.com/magazine/archive/2013/11/the-
man...](https://www.theatlantic.com/magazine/archive/2013/11/the-man-who-
would-teach-machines-to-think/309529/)

------
mistermann
After reading this I can't help but think (due to my fascination with the
subject) of the parallels one can observe in modern day polarized political
discourse on social and other forms of media.

> Whenever I _translate_ , I first read the original text _carefully_ and
> internalize the _ideas_ as clearly as I can, _letting them slosh back and
> forth in my mind_. It’s _not that the words of the original_ are sloshing
> back and forth; it’s the _ideas_ that are _triggering all sorts of related
> ideas_ , creating a _rich halo of related scenarios in my mind_. Needless to
> say, _most of this halo is unconscious_. _Only when the halo has been evoked
> sufficiently in my mind do I start_ to try to express it—to “press it
> out”—in the second language. _I try to say_ in Language B what strikes me as
> a natural B-ish way to talk about the kinds of situations that constitute
> the halo of meaning in question.

In any communication between two or more people, verbal or written, there is
an ever present translation process going on. Political (and other similar)
conversations, despite the communication taking place in the same language (or
so we conceptualize it), are subject to the very same AI translation problem
Hofstadter notes: the translation of _words_ into _ideas_.

In his case, Hofstadter takes his time, reading the text carefully,
"internalizing the ideas _as clearly as I can_ " (~steelmanning). He is
_explicitly aware_ of the mind's behavior (ideas, triggering (an incredibly
complex and poorly understood process) all sorts of related ideas, creating a
rich halo of related scenarios), and that "most of this halo is unconscious".
He takes his time, letting the subconscious mind do it's thing. And only then,
"when the halo has been evoked sufficiently in my mind", does he put pen to
paper to finalize the task of performing an as accurate as possible
translation.

You can see this same sort of thing happening all over the place in the real
world, except unlike Hofstadter, most people are unaware of what is happening
under the covers, and lack the explicit and conscious intent to perform an
accurate (as possible) translation before setting their fingers on the
keyboard to have their say, which will then in turn be inaccurately
interpreted (translated from words to ideas, and then processed) by readers,
each in their own unique way due to the nature of their personal heuristics
and mental model. "Round and round she goes, where she stops, nobody knows."

I'd love to find someone of Hofstadter's background and capability who is
focusing precisely on this phenomenon. The closest I've come across so far is
Jonathan Haidt, and a few people like Eric Weinstein (who often _touches on_
these ideas, but is focused on them to a much lesser degree than Haidt). Any
recommendations of others to look into would be appreciated, as would any
words of encouragement from people who believe there is actually something
important going on here. I believe this, a lack of _true mutual understanding_
of the ideas and beliefs of others, and the lack of intent or desire to do so
(if not revulsion toward the very idea (in my perception, speaking of sloshing
ideas)), is one of the fundamentally important and mostly overlooked problems
in the world today.

------
JasonFruit
Douglas Hofstadter is a pompous buffoon. His style is infuriating, pushing his
undoubtedly fine intellect in the reader's face. He writes for people who want
to believe themselves smarter than most, and the only cost of admission is
that they validate Hofstadter's high self-opinion.

We all know by the end of the article that Hofstadter is a better translator
than Google Translate, but nobody believes that Google Translate is going to
match the work of someone who understands the text they are translating.
Hofstadter claims he's not cherry-picking difficult examples, but he has
chosen two passages where gender is critical, and moved them between languages
where gender is expressed in very different ways. It's a tricky part of
translation, as Hofstadter knows, and nobody expects a computer to speculate
on the intentions of the author of a passage, as you must to successfully
translate his examples.

Google Translate, plus some critical thinking and a bit of work with a
dictionary, will get you a rudimentary understanding of a text in a foreign
language --- _and that 's a great advance_. Hofstadter expresses an
expectation that no-one holds for an automatic translation tool, shows that
Google Translate does not live up to it, and then is satisfied with himself,
as usual.

~~~
sorokod
Instead of an ad hominem, would you care to address the specific examples
sighted in the article?

~~~
watt
I skimmed the article, but it seems the purpose of it would be better served
as series of issue reports in the Google Translate bug tracking system than a
full article.

I think it's not really that interesting to diss the state of the machine
translation art in 2019, instead of looking where it could be in 2020, 2025,
2030.

~~~
andrepd
>issue reports

You miss the point. The point is that the "deep" learning approach is
fundamentally flawed and will not lead to proper translation no matter how
many racks of GPUs you throw at it, since it lacks the "understanding of
ideas" required to do it.

------
laichzeit0
I really hate pieces like this. Hofstadter should know better than anyone how
state-of-the-art machine translation works. Only absolute laymen would use
words like “understanding” to describe what’s going on. It’s really good, and
we’re a lot further than we were 10 years ago but honest to god people should
just drop using anthropomorphic laden terms like “understanding” and
“intelligence” into these discussions. The I in AI is unfortunately taken too
literally. Maybe just do yourself a favor and watch the videos to CS224 [1]
and then you’d be less surprised that these systems do not “understand”
(whatever the hell that even means, unless rigorously defined).

[1]
[http://web.stanford.edu/class/cs224n/](http://web.stanford.edu/class/cs224n/)

