
The Shallowness of Google Translate - ehudla
https://www.theatlantic.com/technology/archive/2018/01/the-shallowness-of-google-translate/551570/?single_page=true
======
YeGoblynQueenne
For the past 5 years or so, as a kind of benchmark, I've been checking how
Google translates a particular word from Greek, to French.

The word is "χελιδόνι", meaning "swallow" (the bird) in Greek. For as long as
I've been trying this tiny little experiment, Google has been translating it
to the French word "avaler"\- the verb "to swallow".

[https://translate.google.com/#el/fr/%CF%87%CE%B5%CE%BB%CE%B9...](https://translate.google.com/#el/fr/%CF%87%CE%B5%CE%BB%CE%B9%CE%B4%CF%8C%CE%BD%CE%B9)

Once in a while, a different translation appears- "machaon", which is a kind
of butterfly, the Old World swallow-tail. This is slightly closer but still
absurd. If GT can make the connection to "swallow-tail", how can it not see
the connection to the bird, "swallow"?

The problem seems to be that, in order to translate from Greek to French,
Google goes via English. A great big chunk of context is lost in the process,
especially since now the translation between two languages that have different
forms for male and female nouns goes through a third language that does not.

So for example, I've seen the same gender-reversal as Hofstadter reports in
his article. The following string in Greek means "I saw my teacher and she
said hello to me".

    
    
      Είδα τη δασκάλα μου και μου είπε γειά. 
    

The following are the French and English translations by GT:

    
    
      J'ai vu mon professeur et il m'a dit bonjour.
      I saw my teacher and he said hello.
    

This seems to happen because Google's English language model has learned that
"he" is found in the context of "teacher" and "said" more often than "she" is.
But of course, such a statistical association is, well, meaningless and for
that useless when it comes to translation, where you actually need to know
when the least likely case is correct.

Generally, it looks like the complete abandonment of any attempt at
representing meaning, and relying instead on text statistics to do meaning-
intensive work like translation, is producing a lot of nonsense. And that's
not a criticism of Google Translate only. I think professor Chomsky might
"win" that old debate with prof. Norvig, after all.

~~~
braindongle
"...such a statistical association is, well, meaningless..."

Good point, but it's even worse than meaningless. It's stereotype-reinforcing.
That particular inference, that the role in society is such-and-such, so
presume that the subject is male, is to be fought at every turn.

I would much prefer the awkwardness of "he/she" when the machine is uncertain
rather than some "this is just the way things are" presumption.

~~~
squeeeeeeeeeee
Social Justice Warrior spotted.

~~~
dang
Could you please stop posting ideological flamebait to HN? Two of your last
three comments have been that—this is bad. The other of the three was quite
good, so if you stick to posting civilly and substantively, you'll be fine.

[https://news.ycombinator.com/newsguidelines.html](https://news.ycombinator.com/newsguidelines.html)

------
larrysalibra
At least Google Translate is a bit better than Baidu Translate which at some
point decided that my name is the English translation of 扒饭 (grilled rice).

2 years ago, I appeared on the menu of a restaurant on Huawei's campus in
Shenzhen because someone apparently used Baidu Translate to translate the menu
to English:
[https://twitter.com/larrysalibra/status/959749866036408320](https://twitter.com/larrysalibra/status/959749866036408320)

And 2 years later, I'm still grilled rice:
[http://translate.baidu.com/#zh/en/扒饭](http://translate.baidu.com/#zh/en/扒饭)

Human language is hard!

~~~
kiwidrew
Wow, that's awesome, the only way it could be more surreal is if you had been
holding the menu in front of you!

"扒饭" isn't, on its own, an actual Chinese word. The "扒" is basically a
modifier that comes after a type of meat and indicates that it is a "steak" or
"cutlet".

So your menu item "美式杂扒饭" should translate as "American-style" (美式) "mixed
grill" (杂扒) "on rice" (饭)... but the machine translation is getting the word
boundaries wrong and translating it as "American-style" (美式) "mixed" (杂)
"larrysalibra" (扒饭). Fascinating.

~~~
larrysalibra
The really crazy thing is how I found out about it. A friend in Austin's
friend who I don't know was apparently at that restaurant in Shenzhen, ~10
miles north of where I live in Hong Kong, traveling on business, saw the menu,
recognized my name from somewhere, told the friend in Austin who then passed
on the message and pic to me. Small world.

China's a big country - makes me wonder what other menus I'm on!

------
cs702
The author is Douglas Hofstadter, author of _Gödel, Escher, Bach_.[a]

In this article, he shows with concrete examples how Google Translate falls
short, and then offers two criticisms:

* Feeding more data to current models won't bring them any closer to understanding, since understanding involves _having ideas_ (including ideas about the state of the world), and this lack of ideas is the root of all the problems for machine translation today. His examples are powerful evidence of this limitation of current state-of-the-art machine translation systems.

* Current machine translation systems make no attempt to go beyond the surface level of words and phrases. These systems merely discover statistical regularities that relate words to other words at multiple, hierarchical levels of composition. In Hofstadter's words, "there's no attempt to create internal structures that could be thought of as ideas, images, memories, or experiences. Such mental etherea are still far too elusive to deal with computationally." He is right.

That said, AI researchers are aware of these limitations, and are exploring
possible ways to overcome them. An early, crude example of such research is
the multi-modal model of "One Model to Learn Them All"
([https://arxiv.org/abs/1706.05137](https://arxiv.org/abs/1706.05137)), which
is trained to learn to perform multiple image-recognition, language-
translation, image-captioning, speech-recognition, and language-parsing tasks
_at the same time_ , using _representations that are shared by all tasks_.

While these early research efforts fall far short of the kind of
"understanding of the world" Hofstadter shows is necessary for human-level
language translation, it's encouraging to see AI researchers actively looking
for ways to move beyond the mere discovery of 'hierarchical statistical
regularities' that relate words to other words.

This is an exciting time for AI research.

[a]
[https://en.wikipedia.org/wiki/Douglas_Hofstadter](https://en.wikipedia.org/wiki/Douglas_Hofstadter)

~~~
brd
I was under the impression that modern word embedding techniques have emergent
properties that start to approach conceptual understanding. The trick is
figuring out how to leverage that property properly.

I believe Google Brain has a demo where they show translation between two
languages without having any corpus specifically linking the two (i.e. the
system learned English/Japanese translations and English/Greek translations
and then translated Greek/Japanese) with a reasonable degree of accuracy. This
seems to me to hint at a more conceptual understanding as opposed to a simple
pattern matching activity.

In either case, I 100% agree that the multi-modal model is the path forward.
It feels like there's a lot of low hanging fruit in the area of model
ensambles.

------
bambax
I sell things on Amazon European platforms (UK FR DE ES IT) and they receive
reviews.

I only speak English and French and therefore for all other languages I try to
understand the reviews using Google Translate.

The result is never good. Sometimes it's barely intelligible; many times it's
not really. (For some reason, GT is incapable of translating Italian; by which
I mean: the resulting translation gives you no idea whatsoever about the
original meaning.)

It really makes one wonder if the hype/fear about AI is maybe misplaced.

Google is a self-described "AI company", with all the money in the world, and
staffed with the best people, and access to the most data, and this is all
they can come up with?

The only explanation is that the problem is simply too hard.

~~~
narag
Do you happen to sell slippers?

Maybe the problem is not too hard, just that statistical models are not the
right way to solve it. Actually it seems like it's THE problem that
exemplifies the difference between partial problems that are suitable for
stats (bayesian spam filters) and the "true" AI.

Also using English as an intermediate step is a no-no. I doubt anything will
improve while there isn't a competitor that tries a fresh approach and gets
better results.

~~~
bambax
> _Do you happen to sell slippers?_

Not at all, why? Do you need any? ;-) Or is it a known difficult term to
translate?

~~~
narag
Both.

It's a kind of shoe in Spain forty years ago. Also the word might be getting
obsolete, so yes difficult to source:

[https://www.calzadoslobo.com/tienda/bambas/bamba-de-lona-
con...](https://www.calzadoslobo.com/tienda/bambas/bamba-de-lona-con-cordones-
made-in-spain)

There was a time when a lot of people in the irc added an "x" to their nicks
in the fashion of unix, linux, aix, hp-ux... I can't help to make the
association every time :)

------
chx
Ho humm. Of course he is right but is that enough? This reminded me of a quote
from Doctorow's Microsoft Research DRM talk which I remember well because I
translated (hah) it to Hungarian as my last act as a Hungarian journalist -- I
sort of resurrected myself as one because by that point I haven't written in
years.

[http://craphound.com/msftdrm.txt](http://craphound.com/msftdrm.txt)

> This is the overweening characteristic of every single successful new
> medium: it is true to itself. The Luther Bible didn't succeed on the axes
> that made a hand-copied monk Bible valuable: they were ugly, they weren't in
> Church Latin, they weren't read aloud by someone who could interpret it for
> his lay audience, they didn't represent years of devoted-with-a-capital-D
> labor by someone who had given his life over to God. The thing that made the
> Luther Bible a success was its scalability: it was more popular because it
> was more proliferate: all success factors for a new medium pale beside its
> profligacy. The most successful organisms on earth are those that reproduce
> the most: bugs and bacteria, nematodes and virii. Reproduction is the best
> of all survival strategies.

If you want the Hungarian one: [https://www.hwsw.hu/hirek/40796/a-digitalis-
jogkezelo-rendsz...](https://www.hwsw.hu/hirek/40796/a-digitalis-jogkezelo-
rendszerek-kritikaja.html)

~~~
csydas
I think Doctorow's point is dramatically different here. He's talking about a
change in medium and specifically arguing against the claims about the
original experience offered by DRM controlled media (e.g., theatres, ebooks,
etc).

The article's point is more about while Google Translate has its impressive
moments, it needs to be understood for what it is - the most basic translation
services.

If you've ever had to rely on Google Translate (or any automated translate
service) you'll know what I mean by "basic translation services". It's great
if you want to say "I want to order Fish and Rice" or "Excuse me, where is the
restroom". The moment you try to get past such complexities is where the
automated translations just can't get a good grip on human languages.
Languages from the same families naturally are a bit easier (English to
Spanish, for example usually can allow for more complexities), but you still
lose a lot simply because it's a 1:1 translation from the box on the left/top
to the box on the right/bottom, which doesn't happen.

I'm learning Russian (in Russia no less) and one of the hardest things for me
as a native english speaker is avoiding the use of "to be", which is almost
never used in Russian. There are other oddities as well regarding prepositions
in Russian that are handled differently than English (my poor teacher still
can't satisfactorily explain to me why stuff is "on the kitchen", but it's "in
the ___" for other rooms) (в гостинной versus на кухне). Now, surprisingly
Google translate gets these two right, but other such preposition specialties
it bungles completely, and I've had more than my fair share of confused looks
when I have to resort to using translate for more complex sentences.
Especially when it comes to things like words of motion, for which there are
tons of different words depending on how you're going.

It's good that it's accessible, but when the reverse happens (russian to
english), for me it requires a fairly fast sed operation to fix Google
Translate's choices for words or just some additions to help get what the
person is wanting.

Edit: Addendum to the main point, I can see that there may be applicability in
that Translate Tools will allow for pidgin languages to crop up if it receives
wide spread usage in real world settings, but I'd be curious to see from a
large sample how many people use the conversation style features. I'm sure
there's plenty of anecdata on the subject from those who have to use it
(myself included), but I wonder if it seems as well used if you increase the
sample size out further. Globally, people still pretty much stay at home
and/or know enough english to get by.

~~~
ericpearl
Correct: the stove is in the kitchen. Incorrect: the stove is on the kitchen.

------
ddmd
There is a new online translator, [http://deepl.com](http://deepl.com), which
relies on deep learning techniques and provides higher (semantic) quality and
accuracy of translation. Previously I had quite positive experience with
[http://translate.yandex.com](http://translate.yandex.com) (but I had to
manually compare and combine their results with google translate).

~~~
nicois
I just tried the following, to German, and got the predictable result, which
is completely wrong. Until the proper semantic parts of the text are
identified, these sorts of mistakes prevent statistical methods from being
trustworthy:

Sailing ships in rough seas is asking for trouble

~~~
romseb
Put a period at the end of the sentence and it changes from

> Segelschiffe in rauer See verlangen Ärger.

to

> Segelschiffe in rauher See sind ein Problem.

~~~
Freak_NL
The translating software can't distinguish between 'sailing ships' the noun,
and 'sailing' (verb) 'ships' (noun) the act. The latter is what is probably
intended. It is also missing chances of using idiomatic expressions in the
translations.

~~~
claudius
Well, 'is' is singular so it can’t be the correct verb associated to 'sailing
ships' if the latter means multiple sailing ships rather than the activity of
sailing ships.

------
btrettel
Having used Google Translate to translate at least 6 Russian academic papers
in full and about 40% of a Russian dissertation (along with a far larger
number of partial translations of papers), I can say that Google Translate
works decently for this task. The result won't win any awards for elegance,
but it's by large intelligible. I submitted many corrections over the past
years and that seems to have noticeably improved the quality, at the very
least reducing the number of gibberish sentences. My contributions are limited
to fluid dynamics, so perhaps they won't generalize, but I am very happy with
the results.

Google in particular had problems with technical phrases which it translated
literally, where the literal translation does not correspond to the equivalent
English phrase. In one case there was no equivalent English phrase and Google
Translate returned a phrase that seemed like it had a Latin etymology. After
reverse engineering the word based on the Latin, I recognized the concept and
found it interesting that there was a Russian word for this. If I recall I
added a footnote explaining the word. Anyway, I doubt your average human
translator without subject knowledge could do this. So I don't blame Google
Translate for this too much.

I have a list of problem sentences that I'll take to a human translator some
time in the future. Machine translation has been convenient but does not yet
replace human translation even when one sets a fairly low bar as I do.

~~~
petra
How did you find interesting articles in Russian ? And I assume they contained
knowledge not available in English ?

~~~
btrettel
The articles absolutely contained information not available in English. I
would not have translated the articles in full if they had, as my time is
limited. My specific field is the breakup of liquid jets into droplets. As an
example, I found a paper that applied the maximum entropy principle to the
prediction of droplet size distributions in 1938 (by V. Ya. Natazon), almost
50 years before the approach was popularized for droplet size in the west (and
before Jaynes). I actually was not attracted to the paper for that, though.
The specific constraint used involved a measure of the strength of the
turbulence, and that was what attracted me as my own research was going in
that direction. The constraint would be regarded as novel if it were published
in a western journal today. I am working on a conference paper right now which
cites this.

As for how I find the articles, there are many approaches. I found the
Natanzon paper by reading an old English translation of a Russian textbook.
Textbooks can be particularly fruitful in terms of foreign references. I also
found bibliographies useful to find citations along with English abstracts
(though few are recent). Going backwards through the citation network for a
particular foreign paper you find interesting can also return good papers.

It's also worth noting that in the Cold War era many translations were
produced, though they may be hard to track down today. I only translated
articles where no previous translation existed best I could tell. I could
write more on identifying whether an article was translated and tracking down
the translation if you are interested.

~~~
btrettel
Typo in the comment above: I would not have translated the articles in full if
they had _not_ , as my time is limited.

------
nabla9
Hofstadter's strength is his ability to stay focused on the highest level
cognition problem. His theory is that analogies are the core of human
cognition. His main work revolves around discovering how extensive human
ability to think using analogies is.

I especially like his attempts to understand and capture the high level
cognition in simple toy problems. His `copycat` and `Letter Spirit` programs
illuminate the problem and his thinking in very clever way.

The current AI and ML research is building things bottom up and there is still
significant gap before we reach the high level cognition that Hofstadter is
interested in. What is the representation that binds low level and high level
cognition and allows high level fluid concepts to be used as analogies from
one domain to next with just one or two examples? Style transfer, variational
autoencoders and transfer learning are very limited in this regard.

Challenges for deep learning:

* Deep Letter Spirit. Show 1-3 examples of lowercase letters of the roman alphabet in some font and have an algorithm that understands the style and completes the rest of the alphabet in the same style.

* Bongard problems solver.

------
jhanschoo
In defense of Google Translate, I will have to point out two constraints of
machine translation that human translation usually do not face.

1\. Machine translation lacks a direct conception of the physical world: it
only understands the "grammar" imposed by physical constraints indirectly
through digitized verbal corpora and hand-constructed parameters.

2\. Machine translation does not have the luxury of understanding their target
audience's domain knowledge. Much of the jargon Google Translate does not
understand comes from very specific situations that few people generally
experience, e.g. Pan-Germanic. Normally, if such words were used in a news
article, the journalist should have to spend a sentence or two describing what
is meant by using those words. If Google Translate was tuned to favor
translating jargon into jargon, it is likely that its translations would
contain much jargon from various domains of experience and be very difficult
to read in general.

~~~
discreteevent
3\. Even 'just' acting as a translator, Douglas Hofstadter is a high bar for
intelligence, not to mind artificial intelligence.

------
paganel
I heartily recommend Umberto Eco's text "Experiences in Translation" (full
text here, a little long:
[https://archive.org/stream/UmbertoEcoEXPERIENCESINTRANSLATIO...](https://archive.org/stream/UmbertoEcoEXPERIENCESINTRANSLATION/Umberto%20Eco%20EXPERIENCES%20IN%20TRANSLATION_djvu.txt)),
which perfectly describes the difficulty (and I'd say beauty) of the act of
translation.

> In this book Umberto Eco argues that translation is not about comparing two
> languages, but about the interpretation of a text in two different
> languages, thus involving a shift between cultures.

The above short presentation of the book says it all, basically, but
surprisingly enough is really hard to understand for lost of technical people
who only think that translation is just "de-coding", as Hofstadter explains.
We won't have proper translations until AGI is here.

------
olasaustralia
Jeffrey Shallit has a response - [http://recursed.blogspot.com/2018/02/doug-
hofstadter-flight-...](http://recursed.blogspot.com/2018/02/doug-hofstadter-
flight-and-ai.html)

~~~
randomsearch
I think he’s (deliberately? Give him the benefit of the doubt) misinterpreted
Hofstadter’s use of the words “processing text”.

What he means, which seemed quite obvious to me, is that the machine is not
reading text, building up a semantic interpretation of the sentence in the way
a human does. That’s because a neural net is a simple pattern recognition
machine that does not work in the same way as the brain. It doesn’t have the
immense life experience to draw upon (probably relatively easy to fix) but
more importantly it doesn’t have a concept of what a sentence actually means.

A neural network doesn’t have an understanding of what “double” means. It just
pattern matches a translation.

I think there’s some serious symbolic reasoning going on in the brain, which
neural nets don’t yet perform. It all feels like shallow syntactic matching
right now, rather than semantic reasoning.

Unfortunately I don’t know how the brain works, so you end up in an absurd
argument where people say “the brain is just a neural net” and it’s impossible
to completely refute their claims, just as if someone said “the brain is a
very large lookup table”. Well, from what I see it does seem that the brain is
much smarter than that, but I can’t be certain without knowing what that extra
kicker is. So whilst such an assertion seems terribly simplistic and self-
evidently insufficient, it is difficult to argue with.

~~~
mojuba
“The brain is just a neural net” plus real life experience, in fact years and
years of experience. Pretty much every example that automated translation gets
wrong is when a real life context is required. Our language is not just a
sequence of words and sentences, it almost always implies some contextual
klnowledge. Where two humans have siginficantly different backgrounds they may
have difficulty understanding each other for the same reason. A total lack of
real life experience on one of the sides makes it even worse: it produces
barely comprehensible near-nonsense.

------
lukas099
> The bailingual engine isn’t reading anything—not in the normal human sense
> of the verb “to read.” It’s processing text. The symbols it’s processing are
> disconnected from experiences in the world. It has no memories on which to
> draw, no imagery, no understanding, no meaning residing behind the words it
> so rapidly flings around.

I had forgotten how edifying Hofstadter's writing was about topics such as
this.

------
ehudla
DRH's /Le Ton Beau De Marot: In Praise Of The Music Of Language/ is a
remarkable book about translation of literature (all of his books are must
reads, btw). Still, it seems to me that shallowness is a feature, not a bug,
of automatic translation.

------
mehrdadn
Related heads-up: if you ever want to use or test Google Translate, don't make
the mistake of assuming Gmail's translator would return its results. I have no
idea what it uses, but it sure isn't the same Google Translate. I've seen it
produce output far inferior to that of Google Translate for the exact same
text.

------
Yetanfou
Google Translate - and all the other translation engines bar some experimental
ones - are a step in the right direction. No, they do not translate like a
human translator does as they are better compared to transpilers than
translators. Be that as it may the emergence of the likes of Google Translate
has made it possible for just about anyone who can read and write to get the
gist of what is written in another language without needing to get outside
help. While the translation might be rickety it generally is possible to get
what was written. The next step will be taken when the technology is ready for
it, no sooner. Giving the rate at which machine learning or 'AI' is being
pushed it won't be that long before human translators are taken out of the
loop for most tasks. They'll still have a job translating literature and some
legal texts [1] but most business communications will be translated by
machine.

[1] even though legal texts should be a prime candidate for machine
translation as they are written in something resembling human byte code to
start with.

------
cannam
In 1968 the Sunday Times ran a competition to translate a poem by Baudelaire
("Je suis comme le roi d'un pays pluvieux") into English.

The poet Nicholas Moore read about the competition and, apparently angered by
the fruitlessness of the task, entered it 31 separate times with 31 different
poems, many submitted pseudonymously and in the styles of other poets.

One of them even begins "I'm like the Winner of the Competition / the one who
wrote the strong, rewarding phrase..."

None of his entries won, but his anger ("All I have against translation is
that it can't be done!") carried him a surprisingly long way.

You can read his entries (and the original) here
[http://www.ubu.com/ubu/pdf/moore_spleen.pdf](http://www.ubu.com/ubu/pdf/moore_spleen.pdf)

------
jvvw
My sister is a professional translator and when companies pay her they
generally do it because they want really good translations not because they
just want people to understand the content. This can be for semi-legalistic
reasons (though she's not a legal translator per se, but she has done a fair
bit of work related to the EU) or for reasons of professional reputation.
Literary translation, although not her mainstay, is also obviously really hard
to do well. She's not worried about Google Translate at all yet, although she
does have the advantage that one of her languages is rare for native English
speakers and she is an extremely good writer in English.

~~~
YeGoblynQueenne
I have a cluster of friends who are professional translators and interpreters
and the translators have complained to me a lot about a machine translation
program called Trados, that they are pretty much forced to use in their day-
to-day jobs.

The complaint I heard was that translation agencies hire translators with
contracts that state that 80% of the translation work will be performed by
Trados and the human translator will only be paid for 20% of it (I obviously
don't know exact numbers!).

Of course, in practice, Trados' translations are nowhere near 80% of the work
needed to translate some text, to the point that the human translator ends up
doing the bulk of the work and being paid for only a small fraction of it.

It looks like translators are already in trouble with machine translation
software, even if they can still do the job way better than that software.

I think we can expect to see this pattern a lot more often in the future:
automation taking over jobs not because it's better at them, but because it's
more profitable that way.

~~~
amake
Trados is not a machine translation tool; it is a CAT (computer-aided
translation) tool, which helps human translators do their jobs more
efficiently. It does not produce translations.

Trados may additionally offer MT services, and it probably provides easy
access to MT services within the tool, but it's not fair to blame Trados
itself for poor translations, as the human must choose what ultimately to
output for each input string.

> contracts that state that 80% of the translation work will be performed by
> Trados and the human translator will only be paid for 20% of it

This is probably referring again not to MT, but to "leveraging" existing
translations. This is basically a lookup (usually with some fuzziness to allow
n% matches where e.g. 80% < n < 100%) against a database of existing
translations. If you are translating a document that has been translated
before, and only x% of it has changed, then it doesn't make sense for the
whole thing to be re-translated.

------
kccqzy
The author makes one point that is wrong. The current deep learning version of
Google Translate specifically _didn’t_ all of the available corpus and data
that the old translation uses, simply because the new engine takes much longer
to train with the all the data. Google internally looked at statistics and
decided that the new engine is aleeady better so there’s no need to feed it
more data. I believe they decided that a reasonable time for training (around
two weeks IIRC) is more important.

So any want of quality is definitely a pragmatic choice made by Google, but
not for want of data.

------
anotherevan
He never did explain why Frank and his Danish friend used Google Translate
despite both being fluent in each other's native languages. I read right
through mainly to try and find that out.

------
pooya72
On the other hand, it's interesting how well Google Translate translates
philosophical text. Just compare the translations below.

Gadamer's text from Truth and Method:

Die folgenden Untersuchungen haben es mit dem hermeneutischen Problem zu tun.
Das Phänomen des Verstehens und der rechten Auslegung des Verstandenen ist
nicht nur ein Spezialproblem der geisteswissenschaftlichen Methodenlehre. Es
hat von alters her auch eine theologische und eine juristische Hermeneutik
gegeben, die nicht so sehr wissenschaftstheoretischen Charakters waren,
als"vielmehr dem praktischen Verhalten des durch die Wissenschaft
ausgebildeten Richters oder Pfarrers entsprachen und ihm dienten.

 _Weinsheimer and Marshall translation:_

These studies are concerned with the problem of hermeneutics. The phenomenon
of understanding and of the correct interpretation of what has been understood
is not a problem specific to the methodology of the human sciences alone.
There has long been a theological and a legal hermeneutics, which were not so
much theoretical as corrolary and ancillary to the practical activity of the
judge or clergyman who had completed his theoretical trainin

 _Google translate:_

The following investigations have to do with the hermeneutic problem. The
phenomenon of understanding and the right interpretation of the understanding
is not only a special problem of the humanistic methodology. There has also
been a theological and juridical hermeneutics from ancient times, which were
not so much scientific-theoretical in character as they corresponded to and
served the practical behavior of the scientist or pastor trained by science.

~~~
ACow_Adonis
Cynically, i wonder if a translation of that text was actually fed
in/available to the translation process.

I mean texts already translated into other languages would presumably be one
of the most obvious things to feed in as training data.

~~~
PeterisP
There's no need to wonder, one can be virtually certain that _every_ "classic"
work that has been translated to many languages is included in the training
data of major translation systems.

MT world runs on data, it includes as much as it can, and a book translation
wouldn't be included only if its translations are not readily available in
digital form - and pretty much all famous books have been digitized.

------
SZJX
It already works very well, especially after its improvement in the recent
years after using neural networks. I've frequently been surprised by the
quality of the translation it provides to Japanese sentences, not to mention
between German and English. The test cases I put in in some other languages
(e.g. Chinese) are really promising too.

Machine translation is a really hard problem. The messiness of language as a
system, the importance of context in daily conversation etc. all play a part.
Another layer of complexity is the gap between everyday usage and official,
written form, which is also being tackled by researchers. You have to put this
thing into perspective. Many of the old rule-based/Chomskyan software have
been simply unusable for decades. New statistical approaches have been in use
for barely 10 years, and industrial deep learning less than half a decade.
There are still much more to come. The hype IMO is well justified.

------
Houshalter
These NNs have fewer neurons than an insect brain. They have to use so little
computing power they can be provided for free to everyone in the world. Of
course they have limitations. But there's been exponential progress in the
last few years, and it's amazing they can do as much as they do.

------
half-kh-hacker
I've found, for Korean, at least, Naver Papago is miles ahead of Google
Translate - Especially while decoding some unnatural phrases.

'백조가 연못에 있지만 나는 안봐' has its [implicit] object pronoun dropped for its second
verb when translated through Google, but not when translated through Papago.

------
buovjaga
Regarding the "gender blunder":
[https://en.wikipedia.org/wiki/Anaphora_(linguistics)](https://en.wikipedia.org/wiki/Anaphora_\(linguistics\))

I noticed Apertium is looking to solve the problem in their system:
[http://wiki.apertium.org/wiki/Anaphora_resolution](http://wiki.apertium.org/wiki/Anaphora_resolution)

It's a potential GSoC project for them:
[http://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Cod...](http://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code/Anaphora_resolution)

------
jmadsen
I use Google Translate extensively to help write things in Japanese -
considered (one of) the most difficult languages in the world.

I show the results to Japanese people who nearly always tell me it looks just
fine to them.

I have also passed off translations as my own work on tutoring sites like
Lang8 and had natives correct "my work". They often will give a slightly
different wording, but I have never had an "WTF does this mean?" type
response.

\----

I think it is important to distinguish the difference between a perfect
translation with nuance, and a simple "I need to make this point" translation
that is what we need most of the time.

~~~
lifthrasiir
> [...] Japanese - considered (one of) the most difficult languages in the
> world.

Just in case, one of the most difficult languages to learn by native English
speakers. It is pretty easy for Koreans for example---they share the same
sentence order (subject-object-verb) and many words frequently share the
etymology due to throughout Chinese influence.

More quantitively, Idibon had once catalogued [1] the _weirdness_ of natural
languages compared to others. English ranked 33rd place out of 239 languages.
Quoting the original, "[p]art of this is to say that some of the languages you
take for granted as being normal (like English, Spanish, or German)
consistently do things differently than most of the other languages in the
world".

[1] [https://corplinguistics.wordpress.com/2013/06/21/the-
weirdes...](https://corplinguistics.wordpress.com/2013/06/21/the-weirdest-
languages/) (the original link is dead)

------
Stranger43
It might be worth looking into the pidgin languages that tend to emerge when
two cultures meet and start to trade without the intervention of a officially
appointed/educated class of arbitrators i.e. machine translated text might be
seen as a kind of pidgin language thats not really any of the two languages in
question.

------
vanderZwan
>> _“South study walking” is not an official position, before the Qing era
this is just a “messenger,” generally by the then imperial intellectuals
Hanlin to serve as. South study in the Hanlin officials in the “select chencai
only goods and excellent” into the value, called “South study walking.”
Because of the close to the emperor, the emperor’s decision to have a certain
influence. Yongzheng later set up “military aircraft,” the Minister of the
military machine, full-time, although the study is still Hanlin into the
value, but has no participation in government affairs. Scholars in the Qing
Dynasty into the value of the South study proud. Many scholars and scholars in
the early Qing Dynasty into the south through the study._

> _Is this actually in English? Of course we all agree that it’s made of
> English words (for the most part, anyway), but does that imply that it’s a
> passage in English? To my mind, since the above paragraph contains no
> meaning, it’s not in English; it’s just a jumble made of English
> ingredients—a random word salad, an incoherent hodgepodge._

> _In case you’re curious, here’s my version of the same passage (it took me
> hours)_

I stopped reading here for now, to avoid having his translation affect what I
am about to do.

What Hofstadter doesn't really go into is that I can still manage to extract
_some_ information from the machine translation, compared to _none_ for the
original Chinese. Not only that, interpreting machine translations itself is a
skill. In a sense, instead of learning a second language, one learns to
translate _poorly machine translated English_. Of course, one can still ask
whether that's a good thing or not. Here's my attempt:

> _“South study walking” is not an official position, before the Qing era this
> is just a “messenger,” generally by the then imperial intellectuals Hanlin
> to serve as._

“South study walking” is GT's best attempt at labelling an unofficial position
taken by intellectuals, comparable to being a messenger for the emperor.

> _South study in the Hanlin officials in the “select chencai only goods and
> excellent” into the value, called “South study walking.”_

It was a position only available to highly-qualified <Hanlin officials>. Quick
google search for "Hanlin": _The Hanlin Academy (Chinese: 翰林院; pinyin: Hànlín
Yuàn; literally: "Brush Wood Court"; Manchu: bithei yamun) was an academic and
administrative institution founded in the eighth-century Tang China by Emperor
Xuanzong in Chang'an._
[https://en.wikipedia.org/wiki/Hanlin_Academy](https://en.wikipedia.org/wiki/Hanlin_Academy)

.. so people from Hanlin academy, suggesting the position was administrative
in nature.

> _Because of the close to the emperor, the emperor’s decision to have a
> certain influence._

The position was close to the emperor, giving those who held it some influence
over him.

> _Yongzheng later set up “military aircraft,” the Minister of the military
> machine, full-time, although the study is still Hanlin into the value, but
> has no participation in government affairs._

"Study is still Hanlin" is likely referring to the “South study walking”
position, since we established the connection to Hanlin earlier. With that,
this reads as: Yongzheng set up a ministry of defence, which meant the
position was excluded from direct government affairs, although there was still
value in having the position.

> _Scholars in the Qing Dynasty into the value of the South study proud. Many
> scholars and scholars in the early Qing Dynasty into the south through the
> study._

Many scholars in the Qing Dynasty have taken the position of "south study
walking", and it was a prestigious position.

I'm sure this is terrible, full of errors, and even the information I
correctly inferred undoubtedly misses a lot of nuance, but again: it gives
_some_ sense of the information the original passage contains.

So here is Hofstadter's translation.

>> _The nan-shufang-xingzou (“South Study special aide”) was not an official
position, but in the early Qing Dynasty it was a special role generally filled
by whoever was the emperor’s current intellectual academician. The group of
academicians who worked in the imperial palace’s south study would choose,
among themselves, someone of great talent and good character to serve as
ghostwriter for the emperor, and always to be at the emperor’s beck and call;
that is why this role was called “South Study special aide.” The South Study
aide, being so close to the emperor, was clearly in a position to influence
the latter’s policy decisions. However, after Emperor Yongzheng established an
official military ministry with a minister and various lower positions, the
South Study aide, despite still being in the service of the emperor, no longer
played a major role in governmental decision-making. Nonetheless, Qing Dynasty
scholars were eager for the glory of working in the emperor’s south study, and
during the early part of that dynasty, quite a few famous scholars served the
emperor as South Study special aides._

Well, that definitely reads a lot better, but I wasn't that far off in terms
of meaning of the text. And it didn't take me hours (writing this comment took
a long time though).

I absolutely agree that human translation by experts is an art, that it
produces much, much better results, and that we should not let it be devalued.
But the value in getting a quick impression, even if flawed, through machine
translation should not be undervalued either. It has a very different
application. On social platforms, for example, it is the difference between
being _completely_ out of the loop of a conversation or still somewhat
following it and being able to ask a question for clarification in a shared
language.

------
babuskov
I don't know what translation engine Bing is using, but I find that combining
Google Translate and Bing results gets me pretty close to the real thing.

I wish teams making those two could work together :(

------
EvgeniyZh
And what is "real understanding"?

------
Manishearth
The 锺书 example matches my experiences with Chinese through Google Translate.

Chinese is a language where you'll often redundantly say things because each
character may have (a) multiple meanings and (b) a lot of homophones, even
accounting for tone.

So the word for vegetable(s) is "蔬菜", which really is two different words for
"vegetable" put together.

I _suspect_ that the engine has "learned" that words can be thrown away
sometimes.

Which leads to fun cases where it tells you the _literal opposite_ of what you
said. An example is in
[https://twitter.com/ManishEarth/status/919434569446776832](https://twitter.com/ManishEarth/status/919434569446776832)
\-- it has since been fixed.

I bet what happened with 锺书 was that Translate decided to throw away 锺 in one
of the cases because it didn't know what to do with it, leaving it with 书
("book").

\-------------

Regarding the anecdote about the Danish-speaking friend, this isn't too
uncommon. For example, I natively speak Marathi but spoken/written Marathi
differ a lot and I don't have much practice reading/writing, so I make
spelling/grammar mistakes. So using Google Translate as a crutch (and then
modifying/verifying the output) is great. Though I've found out that it's not
really good at non-EU languages (The EU translates all of its documents into
the languages of its members so this forms an excellent corpus of professional
translations for Google Translate) and that my written Marathi is often
better.

\-------------

My favorite example of Google Translate limitations is what happens if you ask
it to translate "Yes." and "No." to Chinese. You get the answer 是 and 没有,
which literally translate to "am" and "don't have". (there are no conjugations
and kind of no tenses, so by "am" I mean "whatever conjugation of the English
word 'to be' makes sense in this context")

The thing is, Chinese doesn't really have words for Yes and No. 是的/不是
_sometimes_ works (I think this is more because there are implicit 是s in
verbless sentences). But the basic idea is that if you want to answer a
question, you repeat the verb. "Do you eat meat?" "Don't eat." or "Do you read
books?" "Read.". 是 and 有 are pretty common (as are "to be" and "to have" in
English), so it seems like it picked up on the most common yes/no answers. For
some reason, it picked a different one for no than it did for yes.

Now, this isn't exactly an example of the machine translation sucking, but
more of the translation not having an outlet to express ambiguity at all. The
article gives examples involving this as well, but the basic idea is that some
languages do not allow for the same kind of ambiguity that others do. You
should be able to display something about that when translating from an
ambiguous term to a specific one.

Yes/No is an example of a set of ambiguous terms in English where Chinese has
no similar ambiguous terms. In the other direction, as I already explained, 是
(or any verb!) in Chinese is an example of an ambiguous term that doesn't
translate to English because the tense need not be fully specified (Chinese
does tense from context, and even then it may not be as fully specified as it
is in english), and English _requires_ you to handle tense.

------
ColinWright
For those who might be interested to see the discussion by the HN on this
article, it's been submitted and discussed before:

[https://news.ycombinator.com/item?id=16287171](https://news.ycombinator.com/item?id=16287171)
(23 comments)

[https://news.ycombinator.com/item?id=16267363](https://news.ycombinator.com/item?id=16267363)
(12 comments)

There are other submissions without comments, showing that the article is of
interest:

[https://news.ycombinator.com/item?id=16285196](https://news.ycombinator.com/item?id=16285196)

[https://news.ycombinator.com/item?id=16279656](https://news.ycombinator.com/item?id=16279656)

[https://news.ycombinator.com/item?id=16265302](https://news.ycombinator.com/item?id=16265302)

[https://news.ycombinator.com/item?id=16294491](https://news.ycombinator.com/item?id=16294491)

[https://news.ycombinator.com/item?id=16296792](https://news.ycombinator.com/item?id=16296792)

~~~
dTal
1 day ago, 3 days ago... there is a serious problem with the short attention
span of HN. No sensible forum should split threads like that. Some interesting
discussions take longer than a day to execute, and there is no reason why a
thread could not be alive for years, but HN actively discourages it - there is
no interface for discovery besides the novelty firehose of frontpage+new, and
voting is forbidden after a certain age (possibly replying? I've never tried).

~~~
ColinWright
I see this as an inevitable consequence of several deliberate design features.
It's a link-sharing site, and people get karma for popular links. The votes
serve to make the items rise on the front page, getting more votes, hence more
karma.

So people will rush to be first, and there is no serious mechanism to check
whether the same story, or even the nearly exact same link has been submitted
before.

But anything more sophisticated will be more complex, harder to understand,
more fragile, and possibly less popular. It's not hard to see several things
that _might_ make it better, but it's uncertain, more work, and guaranteed to
make the system less transparent.

<fx: shrug />

To some extent it is what it is. But FWIW, I agree.

~~~
cheschire
Reddit does a simple link comparison to see if the article has been submitted
before, which does limit the double-posts, however some people then
intentionally subvert the process by running the link through a processor like
outline.com or archive.is, or simply finding alternate articles from other
sources.

This is unfortunate as the thing I really want to see is an in-depth
conversation about a _topic_ more than an individual article.

~~~
jwilk
HN does that too.

------
87ert3746
What's going to happen is that we humans are going to become more machine-
like, as we loose our appreciation of our own sophistication & subtlety, so
that most people will be satisfied with machine-translation, writing, art and
music never having known that there are higher levels

