
Building a language translator from scratch with deep learning - saip
https://blog.floydhub.com/language-translator/
======
jeffreyrogers
This is very cool. One thing I wonder about though is whether small companies
will be able to compete with large ones like Google in ML in the future. One
reason Google's translator is better is because they have way more data. In
the past they digitized tons of books so they have an excellent dataset that
has been translated by professional, human translators. This data collection
is effectively cross-subsidized by Google's primary business: advertising.

Since most competitors to Google offerings aren't going to have a hugely
profitable core business with which to fund all the data collection and
normalization that goes into building a high quality ML system, the future for
poorly capitalized competitors to compete seems bleak to me. This seems to
support some of the growing rumblings about enforcing antitrust laws against
the large tech companies.

Edit: better, not bigger.

~~~
l9k
DeepL had a lot of good press when it came out last year. Some saying it was
better than Google.

[https://www.deepl.com/en/translator](https://www.deepl.com/en/translator)

~~~
akie
Wow, thank you for mentioning that. I cannot believe how good the translations
are! My native tongue is Dutch and I threw in some (long!) English, French and
German texts and honestly, they read like they were written by a native
speaker. Hugely impressive.

------
pixelHD
The transformer paper was quite influential in machine translation space. This
resource [0] posted here a while back is a good place to learn and get a
better idea how it works.

[0]:
[http://nlp.seas.harvard.edu/2018/04/03/attention.html](http://nlp.seas.harvard.edu/2018/04/03/attention.html)

~~~
lucidrains
one of the best visual tutorials on the transformer I came across
[http://jalammar.github.io/illustrated-
transformer/](http://jalammar.github.io/illustrated-transformer/)

~~~
pixelHD
Wow, that does look really good! Thanks!

~~~
lucidrains
you're welcome! :)

------
i_made_a_booboo
Machine translation has made some pretty impressive progress over the last
decade. Unfortunately no methods will ever cover the very last mile as
languages don't have perfect 1 to 1 mappings. Though it is amusing watching
the machines try.

~~~
jchw
If we ever do solve the last mile, it would probably be one of the less
interesting consequences, as it would probably imply we've built an algorithm
capable of learning and thinking to a similar degree of a human.

To that, though, I'm definitely not holding my breath :)

~~~
i_made_a_booboo
The last mile isn't solvable. Some languages contain concepts, set phrases,
vocabulary and pop-culture references entirely unique to that language. There
isn't a translation in every single case. Machines however will always try to
come up with one and the results are amusing.

Also people make the assumption that as soon as we make strong AI comparable
to a human we will be to translate anything and everything (let's say we are
excluding the last mile for arguments sake). That assumption ignores an
important fact that sometimes translation is a team effort where certain
words, phrases or concepts are debated among multiple translators to reach a
consensus. It's not always done by a single intelligence.

Some people might argue that's because people have far more limited capacity
to consider all the examples in the corpus whereas a machine can consider all
of lightning fast and thus can arrive at the right answer.

A perfect edge case that illustrates why that doesn't matter and where
multiple human intelligences will often grapple with how something should be
translated would be what name to give to a movie you are translating to an
international audience. The same movie often has quite different names
depending on which language it gets translated into. There isn't actually a
correct answer there is just answers that are deemed 'good enough'.

~~~
jchw
You know, though, machine translators have long been able to make subjective
choices in translations. We deem them correct because a human can verify that
the translation carries roughly the same intent, meaning, tone, etc. Not
because it matches exactly what a human says.

Secondly, you are conflating concepts in my opinion. Localizing a movie may
involve translators translating lines, but it also involves the creative work
of localizing the title and other things, as you mentioned. A machine
translator by today's definition translates a string of text in one language
to a string of text in another. We needn't consider every type of work a human
translator might do; it would be quite enough of a difference to close the gap
on translating strings straightforwardly.

~~~
i_made_a_booboo
This presumes you can translate all strings straightforwardly. You can't.
There are times where I've been given a string and had to have an in-depth 30
minute discussion to understand enough of the surrounding context to be able
to spit out a result. In certain cases no mapping exists.

Also, anyone who is able to verify that a translation conveys a meaning in
enough of the same direction as the original utterance by definition doesn't
need a translation as they know both the source and target language.

It's everybody else who is not able to verify for whom the accuracy matters
for they have no recourse but to trust it.They are frequently led astray.

A couple of examples to illustrate.

掘り炬燵 (horigotatsu) is a noun referring to "low, covered table placed over a
hole in the floor of a Japanese-style room"

Now, given this is something that doesn't exist in any Western, English
speaking country it simply doesn't have a mapping in English. The best that
can be done is to give an explanation of what it is.

Google translate "translates" it as "digging". Welcome to the last mile. In
this case Google should just spit out an explanation of what it is. Digging is
entirely incorrect and unhelpful.

But it gets worse. Imagine if it's used in a sentence. Here is a good example
of a last mile issue in translation. It's impossible for you to translate it
directly, so you have to fall back to a best effort attempt and either
simplify and lose some information or stop mid-sentence and give an
explanation of what the thing actually is.

掘り炬燵に座ってご飯を食べてた。

This sentence is all kinds of problematic from a translation point of view.

Google translates it as: "I sat on a digging stone and ate rice."

That borders on D+/C- in terms of quality for me. But there are a few good
reasons as to why.

The original Japanese doesn't give the context of who is performing the action
because that's simply not necessary to say in Japanese it's almost always just
inferred from context in the moment and that gets lost when you only have a
string. Thus it's possible this could be a "he, she, it, we, I, they". If the
machine is forced to pick one option then it will pick one option.

Then there is the horigotatsu part which gets "translated" as "digging stone".
What the hell is a digging stone? It ought to just say horigotatsu* and have a
footnote. Machine translation today doesn't do footnotes. I wish it did.

Again there is a lack of context as to the meaning of ご飯 (gohan) which
technically can mean cooked white rice but in this case most likely refers to
a "meal". Though which meal is not specified and it could be breakfast, lunch
or dinner but I'm going to guess it's dinner.

But what should the translation actually be? Is it even fundamentally
"translateable"?

One valid translation would be "we sat in the horigotatsu and had dinner".
That still requires an explanation.

Anyway, I hope it's a little clearer what I mean that it's not actually always
possible to translate things.

I think we can hit parity with humans one day, but it requires fundamentally
rethinking certain things at a UX level. For instance if instead of just an
input form Google translate was more like a chatbot that could probe for more
context when needed that's more my idea of where things need to ultimately
wind up. Perhaps a model like rap genius where annotations contain extra
details around possible alternatives and why the current word was chosen....
This is my 2 cents on the issue.

~~~
jchw
No, I am not presuming every sentence has a straightforward translation, just
suggesting that a meaningful measure for the "last mile" of machine
translation would be reaching human parity at that specifically.

Being able to provide additional context would be great, but I don't see why
it would have to be done in a "human" way to satisfy the constraints.

------
psergeant
The grammar correction in Google Translate is a little too good. I was trying
to create some broken Russian phrases to send a Russian friend, but I’d put in
weird or bad English as an input and get very good Russian as an output!

------
skookumchuck
I find that google translator does very well when the text to be translated
has no spelling errors and is grammatically correct. Add any errors, and it
falls to pieces, even though a human reader doesn't have any issues with it.

