Hacker News new | past | comments | ask | show | jobs | submit login
110 new languages are coming to Google Translate (blog.google)
57 points by mfiguiere 5 days ago | hide | past | favorite | 24 comments





While more languages are great, compared to translation by the better LLMs Google Translate leaves much to be desired. Accurate and appropriate translation requires sensitivity to the context and purpose of the translation. While it is possible to get better translations from ChatGPT, Claude, etc. by prompting them with information about the text and requests for how it should be translated, Google Translate takes a one-size-fits-all approach, where a text can be translated in only one way.

Even when not explicitly prompted, the LLMs will adapt the translations based on the context. A simple example: Japanese uses different words to describe older and younger siblings. An older sister is ane, and a younger sister is imōto (also written imouto). When asked to translate “I was born in 2003, and my sister was born in 2002” into Japanese, both ChatGPT 4o and Claude 3.5 Sonnet correctly translate “sister” as ane; for “I was born in 2001, and my sister was born in 2002,” they both correctly use imōto. Google Translate, in contrast, uses imōto for both.

LLMs are also able to answer questions about the text and the translation. After Claude correctly translated “I was born in 2003, and my sister was born in 2002” into Japanese, I asked a follow-up: “Why did you translate ‘sister’ as ane?” Its response:

  I appreciate you asking about that translation choice. Let me explain:

  I used "姉" (ane) for "sister" because the context implies an older sister. In Japanese, there are specific terms for older and younger siblings:

  1. 姉 (ane) - older sister
  2. 妹 (imouto) - younger sister

  Since the sentence mentions that the sister was born in 2002, which is before the speaker's birth year of 2003, it indicates that the sister is older. Therefore, I chose "姉" (ane) as the appropriate term. This distinction between older and younger siblings is an important aspect of the Japanese language and reflects the significance of age relationships in Japanese culture. If the sister had been younger, or if the age relationship wasn't clear from the context, a different translation might have been more appropriate. For example:

  - For a younger sister: 妹は2002年に生まれました。(Imouto wa 2002-nen ni umaremashita.)
  - For a sister of unspecified age: 私の姉妹は2002年に生まれました。(Watashi no shimai wa 2002-nen ni umaremashita.)

  Would you like me to elaborate on any other aspects of the translation?

An important thing to consider is cost - Google Translate is used by hundreds of millions of people and probably has an insane amount of requests done daily, yet it's free for personal use. LLMs are much less economical.

Here's the full list: https://support.google.com/translate/answer/15139004.

Still no Ladino, for the however-many-people (one?) were hoping that they'd add it. (Ladino, for those wondering, is a Jewish language, kind of like how Yiddish is a hybrid of German and Hebrew but with Spanish instead. The challenge would be that it's sometimes written in the Latin alphabet and sometimes in the Hebrew alphabet, but maybe more people would want to start learning it if they knew it existed.)


> The challenge would be that it's sometimes written in the Latin alphabet and sometimes in the Hebrew alphabet, but maybe more people would want to start learning it if they knew it existed.)

This is true of Yiddish too, and it has probably a couple of million speakers. Wikipedia claims about 50K speakers of Ladino, about 25X the number of Manx speakers (I mention Manx as it's included in Google's new list)


[flagged]


if you mean languages of genociders than you’re 100% wrong and that’s disgusting. if you mean language of oppressed people who were genocided, well lots of those languages already exist in translate

I would argue that Google could work to improve the quality of translation in its existing languages and make that the priority. Even for major language pairs like English to Spanish (both of which I speak fluently) the quality of translation can quickly reveal myriad errors.

If the quality of Google Translate is any indication, I have my deep doubts about how good Gemini is as an LLM too.


Isn’t this like complaining about NASA getting funding when poverty on earth isn’t solved yet?

It's like complaining that the poor are getting services when NASA needs more money. There are already many translation options for major languages like English. There were no free automated translation services for some of these 110 languages, until now. I'm delighted that Google is spending money on minor languages, with little promise of making a financial return.

No it's not. For one thing, i'm not saying Google should forsake all other exploration of translation options until it perfects translating well between major languages. I'm just pointing out that this is a big gap in quality that they shouldn't forget in their drive towards something more flashy like over 100 new languages (otherwise a good thing, having those as options).

Secondly, i'm comparing a persistent deficiency inside one specific area of one company to another new project inside that same area of that same company. That's a lot more reasonable to do than whining about any random global disaster because some company X or government agency gets extra money for some project completely unrelated to said global disaster. It's much more absurd to compare those latter two as a point against one of them.


Hopefully quality of translations will improve, they are behind deepl and ChatGPT.

And probably their own Gemini too!

The more languages the better. Google Translate has been very useful while abroad.

Yeah it's good. At least some of the languages( 3 in particular) will see the light of day and would be studied by linguist after the languages die in the next few years all three of them use the latin alphabet by the way since only had an oral tradition thus the people who are going to be actually using them are people who can read and write English, most of the speakers are illiterate and some speak the dominant language of the country they reside in and from those only a few will use this. Languages are Dying at an accelerating pace as the lands are good for mining are being exploited, climate change is putting an increasing strain on their ways of life and making interactions to outsiders a necessity.

I never expected some of these languages to get added to Google Translate. I'm delighted!

And yet it's still massively behind DeepL on the languages that they support. Why is it still so bad at the top languages like Spanish?

I understand the impetus behind keeping dying minority languages alive but it just makes no sense to devote even a second to implementing Manx[1] for example if you're not excellent at Spanish which is the second language in the world after Mandarin in terms of total population of speakers[2].

[1] one of the languages they just added, which is a language where the last native speaker died in 1974 and there are currently less than 2000 speakers in total all of whom use it as a second language after English. https://en.wikipedia.org/wiki/Manx_language

[2] https://en.wikipedia.org/wiki/List_of_languages_by_number_of...


I don’t think that follows. Most users of a translation app don’t have any ability to judge the quality of the translations it produces, so the only distinguishing feature is ubiquity - if every language you can think of is in Google Translate, you might never go looking for other apps.

Plus if you want to keep the best linguists employed, you have to give them fun things to do. Hard to write papers making Spanish translations a fraction of a percent better.


> Plus if you want to keep the best linguists employed, you have to give them fun things to do. Hard to write papers making Spanish translations a fraction of a percent better.

Shipping shiny stuff also more visible in the performance review.


> Why is it still so bad at the top languages like Spanish?

Which Spanish? I thought South America, North America, and Europe Spanish all have their own peculiarities.


Just Spanish translation to/from English in general, really. In my experience, the problem is not regional nuance, but how it simply falls way too often for false friends or overly literal wordings that miss the original meaning.

> Manx is the Celtic language of the Isle of Man. It almost went extinct with the death of its last native speaker in 1974. But thanks to an island-wide revival movement, there are now thousands of speakers.

Wait, how do these people even know how to pronounce things? Isn't it essentially a conlang at that point?


by 1974 presumably plenty of recordings of the language could have been made, and apparently that was the case.

Disappointed that Tibetan is still not available. It's not clear how the decision of what languages to add was made.

Tibetan is part of the languages listed. Perhaps it is not released to everyone yet: the link below says "Over the next few days, you will see new languages available..."

https://support.google.com/translate/answer/15139004




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: