Hacker News new | past | comments | ask | show | jobs | submit login

Is i18n deteriorating at a global scale?

I’m seeing more and more nonsense machine translations and systemic errors in manual translations over few years in my language. Once I’ve seen a story about a corporate turning down “should” and “should not” swapped in documentation as a non reproducible issue, other times I see semi-sensical expressions that has 1:n relationships between English and Translated that needs context to select but randomly thrown around, probably as a best effort from translators.

e.g. [“X had occurred”, “Please do X now”, “Use this to do X”, “Choose which you want for X”, “Do X for this event”]

Microsoft used to be great in this regard in 2000s but now feel like I’m back at when gcc was telling me “$DIRNAME am directory entering”




I don't feel that it's deteriorating as much as that tech in general never properly understood how to do good localisation.

There was a time when many apps I downloaded were apparently machine-translated in a very bad way, so much that it was almost impossible to understand what was meant. That hasn't occurred to me in a while now, which means that I either download higher quality apps, google has stopped pushing the autotranslation feature or people have naturally migrated away from it...

in general, I feel that tech, maybe due to being so overwhelmingly from the US, has very poor support for things like multilingualism, which is in fact more common than not across the world (the US being an outlier traditionally - and even there, I think the influence of Spanish is growing).

For example:

- On some streaming/movie purchasing services, it can be hard to get a movie in the original version and not a localised one

- It's impossible on Android to have different apps use different languages (unless the app itself allows for it), which would not only fix the issue mentioned above with the badly translated apps, but also be really helpful e.g. for language learners

- It took Google Maps years to add a feature where, if you start typing a street name and it suggests a street, it gives you the option of directly filling in the street number too (e.g., I'm typing "Foob" and it suggests "Foobarstraße, 11111 Berlin", but giving me the option to directly type the street number before the comma). My hypothesis for why this took so long to add is that people from the US were totally oblivious to the need for this feature, since in the US, the street number comes before the street name and people could just type "123 Foob" and get the suggestion for the full address

- There is simply no way in the Play Store (and I believe in the App Store it's similar?) to see reviews in another language than the one from your store. This makes no sense for me, for many apps there are very few if any German reviews, but I'd still like to see English ones. I think it's even worse for app developers, although maybe they have some separate way of seeing that? Amazon doesn't have that problem btw.

- Also, a pet peeve of mine: using country flags for languages. Yeah, nope.

and so on ...


> I don't feel that it's deteriorating as much as that tech in general never properly understood how to do good localisation.

Absolutely not. I saw the i18n/l10n process in practice when I was contributing code to KDE. It was incredibly thorough and well thought-out (I actually learnt most of what I know about i18n/l10n at that time). Not just translating strings verbatim, but stuff like different languages having different plural forms (so you might need to translate "users" differently depending on if you're talking about 2 users or 3 users).

This is not rocket science. We know how to do it. It's just that most businesses don't give a shit. English gets you far enough in terms of adoption in most markets that you don't really have to care about l10n unless you have to fulfil legal requirements. (Also, some markets have bad rates of English literacy, e.g. China, but those are usually served by local app providers.)

> On some streaming/movie purchasing services, it can be hard to get a movie in the original version and not a localised one

Where I live (Germany), this has gotten way better over the years. Around ten years ago, cinema chains started offering screenings in original language (i.e. English) for the more popular movies. And cable TV started showing shows undubbed as well. The first thing I can remember there was Game of Thrones airing undubbed on the same day as the US release. I think the major reason was that piracy sites allowed users to access undubbed content easily. If you have the choice of watching the new GoT episode right now or waiting a year for the dub, most people are going to go with piracy. TV/cinema execs saw this and realized that there was a market to tap into.

Having access to undubbed content was actually quite eye-opening to me. Having only had contact with dubs up until that point, I only then realized how eye-wateringly shitty German dubs are. It appears to me like German dubbers don't really consider themselves voice actors (emphasis on the "actor" part). Sometimes it's like they think they're reading a newscast when it's actually an action scene.


You're not contradicting me It's not that we don't have the technology to do proper localisation, it's that the tech industry is oblivious to the needs of non-English speakers and in particular multilingual users. There is this assumption that 1 person = 1 language which is just wrong in many parts of the world.


My disagreement is that, in the phrase

> tech in general never properly understood how to do good localisation

you're using "understood" when it should actually be "cared about" which is substantially different. Also,

> the tech industry is oblivious to the needs of non-English speakers and in particular multilingual users

I'm part of the tech industry and at the same time a non-English speaker and multilingual users, and I'm not oblivious to my own needs. The problem is not that tech people don't understand, it's that business decisions don't take multilingual users into account.

This differentiation is important. Rephrased like this, it becomes apparent that this is a matter of policy, not literacy. It becomes possible to imagine (though I'm not arguing this) a scenario in which apps with a sufficient number of users could be required by law to accommodate multilingual users.


> I don't feel that it's deteriorating as much as that tech in general never properly understood how to do good localisation.

Yes and no. The majority of my experience with localization is via WordPress. Frankly, it's a PITA. Another thing to know, and so on. It's not an extension so to speak, it's another mountain to climb, another silo to wrestle with.

Furthermore, and yes this is unique to WP, it make no effort to leverage it's scale. Certainly, once (e.g.) 'Add to Cart' has been translated and vetted, it doesn't need to be done again. You should be able to submit your language file, have it parsed and spit back fleshed out as much as possible. Then you need only to focus on the bits that didn't find a match.

Yes. Some increase in understanding is in order. But an updating and upgrading of the tool(s) is over due as well.


Even trying to use popular software in a relatively big language like Spanish still inevitably ends up producing lots of untranslated strings (or worse, semi-translated nonsense).

I always set my software to en-us, even though that mightn't be my preferred language or dialect, because it's the only way I can be sure the developers actually checked it.


I've seen that too and I simply don't understand that. Why do some people half-translate an app? Did they just hardcode some of the copy by accident?


They translated it once years ago, likely using external resources. My team has had this issue; we made internationalized versions of our website once, years ago, and it was quite expensive and used external language contractors. Then the i18n versions gradually fell out of date as we continually updated the English version (because that's what we full-time devs speak). Eventually, years later, the i18n versions were hopelessly out of date and turned down, because it wasn't worth paying for another round of internationalization on them given how little it turns out they were used.


But if you had put yourselves in a position to accept community contributed translations, this would not have happened, right?


How is a random static marketing website for a corporation supposed to put itself into a position where it accepts community translations? How would you vet said translations before going live with them (which would be mandatory) given that you don't speak the language?

And who in the world would volunteer their labor for free to do said translations?


It was intended to be a strange mixture of snark, cynicism and sarcasm connecting your described situation with what Google is doing, with the hope of illuminating the different ways people might think about their own work.

In your case, describing it as "a random static marketing website for a corporation" more or less shuts down the discussion.

In Google's case, while they probably don't see YT in those same terms, the convergence of the approach towards i18n suggests that maybe they're a little closer to it than they were.


The context of this particular thread, though, is about why the user interface in an app might not be translated well. I responded by way of example explaining how translations can fall behind in a very similar situation, that of a website.

Neither of these are the same as the YouTube situation, because in the YouTube situation you're dealing with user-contributed content being translated, which is a crucial difference. Community contributions were never about translating the YouTube UI (or god forbid privacy policy, ToS, etc.); they were only ever about users submitting translations of other users' content.


Maybe they fully translated an earlier version of the app but more strings were added in subsequent versions and not translated. When there's no translation available for a string, it falls back to English.


Because either you have to pay some native a decent money to make a translation or because some strings are embedded in code?


Google Translate used to have instant translation from a photo for more languages and then they turned them off, no reason why.

Also you get tremendously worse translations when you translate a document (pdf, doc) vs the "scan/import" an image of text function.

It's a real bummer because hypertext and mobile UI should be excellent mediums for presenting multiple candidate translations and letting the reader indicate the best translation.


Auto translation is probably a big deal for yt/google long-term. If they maintain/achieve dominance in this field they're practically the only one to have detailed insight and anlysis about video content. Fostering an alternative market would be negative and cause more uproar as soon they expand their auto translation services.


I don't see auto translation being anything else but terrible, for many many years still. Possibly forever. I just don't see a machine being able to handle the culture, tradition or customs that are so ingrained in language.

English to danish translations are universally awful. Barely comprehensible gibberish(¤). So I use English operating systems with locale set to en_DK, so dates display as D-M-Y like God intended, but a surprisingly large amount of software somehow thinks it knows better, and displays its UI in danish anyway, so I get to have a brain aneurysm while trying to parse their danish translation for "anisotropic filtering".

(¤) Pre-emptive snarky comment: "Exactly like spoken danish, LOL".


I don't feel this way at all. Videos have automatic subtitles which you can automatically translate into the language of your choice, speech recognition is so good that the right tool will let you program with it, text to speech is a button-click away for basically any Web page (all I need is an extension). Post-processing for color blindness is amazing, left-to-right languages render readably on a consistent basis. OCR is progressing dramatically and we're starting to see projects focused at individual users, and automatic image tagging gives textual descriptions of a huge amount of picture content.

We're at a point where a lot of these tools haven't matured in their consumer implementations, but that's coming. It's just a matter of time.

That's all ignoring the soft accessibility of things like iPads that have made computing accessible to Grandma.


Automatic subtitles for videos in a different language are basically a joke currently.

I agree that we're progressing fast, but fully automated machine translation is IMHO still lightyears away (if at all feasible). And to automate subtitle generation in a foreign language, you first need to have speech to text, which is also still error-prone, so now you have two sources of errors.

We're seeing the uncanny valley problem: By now, things like machine translation are so good for simple use cases, that they're being aggressively pushed, and at first it may even appear correct / as if it was done by a human, but then suddenly the translation becomes nonsensical and weird. Even for the well-received deepl, it's still surprisingly easy to give it some text that it really struggles with.

Incidentally, I remember attending a lecture about 12 years ago by the then new professor of NLP who was talking about his success with using machine aided human translation of subtitles from Swedish into Norwegian. Granted, a lot may have improved in 12 years, but it still struck me as impressive that even in languages that closely related, the best they could hope for in a research project was machine aided translation.


Machine translation can never replace real translators, unless we develop an AI with actual understanding.

Even with human-translated texts it's usually noticeable when the translator didn't understand the subject. To make sense of the translated text you then have to try to reverse-engineer the translator's mapping to figure out what the text would have said in the original.

Much like how you can't properly parse HTML using only regular expressions and string substitution, you can't truly translate human languages without understanding. You have to parse the input language, process the meaning of what was said and finally serialize to the target language.


Subtitling adds even more issues that machine translation simply can't handle, because like a good book translation, it's an artform.

Making good subtitles means you prioritize readability over accuracy. You have a limited amount of space for your text, and you want to keep a low characters per second, so you cut words, ruthlessly. But you have to choose which words to cut so that it still makes sense, which means that you have to identify filler words so you can cut them, or figure out ways to re-phrase something into a shorter sentence.

You probably also want to preserve the tone and style of the dialogue, which means you have to choose the right synonyms, not just the most common ones.

And if you're creating hearing-impaired subtitles, it becomes even more necessary to understand what's going on in the video. If someone slams a door center-screen, you can cut that from the subtitles if you have more important things to display, but if someone slams a door off-screen, you absolutely have to include it in the subtitles, because that's the kind of information a hearing-impaired person needs.

Good luck training your little machine-learning network how to identify which sound effects originate from objects on-screen and which originate off-screen...


I agree in the general sense. The problem is that good human translation works as follows: The translator reads the text, decodes this into some mental representation, and then encodes that representation in the target language. Both decoding and encoding are also highly subjective (which is why works of literature can be translated in many different ways, see e.g. all the translations of works like the Bible, the Odyssey, etc.).

Machine translation still works by a straightforward source-to-target mapping. This assumes that there is somehow a 1:1 correspondence between concepts in one language and concepts in the other one.

There are some cases where this can yield OK results: when the languages are very closely related and/or if the material is very technical (e.g. instruction manuals), because in such cases, the concepts do tend to align a bit better.

But in general, I think the problem is intractable without solving general AI.


> left-to-right languages render readably on a consistent basis

‫os epoh dluohs I.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: