Hacker News new | past | comments | ask | show | jobs | submit login

When I was in Japan I did proof reading for a Japanese feature phone. A major Japanese brand, actually. That was really comical.

There was an Australian guy for English, an German guy, an Italian lady, and me for French. What they did prior to the meeting is: * translate from Japanese to English by Japanese people with a poor English level (maybe the software engineers actually) * translate from weird English to other languages by translators who had only the strings, absolutely no context.

In the meeting we had all the strings, and one person from the manufacturers who had access to the "super-confidential" unreleased device.

More than half of the translations were off because of lack of context. The French guy actually translated "Garbage day" to something like "Shitty day", apparently he thought that was a way to mark in your calendar that you had a really bad day.

Pretty often we had sentences like "delete one", and invariably one of us had to ask "One what? I need to know if it's masculine/feminine/neutral". Of course they didn't prepare to that, it was too late to change the code, so they made us do ugly things like "%n item(s)".

Also the Australian guy was loosing faith into humanity: - That sentence, it's completely wrong, it just doesn't mean anything in English. People will just go "WTF?" when they read that - We're not allowed to change the English strings, they're already validated - .....

I don't know why nobody seems to put information like "warning: this phone's UI in <your local language> is total and utter crap".

Anyway, what you wrote is exactly why I stick to using all software and webservices - OS, text editors, Facebook, et al. - in en_US instead of my native pl_PL. Because translations are always crappy - even for big players. Lack of context is the key here - translated text often feels out of place, because there usually is some overarching idea behind them that isn't communicated to translators. Then there is lack of consistency. Words in original text often have some site-specific meaning, which tends to also be somehow lost in the translation process. For example, on Facebook the word "like" talks about a well-defined thing, not about the dictionary meaning, so it's totally not ok to randomly replace it with synonyms during translation [0].

I realized at some point that I often look at a crapy translation, guess what was the English original, and then in my mind translate to what it should be in the first place. Because for some strange reason I, the user, have the context, and the paid translation team has not. I guess I'm going to put that into my "Translation issues" file in the "Mysteries of capitalism" drawer, right next to "how on Earth multi-milion media companies can't do a movie translation that isn't a total crap" file. I mean, seriously, you're better off looking for pirated subtitles even if you bought the original because pirates at least seem to have watched the movie they're translating.


[0] - I wish more translators would use the approach Jehovah's Witnesses used when doing their own Bible translation. Since it was designed to be studied and analyzed, they preferred accuracy over aesthetics - therefore one of the translation rules was "as much as possible, let's have any given word in original text be always represented by the same word in English". Adhering to that single rule would eliminate like half of the "context missing" problems with software translations.

You know what multi-million movie has a translation that isn't total crap? Frozen. They really put resources into that. You can look up random Disney songs on Youtube in different languages, and then look up the Frozen songs, and you can sort of tell that they've done a better job even if you don't speak the language.

Even relatively obscure languages like Dutch where they usually just watch English-language movies: https://www.youtube.com/watch?v=yOueN0sV2SY

I agree. Frozen, and other Pixar/Disney/Dreamworks children movies (like Shrek) tend to be of awesome quality in all languages. But I attribute this to the fact that those movies are not translated - they're being localized, which by definition requires much more work and paying much closer attention.

The Latin American Spanish localization of Dreamworks's Shrek is a great example.

They brought in Eugenio Derbez, a Mexican comedian, to voice Donkey (voiced in English by Eddie Murphy). Donkey in particular speaks in colloquialisms and pop culture references with wordplay, so Derbez wrote a bunch of new lines and jokes that referenced Latin American colloquialisms and pop culture.

Children learn different fairy tales in different countries, so they also managed to change the identity of some of the characters without changing their appearances (and without altering video at all, just audio).

Exactly. I mentioned Shrek for a reason. It was (and still is) hugely popular in my country (Poland), and one of the reasons for that is the deep localization. They replaced original jokes and pop culture references with local ones.

@daxelrod: Wow, thanks for that. Quite interesting. I always wondered about how that was done and if it was a direct conversion of sorts but I guess it's not. Very interesting.

I just have to ask: How do you know this?

All of this information comes from the teacher of a Spanish class I took (we watched the Latin American version of Shrek in the class). I wish I had some more tangible sources to cite.

EDIT: http://www.imdb.com/name/nm0220240/otherworks: "Jeffrey Katzenberg and Dreamworks allowed [Eugenio Derbez] not only to dub Donkey's voice, but to translate and adapt the script of "Shrek" and "Shrek 2" to make it more appealing to Latin America"

I also remember that the Gingerbread Man was one of the characters who was altered, but I don't remember the name of the Latin American replacement.

Gingerbread Man is translated as "El Hombre de Jengibre", which was previously unknown to Spanish speaking children.

What was relocalized was the Muffin Man nursery rhyme (http://en.wikipedia.org/wiki/The_Muffin_Man), which was substituted by Pinpon's ronda song:

Pinpon is a puppet very handsome and made out of cardboard. He washes his little face with soap and water. He untangles his hair with an ivory comb. And in spite of the hair pulling he cries not nor even winces.

Ah! Thank you, you're absolutely right, I misremembered the character.

I remember the Hindi translation of Aladdin getting a bunch of critical acclaim too

Not to mention that the context of each line of dialogue is pretty close to unambiguous since you have the source material right in front of you for a movie. I imagine it's still a huge job, but has to be more enjoyable that translating/localizing for us asshole programmers and our magic translation strings.

That's why I say some translators seem not to bother even watching movies they're working on. Otherwise they wouldn't make such stupid mistakes.

Nitpick: Shrek is a Dreamworks Animation movie.

Thanks. Updated the comment.

I don't watch this type of movies often so I tend to bundle them together into "Pixareque" category in my mind ;).

If, by analogy, the visuals of a movie musical are the "backend" and the audio is the "frontend," then what these localizers do is the equivalent of completely redesigning the entire frontend. Dubbed musicals have an incredible number of constraints in terms of number of syllables, scansion, etc., so the script translators need to be given a tremendous amount of leeway and creative freedom. They're basically lyricists in their own right, in a world where everything's composed melody-first!

In software, this would translate to localization coders being able to (and having the talent to) rewrite the entire frontend logic. And if your software product is going to make multi-millions in new markets by virtue of feeling like it's translated natively, it might be worth retaining native-speaker coder(s) to maintain a branch that parallels (and consistently merges in) your master branch, and rewrites display logic as it comes in. I'd imagine the Googles of the world do exactly this.

Disney generally spend a lot of effort on the translations.

Idina Menzel, who voiced Elsa also plays the lead in the Broadway musical Wicked. Several translations of Frozen use an actress for Elsa who played the lead in a localized version of Wicked.

For Big Hero 6 we also translated and replaced all of the Japanese text in San Fransokyo with Chinese and Korean for the Chinese and Korean markets. By "text" I mean all of the CG signs, posters, and environmental set dressing in the actual movie (not just the dialogue).

We literally had to re-render the entire movie for each translation. Disney Animation takes these translations very seriously :-)

To be honest, I thought the Dutch translation was relatively awkward (I have to admit I've only heard the Dutch "Let It Go" version, not the rest of the movie). I thought the Flemish version was much nicer, despite some Flemish phrasing sounding off as Dutch...

(For those wondering: Flemish is the Belgian variant of Dutch.) And i agree.. i also prefer the Flemish dubbed voices. For example, Timon&Pumba in The Lion King are Flemish, to great effect.

Because Kids (and Disney)

High-end kids movies are usually localized carefully, including songs, etc (there's a YT video with snippets of all the versions)

And of course they don't split the movie into phrases and make them translate one by one

> I don't know why nobody seems to put information like "warning: this phone's UI in <your local language> is total and utter crap".

This is what makes me use SW and equipment only in English.

Translations are pretty much useless (and of course, Googling the English error messages usually gives the best results)

It makes sense if not all of your target demographic, or at least a large part, cannot read English on the level neccesary to use your tool.

However, for a lot of modern tools that isn't the case. Often a translated tool is an order of magnitude less usable because of broken translations and inability to Google things.

It's also infurating that lots of tools look at your Windows location when deciding your language. No, I don't want your broken native translation on my English Windows installation, just because I do like to still have € in front of my currency.

That's if you're lucky! Google does far worse, and picks a language based on their often broken geo IP system. It's not even fixable. When viewing alt text for Google doodles, for instance, I get "localized" text even if the rest if the UI is in English. Google Play also has jacked up section titles from time to time.

Netflix search box is the same way. This is in addition to the discriminatory practice of often not providing subtitles in the language of the audio track.

Chrome also would install in the geo IP language, regardless of system settings. How arrogant is that? They deliberately disregard your OS language settings and pick another for you. And, it'd force the default Google search to go to the localized version until it detected a few location. Which, in Denver has had me appear in France then Hungary, as their database somehow accumulates errors. And again, no way to fully opt out- even selecting English and .com would still show country specific logos and such on eg YouTube.

Xbox is also a mess. You can buy games that are country restricted, with zero warning. When downloading, the Xbox provides no indication of a problem until it finishes. Then it does a geo check and reports "download corrupted". Xbox support wanted to RMA the unit, as they were convinced this was a hardware issue and had no KB info on country restrictions. Using a VPN fixed it.

Basically it seems that many developers are brain dead or simply do not care about travelers, expats, or anyone with different language prefs.

If you prefer using Google search untranslated, try their no-country-redirection: http://www.google.com/ncr

Don't forget monolingual managers that just don't care though. I've done so many last-minute string translations... "Oh shit, we forgot French, this was supposed to launch last week, you have 1 hour." And then insert the types of cases from the original article here where you need grammatical logic in a fixed string. Ugh.

I think Gettext on Windows still uses the locale to determine UI language, despite those two things being completely separate concepts.

At least it can be overridden with environment variables.

I have tended to use the localised version of os an sw, not because I prefer it but because I often ended up supporting end users with localized versions.

> Adhering to that single rule would eliminate like half of the "context missing" problems with software translations.

Sort of. Almost all the issues I have with translators and context come from the same word being used for different things. This is particularly true for words like "date" and "time", which can have different translations depending on context (is it the time of day, or the time the test has been running?)

So as well as using the same word for the same meaning throughout, using different words for similar-yet-subtly-distinct meanings is also required.

I'm pretty sure general opinion among everyone other than Jehovah's Witnesses is that their bible translation is not good, and that the same-word rule is one of the reasons why.

And I really don't see how having a similar rule for translations of text in software would eliminate most "context missing" problems. It seems what it would actually do is stop translators even guessing those missing contexts.

> And I really don't see how having a similar rule for translations of text in software would eliminate most "context missing" problems. It seems what it would actually do is stop translators even guessing those missing contexts.

Well, because for one translations would at least be consistent. In software words have often specific meaning related to the application itself. You're not free to translate the word "like" on Facebook however you like because it has its own, specific meaning that is different from the dictionary one. The same applies to things like tools in Photoshop, etc. In general, wrong use of synonyms for things that have application-specific meaning is one of the most common problems with translations I see (and the same happens in official movie translations).

When you don't have (or can't be bothered to get) context, this is the least you can do to play safe.

I have a tool that does this. I produced machine generated translations of German and French as an expedient. The German messages were corrected by a native speaker and I cleaned up the French as best I could. When French is in use an extra message is printed appealing for someone to edit the weak translations.

I dealt with word order issues by avoiding formatted strings with more than one replacement field. Only one string needs to deal with singular vs plural quantities.

Lack of context was a definite problem with the machine translations. It would inconsistently choose translated words that should have been the same because the short string snippets could not be evaluated for their meaning within the narrow domain of the program using them.

Same here. Technical text translation is normally a disaster. I truly hate web pages that switch languages based on IP location, as MSDN sometimes does. In my language the translators always get wrong keywords that should not be translated, specially when they're adjectives. e.g: * "You should use the 'new' keyword..." Often get translated as something like "You should use the word that is not old..." / "Debe usar la nueva palabra..."

To be fair this is because the code is wrong. It should use:

    printf (_("You should use the '%s' keyword"), "new");
indicating that the main string should be translated (_ == gettext) and the keyword itself should not. This also happens to generate slightly smaller binaries and fewer translations in the case where you have several keywords.

@eloisant: I work in automotive where I deal with translations for automotive clusters for one of the largest auto makers. Automotive companies are going to what are called reconfigurables which is basically an instrument cluster with no mechanical gauges; just a screen with gauges rendered by 3D engine. Center stacks too.

I kid you not, the way we translate is to use Google translate as a first pass and then the screens get reviewed by people who know the language. I guess somethings slip through and we evidently pissed off a lot of Chinese folks because of a similar flub. The folks doing the first pass don't know any other languages.

It's quite comical but also a real pain. One of the things I had to work into our code was the left-to-right vs. the right-to-left; you would think it would be just a C-style string but we have to know for text justification semantics.

I don't actually do translations but work on the HMI where the text is displayed. Another pain point is that we have a certain space where text needs to be displayed and everything is fitted using English but after translating to other languages, some strings are much longer than the allotted space.

I can totally imagine that. Recently I've been helping a guy with tweaking his integrated car navigation/radio/media player and I spent some time browsing through its firmware and file system. I saw the translation strings and man, they were horrible. Also half of the stuff wasn't even translated (though it didn't show anywhere on the UI).

BTW. half of the files and directories were named in Chinese which made my work very "fun", but that's another story...

> Another pain point is that we have a certain space where text needs to be displayed and everything is fitted using English but after translating to other languages, some strings are much longer than the allotted space.

As a rule of thumb, if you're using English strings to design your UI, you should account for 30% more space so other languages can fit in. Of course, sometimes this might not be enough.

I worked for a place that used German as the "placeholder text", to avoid the problem of text longer than the allotted space. German is supposedly about 1.5 times as long as English, so if you can fit the German version, you won't have any problems with other translations being longer than the space.

German is longer than English, but there are other languages where the difference is even more noticeable, e.g. French.

> and me for French > The French guy actually translated "Garbage day" to something like "Shitty day"

So you translated it to "jour de merde"?

That's the other way around, the guy who did the translation without context wrote "jour de merde", and I saved them from releasing that in their phone calendar app during my one-day job paid in cash.

Well you really... saved the day!

The commenter wasn't the translator, they were the proof-reader:

> When I was in Japan I did proof reading for a Japanese feature phone.

Ah that makes more sense, thanks.

Your experience speaks to this observation:

With very few exceptions, consumer hardware companies are bad at software.

Yes, I've noticed that since forever. Has anybody ever tried to come up with an explanation?

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact