Hacker News new | comments | show | ask | jobs | submit login

The way you phrase that is a bit misleading. It's not that Japanese speech has any less info than most languages, it's just that you can set topics (add state to a stack, to put it in geek terms) that carry over into subsequent phrases. The correct translation of an individual sentence may thus depend on that previous context.

A simple but famous case is "Watashi wa hamburger desu". The sentence has no subject, so with no context that would get translated as "I am a hamburger", but if you fill in a previously defined subject, it could be "I [order] a hamburger", "My [favorite food] is a hamburger", etc.

To clarify further for those unfamiliar with the language,a super literal translation of "watashi wa hanbaga desu" is something like:

    (concerning/as for) myself, (it's) hamburger.
On it's own if Bob says this, basically it comes out as "I (am) (a) hamburger".

However, if Sally has just said something like "I'll have a salad...what about you Bob?" then it makes sense as Bob's order is the implied subject and it becomes "My order is hamburger." or "I'll have hamburger."

I know very little about linguistics but I think there are a bunch of other things that make Japanese-English difficult to translate via software as well.

There is the whole aspect of culture embedded in it. あなた could mean "you" or something like "dear/sweetie" depending on the context. There also the question of how to translate "you" (etc) in English text to Japanese as you have to consider politeness etc. If you are just translating a business web page it's probably safe to stick with polite forms, but if you are translating say the dialogue in a TV show you want to preserve the tone of the characters.

In terms of voice recognition, Japanese seems to have a lot of homophones to me when compared to English. It may just be my imagination, but here are some I ran into recently:

舶,錘, 頭, 摘む, 積む, 詰む, and 紡錘 are all pronounced つむ and mean completely different things. Or 六, 碌, and 録 are pronounced ろく. 上, 神, 紙, 髪, and 加味 are all pronounced かみ.

I seem to run into things like that regularly; when just hearing it spoken you need the context to figure out what they mean.

> 上, 神, 紙, 髪, and 加味 are all pronounced かみ.

This can be mostly solved by context. There are very few situations in normal speech where you'd hear "kami" and not know if they're talking about 神 (god) or 髪 (hair). Also, it's not particularly hard to code that knowledge. E.g. try かみにいのる (pray to god) and かみのけをきった (cut hair) on Google Translate. It will suggest the correct kanji in both cases.

Anyway, I'm not a native Japanese speaker, but I find the whole homophone thing a bit overrated. As far I as can recall the only pair of homophones that cause trouble in normal speech are 科学/化学 (both pronounced kagaku, meaning science/chemistry) and 私立/市立 (shiritsu, private/municipal).

Thanks for the reply.

> This can be mostly solved by context. T

Right, as I said. It's not too bad, but it's easier when you can just translate word for word.

かみにいのる gives me "pray to bite" on Google translate; as you say, it suggests the right kanji...but that's precisely my point. It needs you to disambiguate for it to be sure.

I'm not saying this is an insurmountable problem, I'm contrasting the difficulty.

> There are very few situations in normal speech where you'd hear "kami" and not know if they're talking about 神 (god) or 髪 (hair).

I ran into it recently in music. Babymetal has a song that starts:


When you listen to the song, it'd be easy to momentarily think she might be saying "black god" or "black paper" since while the pronunciation wouldn't be identical, it's pretty close. Since I'm human, I figured out pretty quickly what she is saying...but in the equivalent English phrase there's no issue there...it's "black hair" or "black paper".

This is admittedly not "normal speech", but I could see it popping up there too.

I've seen confusion over 神/髪 in other situations too, though those were deliberately puns so probably don't count, but demonstrate it's possible to have situations where it's at least somewhat ambiguous.

> I find the whole homophone thing a bit overrated

I'm sure it's exaggerated to me because my Japanese is pretty atrocious, but I think my point is valid: any time you have homophones in a language it makes things more difficult to set up a system that listens to speech and translates. Japanese seems to have more homophones than English, and if that's true it is proportionally more difficult to translate in that regard.

Also from what I understand, certain homophones are differentiated in practice by differing accenting (raising/lowering) in speech. This is however region specific.

> I'm not saying this is an insurmountable problem, I'm contrasting the difficulty.

Fair enough. I'm not claiming there's no homophone ambiguity either, just that it's a relatively easier problem compared to, say, the stuff Microsoft is doing.

Yeah, when I say "normal speech" I don't include pop music lyrics.

As someone already pointed it out, you should have written the translation as "I - hamburger", not "I hamburger." This implies a pause in speech. I am native Russian speaker and Russian also has this case where the meaning of a sentence has to be deduced from the context of previous sentence. But in written Russian you could look at that sentence only and understand that someone is likely replying to something. I don't practice my Russian daily as my day-to-day communication is only in English. I tend to forget words when i communicate with my Russian friends via email, so i use Google Translator a lot to translate from English to Russian. I actually find that Google is pretty good at translating formal sentence structure you'd see in literature and absolutely abysmal at everything else.

Japanese has noticeably fewer phonemes than most languages (IIRC something like 21 compared to a "normal" 24-28) so it makes sense that there are more homophones. One interesting effect is that puns and innuendo are easier in Japanese. Of course it's easy enough to disambiguate in normal conversation.

This is not a unique feature of Japanese. You can do exactly the same in Russian using a dash (which translates to a short pause in speech): "я — гамбургер" (I — hamburger). In general "X — Y" means "X is Y" or another relationship between X and Y as indicated by preceding context.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact