Hacker News new | past | comments | ask | show | jobs | submit login

I can see several problems with this.

1. Japanese subtitles are completely different from what is said in Japanese dub. Subtitles are optimized for readability and use completely different phrasing, or even terms (for example dub says クイーン ("queen"), sub says 女王 ("jo-oh")).

2. Japanese subtitles and English subtitles are also different, because many lines have to be completely rephrased in order to express them in a (very linguistically distant) language.

3. The furigana (pronounciation hints) in these screenshots is not correct. "困った" is "komatta" not "koma ta". "妻は" is "tsuma wa" not "tsuma ha". If only they used ichi.moe's engine... (disclosure: I'm the author of ichi.moe).

The way the "koma" is placed above 困っ makes me suspect they first tokenize the text, get furigana for each segment (most tokenizers have this built-in) and then derive the romaji without correctly handling the sokuon っ when it appears just before a segmentation boundary. Usually it's represented in the romanization by doubling the following consonant, but of course that doesn't work if there's no following consonant because each token is treated as a separate string. They do get it right for "erikku" where there's no segmentation boundary.

If they're having troubles with correct romaji transliteration (and even if not, to be honest), I don't see why they don't just transliterate it in furigana then. The source data of Tokeniser dictionaries such as NAIST JDIC isn't even in romaji, so if they are using a proper tokeniser, then they're actually doing an extra step and throwing away data to transform it into romaji form.

The feature is probably targeted at absolute beginners who don't even know kana yet. The screenshot of the settings does show the option to select a different transliteration, but romaji seem to be the default.

That makes sense.

Hopefully it can be fixed, since geminates are one of the most difficult features of Japanese pronunciation for native English speakers to master. Erasing it from the orthography intended to aid beginners certainly doesn't help.

> (for example dub says クイーン ("queen"), sub says 女王 ("jo-oh"))

My Japanese comprehension is still very minimal, but wouldn't this be pretty normal? It seems common in manga at least to use kanji for semantic meaning and annotate it with furigana for a foreign word or in-universe term.

This is slightly offtopic but ichi.moe is a wonderful tool, thanks so much for making it.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact