Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Neural Japanese Transliteration (github.com/kyubyong)
115 points by kyubyong on June 10, 2017 | hide | past | favorite | 52 comments



Interesting project! Does anyone know what iOS is using for it Japanese transliteration predictions? Mine worked well for a long time, but in the past 3 months it's gone haywire for common kanji suggestions. The other day it had "機能" as the first/only suggestion for "きのう," and I had to dig down into the menu to arrive at the intended, "昨日." Had a lot of similar experiences recently.


I've had a similar experience on OS X, it seems to lose its mind and actually refuse to display as a choice the most common version of a jukugo. Very weird.

These is also a reset for "Conversion Learning" under the Keyboard > Input Sources > Japanese that I've just found thanks to the hints from the thread on this feature in iOS.


Try resetting your keyboard under Settings>General>Reset


Does it act up on your desktop OS as well? I know there's an alternative from the ATOK developers for iOS (http://www.justsystems.com/jp/products/atok_ios/) but I haven't tried that one.


No, just my phone. But I didn't know about that alternative. Thanks, I'll check it out!


I also had this problem and had to turn off "Live Conversion" in the Japanese settings.


You can reset keyboard dictionary in iOS settings, maybe doing that will help.


> In the digital environment, people mostly type Roman alphabet

Might be selection bias but I mostly notice people using the 10-key click one


True on the smartphone. On the computer most Japanese speakers I know just type romaji. However, this is pretty much irrelevant to this article, as romaji->kanas (the phonetic alphabets) is a pretty straightforward and solved problem (there is a clear bijection between both).

The real problem is transforming the phonetic transliteration into the correct word in either kanji (for most Japanese words) or katakana (for words with foreign origin).

This problem is akin to disambiguating between two homonymes (which are much more frequent in Japanese). In some cases it is easy by looking the previous words, but in some it is heavily context dependent.

Nowadays, most japanese typing system will propose a list of kanjis as you type that corresponds to the most frequent writting of your transliteration, but sometimes for unusual kanjis o(or people's name) you have to dig deep into the list.

I can see how such a system could improve typing speed in Japanese.


Do people really use romaji on keyboards in Japan? This strikes me as an odd way to type, as it means that you first need to learn romaji in order to type. I thought that hiragana keyboards (hiragana mapped onto the normal layout) were the norm, especially on laptop keyboards.


A typical keyboard will look like this: https://qph.ec.quoracdn.net/main-qimg-2394848830a7526592f3a1...

But none of the Japanese I know use the hiragana directly. They told me it is mostly old people who use it. Almost every Japanese knows romajis now, there is no additional cost of learning a new alphabet.


I agree. I use it on iOS with swiping enabled - it is way faster and less error-prone than the romaji keyboard.


except in asia. Koreans and Japanese use a lot of dial type input systems.


Not sure what you mean by dial type input system, but if you mean that people use the number keys 0-9 to input text, that's the exact thing the grantparent meant.


Awesome! Where can I get this keyboard?


Have you tried Google Japanese Input? It's probably more accurate at word prediction than anything else just because Google has more data.


Not yet. This is just a preliminary project for my paper. Probably I will add more keyboards.


I'm more interested in (kind of) the reverse.

Given a Japanese sentence (that uses kanji), figure out the proper reading for each Kanji character, using a neutral network.

I know there are already hardcoded analyzers, like kuromoji, but they produce incorrect answers in a lot of edge cases.


its very hard to do that since there are cases when you can read the kanji In multiple ways, like In peoples names for examples. Japanese is full of exceptions because the writing system was imported very, very late In Japan (300ad) without much effort to standardize its application.


No need to shoot down an NLP task because it can't be solved with 100% accuracy. That's every NLP task.

The kanji -> kana direction should be considerably easier than the kana -> kanji direction. There are many fewer sources of ambiguity, and the space of possible answers is smaller.


Well, it only needs to be as good as a knowledge human expert. When there's no right answer, then there's no right answer. And that's ok.


Cool! So, would it be correct to say that this is in essence generates a disambiguation model for a language with a lot of homonyms due to having relatively few sounds (but also with some variation in morpheme boundaries, e.g. "an-i" vs "ani")?


For what it's worth, this is basically just the same as any popular Japanese (or Chinese) input method. Usually the approach is to greedily form the smallest set of the longest words from the given syllables, because people tend to give inputs where all the words are complete. Sometimes people use markov models to fix situations where that falls over.

Not sure how well this model performs, but the task is not novel.

P.S. Mandarin has a fair few homophones as well (yes, tones and all). English has tonnes of phonemes, and still we have loads of homophones (and in some of the most common words, no less!). Japanese, to my elementary-level ear, doesn't sound an order of magnitude more ambiguous than English.


> Japanese, to my elementary-level ear, doesn't sound an order of magnitude more ambiguous than English

I don't have any citations, but subjectively I disagree. One can think of English words that are homophones, but in JP the challenge is more to think of words that aren't.

On one hand, if you include uncommon words (as a keyboard's corpus would) then practically any medium-length word will have multiple possibilities, which is not the case in English. But much more important is that, where an English homophone typically has two or possibly three interpretations, a kanji jukugo might have 3-4 everyday meanings and a bunch more uncommon ones.

And all this is on top of the matter of word boundaries. If the user enters し, there could be 10+ possible transliterations of that character as a standalone word/particle, above and beyond whatever readings are possible together with the characters before and after.

So I don't know what an order of magnitude would mean in this context, but I think the whole matter is significantly more ambiguous than English.


Spoken Japanese isn't any more ambiguous than English (for a human, or a speech-to-text AI) because Japanese people pause between spoken words just like anyone else.

But a stream of romaji furigana with no spaces is quite ambiguous—since there's nothing to indicate word boundaries, any substring of the input might turn out to have actually intended to be e.g. a katakana spelling of a name.

If CJK IMEs expected and required people to hit the spacebar between the "words" (lexer tokens) of the provided input for matching, they'd have a much simpler job. But as it is, they fall over quite badly when you type multiple words into the IME input box, and are mostly only usable if you resolve single words at a time (which, sadly, throws away a lot of the inter-word context that would otherwise be available for matching.)


>because Japanese people pause between spoken words just like anyone else.

This is (surprisingly) not true. People do not pause between words, however when listening to a language that they understand, they do perceive pauses between words; even though such pauses do not exist.


Wow, there are a lot of responses putting a lot of weight in my exact choice of words, here.

I'm not a linguist; I don't know what the proper name is for the thing people do between each pair of spoken words—that they don't do inside words—but I do know that there is something people do there. I would call it "a pause" because that is the function it serves. It's an overlap of lesser "terminal" sounds that forms something that is detectably a semantic gap—like the pause between crossfaded tracks on a gapless record, or between movements in a concerto.

Whatever it is, it is there, because speech-recognition systems use it to detect spoken word boundaries regardless of language. (This heuristic does screw up sometimes; spoken language does often "slur" particular word-pairs together. But it's rare enough that these can be trained as specific exceptions to the rule, rather than needing to throw out the rule.)


I disagree. There's a subtle difference between breathing cadence and inflection, and a completely monotempo monotonal string of sounds.


It seems like it'd be useful for one or both of you to cite any research than has been done on this.

Seems more productive (and enlightening to all) than the agree/disagree dialogue here.


Unfourtuantly, this is so well established that it is hard to find research. I did find this paper [1] which looks into how babies acuire word boundries. As you identify, there are probably phonetic cues; but not pauses.

The best way to see this is to try listening to a language you do not understand, and try to identify word boundries.

Indeed, the paper I link argues that some phonetic cue must exist because babies can recognise word boundries.

[1] https://www.ncbi.nlm.nih.gov/pubmed/8176060


It seems I linked only to the abstract. Here [1] is the pdf.

[1] https://www.sissa.it/cns/Articles/94_doInfPerceiveWordB.pdf


Well for starters, a Japanese speaker can tell the differences between God "kami" and paper "kami"...

A monotonal, monotempo sound would not be able to make that difference audible

https://en.m.wikipedia.org/wiki/Japanese_pitch_accent


I think the claim being made is that the pause between consecutive syllables is (on average) the same whether they're part of the same word or not, so inter-syllable time is not a reliable indicator of word boundaries. Anecdotally, I think this is probably true of my speech most of the time (i.e. any time I'm not consciously enunciating). It might be less true for people with accents that have a slower cadence.


There isn't certainly pauses between words in Japanese. (Actually, I'm not aware of any language that has.)

The Japanese disambiguate word boundaries in spoken language using the pitch accent as the primary clue. Tokyo Japanese has a phenomenon called initial rise, which differentiates the pitch between the two first moras of an accent phrase – either the pitch rises or steeply falls.

Here's an example - upper case: high pitch, lower case: low pitch.

  KYOu,  kaINI  iKIMAshita
  today, to buy I went
  
  KYOu   KAini      iKIMAshita
  today, to meeting I went
  
  kyoUKAINI iKIMAshita
  to chuckh I went


> Spoken Japanese isn't any more ambiguous than English (for a human, or a speech-to-text AI) because Japanese people pause between spoken words just like anyone else.

Also intonation, which is not captured by the written system at all. Japanese isn't strongly tonal in the way Chinese is, but it has a regional prosody, like Swedish, which helps in disambiguating meaning.


It’s usually analysed as having both normal prosody and a pitch accent (similar to a stress accent) that varies somewhat by region. I’ve read that broadcasters are expected to use a standard (Tokyo?) pitch accent when speaking.


That is correct. The Tokyo dialect is considered "neutral" in the same way a Midwestern accent is for American broadcasters.


pause between each word? certainly not. Japanese are known to speak very fast and to remove any form of blank between words. like when asking お元気ですか? it sounds as if everything is attached. Please dont spread out inaccurate information.


Well, to be honest my first impression was that this is what I've been using all along. I just liked that someone experimented a bit. :) Perhaps I assumed this would end up with a different predictive model. No idea what kind of corpora have been used for the input methods already available.

Edit: your P.S. made me remember the "ma-ma-ma-ma..." mouthful the Chinese language students I studied in parallel with discovered (apparently well-known - something about a horse and a...mother?). If tones are not represented in romanised Chinese, things seem to get tricky, indeed.


I don't speak Chinese but I've encountered this series a lot:

https://en.wiktionary.org/wiki/%E5%AA%BD#Chinese 媽 mā 'mother'

https://en.wiktionary.org/wiki/%E9%BA%BB#Chinese 麻 má 'hemp' (sometimes 'flax')

https://en.wiktionary.org/wiki/%E9%A6%AC#Chinese 馬 mǎ 'horse'

https://en.wiktionary.org/wiki/%E7%BD%B5#Chinese 罵 mà 'scold'

It's also cool that you can see that 媽 is made up of "semantic 女 + phonetic 馬", where 女 means 'lady' and 馬 sounds like "ma", so the character was meant to suggest "a word relating to ladies that sounds like ma".

https://en.wikipedia.org/wiki/Chinese_character_classificati...


There's one more:

https://en.wiktionary.org/wiki/%E5%97%8E#Chinese 嗎 ma 'question particle'


Oh yeah, thanks!

Apparently the phono-semantic derivation for that is "mouth ma" (maybe because a mouth is used to ask questions?).


There's also this rather fun poem, Shī Shì shí shī shǐ.

https://en.wikipedia.org/wiki/Lion-Eating_Poet_in_the_Stone_...


For what it's worth, there's a lot of work to do on Japanese input methods. To a human it may be obvious that hawotogu should become 刃を研ぐ (sharpen a/the blade), and that one does not typically sharpen or hone a 歯 (tooth), 葉(leaf), 派(party/ingroup), or even 覇(~hegemony); but to a computer without topic-specific context, each of these is equally valid. One might be thankful that humans don't really say all that many different or interesting things, at least in this context.


I didn't mean to imply otherwise. My curiosity was/is with the generation of the model in the link (especially compared to current models), hence the question in my original post. There's also pitch accent that helps with disambiguation of spoken Japanese (I should know, I accidentally called a friend seaweed for a long time).


I don't know about "topic-specific context", but doesn't a word-granular Hidden Markov Model trained on huge amounts of text from the language let you figure out things like "given that the sentence includes 刃, 研 is the highest-scoring match"?


It does, but the suggestions will be nowhere to be found if somebody is making a novel use of words. That's what I was getting at.


Was the phrase possibly this: 你敢罵我媽的馬嗎?

(roughly: "you dare to scold my mother's horse?")


"Scold" or something similar to it was in there, so quite possibly yes. Thanks, for the heads up.


I feel like I've read that readme file a few 2-3 years ago, but everything says 14 hours ago.

Anybody familiar with the history of this project?


I created this repo early this year.


Then it must be my memory playing a trick on me. Thank you for elaborating.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: