
Show HN: Neural Japanese Transliteration - kyubyong
https://github.com/Kyubyong/neural_japanese_transliterator
======
alphonsegaston
Interesting project! Does anyone know what iOS is using for it Japanese
transliteration predictions? Mine worked well for a long time, but in the past
3 months it's gone haywire for common kanji suggestions. The other day it had
"機能" as the first/only suggestion for "きのう," and I had to dig down into the
menu to arrive at the intended, "昨日." Had a lot of similar experiences
recently.

~~~
blipmusic
Does it act up on your desktop OS as well? I know there's an alternative from
the ATOK developers for iOS
([http://www.justsystems.com/jp/products/atok_ios/](http://www.justsystems.com/jp/products/atok_ios/))
but I haven't tried that one.

~~~
alphonsegaston
No, just my phone. But I didn't know about that alternative. Thanks, I'll
check it out!

------
scentedmeat
> In the digital environment, people mostly type Roman alphabet

Might be selection bias but I mostly notice people using the 10-key click one

~~~
Iv
True on the smartphone. On the computer most Japanese speakers I know just
type romaji. However, this is pretty much irrelevant to this article, as
romaji->kanas (the phonetic alphabets) is a pretty straightforward and solved
problem (there is a clear bijection between both).

The real problem is transforming the phonetic transliteration into the correct
word in either kanji (for most Japanese words) or katakana (for words with
foreign origin).

This problem is akin to disambiguating between two homonymes (which are much
more frequent in Japanese). In some cases it is easy by looking the previous
words, but in some it is heavily context dependent.

Nowadays, most japanese typing system will propose a list of kanjis as you
type that corresponds to the most frequent writting of your transliteration,
but sometimes for unusual kanjis o(or people's name) you have to dig deep into
the list.

I can see how such a system could improve typing speed in Japanese.

~~~
ue_
Do people really use romaji on keyboards in Japan? This strikes me as an odd
way to type, as it means that you first need to learn romaji in order to type.
I thought that hiragana keyboards (hiragana mapped onto the normal layout)
were the norm, especially on laptop keyboards.

~~~
Iv
A typical keyboard will look like this: [https://qph.ec.quoracdn.net/main-
qimg-2394848830a7526592f3a1...](https://qph.ec.quoracdn.net/main-
qimg-2394848830a7526592f3a182e18966d2)

But none of the Japanese I know use the hiragana directly. They told me it is
mostly old people who use it. Almost every Japanese knows romajis now, there
is no additional cost of learning a new alphabet.

------
Sir_Cmpwn
Awesome! Where can I get this keyboard?

------
Grue3
Have you tried Google Japanese Input? It's probably more accurate at word
prediction than anything else just because Google has more data.

~~~
kyubyong
Not yet. This is just a preliminary project for my paper. Probably I will add
more keyboards.

------
hasenj
I'm more interested in (kind of) the reverse.

Given a Japanese sentence (that uses kanji), figure out the proper reading for
each Kanji character, using a neutral network.

I know there are already hardcoded analyzers, like kuromoji, but they produce
incorrect answers in a lot of edge cases.

~~~
ekianjo
its very hard to do that since there are cases when you can read the kanji In
multiple ways, like In peoples names for examples. Japanese is full of
exceptions because the writing system was imported very, very late In Japan
(300ad) without much effort to standardize its application.

~~~
rspeer
No need to shoot down an NLP task because it can't be solved with 100%
accuracy. That's every NLP task.

The kanji -> kana direction should be considerably easier than the kana ->
kanji direction. There are many fewer sources of ambiguity, and the space of
possible answers is smaller.

------
blipmusic
Cool! So, would it be correct to say that this is in essence generates a
disambiguation model for a language with a lot of homonyms due to having
relatively few sounds (but also with some variation in morpheme boundaries,
e.g. "an-i" vs "ani")?

~~~
microcolonel
For what it's worth, this is basically just the same as any popular Japanese
(or Chinese) input method. Usually the approach is to greedily form the
smallest set of the longest words from the given syllables, because people
tend to give inputs where all the words are complete. Sometimes people use
markov models to fix situations where that falls over.

Not sure how well this model performs, but the task is not novel.

P.S. Mandarin has a fair few homophones as well (yes, tones and all). English
has tonnes of phonemes, and still we have loads of homophones (and in some of
the most common words, no less!). Japanese, to my elementary-level ear,
doesn't sound an order of magnitude more ambiguous than English.

~~~
derefr
_Spoken_ Japanese isn't any more ambiguous than English (for a human, or a
speech-to-text AI) because Japanese people pause between spoken words just
like anyone else.

But a stream of romaji furigana _with no spaces_ is quite ambiguous—since
there's nothing to indicate word boundaries, any substring of the input might
turn out to have actually intended to be e.g. a katakana spelling of a name.

If CJK IMEs expected and required people to hit the spacebar between the
"words" (lexer tokens) of the provided input for matching, they'd have a much
simpler job. But as it is, they fall over quite badly when you type multiple
words into the IME input box, and are mostly only usable if you resolve single
words at a time (which, sadly, throws away a lot of the inter-word context
that would otherwise be available for matching.)

~~~
gizmo686
>because Japanese people pause between spoken words just like anyone else.

This is (surprisingly) not true. People do not pause between words, however
when listening to a language that they understand, they do perceive pauses
between words; even though such pauses do not exist.

~~~
yomly
I disagree. There's a subtle difference between breathing cadence and
inflection, and a completely monotempo monotonal string of sounds.

~~~
snakeboy
It seems like it'd be useful for one or both of you to cite any research than
has been done on this.

Seems more productive (and enlightening to all) than the agree/disagree
dialogue here.

~~~
gizmo686
Unfourtuantly, this is so well established that it is hard to find research. I
did find this paper [1] which looks into how babies acuire word boundries. As
you identify, there are probably phonetic cues; but not pauses.

The best way to see this is to try listening to a language you do not
understand, and try to identify word boundries.

Indeed, the paper I link argues that some phonetic cue must exist because
babies can recognise word boundries.

[1]
[https://www.ncbi.nlm.nih.gov/pubmed/8176060](https://www.ncbi.nlm.nih.gov/pubmed/8176060)

~~~
gizmo686
It seems I linked only to the abstract. Here [1] is the pdf.

[1]
[https://www.sissa.it/cns/Articles/94_doInfPerceiveWordB.pdf](https://www.sissa.it/cns/Articles/94_doInfPerceiveWordB.pdf)

------
wodenokoto
I feel like I've read that readme file a few 2-3 years ago, but everything
says 14 hours ago.

Anybody familiar with the history of this project?

~~~
kyubyong
I created this repo early this year.

~~~
wodenokoto
Then it must be my memory playing a trick on me. Thank you for elaborating.

