
Show HN: Nihongo – Study Japanese using authentic text from games, songs, etc. - chrisvasselli
http://nihongo-app.com
======
Blahah
OP, this is wonderful. I'm a long time japanese learner and I wrote the iOS
Anki app.

Just looking through the website (unfortunately I don't have an iOS device on
hand), you've got some killer features here that I've wanted for a long time.

'clippings' looks really excellent. I would go so far as to say this is a
game-changer for me as it currently stands. If you took the same idea and went
bigger, that would be awesome: I'm thinking an e-book reader. That is, load up
a Japanese ePub and then have one-click access to a context-aware J->E
dictionary lookup via your app. Then being able to generate cards from the
book would be perfect.

If I had to pick a missing feature (again just going from the description) it
would be audiovisual context. This has been crucial to my learning - any plans
to integrate it?

Finally, I don't know if you're aware but there is precedent for 'clippings'
in the tools that generate Anki apps from movie subtitles. They create cards
that have a still from the movie, the audio clip, and the subtitles in
Japanese and English. Yours is more universal, but I think something that
combined features of both would be even more powerful (i.e. audio and images).

~~~
chrisvasselli
I would LOVE to have integration with e-books, I think that could totally
change the way I learn Japanese. Since over time the app learns what words you
know and what words you don't, it could even suggest books for you that are
around the right difficulty level. Wouldn't that be amazing?

Unfortunately, I haven't been able to find any place where you can purchase
DRM-free e-books, so I think this would require working with Japanese book
publishers. The one exception is the Harry Potter books, which are published
DRM-free through their own website. I read the first two Harry Potter books
using clippings to generate flashcards for each chapter as I went along, and
it was incredibly helpful.

The idea of audiovisual context is interesting. Could you tell me more about
what you have in mind? I've thought about adding text-to-speech options on the
dictionary entries and in flashcards. But something more complete, like what
Rosetta Stone does, seems a lot harder, just from the perspective of acquiring
rights to the images.

~~~
Blahah
Aozora Bunko ([http://www.aozora.gr.jp/](http://www.aozora.gr.jp/)) is the
only legitimate source I know of for ebooks (and they are free). Long term it
would be really great to get some Japanese publishers to commit to ebooks, but
I know that historically they have been strongly, publicly resistant.

The Wakaru app does allow ebooks reading with a dictionary, but I always found
it clunky and I don't think it allows easily generating cards from a book, the
dictionary was less useful than yours looks, and it doesn't learn your vocab
over time.

For audiovisual context, I'm really thinking of:

    
    
      - images to match vocab items (potentially auto-sourced from a Creative Commons licensed image search engine)
      - images from articles to match sentences from articles
      - sound clips to match text taken from audio captions
      - video clips OR audio clips + screenshots to match sentences from video subtitles
    

Regarding image rights, I would say that there is a huge amount of stuff in
the public domain or liberally licensed (e.g. on Mediawiki). But you could
also allow the user to add their own images, where they take responsibility
for making sure they have the rights.

~~~
chrisvasselli
Yeah, unfortunately I find the content in Aozora Bunko not all that
interesting. Or at least, I don't know how to find good content in it. I
actually built a proof-of-concept of a "Books" tab in Nihongo using Aozora
Bunko, but I found it sort of unsatisfying, so I never pursued it further. I
haven't used Wakaru before, I'll have to check that out.

I'll explore the audiovisual ideas more, that's pretty interesting, thanks.
I'd love to have user-generated content in the app, like the ability to share
decks. Seems like that could be a good way to handle image rights as well.

~~~
jamesknelson
I've got the same problem with finding decent content in Aozora Bunko. For me,
the main issue is how to find content which matches my level (actually being
interesting being the other problem, but seemingly harder to solve).

I set up a really hacky "grader" algorithm and run it on a bunch of the Aozora
Bunko books, then put them online [1]. The very "easiest" books are kinda
boring, but it is certainly easier finding usable material in that list than
looking in Aozora Bunko itself. The one I've found most interesting so far is
銀河鉄道の夜 [2]

[1] [http://www.readyourgrade.com/](http://www.readyourgrade.com/)

[2]
[http://www.aozora.gr.jp/cards/000081/files/43737_19215.html](http://www.aozora.gr.jp/cards/000081/files/43737_19215.html)

~~~
chrisvasselli
This is awesome. I'd love to have these kind of features in Nihongo.

------
chrisvasselli
Hey HN, creator of Nihongo here!

I spent a long time studying Japanese vocabulary using Anki, Flashcards
Deluxe, etc., but I found that studying using premade word lists never worked
that well for me. Nihongo is built around studying the words you're
encountering naturally through your own hobbies and interests. Flashcard packs
are automatically generated from the words that you look up in the built-in
dictionary. You can also copy and paste Japanese text into the app like song
lyrics, textbook readings, videogame scripts, or books, and it will
automatically find all the words contained in the text, and generate
flashcards from them.

I built Nihongo because it was the tool that I wanted and felt like was
missing for studying Japanese. Hopefully it'll be useful for some of you too!
I'd love to hear your feedback.

~~~
veidr
Hi Chris, looks like an awesome start. Some quick suggestions and questions:

1\. You should fully support romaji entry for looking things up. The mechanics
of switching input methods on iOS are pretty terrible so it is really
inconvenient to have to switch into Japanese input mode. I searched for "hana"
but got only 花合わせ and 花を引く as matches -- because those have 'hana' in the
English definition string. Searching for はな brought the two obvious matches (花
and 鼻) to the top, but I had to go back and cycle through several other input
methods to get to Japanese input mode, which was annoying and took longer than
the rest of looking up the word. Some other dictionaries do this, and even
sites like Amazon Japan now do this for searches as well -- dealing with input
methods is just a hassle in general.

2\. The understanding of verb conjugation seems really awesome and jumping
between things on a "learning tangent" works really well.

3\. You _have to_ sync vocab lists and spaced repetition drill status/progress
to the cloud. It's just too useful not to do it -- you want to be able to do
that in quick downtime moments from any device. Luckily there are many easy
ways to do it nowadays. I don't blame you for shipping without that, though!
;-)

All in all, great start!

~~~
chrisvasselli
Thanks for the feedback!

1\. I actually just finished implementing romaji entry, and it will be coming
in the next version update. Honestly, after implementing it, I can't believe
how much time I've wasted switching input methods. =)

2\. Thanks!

3\. I know, I know! I really want to, but I spent a couple weeks diving into
it and found it really buggy and hard to get right. I'm using Core Data under
the hood to store this stuff. Any suggestions for tools I should look at to
help with syncing?

~~~
sharp11
For syncing, look at couchbase mobile, which recently has added support for
Core Data: [http://blog.couchbase.com/syncing-with-core-
data](http://blog.couchbase.com/syncing-with-core-data)

~~~
chrisvasselli
Looks promising, thanks. I initially tried with iCloud syncing, and it was a
mess.

------
ericmo
@chrisvasselli:

When I was a child, my parents tried to teach me Japanese as my first
language, but we live in Brazil and that didn't work out, as soon as I got
into school I dropped it and started to speak exclusively in Portuguese. Still
I have some knowledge of "spoken Japanese", even though I don't know how to
write the kanjis.

That's why I found this clippings feature really great. I can read the
pronunciation over the kanjis and deduce what it means by context, I've always
missed that when trying to study Japanese, and if I can't deduce its meaning,
it's also easier to lookup in the dictionary if I know its pronunciation (in
the app it's probably just a click). Some people think that going through this
trouble actually helps memorizing the kanjis, but for me it's just a time
consuming task.

Too bad I have no plans to buy an iPhone. No plans for an Android or a web
version?

Also, how did you get these texts to feed the clippings? By hand? Scraping?

~~~
chrisvasselli
Nice, that's exactly what I'm going for! Part of my philosophy with the app
was to make my studying of Japanese as efficient as it could possibly be. I
tried to shave out all of the time spent doing tasks that don't contribute to
my actual learning, like trying to draw kanji, count strokes, manually type
out flashcards, etc.

As for the clippings, those are all user-supplied. For example, I love classic
JRPGs, and FFVI is my favorite. If you google around a bit, you can find the
script for the Japanese version of FFVI. Import the script into Nihongo, and
you can pre-study the words as you play along. You can also filter the
flashcards down to just the words that show up more than once, so if you're
not ready to study everything, you can just focus on the important words. It
also works great if you want to study song lyrics, and I even used it to read
Harry Potter in Japanese (thanks J.K. Rowling for putting out DRM-free books!)

Unfortunately, I don't have any plans for Android or the web right now, since
this is a solo side-project for me, and I just don't have the time. But I'd
love to do it in the future!

------
arcameron
Looks great! Any timeline on Android?

~~~
chrisvasselli
Thanks! Unfortunately, I don't have any plans for Android right now. This is a
solo side-project for me, so I just don't have the time. I'd love to at some
point in the future though!

------
Nadya
_> Our dictionary is made to find what you're looking for fast, with the words
Japanese speakers actually use at the top of every search result._

May I ask how this was factored? Also what if I want to use a certain dialect?
Osaka might opt for a different word than Hokkaido. I assume standard Tokyo
dialect?

 _> Word Commonality_

Also interested in how you measure this. Frequency used in a newspaper? Is it
part of the Jouyou kanji? How about age of speaker? "Common among younger
generation" and "common among older generation" might be a better benchmark
for commonality.

 _> We'll even add furigana._

Is it editable for instances when it is wrong?

I don't have an iPhone to test, but I am very interested in helping others'
learn Japanese and more importantly making sure they are learning _correct_
information. I've seen few coded systems that weren't heavily curated provide
accurate information. There's nearly always a "well...that's...actually wrong"
somewhere in the dataset that things use.

Not saying that's the case here - just that it is a possibility.

~~~
chrisvasselli
Great questions!

We include words from all different dialects, but standard Tokyo dialect will
inevitably float to the top. I learned Japanese in Kansai, so I'm sensitive to
this issue. =)

As for how the commonality is determined, I use a combination of corpuses
including newspapers, novels, literature/poetry, and spoken language. I
actually initially had a feature that would tell you "this word is more common
in novels", but it didn't work consistently well enough, so I ended up
scrapping it.

When we actually choose how to order the search results, we take into account
a few more things - like if the search string is part of the primary meaning
of the word, if it's a partial match of the meaning string or a full match,
etc. In order to determine the relative importances of the various factors, we
actually automatically optimized them against a huge set of vocabulary lists
that a Japanese native, English fluent speaker made for Japanese students
studying English. This gives us the best chance of picking the Japanese word
that a Japanese native speaker would choose for a given English word.

As for the furigana, since we're taking arbitrary user input, there are bound
to be mistakes here or there, and that's the case with this app too. I don't
have any plans to let people edit the furigana right now, but I am improving
the parsing all the time.

~~~
Nadya
_> we actually automatically optimized them against a huge set of vocabulary
lists that a Japanese native, English fluent speaker made for Japanese
students studying English. This gives us the best chance of picking the
Japanese word that a Japanese native speaker would choose for a given English
word._

That's actually a pretty awesome way to handle it compared to other methods
I've seen used.

 _> As for the furigana, since we're taking arbitrary user input, there are
bound to be mistakes here or there, and that's the case with this app too._

Of course, I'm understanding of that issue. Parsing arbitrary input will
always have errors, Google isn't exempt from this problem either. :)

It's very, very difficult to get some of these things right (and sometimes
just impossible given current technology/software).

My issue with it is that not all people who use the app will be
technologically savvy and understand that the furigana could be wrong. For
someone _learning_ a new language, giving the wrong reading could be extremely
detrimental for them.

I guess just voicing a gripe I have. I understand from a marketing point,
every JP app does this when they automatically add furigana readings. I just
wish there was some transparency that they aren't always accurate. :\

~~~
chrisvasselli
Yeah, it's a good gripe to bring up. I'd love to come up with some indication
of "confidence" in the parsing. Or maybe when there are two possible parses,
give the user the ability to see both somehow?

~~~
Nadya
I think giving an option of two possible parsing would confuse people
learning. They wouldn't know which is the right one to pick! So adding a
second option just ads to the confusion...

I think showing a confidence level would be a good solution based on the
number of available readings for the given context of the kanji.

生 would be a good example. When used by itself, the confidence that the
reading is correct should be low (since there are many possible options!) but
if it is used in 生まれる the confidence level is very high (because there is only
the one possible reading). When used in 生す it could be 50/50 (since it could
be read な or む).

Explaining this to learners is a little tricky.

~~~
chrisvasselli
Yeah, 生 is a good example. Fortunately, there are corpuses out there with
hand-parsed sentences, so we at least train on those to pick the word for 生 on
its own that's most common (I just tried - it's なま). We could use this to get
our confidence even better than 50-50 for 生す.

I can see how giving users more options could just bring more confusion. Such
a tricky problem! I'll be thinking about this one.

------
archseer
Sweet! Looks like a nice addition to WaniKani. I'm using WaniKani for my
general studies, but automatic card generation based on my word lookup is
definitely neat.

~~~
chrisvasselli
Yeah, I think Nihongo makes a great compliment to WaniKani! Nihongo doesn't
have lessons, and isn't really meant to teach you Japanese in the same way
WaniKani is. I like to think of Nihongo as a great companion tool for however
you're learning Japanese already, whether that be from a class, from living in
Japan, reading books, or studying using other tools.

------
dtouch3d
Lately I've been thinking of learning Japanese just like I started learning
English. After two years of basic English courses starting from third grade, I
started to play videogames like Pokemon and watch movies with subtitles and my
vocabulary and proper expression greatly increased. So I should begin studying
elementary Japanese for a while and then start playing pokemon in Japanese and
study as it seems fit.

Good job OP! An Android version would be great!

------
abustamam
Is there anything like this in different languages? I think you could make a
great language-learning platform with this idea.

~~~
chrisvasselli
Not that I know of! I'd love to expand to other languages at some point.

One of the tricks is that I think it's important for there to be a best-in-
class dictionary attached to it. If users feel like they'd rather use a
different dictionary, then you don't get their dictionary history as
flashcards, and overall you create a crappy user experience for them if they
have to switch back and forth all the time.

Best-in-class dictionaries require someone deeply familiar with the language,
and specifically the pitfalls of learning the language, to make. So I think to
expand to other languages, I'd want to get passionate learners of the other
languages involved as well.

Definitely a long-term goal though.

~~~
abustamam
I'd be interested in helping out, if you decide to take this further. I'm a
passionate learner of French and Arabic but I am not deeply familiar with
either language yet. I do know some people who are native speakers though.

------
deciplex
> _Have you ever tried to use a word you looked up in the dictionary, only to
> find out that no one has ever heard of it?_

This is so great. EDICT has so many entries that Japanese speakers have either
never heard of, or would never use, or even where the definition is archaic or
just plain wrong.

Maybe someday, this app will be available for Android :-)

------
djent
Unfortunately doesn't run on iOS 7 or below. Any chance you could support it?
I'd love to use this app.

~~~
chrisvasselli
The first couple versions of the app supported iOS 7, so when you try to
download it, you should get a prompt to download the earlier version.
Unfortunately, I don't have any plans to support iOS 7 going forward. =/

~~~
djent
The App Store doesn't present me with that option. I believe you need to have
an older version already installed to be able to redownload it.

~~~
chrisvasselli
Ah, sorry, I didn't realize that! Unfortunately I make use of some of the
features in iOS 8 in a way that would make it difficult to add support for iOS
7. If you ever get a new device compatible with iOS 8 I hope you'll check it
out!

------
CoryG89
Android version please!

------
it_learnses
I have android :(

------
yaiu
I need this, but for Mandarin

------
buster
And now for Android please! Always looking for a good japanese learnign app.

------
PSeitz
Could you shed some light on your algorithms (how you declare something as
Common, Uncommon) and datasources (JMDict, tatoeba) ?

Disclaimer: I am developing a free japanese dictionary for android

------
newman314
Does it leverage text to speech for pronouciation purposes?

~~~
chrisvasselli
Not yet! But it's on my short list of features to add! Good to know there's
interest.

------
pavlov
Looks lovely.

Seems like even a starter-level student could benefit from this, with the
flashcards and word commonality features.

------
deeteecee
just curious, how do you mark a word as common? I often used jisho.org or
classic.jisho.org (from wwwjdic) and from what i remember, a lot of those
words marked as "common" were mostly definitely not.

~~~
chrisvasselli
Yeah, I found the commonalities in WWWJDic/JMDict to be pretty problematic, so
I came up with my own. I use a combination of corpuses including newspapers,
novels, literature/poetry, and spoken language. Some of these have been hand-
parsed by humans, and others I parse using the same parsing as clippings. I
think the result is pretty good!

~~~
glandium
Commonality is a tricky thing. There are words that most native won't actually
know, words they know but rarely use, words that used to be trendy but aren't
anymore, etc. It's hard to know from "common", "uncommon", and "rare" in what
kind of bucket a word or expression would enter, while "native won't actually
know" is a very important distinction to make. Moreover, it's not clear which
of "uncommon" and "rare" ranks above the other and below "common", both words
being synonyms. To give an example, since I gave a try to your app (best I've
found so far, by the way, but unfortunately doesn't match my own needs): 自問自答
is marked rare. 出勤 is marked uncommon. Now, looking at this and some other
words, my guess is that uncommon is above rare. Fine. 自問自答 might be rare, but
there's no Japanese I know (excluding small kids) that wouldn't understand it.
(And, in fact, it stuck in my mind because I keep hearing it). So in your
classification, there is no room to tell apart those words that most japanese
won't know.

~~~
chrisvasselli
Thanks, it's useful to hear that the difference between "uncommon" and "rare"
was unclear to you. I'll think about how to make that better.

I definitely would love to have a distinction in the app for "this is a word
that any educated Japanese person would know". I haven't found the dataset yet
that can let me build that, unfortunately. I checked, and in the case of 自問自答,
it actually doesn't appear even once in any of the corpuses I'm using. So it
seems like we need some other source of data.

~~~
glandium
The data used for commonality in JMDict is a little dated (1998) and biased
(exclusively based on newspapers, which tend to have specific vocabulary),
which I guess is why you mentioned they were problematic. However, there are
more recent data sets available on the Monash ftp archive.
[http://ftp.monash.edu.au/pub/nihongo/](http://ftp.monash.edu.au/pub/nihongo/)
. For example, there's one dataset from 2008 using blog entries from
goo.ne.jp, and another with novels. They could be good additions to your
corpus (fwiw, for 自問自答 there are 18038 occurrences indicated in the dataset
for goo.ne.jp and 68 in the dataset from novels). Certainly, doing some
similar work with current data from the net would be useful too. I wish there
was some regular scraping done, so that we could always use fresh data. Hell,
I wish Google, Bing or any other search engine were just giving out such word
frequencies from their spider bots data (and not just for japanese).

~~~
chrisvasselli
Hmm, I actually use that novels corpus. Sounds like you may have found a bug
in Nihongo. I'll look into it. Thanks!

------
amitlan
'Clippings' looks really handy. Thanks!

