
Okinawan manuscripts digitised - lermontov
http://blogs.bl.uk/asian-and-african/2017/05/okinawan-manuscripts-digitised.html
======
titanix2
This is great. Old dictionaries that fell in the public domain are a great
source of data for some poorly endowed language and language pairs.

I worked on such a dictionary for my master thesis. A researcher I'm working
with gone as far a building a whole crowd sourcing platform to correct OCR
output from a Japanese-French dictionary, compiled by Cesselin. You're more
than welcome if you can contribute to this language pair.
[http://jibiki.fr/](http://jibiki.fr/)

~~~
peterburkimsher
If you're interested in data for your corpus, I recommend translated Manga,
song lyrics, and the Bible.

See my other comment above about Pingtype - some friends have asked me to make
one for Japanese-Chinese, or Japanese-English, so I'm making a note of
possible data sources (Naruto, One Piece, One OK Rock, Ellegarden, Hillsong
Global Project, 新改訳聖書). I've made a note of your dictionary project, so I can
look into it in future!

------
peterburkimsher
Does the pronunciation differ significantly from Hepburn Romanization?
Honestly, I can't read his handwriting, so it's hard to tell.

I sympathise with his emotions: "What a beastly labour of hand & back bending,
besides mental toil & anxiety"

I'm currently working on a machine translator for Chinese dialects, including
traditional & simplified characters, pinyin, bopomofo, literal English
translation using a dictionary, and parallel English text. I'm adding
Taiwanese and Cantonese dialects now, with their own romanisations. I'll
publish it here on Hacker News when my friend finally translates the
documentation.

[https://pingtype.github.io](https://pingtype.github.io)

~~~
shiro
Not significantly, but in the Hiragana page there are some interesting
differences from standard Romanization.

Most notable one is conflation of vowel [イ] and [エ]. Often both are denoted
with "i", sometimes one of them is "yi". In modern American English the sound
/i/ falls in middle of Japanese イ and エ. I'm not sure in this case that it's
because of that, or Ryukyu dialect had shifted vowels.

セ is denoted as "she" (usually "se"). This variation of consonant appears in
some Japanese dialect.

ヒ is denoted as "fi". Might stem from old Japanese pronunciation.

Curiously, ヰ is denoted as "i" and ヱ is denoted as "yi/ye/e". Usually they are
"wi" and "we", but those pronunciations have been lost in modern Japanese.

------
DamonHD
Fascinating. No hiragana! B^> Interesting how modesty was not important in
this, though there acknowledgements of possible faults etc.

