
Machine Learning Can Help Unlock the World of Ancient Japan - hughzhang
https://thegradient.pub/machine-learning-ancient-japan/
======
czr
for those interested in the different ml methods attempted for the challenge,
many of the top kagglers in this contest posted rundowns of their solution
[0].

for those interested in kuronet [1] and the competition alex gave a
presentation here (en, [2]) as did tarin (ja, [3]).

for those interested in classical japanese literature, you can see the tools
they've built here [4]. tarin is also very active on twitter [5].

[0] [https://www.kaggle.com/c/kuzushiji-
recognition/discussion](https://www.kaggle.com/c/kuzushiji-
recognition/discussion)

[1]
[https://arxiv.org/pdf/1910.09433.pdf](https://arxiv.org/pdf/1910.09433.pdf)

[2] [https://youtu.be/e2-mVBvygm8?t=1428](https://youtu.be/e2-mVBvygm8?t=1428)

[3] [https://youtu.be/e2-mVBvygm8?t=537](https://youtu.be/e2-mVBvygm8?t=537)

[4] [https://mp.ex.nii.ac.jp/kuronet/](https://mp.ex.nii.ac.jp/kuronet/)

[5] [https://twitter.com/tkasasagi](https://twitter.com/tkasasagi)

------
gumby
> Chirashigaki was a method of writing popular in premodern Japanese due to
> the aesthetic appeal of the text. ... When humans read these documents, they
> decided where to start reading based on the size of characters and the
> darkness of the ink.

Extraordinary.

~~~
garmaine
Sounds more extraordinary than it is. It basically means “you read it the
obvious way.”

~~~
Iv
I wopud not call it really obvious:

[https://thegradient.pub/content/images/2019/11/image2.jpg](https://thegradient.pub/content/images/2019/11/image2.jpg)

~~~
ninjin
Well, even someone such as myself with only occasionally passable Japanese
skills can fairly easily see that one most likely ought to start at “御”
towards the upper right and do find that the other order markings “makes
sense”. So perhaps it is just a matter of needing a little bit of exposure?
Now, reading it is its entirety is an entirely different story, but I do know
that native Japanese readers do struggle greatly as well on this point.

On a related note, does the first text region not look an awful lot like the
Japanese main island of Honshu?

~~~
Iv
Spotting the starting point is easy, but I doubt even a human can order the
text correctly without taking cues from the textual information.

It is indeed doable, calling it obvious is what I react to :-)

------
9nGQluzmnq3M
Describing _kuzushiji_ ("broken characters") as a "writing system" is a bit
much, I'd liken it to a cursive script. Not that this diminishes the
importance of ML here, since anybody who's tried to read historical English
documents will know that these can also be pretty incomprehensible to the
untrained eye:

[https://www.nationalarchives.gov.uk/palaeography/doc5_popup/...](https://www.nationalarchives.gov.uk/palaeography/doc5_popup/image.htm)

But as ever, Japanese ramps up the difficulty because there are thousands of
characters instead of 26 letters to deal with.

------
sova
It would be really cool if they released these generated images en masse
because that could help a whole generation of linguists catch up on ancient
Japanese literature swiftly. These symbols are hard to decipher, but with
enough Machine Generated examples it could advance the field considerably!

