
Recurrent Net Dreams Up Fake Chinese Characters in Vector Format with TensorFlow - kogir
http://blog.otoro.net/2015/12/28/recurrent-net-dreams-up-fake-chinese-characters-in-vector-format-with-tensorflow/
======
patio11
_Stroke order is very important to Japanese culture, in a society where the
process matters just as much as the end result. Some calligraphists take
stroke order very seriously, and will probably explode if they see someone
writing a Kanji with incorrect stroke order._

One might phrase this "Stroke order is assigned by convention in Japanese;
deviating from the conventional order is incorrect, much laik hau Einglish
speling iz nat ap four eendevizyuelle tyoise." ("What's the matter? You can
sound it out. Pfft, Americans, such rigid traditionalists. It makes sense in
the context of their religious views and conservative political tendencies,
though." [+])

This avoids unfortunate Man Japanese People They're So Craaaaazy overtones.
(FWIW: stroke order is prescriptionist not descriptionist but AFAIK
prescriptionism has a virtually hegemonic mindshare among relevant
authorities.)

n.b. Otherwise this project and post is freaking excellent.

[+] You can actually read Japanese-language takes on American culture which
are exactly as bad as this Orientialism-in-reverse.

~~~
emmett
I think a key point about this, which was only clear to me after I learned to
write Hanzi in Mandarin, is that if you write the strokes in the wrong order
it will not look right at the end.

To someone looking at characters for the first time, it's not obvious that the
stroke order matters to the reader as spelling does in English. It's not about
fetishization of the process over stroke order, it's really obvious to the
reader if you wrote the character incorrectly.

~~~
schoen
I've kind of wondered about how you can tell -- do you know of an example
that's online anywhere that makes it clear how it looks different? (... when
written with a pen or pencil?)

~~~
toufka
see: す - imagine if it were written from bottom to top. The placement of the
'straight' part of the vertical line and the placement of the 'curl' would be
opposite (especially if you're using a brush instead of a pen). This is
similar to the lowercase English 'a' or 'u' written bottom to top - especially
if using cursive.

or see: お

the last 'dash' in the top right is a retouching of the paper after the curl
on the bottom right. Imagine a swirl that starts in top middle, swings around
on the bottom left, swings around on the bottom right, lifts off the paper,
and then touches back down briefly in the top right. If you're hand-writing,
that little landing dot can be huge, tiny, or even conjoined, but to be
legible it's generally visually a continuation of the lower stroke. If you
just copied the letter without any concept of stroke, it would be difficult to
read because you'd be looking for that swirl. Imagine the dot on a lowercase
'i' \- it's supposed to be directly above the stroke for your brain to render
it as an 'i' vs an 'l'. It's why when written out you can distinguish 'il' and
lt' even though they're quite similar if drawn literally, quickly.

An example of stroke order vs stroke direction is more apparent when you need
good proportions. Kanji like 風 or 看 are really hard to 'draw' with good square
proportions if you don't stroke them in a particular order. Kind of like
trying to draw a face by starting with the nostrils, then doing the eyebrows,
and ending with the outline of the face - it's tough.

Interestingly, the Japanese hiragana letters are really designed to be written
with a brush from top to bottom. The letters flow from one to the next really
nicely when written from top to bottom, and are actually much harder to write
properly when written left to right. See: す - き - の. They're much more natural
to write top to bottom, and keep going down rather than from a left to right
(which is also indicated by stroke order - the horizontal bars on the letters
above are written before the downstrokes).

~~~
timr
Ehhhh. This is all starting to sound rather speculative. People draw the
lowercase 'a' in all sorts of different ways and end up with an equally valid
result. There's no particular reason why drawing す from bottom-to-top would
_require_ that the shape be different. You just have to know the shape.

I also don't buy Emmett's claim. There are regional differences in stroke
order, even for hanzi. Japanese people use a different stroke order for kanji
than mainland Chinese people use for hanzi, which again is different than the
stroke order used by people in Taiwan and Hong Kong. The differences implied
by these things are smaller than the differences in CJK fonts, which have no
inherent "order" at all, yet people somehow manage to read.

Stroke order matters for calligraphy and memorization and dictionaries, but as
patio11 says, it's largely just prescriptive. Spelling is prescriptive, too:
it's historically and/or conventionally derived, but knowing the rules makes
learning words easier, and you certainly can't use a dictionary without it.

~~~
xiaoma
There are some minor differences in stroke order by region, but the basic
rules are always the same (e.g. top before bottom, left before right, etc).

It's genuinely hard to read people's writing if the stroke order is off.

If you write super slowly and deliberately it may not matter much, but in my
actual real-life usage I've found that it does. If you're learning, you're
definitely better off learning the standard stroke order for the area where
you live.

~~~
timr
I mean, sure...there's a gray area here. I could theoretically write all of my
latin characters from right-to-left, but it would be wonky and hard, if only
because english is written from left to right.

I believe there's a reason for the rules and that you need to know them. I
think patio11's metaphor is the correct one -- it's a lot like english
spelling.

------
joe_the_user
About the reference in article to deep learning producing art:

The thing is that what a current net is doing is very simple. It's just
extrapolating from a set on a high dimensional feature space.

There's no meaning here and so it's hard to generating a bunch of pseudo-data
as art. It's fun and interesting when you do it a few times so any semi-random
technique like ink blots or fractals but if you produce a steady stream of
something like that, it's unoriginal, non-artistic qualities become more
obvious.

~~~
anon4
If a urinal on a pedestal can be art, I think we've lost the fight when it
comes to the word having any meaning.

------
trhway
>suddenly realise they forgot how to write Kanji. I am also guilty of this –
even though I read a lot of Chinese and Japanese content in my everyday life,
I struggle to write Chinese characters. What we notice is that while we can
definitely read and recognise the characters we are able to write, the
converse is certainly not true.

when written in ink by brush the order of strokes seems to follow very natural
flow, yet when it is taught using pencil it looks much less natural.

~~~
zhemao
It's not about stroke order. You just forget how it should look. It's like the
difference between being able to recognize someone's face and being able to
draw it from memory.

~~~
trhway
Thanks. I did feel that my interpretation wasn't enough.

------
vvpan
"Dreams up" sounds a lot like modern "handcrafted with love" marketing.

------
teddyh
_' Twas brillig, and the slithy toves_

 _Did gyre and gimble in the wabe;_

 _All mimsy were the borogoves,_

 _And the mome raths outgrabe._

------
niccaluim
I wonder what RNN-generated fake Latin characters would look like. Sort of a
handwriting version of that "what does English sounds like to non-English-
speaking people" video.
([https://www.youtube.com/watch?v=Vt4Dfa4fOEY](https://www.youtube.com/watch?v=Vt4Dfa4fOEY))

~~~
hardmaru
I reproduced an English handwriting generation experiment in the previou post.

[http://blog.otoro.net/2015/12/12/handwriting-generation-
demo...](http://blog.otoro.net/2015/12/12/handwriting-generation-demo-in-
tensorflow/)

------
aftbit
The more I read about machine learning using neural networks, the more I
realize that we've just moved the "decisions" up into the choice of
architecture and hyperparameters. This blog post describes a process of tuning
the neural network until it generates Kanji that "look right" to the author.

~~~
habitue
What's interesting though is that debugging hyperparameters like this is much
more at the level of human thought, rather than the nitty-gritty of
programming. It's much closer to the Star Trek concept of programming

"Computer, create fake chinese characters. These don't look right, stop
drawing earlier. Ok, cut out all that are outside a one inch by one inch
square."

~~~
sago
"No, not like that. Why are you doing that? What on earth is causing that?
What can I do to change it? I'll throw away the ones that don't work - oh,
there are none left. Has it stopped working? Why is it eating 100% CPU and not
spitting out any results?"

One of the earliest games projects I worked on with learning AI scrapped the
learning AI because it had a habit of sometimes, but rarely, causing the
characters to run into the corner of a room and stay there until shot. Nobody
had the slightest clue how to 'debug' the neural network to solve the
intermittent issue.

It's all fun and games when things are going well. But like Jurassic Park,
when it comes to paying customers, eventually there will be running, and
screaming.

~~~
hardmaru
The problem with this particular algorithm is that I needed to have a lot of
data to make it sort of work. I'm not sure how your AI was trained to play
that game, sounds like a difficult problem-

------
ziyao_w
Pretty cool - some of the characters are actually real Chinese characters.

------
est
[http://hanja.naver.com/](http://hanja.naver.com/)

Here's a korean site providing many Hanja/Kanji/Hanzi stroke orders

~~~
hardmaru
Cool, I wonder how difficult it would be to extract all of the stroke data.
LINE would probably not put the data on github..

------
thrownaway2424
Maybe this can be used to generate embarrassing nonsense tattoos for idiotic
Americans.

