
Turning two-bit doodles into fine artworks with deep neural networks - coolvoltage
https://github.com/alexjc/neural-doodle
======
bd
These are really cool. Though if you were, like me, puzzled how could some
really complex and coherent features come from those simple drawings / masks,
have a look at the original paintings that were used as sources and compare
them with generated images:

Original #1:

[https://github.com/alexjc/neural-
doodle/blob/master/samples/...](https://github.com/alexjc/neural-
doodle/blob/master/samples/Monet.jpg)

Generated #1:

[https://github.com/alexjc/neural-
doodle/blob/master/docs/Coa...](https://github.com/alexjc/neural-
doodle/blob/master/docs/Coastline_example.png)

Original #2:

[https://github.com/alexjc/neural-
doodle/blob/master/samples/...](https://github.com/alexjc/neural-
doodle/blob/master/samples/Renoir.jpg)

Generated #2:

[https://github.com/alexjc/neural-
doodle/blob/master/docs/Lan...](https://github.com/alexjc/neural-
doodle/blob/master/docs/Landscape_example.png)

So those new generated images are structurally very similar to the original
sources. Neural net seems to be good at "reshuffling" of the sources. That's
probably how things like reflections on the water got there, even if not
present in the doodles.

~~~
nuclai
Thanks for clarifying, I'll update the README. The research paper does a
better job of explaining this with its figures!

The algorithm can only reuse combinations of patterns that it knows about, it
can do extrapolation but it often ends up being just like a blend. However,
you can give it multiple images and it'd borrow the best features from
either—for example drawing from all of Monet's work. (Needs more optimization
for this to work though, takes a lot of time and memory.)

As for the images, as long as the type of scene is roughly the same it'll work
fine. The fact it can copy things "semantically" by understanding the content
of the image makes it work much more reliably—at the cost of extra annotations
from somewhere. With the original Deep Style Network it's very fragile to
input conditions, and composition needs to match very well for it to work (or
you pick an abstract style). That was part of the motivation for researching
this over the past months.

~~~
bd
So if I understood well, this GIF shows you - human being - exploring
possibilities / limitations of your method, hand tweaking it for one
particular image?

[http://nucl.ai/files/2016/03/MonetPainting.gif](http://nucl.ai/files/2016/03/MonetPainting.gif)

That is, the final image, the one that looks the best, is the result of you
doing tweaks to doodles to get something that neural net can then fill-in
convincingly?

Or are these a different runs of the same method based on the same inputs,
that have some natural variability, and you selected the one that looked the
best?

Or are these progression steps in one run of the automated algorithm?

Language in the blog post is kinda ambiguous, not sure which steps were done
by algorithm and which by a human being.

~~~
nuclai
Exactly, the doodling is done by humans and the machine paints the HD images
based on Renoir's original. I've edited the blog post to clarify.

~~~
bd
That part was clear :)

What still isn't clear to me is how exactly that "workflow" demo (and
consequently the "money-shot" final generated images) happened.

There is a progression of generated images with increasing quality. Who did
which steps in those iterations?

Blog post uses ambiguous language: "N-th image tries / removes / fixes", etc.

It's not clear though if it was:

1) algorithm steps (keep computing more till generated image looks good), or

2) human being tweaking inputs to fixed algorithm (keep painting new
input/output doodles till generated image looks good), or

3) human being tweaking algorithm itself (change code till generated image
looks good).

~~~
nuclai
The algorithm does the same thing every time (it's triggered on request), only
the input is changed by the human modifying the doodle—as shown in the video.

The output gets better because through iteration the glitches are removed
incrementally, and it converges on a final painting that looks good!

------
nuclai
(Author here.)

For details, the research paper is linked on the GitHub page:
[http://arxiv.org/abs/1603.01768](http://arxiv.org/abs/1603.01768)

For a video and higher-level overview see my article from yesterday:
[http://nucl.ai/blog/neural-doodles/](http://nucl.ai/blog/neural-doodles/)

Questions welcome!

~~~
ThePhysicist
You should make an app for that (seriously)!

~~~
Untit1ed
I can't believe he posted it up on github before doing this - there's so much
potential for this to go viral once it's packaged with a doodling app.

EDIT: Actually reading more closely I guess 10 minutes on a machine with a
decent GPU is a lot of server load :|.

~~~
nuclai
The research is based on work I did writing and improving @DeepForger
([http://twitter.com/deepforger](http://twitter.com/deepforger)), an online
service for "basic" style transfer. The GitHub is a standalone version for
learning and education, which doesn't do HD rendering as well yet and uses a
bit more memory. The positive side, however, is that opening up the source
code makes these ideas progress faster!

We'll try to integrate the idea of semantic style transfer into @DeepForger in
the future, but this require quite a bit of work to get it to reliably
understand portraits or landscapes without anyone's intervention. The fact it
does require these semantic maps for all images makes it less straightforward
to release as a service.

~~~
TuringTest
One question, is the semantic map created on the fly at the same time as the
final image is composed, or are the maps pre-computed?

~~~
nuclai
The semantic map remains static during the optimization, so it can be provided
as a pre-computation (e.g. pixel labeling, semantic segmentation, etc.) or
done by hand. The ones in the repository are done manually, but now
experimenting with other algorithms. Anything that returns a bitfield or masks
can be used!

------
pygy_
I'd love/dread to see this this kind of work (neural nets run in reverse mode)
applied to voices and accents.

You could credibly put any words in the mouth of anyone.

~~~
mdasen
This basically already exists. Siri and similar TTS voices today are generated
off of a lot of recorded speech from a person. There's a lot to get right for
it to sound natural, not just hit the phonemes. You have to deal with the
transitions between phonemes, declination, etc.

I've even seen a demo converting one person's voice to another (without going
through text) trying to preserve the pattern (pauses, stresses, etc.). It was
kinda cool, but you wouldn't think it was the other person in a genuine way.

~~~
lhnz
Do you know of any projects on GitHub that do this?

------
beeswax
That's pretty cool. Might speed up asset creation for games by orders of
magnitude: Train with concept art, generate the variations via these networks;
adds consistency to the output and helps loosen the asset bottleneck / content
treadmill esp for smaller studios/individuals.

~~~
logicrook
No, it's a gross misunderstanding of what is concept art. A concept art piece
is about the idea, not the style; if you take a famous protagonist, say
batman, you can have it drawn in a medieval, realistic, sci-fi version, drawn
in a stylized, realistic, cartoon way; in each case you will recognize him
because the idea, the shape language, have nothing to do with the style of the
drawing.

Even for illustrative work, where it can give you a good base, it still sucks,
because for actual painters this step (thumbnailing) is actually the quickest;
most of the time-consuming painting process is 'finishing', or 'detailing' the
rough.

However, where it's great is in giving the ability to inexperienced people to
paint well. The hard part of the painting is getting the lightning, color
scheme, perspective right, but the finishing process is quite mechanical. So
it could ease the outsourcing of some art assets creation.

~~~
beeswax
yeah you are right in terms of concept art; I figured that unfortunately more
often than not the lines between concept, mood, detailing etc tend to be
blurred depending on who looks at it; Also this is not going to replace
individual character design or other specific assets, but might remove
scalability issues w/ project that require a huge number of different
backgrounds, texture variations.

Of course the originial craft to producing high quality output is still needed
- and just one image is not going to be sufficient anyway.

As you mentioned, I can also think of giving lesser experienced folks the
ability to tinker with scene setup, dimensions, ratios etc and get faster
'final' results, although the 'old school' approach to getting those right
before actually detailing something is quite important.

~~~
logicrook
>yeah you are right in terms of concept art; I figured that unfortunately more
often than not the lines between concept, mood, detailing etc tend to be
blurred depending on who looks at it; Also this is not going to replace
individual character design or other specific assets, but might remove
scalability issues w/ project that require a huge number of different
backgrounds, texture variations.

You can already do that in 3D if the overall concepts have been decided and
some basic assets are there. Just switching the textures, lightning
conditions, and some predefined building blocks do basically the same thing as
this algorithm, except it's already part of the pipeline, and it gives you
much more since it's 3D.

The other thing is that these algorithms look indecently good when you see a
thumbnail, but very bad if you're looking at it too closely. These cool
'concept art' pieces that look good are 90% of what people see, but they do
not represent 10% of the concept art work; most of it are boring details of
joints, how the blade is strapped to the costume, how windows open, unsexy
stuff as can be (that you can't do with such algorithms).

~~~
Aperson4321
beginner artist/programmer combo here, I am sure this type of tools can be
very powerfull for prototyping the moods of 2d games, if not parts of 3d
games, as for its limitations this method do get the general scene mood rather
well. Anyway far better than that shitty programmer art placeholders. The
process could let game devs be many times faster and better at finding the
right mood in a game, just swapping out the images the program learns from to
find a new mood theme for the game got an level of intuitive logic to it, that
I am sure can make it very accessible to non technicals. A well made tool of
this type would be very good in the early prototyping pipe line for 2d games
and more. Would not make even really bad artists jobless, but in the case of
specific jobs it would be a powerful tool.

------
ogreveins
I played with something similar for a while,
[https://github.com/jcjohnson/neural-
style](https://github.com/jcjohnson/neural-style)

What I've found so far is that it takes a while to get good results like
something that looks like its own creation instead of an overlap of pictures.
There's no exact way to do this. If you modify existing artwork it works well
enough since the source is already somewhat divorced from reality but photos
are difficult. When it works it's amazing though.

~~~
nuclai
From that perspective, this research is two steps further than Neural Style, I
wrote about it yesterday here: [http://nucl.ai/blog/neural-
doodles/](http://nucl.ai/blog/neural-doodles/)

First, the paper I call "Neural Patches" (Li, January 2016) makes it possible
to apply context-sensitive style, so you have more control how things map from
one image to another. Second, we added extra annotations (which you can
specify by hand or from a segmentation algorithm) that helps you control
exactly how you want the styles to map. We call that "semantic style transfer"
(Champandard, March 2016).

You're right about it being hard otherwise, it was for many months and that's
what pushed this particular line of research! Try it and see ;-)

~~~
wimagguc
This reminds me of "If Edison didn't invent the light bulb, someone else would
have: there were thousands of other engineers experimenting with the exact
same thing, a natural next step after electricity came about" (-- paraphrased
from Kevin Kelly)

~~~
SixSigma
One was called Swan, Edison tried to sue him for patent infringement but
Edison's lawyers warned him about prior art, so instead he negotiated a joint
venture.

You may remember the "Mazda" brand of bulbs

[https://en.wikipedia.org/wiki/Edison_and_Swan_Electric_Light...](https://en.wikipedia.org/wiki/Edison_and_Swan_Electric_Light_Company)

------
Angostura
Looked at the images and honestly thought that someone had posted an April
fools joke a few weeks early. Amazing.

------
MichaelBurge
Very interesting! The thing that amazes me most about these neural network
projects is how small the source usually is compared to what they're doing.
Your doodle.py is only 453 lines.

~~~
afandian
I imagine between SciKit[0], Thenano[1] and Lasagne[2], the total size is a
little north of 453 lines.

[0] [http://scikit-image.org/](http://scikit-image.org/)

[1] [https://github.com/Theano/Theano](https://github.com/Theano/Theano)

[2]
[https://github.com/Lasagne/Lasagne.git](https://github.com/Lasagne/Lasagne.git)

------
amelius
What data has been used to train the neural network?

~~~
nuclai
It's a pre-trained network on image classification dataset from 2014 called
ImageNet. The network is called VGG, paper is here:
[http://arxiv.org/abs/1409.1556](http://arxiv.org/abs/1409.1556)

There's no additional training apart from that. The neural network is used to
extract patterns (grain/texture/style) and a separate optimization tries to
reproduce them as appropriate.

~~~
amelius
Interesting. If A is the input image, and B is the style image, then from
which of those two images is the NN extracting patterns? And how is the other
image used to get the desired effect?

Just trying to get a birds-eye view of the algorithm :)

~~~
nuclai
Both images have their patterns extracted by the NN, and the optimization then
tries to match the best patches from one image with the other, performing
gradient descent to adjust the pixel values from a random start image.

------
mkj
In coming years this will create a very strange reality combined with
improving VR tech...

------
Dowwie
Have you run children's paintings through this yet?

~~~
hcrisp
Another suggestion: try running a copy of Tolkien's Middle-earth map to
transfer the style to a more detailed USGS-style map [1].

[1] e.g.
[https://www.google.com/search?q=usgs+map&safe=active&client=...](https://www.google.com/search?q=usgs+map&safe=active&client=ms-
android-uscellular-
us&biw=360&bih=559&prmd=ismvn&source=lnms&tbm=isch&sa=X&ved=0ahUKEwim-
Kr4nrbLAhXL1CYKHQ_wC88Q_AUIBigB#imgrc=N1UT1Hw5ptb_bM%3A)

------
wslh
Exciting! Where can we find image databases for this?

~~~
nuclai
You can use any image as source, but to create annotations you have to do that
yourself currently. Using simple segmentation libraries (or clustering) can do
a good job for certain images, or look at better solutions for semantic
segmentation:
[http://gitxiv.com/search/?q=segmentation](http://gitxiv.com/search/?q=segmentation)

------
api
This project should be named Bob Ross.

------
intrasight
Now please combine this with TiltBrush

------
mhurron
Finally a way to draw things without learning how to draw. I'll be famous!

------
tjaad
Would this work with photos?

~~~
nuclai
You can specify two pairs of images (content+annotation) and it'll transfer
the style from one to another as consistently as possible. The down side is
that you need to find an algorithm, neural network, or person to create the
annotations. (We're working on training one for portraits only.)

These examples are in the paper above, direct link for convenience:

[https://twitter.com/alexjc/status/705784566657720320](https://twitter.com/alexjc/status/705784566657720320)

[https://twitter.com/alexjc/status/705811208901939200](https://twitter.com/alexjc/status/705811208901939200)

