
iGAN – Deep learning software that generates images with a few brushstrokes - visionp
https://github.com/junyanz/iGAN
======
junyanz
Thanks for sharing our work. Check out the full video:
[https://m.youtube.com/watch?v=9c4z6YsBGQ0](https://m.youtube.com/watch?v=9c4z6YsBGQ0)

This work is a deep learning extension of our previous average image project:
[https://m.youtube.com/watch?v=1QgL_aPPCpM](https://m.youtube.com/watch?v=1QgL_aPPCpM).
See the New Yorker article for details:
[http://www.newyorker.com/tech/elements/out-of-many-
one](http://www.newyorker.com/tech/elements/out-of-many-one)

I guess the deep learning might be a better way to blend millions of images
for creating new visual content.

------
Springtime
The image generation portion reminded me of Neural Doodle [1], an impressive
project that translates very simple sketches into realistic representations
using what it calls 'style transfer'. The page has some great examples like
Impressionist paintings and texture creation.

[1] [https://github.com/alexjc/neural-
doodle](https://github.com/alexjc/neural-doodle)

GIF: [https://github.com/alexjc/neural-
doodle/raw/master/docs/Work...](https://github.com/alexjc/neural-
doodle/raw/master/docs/Workflow.gif)

------
rl3
It's going to be very interesting to see how this type of technology plays out
in relation to art asset creation in both the game and film industries.

A common technique concept artists use for matte painting is to take existing
images and blend them in to their creation, so this would almost be an
evolution of that methodology.

~~~
junyanz
You are absolutely right. I believe this deep learning technique is a fancy
way for mixing many many images automatically given a user's guidance.

~~~
rl3
Do you think a similar technique could work for generating 3D models?

For example, it's not hard to imagine future organic sculpting packages (e.g.
ZBrush) having this type of tech integrated. Perhaps in-game character
sculpting systems as well.

~~~
junyanz
It's possible. But 3D data (3D models, videos) are much more difficult to
model via a deep neural net. While most of the researchers focus on modeling
2d images in recent years, there are a few work on 3D. For example, here is a
project on modeling 3D objects like chairs and tables.
[https://arxiv.org/abs/1411.5928](https://arxiv.org/abs/1411.5928)

~~~
rl3
Thanks for the reply. As coincidence would have it, this appeared on HN just a
couple hours after, referencing the same paper:

[https://news.ycombinator.com/item?id=12581420](https://news.ycombinator.com/item?id=12581420)

Appears to be fundamentally 2D, but the interpolation between orientations
gives it a sort of meta 3D aspect.

------
RangerScience
Okay. Here's a wacky thought. Let's say I want more work of a particular style
than exists. AFAIK, what I'd do is train the neural net on a body of work
within that particular style, and then use tools like this one to "paint" and
produce new work in that style for minimal effort.

However - What knowledge or tools would help me in best affecting the work
that the neural net then produces? As in, effect the "style" that the network
applies?

~~~
junyanz
This is a brilliant idea. I guess it would be difficult for this work to
accomplish as you need to train a neural net on tons of data (like 100k
images, or millions), and we cannot find so many paintings with consistent
style.

Work like Deep style transfer, or Prisma can try to transfer the style of one
painting to an existing user photo. But you cannot use it as painting tool for
creating new stuff.

~~~
RangerScience
Thanks!

There's got to be a way, although it might be incestuous. Use Deep style
transfer and/or Prisma to massively increase the body of work, by transforming
other work into that style, and then using that as training data for this...?
Then I guess the artistry is in filtering those images, but that's a lot of
images...

OOOOOHHHH WAIT. Remember how there's that dude who gets shown surveillance
images from the middle east, and a computer watches his brain for the faster-
than-thought responses to there being things in those images? That same trick
MIGHT work for artistic sensibilities, but the response might not be
identifiable enough.

~~~
junyanz
We are working on something similar to your idea. We generate sketch images
from real images automatically and train a model on the sketch images. So
ideally, if a user draw the left wheel of the bicycle, the system will produce
the entire bicycle sketch. We will release this 'sketch' feature in a few days
and hope it will help a user better sketch object.

As you said, one can also apply other filters like Prisma.

~~~
RangerScience
Neat! Who's "we"? I'd like to read more.

------
infinitone
Question: is there something like this but for text? Like if i write a
sentence it starts generating sentences similar to it. Or is this an area of
research?

~~~
junyanz
Yes! In 1948, Shannon proposed using a Markov chain to create a statistical
model of the sequences of letters in a piece of English text and this model
can be used to generate random text given some existing text.
([http://www.cs.princeton.edu/courses/archive/spr05/cos126/ass...](http://www.cs.princeton.edu/courses/archive/spr05/cos126/assignments/markov.html)).
Here is a GitHub implementation:
[https://github.com/jsvine/markovify](https://github.com/jsvine/markovify)
Deep models like LSTM/RNN can probably produce better results.

~~~
clickok
According to a talk by Max Tegmark[0] (and its associated paper[1]), neural
nets (particularly LSTMs) might be inherently better at this sort of thing due
to the way they model mutual information.

Markov models are best suited to situations where an observation k-steps in
the past gives exponentially less information about the present[2] (decaying
according to something like λ^k for 0 <= λ < 1). Intuitively, the amount of
context imparted by a word or phrase decays somewhat more slowly. That is, if
I know the previous five words, I can make a good prediction about the next
one, and likely the next one, and slightly less likely the one after that,
whereas in a Markovian setting my confidence in my predictions should decay
much more quickly.

So in answer to the grandparent, such a thing should be reasonably
straightforward to build if it doesn't exist already, and it may offer
improvements over a similar model based on Markov chains.

\---

0\.
[https://www.youtube.com/watch?v=5MdSE-N0bxs](https://www.youtube.com/watch?v=5MdSE-N0bxs)

1\. [https://arxiv.org/abs/1606.06737](https://arxiv.org/abs/1606.06737)

2\. Why is this? Lin & Tegmark offer details in the paper, but it comes from
the fact that the singular values of the transition matrix are all less than
or equal to one (an aperiodic & ergodic transition matrix has only one
singular value equal to one), and so the other singular vectors fall away
exponentially quickly, with the exponent's base being their corresponding
singular value.

~~~
tfgg
It sounds like Tegmark is pointing out a pretty obvious and deliberately
designed property of LSTMs... the entire point of them is to avoid
exponentially decaying / exploding gradients and allow propagation of
information over longer time-scales.

------
visionp
See the article at NVIDIA developer blog:
[https://news.developer.nvidia.com/artificial-intelligence-
so...](https://news.developer.nvidia.com/artificial-intelligence-software-
easily-generates-digital-art/)

------
chanux
Finally I can draw an owl following this guide
[http://imgur.com/gallery/RadSf](http://imgur.com/gallery/RadSf)

------
camillomiller
Wow, It's an AI Bob Ross! Please add a soothing slightly robotic voice! :)

------
Roritharr
This is amazing work, well done! I see this as potentially very powerful not
only in the generation of images, but also in the search field.

How often have i looked for shoes or clothing items that have "a stripe of
white around the soles, black for the body with some dark red decals" with
this i could basically enter this as some kind of visual search query.

Exciting stuff.

------
optforfon
How is Adobe involved?

This is Python + OpenCV, but I'm under the impression Adobe is a pretty
serious C++ shop and has their own graphics libraries (I'm mainly aware of
boost::GIL and their STL)

~~~
santaclaus
Adobe's research group in Seattle is pretty separated off from the core of the
company down in San Jose. The researchers there don't have academic professor
levels of do what ever you want freedom, but a lot of their work doesn't look
like it even comes close to being associated with any of Adobe's products.

~~~
junyanz
I actually did two internship there. The Seattle lab did ship many new
features like content-aware fill, shake reduction introduced in Photoshop CC
2015. The researchers there also has lots of freedom to explore different
directions not directly related to the products, and it turns out that many of
them will become a new feature of products within a few years.

------
restapi
Looks very neat. Will try it out, thank you for sharing! Would be interested
to see this kind of tech in different verticals...

------
ptrkrlsrd
The results of this image editing tool would be a great starting point for
matte paintings and for similar applications.

------
zump
Two drawbacks:

\- Lack of close up detail as expected from generative networks. Looks like
someone has used the Photoshop clone tool.

\- Low resolution results as seems to be common with GANs.

~~~
junyanz
Sure. The current generative models _cannot_ produce good details, and the
generated images are often low resolution (e,g. 64x64). In the paper, we tried
to enhance the low res result by stealing the high res details from the
original photo. But in general, there are not many things you can do.

On the other hand, in the recent years, we see dramatic improvement of image
quality from these generative models. Overall I think this is a promising and
exciting direction.

