Hacker News new | comments | show | ask | jobs | submit login
iGAN – Deep learning software that generates images with a few brushstrokes (github.com)
194 points by visionp on Sept 25, 2016 | hide | past | web | favorite | 38 comments

Thanks for sharing our work. Check out the full video: https://m.youtube.com/watch?v=9c4z6YsBGQ0

This work is a deep learning extension of our previous average image project: https://m.youtube.com/watch?v=1QgL_aPPCpM. See the New Yorker article for details: http://www.newyorker.com/tech/elements/out-of-many-one

I guess the deep learning might be a better way to blend millions of images for creating new visual content.

The image generation portion reminded me of Neural Doodle [1], an impressive project that translates very simple sketches into realistic representations using what it calls 'style transfer'. The page has some great examples like Impressionist paintings and texture creation.

[1] https://github.com/alexjc/neural-doodle

GIF: https://github.com/alexjc/neural-doodle/raw/master/docs/Work...

It's going to be very interesting to see how this type of technology plays out in relation to art asset creation in both the game and film industries.

A common technique concept artists use for matte painting is to take existing images and blend them in to their creation, so this would almost be an evolution of that methodology.

You are absolutely right. I believe this deep learning technique is a fancy way for mixing many many images automatically given a user's guidance.

Do you think a similar technique could work for generating 3D models?

For example, it's not hard to imagine future organic sculpting packages (e.g. ZBrush) having this type of tech integrated. Perhaps in-game character sculpting systems as well.

It's possible. But 3D data (3D models, videos) are much more difficult to model via a deep neural net. While most of the researchers focus on modeling 2d images in recent years, there are a few work on 3D. For example, here is a project on modeling 3D objects like chairs and tables. https://arxiv.org/abs/1411.5928

Thanks for the reply. As coincidence would have it, this appeared on HN just a couple hours after, referencing the same paper:


Appears to be fundamentally 2D, but the interpolation between orientations gives it a sort of meta 3D aspect.

I can definitely see something like this being used for story boarding and doing rough sketches quickly.

Okay. Here's a wacky thought. Let's say I want more work of a particular style than exists. AFAIK, what I'd do is train the neural net on a body of work within that particular style, and then use tools like this one to "paint" and produce new work in that style for minimal effort.

However - What knowledge or tools would help me in best affecting the work that the neural net then produces? As in, effect the "style" that the network applies?

This is a brilliant idea. I guess it would be difficult for this work to accomplish as you need to train a neural net on tons of data (like 100k images, or millions), and we cannot find so many paintings with consistent style.

Work like Deep style transfer, or Prisma can try to transfer the style of one painting to an existing user photo. But you cannot use it as painting tool for creating new stuff.


There's got to be a way, although it might be incestuous. Use Deep style transfer and/or Prisma to massively increase the body of work, by transforming other work into that style, and then using that as training data for this...? Then I guess the artistry is in filtering those images, but that's a lot of images...

OOOOOHHHH WAIT. Remember how there's that dude who gets shown surveillance images from the middle east, and a computer watches his brain for the faster-than-thought responses to there being things in those images? That same trick MIGHT work for artistic sensibilities, but the response might not be identifiable enough.

We are working on something similar to your idea. We generate sketch images from real images automatically and train a model on the sketch images. So ideally, if a user draw the left wheel of the bicycle, the system will produce the entire bicycle sketch. We will release this 'sketch' feature in a few days and hope it will help a user better sketch object.

As you said, one can also apply other filters like Prisma.

Neat! Who's "we"? I'd like to read more.

Question: is there something like this but for text? Like if i write a sentence it starts generating sentences similar to it. Or is this an area of research?

Yes! In 1948, Shannon proposed using a Markov chain to create a statistical model of the sequences of letters in a piece of English text and this model can be used to generate random text given some existing text. (http://www.cs.princeton.edu/courses/archive/spr05/cos126/ass...). Here is a GitHub implementation: https://github.com/jsvine/markovify Deep models like LSTM/RNN can probably produce better results.

Some fun examples of text generation using LSTM/RNN (and a good overview of RNNs for sequences): http://karpathy.github.io/2015/05/21/rnn-effectiveness/#fun-...

According to a talk by Max Tegmark[0] (and its associated paper[1]), neural nets (particularly LSTMs) might be inherently better at this sort of thing due to the way they model mutual information.

Markov models are best suited to situations where an observation k-steps in the past gives exponentially less information about the present[2] (decaying according to something like λ^k for 0 <= λ < 1). Intuitively, the amount of context imparted by a word or phrase decays somewhat more slowly. That is, if I know the previous five words, I can make a good prediction about the next one, and likely the next one, and slightly less likely the one after that, whereas in a Markovian setting my confidence in my predictions should decay much more quickly.

So in answer to the grandparent, such a thing should be reasonably straightforward to build if it doesn't exist already, and it may offer improvements over a similar model based on Markov chains.


0. https://www.youtube.com/watch?v=5MdSE-N0bxs

1. https://arxiv.org/abs/1606.06737

2. Why is this? Lin & Tegmark offer details in the paper, but it comes from the fact that the singular values of the transition matrix are all less than or equal to one (an aperiodic & ergodic transition matrix has only one singular value equal to one), and so the other singular vectors fall away exponentially quickly, with the exponent's base being their corresponding singular value.

It sounds like Tegmark is pointing out a pretty obvious and deliberately designed property of LSTMs... the entire point of them is to avoid exponentially decaying / exploding gradients and allow propagation of information over longer time-scales.

Check out this rather entertaining talk from GitHub universe about the use of an LSTM to generate a film script: https://www.youtube.com/watch?v=W0bVyxi38Bc

and the short film they made, using that script: https://www.youtube.com/watch?v=LY7x2Ihqjmc

(disclosure: i work for github on events/AV)

His subsequent colleagues fired it at Usenet


Robin Sloan (Author of Mr. Penumbra's 24 Hour Bookstore, Julie Rubicon[1]) built something like you describe using a corpus of old sci fi stories ("If I had to offer an extravagant analogy (and I do) I’d say it’s like writing with a deranged but very well-read parrot on your shoulder"):


[1] https://m.facebook.com/notes/robin-sloan/julie-rubicon/98569...

check anonymouth

See the article at NVIDIA developer blog: https://news.developer.nvidia.com/artificial-intelligence-so...

Finally I can draw an owl following this guide http://imgur.com/gallery/RadSf

Wow, It's an AI Bob Ross! Please add a soothing slightly robotic voice! :)

This is amazing work, well done! I see this as potentially very powerful not only in the generation of images, but also in the search field.

How often have i looked for shoes or clothing items that have "a stripe of white around the soles, black for the body with some dark red decals" with this i could basically enter this as some kind of visual search query.

Exciting stuff.

How is Adobe involved?

This is Python + OpenCV, but I'm under the impression Adobe is a pretty serious C++ shop and has their own graphics libraries (I'm mainly aware of boost::GIL and their STL)

Adobe's research group in Seattle is pretty separated off from the core of the company down in San Jose. The researchers there don't have academic professor levels of do what ever you want freedom, but a lot of their work doesn't look like it even comes close to being associated with any of Adobe's products.

I actually did two internship there. The Seattle lab did ship many new features like content-aware fill, shake reduction introduced in Photoshop CC 2015. The researchers there also has lots of freedom to explore different directions not directly related to the products, and it turns out that many of them will become a new feature of products within a few years.

In the Acknowledgements it says:

This work was supported, in part, by funding from Adobe, eBay and Intel, as well as a hardware grant from NVIDIA. J.-Y. Zhu is supported by Facebook Graduate Fellowship.

I assume Adobe is in the title and eBay and Intel aren't for some reason

I am not sure. I think it could be a proof-of-concept opensource prototype. Adobe researcher was also involved.

Because large corporation funding usually specifies which language you must use or else no money for ya.

Looks very neat. Will try it out, thank you for sharing! Would be interested to see this kind of tech in different verticals...

The results of this image editing tool would be a great starting point for matte paintings and for similar applications.

Two drawbacks:

- Lack of close up detail as expected from generative networks. Looks like someone has used the Photoshop clone tool.

- Low resolution results as seems to be common with GANs.

Sure. The current generative models cannot produce good details, and the generated images are often low resolution (e,g. 64x64). In the paper, we tried to enhance the low res result by stealing the high res details from the original photo. But in general, there are not many things you can do.

On the other hand, in the recent years, we see dramatic improvement of image quality from these generative models. Overall I think this is a promising and exciting direction.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact