
Deep Convolutional Generative Adversarial Networks - nazka
https://github.com/Newmu/dcgan_code/tree/gh-pages
======
temuze
I think AI-generated art is an industry waiting to happen. Imagine telling a
VR system "show me a mountain ski resort with skiers going everywhere" and it
just generates it. It could have a huge impact on procedurally generated maps
in gaming.

Or even plays or fiction! Look at how formulaic detective shows are. It'll
probably be hard to get the emotional nuance, but maybe AI could make all of
the characters and plots.

While I think we're far away from generating TV shows with Conv-Nets, perhaps
it can do simpler things (porn?).

~~~
rl3
>It could have a huge impact on procedurally generated maps in gaming.

I think it could go far beyond just maps. A major immersion breaker in most
open-world games is the lack of detail in the lives of non-player characters.
It's impractical to manually create this content at any real scale, so I think
that's an area where AI would shine.

However, that isn't terribly valuable without also being able to generate
convincing conversation dynamically—and that's definitely Turing Test
territory. It would also help to synthesize that output in the form of an
uncanny and unique human voice, which is also hard, and probably squarely in
the same territory.

That said, I think all of this could be accomplished relatively soon, and
without general intelligence necessarily being solved.

Of course, once general intelligence is solved (and doesn't go sideways on
us), it's a good bet that this kind of thing will be nothing short of
incredible. I imagine the Holodeck from Star Trek might be a fair
approximation.

As an aside, utilizing general intelligence within interactive entertainment
mediums for the purpose of creating believable characters may prove to be
highly unethical. Bostrom's _Superintelligence_ touched on this, and it's
quite interesting if not terrifying to ponder.

------
riordan
Well this is delightfully terrifying. Neural Networks that can create
environments and people on their own.

~~~
gcr
These models are just clever image uncompression. Nothing scary here. :)

It's just like using Markov chains to make English-like gibberish, but for
pictures.

The interesting bit is that (I think) the generative adversarial network is
regressing from random noise to an image, which isn't how most autoencoders
work.

~~~
davmre
Compression is a subtle and powerful thing. The ability to compress is closely
related to prediction: if you can predict 95% of the moves a chess grandmaster
will make, you can compress the game by explicitly representing only the other
5% of moves. If you could perfectly predict (or equivalently, compress) the
actions of a real-world human being, you'd have solved AI.

Despite their simplicity, Markov chains are used throughout modern statistical
AI, e.g., the Google Translate language model is essentially a big Markov
chain. The fact that deep networks can apparently form better generative
models across a wide range of applications (no one has ever actually gotten
these kinds of image generation results from Markov chains) means that they
really are getting at more interesting structure. They're not a panacea, but
it's still a pretty big deal.

~~~
frisco
Pretty sure Google Translate is not a Markov model at this point; pretty sure
it's a deep recurrent network.

~~~
abrichr
Yep, LSTM RNNs:

[http://googleresearch.blogspot.ca/2015/08/the-neural-
network...](http://googleresearch.blogspot.ca/2015/08/the-neural-networks-
behind-google-voice.html?m=1)

------
grandalf
I wonder if, with the right training data, this kind of thing could make use
of detailed 3D/structural knowledge when doing the transforms.

There is lots of information about the 3D world present in the light (at least
for the bedrooms).

This could be very useful for spatial navigation or computer vision, I'd
think.

~~~
nl
Yes. I know a guy doing his PhD on this.

~~~
grandalf
Interesting! Any links or terms to google would be appreciated.

~~~
nl
I believe it is part of this project:
[https://cs.adelaide.edu.au/~hengel/?/research/project-
lego/](https://cs.adelaide.edu.au/~hengel/?/research/project-lego/)

Paper:
[http://arxiv.org/pdf/1511.02570.pdf](http://arxiv.org/pdf/1511.02570.pdf)

He's in that research group, anyway.

------
cr4zy
Looks similar to the top project on this page from cs231n by Jon Gauthier:

[http://cs231n.stanford.edu/reports.html](http://cs231n.stanford.edu/reports.html)

~~~
smhx
Both use GANs to generate images, and that similarity is indeed there.
Original GAN paper: [http://papers.nips.cc/paper/5423-generative-adversarial-
nets](http://papers.nips.cc/paper/5423-generative-adversarial-nets)

------
deepnet
The Latent space of the image vectors is rich with emergent semantic algebra.

Extracting a vector for glasses wearing and adding it to another person;
rotating a face is a consistent vector; a vector for removing windows from any
imagined room; a vector for smiling.

This all seems a bit like Geoff Hinton's thought vectors - nice work.

------
gcr
The `master` branch is the one with the code. This link points to `gh-pages`,
which doesn't have any.
[https://github.com/Newmu/dcgan_code/tree/master](https://github.com/Newmu/dcgan_code/tree/master)

------
djfm
Looks very intersting (and beautiful), wish I could understand some more of it
:)

Could someone explain in simple terms how the arithmetic work? E.g. when they
do "smiling woman" \- "neutral woman" \+ "neutral man" = "smiling man", is it
like you perform the operations on the networks' weights or something?

~~~
smhx
The input to the G-network is a Vector of numbers, of size K. The output is an
image. Different random vectors give different output images.

We are doing vector arithmetic in this input space, and the outputs seem to be
semantically changing according to the vector arithmetic.

------
michael_h
Those faces are right in the bottom trough of the uncanny valley. Especially
the ones with the...teeth under the lips?

Buh.

