
CAN: Generating Art by Learning About Styles and Deviating from Style Norms - rimraf
https://arxiv.org/abs/1706.07068
======
onemoresoop
Art is the soul embedded in the medium but it's not the medium itself. So art
could be anything digital if it has some human touch. But is art produced by
"generating [...] by learning about styles and deviating from them" really
art? I think it is but a new kind of art. Nature can produce art but
algorithms are not nature. It is very interesting to observe where this is
going. So far generated art had some kind of wow moment but people quickly got
bored of it until a new formula is discovered, has its moment then fades. This
type of generated art through learning about styles has a lot of potential to
change what we see as art. End of rant.

------
xrd
I am surprised there are not more "GANs-as-a-service" startups. I only know of
runwayml.com. GANs seem like there are ripe opportunities for collaborations
with artists. But, inaccessible to artists without easier tools.

~~~
modulate_ai
What do you mean by GANs-as-a-service here? I'm not sure that our
understanding of GANs is refined enough to offer something like "a framework
for letting non-experts train a GAN to do cool things" outside of a very
limited scope - there are lots of issues with things like making sure that the
generator/discriminator converge and avoiding oscillations back and forth
between either winning, that can unexpectedly ruin your training.

On the other hand, if you mean "startups that use GANs to build cool systems,
such as manipulable latent spaces for interesting data" then I totally agree!
For example, we're using GANs to train voice conversion systems, which let you
manipulate a space of voice representations to make new voices, which you can
then sound like. I definitely expect to see a lot of startups doing
interesting things like that popping up!

~~~
gwern
> there are lots of issues with things like making sure that the
> generator/discriminator converge and avoiding oscillations back and forth
> between either winning, that can unexpectedly ruin your training.

I think the situation has been gradually improved. I've been messing around
with GANs since Soumith first released his DCGAN and I've tried out DCGAN,
WGAN (in several variants), SAGAN, MSG-GAN, ProGAN etc, and GANs increasingly
'just work'. ProGAN especially: I've yet to see any of my ProGANs diverge or
have serious instability, and I've been doing things like retraining the
CelebHQ-A faces model with anime faces, and it's worked just fine (better than
any other GAN so far, actually). ProGAN is _slow_ but the progressive training
works wonders for stability and image generation quality. I haven't had to
tweak the learning rate or mess with filters or any of the things necessary
with earlier GANs.

It might be expensive to offer as a service because you are still talking
weeks of training on a GPU with >=10GB VRAM (so a 1080ti or P100 instance),
but I think you could offer it as a service where the artist merely uploads a
tarball of images and gets out a trained model and samples without much more
expertise.

~~~
xrd
Can I ask, what are you using to play with these GANs? An online playground
like colab from Google? Or, your own hardware? Are you using a shared
notebook, or building them up yourself? I glanced at your blog/site so sorry
if I missed a description of your research playground already. Thank you for
the rich comment.

~~~
gwern
I didn't have a GPU when I started; I started with a borrowed GTX 980 I sshed
into, but concluded that DCGAN/WGAN were inadequate to generate decent anime
faces, much less anime images in general.

So I took a very long break to work on creating a large anime image corpus (
[https://www.gwern.net/Danbooru2017](https://www.gwern.net/Danbooru2017) ),
with the idea that the rich tagging information could fix the GAN problems.
StackGAN impressed me a great deal by generating high-quality images quickly
by feeding in a text embedding of the image description and then doing
progressive training, so I figured that Danbooru tags would be almost as good.
(I couldn't get StackGAN itself to work even though I had tags downloaded for
preliminary versions of Danbooru2017: it uses an old pre-generated set of text
embeddings and I wasn't good enough at Python/Tensorflow to either fix the
original embedding code to generate a new set, or create a new embedding &
edit StackGAN to use that.)

Two months ago or so, I finished building my desktop with 2x1080ti, and I've
been using a spot P100 AWS instance as well to run 2 or 3 instances of MSG-
GAN/ProGAN/SAGAN/Glow simultaneously. The upshot there so far is: MSG-GAN
works well but simply requires too much RAM since it doesn't use progressive
growing; SAGAN works nicely but the self-attention mechanism means it has a
hard time scaling past 128px; Glow has very impressive results in OpenAI's
paper but the memory consumption is enormous (reversible gradients or no) and
the paper doesn't make clear that it requires several GPU-months and is much
slower than ProGAN, so I never got beyond 'cool textures' before I began to
wonder if I had a bug and checked the paper/README much more carefully; ProGAN
works well but is still relatively slow (1-2 GPU-weeks for anime faces from
scratch of decent quality, ~3 GPU-days if initializing from CelebHQ-A for
transfer learning) and can handle faces easily but handling anime images in
general may require several GPU-months of training, I'm not sure, since I only
gave it about a month before moving on.

I post samples from various runs occasionally on my Twitter thread:
[https://twitter.com/gwern/status/1040323921961213957](https://twitter.com/gwern/status/1040323921961213957)

I mostly work in the terminal, no fancy notebook stuff. I should probably
write it all up, but you know how it is, it's a pain to figure out what you
were doing years ago and figure out what samples survive etc.

