Hacker News new | past | comments | ask | show | jobs | submit login

Thanks for all the upvotes! Since I made this site, people have already started to train on datasets beyond just real faces. Turns out it can disentangle pretty much any set of data.

Gwern has applied this to anime dataset https://twitter.com/gwern/status/1095131651246575616

Cyril at Google has applied it to artwork https://twitter.com/kikko_fr/status/1094685986691399681

This was to raise awareness for what a talented group of researchers made at Nvidia over the course of 2 years, the latest state of the art for GANs. https://arxiv.org/pdf/1812.04948.pdf (https://github.com/NVlabs/stylegan)

Rani Horev wrote up a nice description of the architecture here. https://www.lyrn.ai/2018/12/26/a-style-based-generator-archi...

Feel free to experiment with the generations yourself at a colab I made https://colab.research.google.com/drive/1IC0g2oDQenrDmwbtkKo...

I'm currently working on a project to map BERT embeddings of text descriptions of the faces directly to the latent space embedding (which is just a 512 dimensional vector). The goal is to control the image generation with sentences, once the mapping network is trained. Will definitely post on hacker news again if that succeeds. The future is now!

I love this part of the system req's (from the stylegan repo):

"One or more high-end NVIDIA GPUs with at least 11GB of DRAM. We recommend NVIDIA DGX-1 with 8 Tesla V100 GPUs."

I'm sure my wife will understand why I took out that second mortgage on our home...no problem.

Or use on the cloud, much cheaper https://cloud.google.com/nvidia/

They recommend the 8 Tesla, but would a 2080Ti work? That's not too bad. I might try this at home with my gaming pc.

I like the fact that the server returns an image directly without any HTML or other content. It makes the loading experience fun too.

How many faces are generated? This can't be real time.

Thanks! Initially, I thought about just generating a big batch and cycling through it. Then I thought it would be more dramatic if the machine was "dreaming" up a face every 2 seconds in real time. I went for the dramatic approach just so I could phrase it that way to my non-tech friends!

Are the old images discarded? It would be interesting to use these as references for hyper-realist drawings. You could have something completely authentic that could never be traced back to its source.

Oh wow so it really is. Great work

Google image search must be getting smashed by this. Is there a canonical exif tag, perhaps isBot?

Looks like someone else has already used another neural net to find our president in the latent space. https://github.com/Puzer/stylegan

That artwork demo has really piqued my interest regarding using techniques like these for creating animation more easily.

Love the work you guys are doing in the progressive GAN space. Last year I did something similar to make a face-aging network, involving training an encoder to get an initial guess of a latent vector for someones face into the pgan space, and then relied on BFGS optomization to fine-tune the latent vector, followed by further fine-tuning of some intermediary layers of the generator network to really match the input pixels. I also snuck an affine transform layer in there allowing the network to shift the image around to better fit the target.

The results were .... eh .... okay, at least on my ugly face. https://twitter.com/RustBot/status/1044120159022022658

But overall, Im still tweaking. In the mean time, I've been focusing on static image analysis for aging research, but I hope to find better encoding schemes down the road.


Cool page, and great job.

> Turns out it can disentangle pretty much any set of data.

All the example I have seen (including your links) are variants of face generation algorithms. Any ideas on how this could be useful beyond image generation in some style? Specifically for (data) science?

Sorry if this is a naive question.

Edit: By "variants of face generation algorithms" I mean any image generation really.

The original Karras et al 2018 paper did both cars and cats, which aren't faces. Worked very well, unsurprisingly. (ProGAN also did well on those, though it was the faces everyone paid attention to.) Look at the samples in the paper or the Google Drive dumps, or at the interpolation videos have posted on Twitter.

Aside from the original work, on Twitter, people have done Gothic cathedrals very well, graffiti very well, fonts very well, and WikiArt oil portraits not so well. On Danbooru2017 full anime images (linked in my thread), one person has... suggestive blobs but has only put 2-3 GPU-days into it and we aren't expecting much so early into training. skylion has been running StyleGAN on a whole-body anime character dataset he has, and the results overnight (on 4 Titans) are pretty impressive but he hasn't shared anything publicly yet.

Great job on the Danbooru training! I've been following you on twitter and machinelearning for the longest time haha

Thanks! The wait on training is killing me, though. I've been doing large minibatch training to try to fix the remaining issues in the anime face StyleGAN and it's frustrating having to wait days to see clear improvement. Checking GAN samples is so addictive and undermines my ability to focus & get anything else done. I'm also eager to get started on full Danbooru image training, which I intend to initialize from skylion's model - whenever that finishes training...

(Who says we aren't compute-limited these days?!)

Haha, having to work around the computation limits are welcoming! It feels like building web apps back in the late 90's again. These days we have so much memory and disk space at hand it doesn't even feel like a challenge anymore.

That is, until Graphcore delivers their IPU.

I forgot one failure case: a few hundred/thousand 128px pixel art Pokemon sprites. StyleGAN seems to just weakly memorize them and the interpolations are jerky garbage, indicating overfitting. (No GAN has worked well on it and IMO the dataset is too small & abstract to be usable.)

no not naive at all. this method isn't specific for just extracting features from faces. it can disentangles features from any kind of images. in fact, the next dataset i might train on is on flowers (or birds)


OK, my point is what could be done beyond generating images in some style? Can we generate interesting mock data given a database for instance (of course this is exactly what you did in a way, but I have in mind e.g. a database containing some numerical/categorical features known to a specific accuracy)?

You can use GANs to generate fake data based on stuff like particle accelerator data or electronic health records. Whether you can use StyleGAN specifically is unclear. What's the equivalent of progressive growing on tabular/numeric data? Or style transfer?

Could be used to generate building plans or other schematics (pretty sure of no use though). Could certainly be put to good use generating pornographic images.

Hey, you might want to consider bert-as-service[0] for deep feature extraction from a BERT model. It will give you a 768 dimensional representation of the description, then you can embed that in the 512-dim latent space? I've been thinking of something similar.

It's not that hard to do it yourself, but it's a really clean package, and it gives you nice CLI flags for most things like pooling strategy, and what layer you want to get the activations from.

[0] https://github.com/hanxiao/bert-as-service

Some enterprising developer could use images from Tinder (or better) OkCupid tagged with data coming from the individuals profile data, then interpolate based on abstract factors such as risk taking, gender bias etc. Well... you get the picture.

I think this is a very dangerous game we are playing here but I guess it is going to be done.

@lucidrains - this is pretty amazing. Every time you refresh the page, is it a real-time generation, or does it draw from a pool/DB of real-time images generated previously? I got the exact same image twice, which is why I am asking, which kind of dampened the "cool" factor just a notch.

So does this make it trivial to input a source portrait, and then visualize different hair styles?

you'll have to build an encoder to encode someone's face into the latent space. then you'll have to dive into the latent space and find the dimension(s) that controls for hair style (just fork the colab and start experimenting with interpolations)

then yes, it should be possible

What license are the generated images under? Could you release them under creative commons?

They're probably public domain [1,2]. Generally in the USA and Europe, you can't copyright computer generated images or images created by nonhuman entities.

"To qualify as a work of 'authorship' a work must be created by a human being": https://www.copyright.gov/comp3/chap300/ch300-copyrightable-... [PDF], see section 313.2 "Works that lack human authorship"

Monkey selfie case: https://en.wikipedia.org/wiki/Monkey_selfie_copyright_disput...

From the Wiki article of the "Monkey Selfie Case"

>>> On 23 April, the court issued its ruling in favor of Slater, finding that animals have no legal authority to hold copyright claims [1]

[1] https://petapixel.com/2018/04/24/photographer-wins-monkey-se...

I feel this is wrong. If I make random generative art I instantly lose copyright? Or how about Photoshop? Really I'm asking myself where is the lone drawn.

You don’t “loose” copyright, you never had it in the first place.

Copyright is (read the law!) a temporary monopoly granted for works meeting certain criteria, being creative is one of them. You’d hold copyright for the code you wrote to generate the “art”. If you download somebody else’s code (as this site uses Nvidia’s), you lack the creative element.

You can though own the algorithm you used to generate the art, as in the case of Fractal Flame[0] created by Scott Draves.

[0] https://en.wikipedia.org/wiki/Fractal_flame

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact