Hacker News new | past | comments | ask | show | jobs | submit login
Generating custom photo-realistic faces using AI (insightdatascience.com)
569 points by homarp 6 months ago | hide | past | web | favorite | 116 comments



Very impressive work !

On a side note, the section "A word about ethics" was a welcome addition. I found this:

> If we can generate realistic looking faces of any type, what are the implications for our ability to trust in what we see.

resonated with another article posted on HN (https://news.ycombinator.com/item?id=18309305) talking in part about the ethical impact of engineered addiction.

I guess very soon we will be able to generate "super-attractive" (as in "superstimuli") faces for virtual personas, according to targeted demographics and purpose (advertisement, youtube videos for kids, political messages and so on).

Rather disconcerting...


Lately, I find myself wondering if the Great Filter that resolves the Fermi paradox is the point where a species has enough technology to completely super-stimulate themselves to oblivion.


Fermi's paradox doesn't need a resolution besides the vastness of space. The concept of such logical extrapolations from a sample size of 1 irritates me to my core.

Devils advocate; I doubt that. "super-stimulating" ourselves into oblivion requires a level of willful complacence we manage to avoid, at least on the whole as a species. Otherwise things like Vegan diets wouldn't be fads because no one would willfully choose crappier food options for the sake of abstract reasons like "ethics" or "morality" that have zero impact on our daily lives.


Not going against your general argument and I don't want to derail the conversation, but saying veganism is a fad is perhaps not a good example to give.

There is solid evidence against vegan diets being a fad, unless you regard a >3% yearly sales growth of vegan-labeled food or a roughly 600% increase in google searches since 2004 a fad[1][2], in addition to the roughly 500m-1b people who are on a mostly plant-based diet for cultural or practical reasons[3]. I'd wager that only a small percentage of people are in it solely because of ethical or moral reasons.

[1] https://www.statista.com/statistics/562911/global-sales-grow... [2]https://trends.google.com/trends/explore?date=all&q=%2Fm%2F0... [3] https://foodrevolution.org/blog/vegan-statistics-global/


In addition, there is good evidence for vegan diets having a lower impact on the environment [1] and some evidence for health benefits [2]. For many vegans, their diet is not motivated solely by ethical concerns. So it's really not a good example for parent's parent :)

[1] https://josephpoore.com/Science%20360%206392%20987%20-%20Acc... [2] https://www.choosemyplate.gov/vegetables-nutrients-health


It also ignores the reality that many of us _do not like_ to be super stimulated by vapid and shallow experiences. I find there also tends to be correlative relationship between one's drive and intelligence, and their lack of eagerness to engage in things that are shallow grabs for stimulation.


> It also ignores the reality that many of us _do not like_ to be super stimulated by vapid and shallow experiences.

Everyone likes to be super stimulated. That's what the phrase means. It's tautological. You may just like to be super stimulated by different things, but at the end of the day, it's just dopamine in your Ventral Tegmental Area.


> Everyone likes to be super stimulated. >> by vapid and shallow experiences

GP does not deny being super stimulatable. Just not by a beautiful face. Once AI reaches the level that it can be an intellectual mentor people like GP maybe sucked into infinite pointless learning just for the kick of it.

It’s already happening without AI (I read HN for mostly pointless stimulation; there is much more fascinating knowledge on the internet than you can understand in your lifetime)


I'm imagining a fact bot that shows you a really mind blowing fact in exchange of $0.25 at the press of a button.

The first fact is free. That's how confident they are that you'll become a paying member.


There is such a thing as over stimulation. I'm rather sensitive to stimulation and become uncomfortable quite quickly around bright lights, strong smells, loud films, large grouos if people, ect. The flush of dopamine causes too much stimulation and we sensitive souls become too aroused, our thoughts become too loud and overlap each other and we become confused, dissoriented and function poorly. It's quite uncomfortable.


Interesting. I have found that there is a correlation with drive, but not intelligence.

I know many brilliant people with zero drive who hedonistically seek out the greatest stimuli they can find. When I talk with them about this, they say it is partly out of a feeling of depression/helplessness alongside feeling like you can’t actually change anything in the world so you might as well just enjoy the ride.

I’ve found driven people have more impulse control regardless of “intelligence” (however we may choose to measure this).


Correlation is not causation


Right this is a great point. Future "mates" would be entirely generated exactly tailored to one's neural pattern optimally in every dimension that is then projected onto some morphing android or in some VR future tech that would render other human contact utterly boring and unappealing. The entire experience of the virtual world is so sugar coated in every aspect that we become completely detached and uninterested from the real one, at least in mass, essentially living inside our own mind's fantasy while AI moves on without us, and rightly so.


So, if everyone gets that, and dies happy, and we leave the earth to the stewardship of the AI and robots...is that the worse possibility? I mean, it sounds dystopian for humans, but it isn't the worst dystopia for humans. I sure prefer that to nuclear holocaust.


I don't know if it's a bad future but it would limit our potential to our current biological evolution's zeitgeist. An analogy would be like giving the kids all the candy they wanted. It isn't good for them in the long run. Meaning we are fairly simple still relying on our mammalian brain to drive our internal desires. The neural lace idea for example could potentially drive our motivations to a higher cause and purpose than we could ever imagine now with our current limitations. Focusing on expanding our understanding is probably more noble but less fun.


I think the current context is the Fermi Paradox, so the relevant question is if the AI have any expansionist behaviour after all the biological life dies.

Imagine a version of Star Trek where it turns out that every planet is Risa, only Vulcan is Space Buddhist Risa, Qo’noS is Kinky Risa, Kzin is Furry Risa etc., and all the Vulcans, Klingons, and Kzinti etc. have just been extinct for thousands of years leaving bots behind.


I think many of us would prefer that to normal mundane life.


It does have the pleasing effect that if the VR is convincing enough for us, that it would alleviate all material scarcity, which would arguably allow many people to reach greater potential. I.e. sculptors can learn to sculpt without wasting marble, woodworkers don't have to cut down trees or buy machinery. If the simulation is _that good_ then maybe its a great equalizer. we may not arrive at true post-scarcity, but virtual post-scarcity might be a literal next best thing.


Notice that such hedonistic trap has existed before, and still exists nowadays. Wise men have always been aware of the risks related to endless and effortless satisfaction. Even if there is no obvious adverse effect, they were pointing out the lack of purpose and meaning to such way of living. To a degree, they considered that the human life is only worth living if it requires overcoming some sort of adversity.

There is no doubt that technology will revive the hedonistic trap from times and lead to some degree of decadence, but the stoic mindset should survive and prevent mankind to fall completely.


Have you thought about this: dinosaurs ruled this planet for roughly 200 million years. They didn’t even get close to leaving it...


I think painters have been trying to paint super-attractive people for several hundred years now, but the most beautiful drawn/painted people are not dramatically more attractive than the most beautiful real-life photo models. It seems that there is an upper limit on how attractive faces gets, and we have already hit it without using advanced machine learning.


I’m not sure paintings are a good reference point. There’s an enormous difference between a static image and a moving, talking person with personality, charisma and apparent agency. Those are the spellbinding aspects of a persons appearance.


Imagine having Tinder's dataset for your target audience and, for each potential customer, generating advertisement with faces designed in order to attract that particular person. That's a disturbing thought.


i don't think people have such different taste related to physical attractiveness that this would make any difference compared to today's use of models for advertising (unfortunately). You could target by ethnical group, but beyond that...


There is probably a feedback effect here. It is possible that if systems like this were commonplace people's individual tastes might begin to diverge, which ironically could be beneficial for the human race.


That's not even far fetched, the ever expanding possibilities of targeting individuals through the big data evolution, will clearly lead to some nightmare like this. My hope is that capitalism will be rendered obsolete through cultural and technological progress, before we arrive there. Stuff like this always reminds me on YTCracker's cyberpunk Album "Introducing Neals"

"...big data and your love live after this..."

https://ytcracker.bandcamp.com/album/introducing-neals


The problem is that technical progress is much faster than cultural one. Human nature is basically the same it was 2000 years ago: people in power want more power. And while the totalitarism of early 20th century had eventually fallen, the totalitarism powered by constant surveillance, Big Data, Machine Learning has a potential to be so effective we won't be able to escape it. Some elements of that are already implemented in China (think: Great Firewall).


>a disturbing thought

You misspelled "profitable"


> I guess very soon we will be able to generate "super-attractive" (as in "superstimuli") faces for virtual personas

We have done that already, except it was not using AI. I would classify most anime under that label of unrealistic super-attractive images.

About the ethical impact, Akihabara seems to be one end result of this. So it would be the same but in a larger scale.


I would be worried about this getting used to generate fake faces for social media profiles etc.


OK, I am not well-versed in machine learning, so I need to ask:

How is it really different, in an essential way, from using Eigenfaces[1] for the same task?

There's a paper from year 2000 that does exactly that[2]; you can see the results on page 5.

Now, the results in the linked article are way more impressive than the Eigenfaces-generated random faces, but from skimming the text, it seems like the principles are the same: dimensionality reduction from a high-dimensional space into a space where the axis are "meaningful" components (in Eigenfaces, the reduction is done using SVD).

Eigenfaces is a linear approach; there's also a multilinear version, TensorFaces by Vasilescu et al[3][4]. I wonder if similar quality images can be obtained using this approach.

[1]https://en.wikipedia.org/wiki/Eigenface

[2]https://link.springer.com/content/pdf/10.3758/BF03207802.pdf

[3]http://alumni.media.mit.edu/~maov/mica/mica05.pdf

[4]http://web.cs.ucla.edu/~dt/papers/eccv02/eccv02.pdf

[5]https://en.wikipedia.org/wiki/Convex_set


As you said the result is way more impressive. That is the main theme when you try to compare deep learning based approaches to traditional ML approaches especially in unstructured data(images, speech, text).

Also, deep learning based approaches don't require too much fiddling with features. Main work involves coming up with NN architecture and loss function.


In both cases, there's a low-dimensional latent space, and there's a pixel space of images. In the case of Eigenfaces, the correspondence between the two is linear. That's not a bad approximation, but "make him smile" or "make him bold" is obviously a very nonlinear operation. Eigenfaces approximates it with something linear, and that's just not very good. The justification for all the added nonlinearity in the mapping is that this operation really becomes linear in the latent space.


Deep learning will do eigenfaces of eigenfaces of... it's eigenfaces all the way down. Actually it's not all the way down, there is only a few hidden layers.


You are right that they are very closely related.

For variational autoencoders (one baseline technique for this sort of thing) if you make your neural network have one layer without a nonlinearity, and train it, it ends up minimizing the same objective as PCA (i.e. finding the eigenfaces).

I believe this is also true of GANs where you similarly restrict the generator and discriminator to be very simple.

I bet there is a nonlinear non-NN approach that could perform well, but we may not have the investment in hardware, well-optimized algorithms, etc to train big models fast.

edit: here's a paper that connects GAN to PCA in a simple case, among many other things. not the easiest to follow, though.

https://arxiv.org/pdf/1710.10793.pdf


Compare the images used for eigenfaces to those that GANs synthesize: different angles, facial expressions, glasses vs no glasses, etc.


I have to wonder - is anyone working on using generative AI to create 3d models for games? Presumably they are but I'm surprised I've not seen more about it because that's a huge use case for this sort of tech. I only tinker with game development for fun but I can see the value in, e.g., having an app that can generate various kinds of models using these feature sliders. Combine that with style transfer to copy an aesthetic from one model to another and that could cut down on the asset creation time for video games considerably.


And movies. Every Pixar movie uses variants of the same faces which aren't really all that variable in their models so it really does seem like a character playing different roles. Would love to see this sort of work combined with the character pipeline such that not only could you generate a new 'face' you could generate the full expression pipeline as well so that the character can be brought to life.



Creative directors and other Artists in video games (my experience) tend to like to 'stamp' the visuals of a game with a certain aesthetic, from their own inspiration. Generating 3D models (offline, not in game) would be a great start, but in the end it's only a 'rough' draft and will likely go through the same amount of work to get to a final 'approved' visual state.

Unlikely in the near future, highly probable in the longer term. Imagine writing a script (movie) and having the AI generate the characters ... storytelling to a new level.


Regarding specifically the creation of 3D models for usage in games, this is particularly relevant:

Avatar Digitization From a Single Image For Real-Time Rendering

http://www.hao-li.com/publications/papers/siggraphAsia2017AD...

(from the people at Pinscreen)

There's a lot of work being done on using deep learning for computer graphics in general as well.


> I have to wonder - is anyone working on using generative AI to create 3d models for games?

It's not AI, but MakeHuman[1] has parametric features for human meshes[2], and you can hit "Random". It also has sliders for age and gender parameters. I last used it a while ago, I'm sure I'm underselling it.

1. http://www.makehumancommunity.org/

2. https://www.youtube.com/watch?v=iDhb-6FcqeU


Daz Studio uses the same approach with better quality: https://www.daz3d.com/gallery/

This approach doesn't work very well because simple linear morphs don't account for all the correlations and multidimensional variations of an actual human body shape, resulting in similar or unrealistic body shapes. It's like nudging a face in a photo editor. Different characters in Daz Studio are still hand-crafted in a separate 3D modeling software, and just adjusted with the morphs.


I have heard of video game companies experimenting with GANs for texture synthesis. I don’t see why you couldn’t also try to do some simple 3D modeling, although that may be a much harder problem.


I have no specific info on this, but I can only imagine a lot of work being done here, none of which will be made public until it is close to release and the marketing benefits of showing it outweigh tipping off their competitors.


I think it's a bit misleading to claim that AI is "generating" these images, as if from whole cloth.

Images of real people have been scored along the adjustable metrics, and then as you click the +/- adjustments, the source images are very craftily "blended" to produce a finite spectrum of results.

Yes, no output image is itself a "real" human (although I wonder if you tweak the settings just right to exactly match one of the input images if it would just spit it back out?).

It does not appear to me that we are seeing an AI that has learned what humans look like and can now generate arbitrary fictional humans. The magic trick is exposed by lossolo's comment below. [1]

EDIT: Apparently I'm just agreeing with what femto113 said 3 hours ago down-thread [2]

[1] - https://news.ycombinator.com/item?id=18310461

[2] - https://news.ycombinator.com/item?id=18311377


In the recent Deep Mind paper on Big GANs, they discuss this point and compare their generated images with the most similar images in the training set. They certainly makes it seem like the network has learned something real and can generate new images that are not just blends of previous ones.


yeah the progress is scary. it will possible at some point to have our AI generate highly addictive binge worthy tv shows real time.

imagine if you could watch breaking bad season 9 for example, Walter White Jr. breaks bad


This is an incorrect description of what the algorithm is doing.

Two networks--a "generator" and a "discriminator"--play a minimax game where the generator maps random vectors into images, and the discriminator attempts to distinguish between these images and real images of celebrities. When the discriminator is (close to) optimal, image-gradients based on the features it uses to predict are passed to the generator, so that the generator can learn the distribution in feature space that makes up human faces.

> Images of real people have been scored along the adjustable metrics, and then as you click the +/- adjustments, the source images are very craftily "blended" to produce a finite spectrum of results.

There is no reason to believe the source images are actually embedded in the latent space.

The descriptors that you can edit in the gui don't necessarily span an orthogonal basis in that latent space, so some of them are correlated, which is why editing one value can change others. Additionally, there is no a priori reason to believe that the manifold of "human face-like images" of 628x1024 is 512-dimensional, so there are areas of the space that still don't map well to real images. The network's ability to cover this space is limited by the number of unique training images it sees, how long it is trained, and its architecture.


> The descriptors that you can edit in the gui don't necessarily span an orthogonal basis in that latent space, so some of them are correlated, which is why editing one value can change others.

I think both you and the author of the article are making the same mistake here. (Although at least you use "orthogonal" and "correlated," whereas the author calls nonorthogonal vectors "entangled" for some reason.)

If you have a nonlinear function f on a vector space, there's no reason why an orthogonal basis for that space will give a better parameterization than a nonorthogonal basis. Even if you have a linear function, there's no reason why that should make a difference.

(For example, take f(x,y) = (x-y,y). Then f(x,0)=(x,0) and f(y,y)=(0,y), so "correlated" input directions (1,0) and (1,1) are mapped to "independent" or orthogonal outputs.)

I think it is a bit of a mystery why Gram-Schmidt orthogonalization makes a difference here. Perhaps the author should experiment more with different inner products.


>I think both you and the author of the article are making the same mistake here.

Maybe?

>If you have a nonlinear function f on a vector space, there's no reason why an orthogonal basis for that space will give a better parameterization than a nonorthogonal basis.

I don't think I made that claim. Here's all I'm saying: To whatever degree the features of interest are linearized in the latent space (and there's really no guarantee that they are), we don't have any guarantee that those linear features are orthogonal to one another, so tuning the latent representation along one feature will also impact others.

> (For example, take f(x,y) = (x-y,y). Then f(x,0)=(x,0) and f(y,y)=(0,y), so "correlated" input directions (1,0) and (1,1) are mapped to "independent" or orthogonal outputs.)

That's true, but remember that the nonlinear mapping is from our latent space (spanned by uniformly random 512-element input vectors) to pixel space. We really don't care about linear algebra in pixel space. I have zero expectation that we would preserve orthogonality from latent to pixel space.

I don't think any part of the GAN objective requires that these interesting features actually be linearized in the latent space (obviously they are not in pixel space), but the approach is to use a GLM to find the latent vectors that best fit the features anyway. Whether or not the vectors you identify with the GLM really retain their semantic meaning through the latent space, they're also clearly not orthogonal, so changing the latent representation along one dimension also changes others.


That's kind of an absurd statement. It's not possible in this universe at all for a human or AI to generate an image of a face without any previous impressions of one.


If it had any "impression" of what a face was, you would never see the results that occurred in [1]. My point is the algorithm has no impression of what a "face" is at all. It seems like this was given prototype pictures with scoring attributes, and mapped those into some sort of gradient, completely independent of any notion of "face".


The generator never sees a single image. But it receives gradients from the discriminator teaching it to make better fakes. The discriminator here acts like a 'learned loss function' for the generator.


> It does not appear to me that we are seeing an AI that has learned what humans look like and can now generate arbitrary fictional humans

Here, take a look at this video, one hour long of high-res generated faces. Can you still say AI hasn't learned to generate arbitrary identities?

https://www.youtube.com/watch?v=36lE9tV9vm0&t=262s



Very very rad!

I've got a similar algorithm https://bit.ly/2ELVG50

To use, upload a photo with the #ageme command. This is beta though, so will take a while to return. The other thing this bot does (the #showage command) runs instantly.

Example output: https://bit.ly/2PXKTWu

Your model seems to produce much higher quality output. I think I played a penalty by putting the users face into latent space, which is a difficult optimization to perform.


(It's not mine, I just found the link in the article)


It's very interesting and kinda creepy. The controls behave randomly, if you decrease "age" it might start changing skin color, etc.


Actually, you can click on the name of the attribute to "lock" it. That way you can change one without changing the other!


Excellent


Oh yeah! I wrote a similar algorithm and constantly suffered similar problems. Constraining the latent space can be a major pain. Its hard to visualize but I think a lot of latent features may exist in non-continuous latent "pockets", meaning that the desired direction of the latent vector is dependent on your position in space.


I´m having a blast changing all the parameters. Really uncanney valley feelings. Some women come up very attractive, looking like Jessica Alba. How can I react to a face of a person that don't even exist?

On a second thought, I don´t mean that the controls behave randomly. I understand they affect related parts of the system. If you increase "baldness" on a woman's face, it will obviously increase the "male" factor and the "gray hair" factor. I understand that this faces are being generated from a continuous space. Fascinating.


Looking like Jessica Alba is not a coincidence. This model was trained on CelebA, a dataset of Hollywood celebrity face photos. Jessica Alba is almost certainly in the training set.


It would be cool to integrate a face tracker to measure features, then you could learn the latent vectors for things like eye position and then you could drag the eyes around.


HN effect in full effect


Well, I didn't know changing skin tone would melt your lip, make you bald and change your gender.

https://imgur.com/a/vc5dz0L

AI magic.


I think "generated" may be too strong a word for what this system seems to be doing. It feels more like it has found a bunch of averages among a large enough set of similar pictures of some fairly homogenous celebrities. When enough of those averages are put on a gradient along a particular axis you get transformations that seem magically smooth. In the example you gave I think it didn't have enough source photos along that axis. I'll further guess that this works best for celebrities that are generally considered attractive because attractive faces are already averages.

[1] https://www.ncbi.nlm.nih.gov/pubmed/15376799 "Images of faces manipulated to make their shapes closer to the average are perceived as more attractive."


The point is that the image has been transformed into a latent space encoding and interpolating those latent-space variables creates those mind-blowing effects (imagine just moving from one value to another via a simple convex combination). How are those latent variables constructed and what do they represent (or if they could be described easily) is completely up to non-linear optimization process running on a huge number of dimensions.


I don't understand this as the image being transformed, "space" and "dimensions" seem to originate not with the visual features of the images but with the attributes that the CelebA image set is annotated with. The coverage of the set is not uniform with respect to gender and skin color, I'm not 100% sure how to interpret the attribute values but a some quick math shows that images with "Pale_Skin" outnumber their opposites by about 20 to 1.


This is one of the biggest challenges with AI in my opinion. The models can generate the transformations but they have no concept of correctness when applied to vague generative tasks like creating a face based on a set of existing photos.

Basically, there's just one level of cognition. In this case, the AI would only achieve that expected fidelity if the system is layered with more and more models that aim for correctness and accuracy (does this look like a woman, does this look like a mouth, does this look like a nose, etc). The problem with this approach is that it becomes incredibly hard to determine what's needed to be 100% successful at a complex task.

This is the reason why I think we are still far far away from a fully cognitive AI and is the same reason why you only see AI used for very narrow use cases.

Self-driving cars seem to be the first real attempt to have a broad AI system applied to a super-complex and unpredictable field, but I always see conflicting information regarding the progress and challenges in this area.


I think that could feed the output of the GAN into yet another network that assesses the quality of the generated image and automatically tweak the parameters a bit until it doesn't look like an alien.

In fact, that network is probably already part of the original GAN training phase.


Ultimately, this algorithm tries to find an single directional vector in a 512 dimensional space that approximates what it means to change skin color.

Expecting that this works all the time (or expecting that all points in this 512-dim space result in a beautiful person) is probably a bit too much to ask. :-)


The training set probably consisted of black and white celebs, without representation of latino or south asian celebs to fill in the gradient in between.


Can't wait until such latent-space variable changes will be outlawed...


This is gold.


beauty alchemy?


I appreciate the author acknowledging -- if belatedly -- the inbuilt essential bias of the training set,

> Training on photos of celebrities of Hollywood means that our model will be very good at generating photos of a predominantly white and attractive demographic... If we were to deploy this as an application, we would want to make sure that we have augmented our initial dataset to take into account the diversity of our users.

Could this system generate credible faces of people of color? If it has a "gender" axis, could it have a "melanin" adjustment axis? Or various ethnic axes?


Yes, iff it's given relevant training data that includes a wider mix of ethnicities.


Yes


Interesting that the top level layers of that network correlates to what is essentially gene expression. To think of it, seems feasible to train a network with whole genome as input, and 3D scanned morphology as output. Again, top level weights in the network would end up encoding salient gene variances that cause morphological differences.

High cost to generate the data, but the first thing I’d do if I was setting up a designer baby startup.


Amazing work! But I find myself amused and distracted by the spurious correlations. Eg.: eyeglasses are apparently the domain of balding old men, not beautiful young women:

https://imgur.com/a/99WUJ6J


Degree of having eyeglasses on doesn't make much sense anyway. So it has to be some spurious correlation for the network to interpolate it anyway.


There's a transition where a quality signal becomes easy-to-fake, during which time humans are "fooled", but then we become sensitized. It's no longer a quality signal and we subsequently refine our aesthetics toward harder-to-fake signals. For example, wood veneer over particle board, tile-patterned linoleum, etc. In light of that, I wonder what will happen to the concept of beauty as a result of this?


Not sure where you were going with this: your examples are examples of fake-detection in real life. If you show a photograph with wood veneer over particle board, and it's a high quality wood veneer, then no: no one can tell it's not wood unless they're at the shoot and have the opportunity to examine it. Same for linoleum: as long as the photo's lit properly, you can't tell.

And the same is true for what's presented here: as "fake photos", once you hit photorealistic, you're done. There is no "until humans start seeing the pixels".


> once you hit photorealistic, you're done

Maybe once, but not over repeat occurrences. Then you have an arms race between consumers tuning their aesthetic preferences to certain quality signals, producers trying to exploit those preferences without actually delivering quality, consumers re-adjusting their preferences after getting burned, etc.

A closer comparison might be dating profile pics, which as I understand dating "shoppers" quickly learn to distrust, at least for certain angles or types of shots. This AI enhancement stuff would presumably cause some rapid evolution in that particular arms race.


Such an arms race is likely until either the AI or the customer brains hit their limits. The in-principle limits of a synthetic mind vastly exceeds anything biologically possible — it’s like comparing wolves to hills, both in size and speed, and silicon has the advantage both ways.

The only thing keeping us safe is that we don’t yet know how our minds work.


Could this be used for non-intrusive photo anonymization? Just replace faces with randomly-generated-but-plausibly-looking ones.


Nice idea! You could do that but this current method has no way of constraining the face to fit with an existing body. I'm sure there's a way to do that though.


Next workers put out of jobs by AI: Models.


Not sure what this has to do with modeling.

On the other hand, this could be useful for generating NPC/character portraits in RPGs.Fat chance this will ever happen, though. No one wants to deal with Python mess of libraries and no one seems to care about putting neural net stuff in self-contained, reusable packages.


Unity has been investing heavily in neural nets for AI. I could envision this also being used for art asset generation. The core TensorFlow API is written in C++, not Python.


As foreseen by Looker (1981), written and directed by Michael Crichton.


I remember enjoying this one as a kid. I guess I was too young to have understood the implications of the plot, I can only remember the LOOKER hypnosis gun from this film.


Also photographers (especially stock image photographers).


Those are some really amazing results. I'm wondering when this shows up as an aid to Police sketch artists.


Thanks for mentioning this! I think other than using this model for enhancing Photoshop, Instagram, etc. Police sketching would be an awesome application of it


This is really cool. My first thought of using it would be seeing how i look with a beard haha.


Can anyone explain how to do that (in theory).

I guess one needs to train a model from a dataset that contains a picture of oneself and many images of other people, some with beards (possibly CelebA-HQ). According to the README this would take about 2 weeks on a NVIDIA Tesla V100.

Given that we already have a model from CelebA-HQ would it be possible to use that as a pre-trained model and just train it a bit more with an additional image? If possible would that speed things up enough to do it on cheaper hardware in less than a day?


I'm definitely not an expert, but I think the way to go is to use an algorithm that can find a specific face in the latent space. If the latent space has enough dimensions (trained with a big enough set) it should already contain your face. The problem is to find it.

Actually I think this could do the job: https://arxiv.org/abs/1702.04782

https://github.com/simoroma/RecoverGANlatentVector


Thanks for the reply! I just want to add a little more details. As long as the training dataset is diverse enough, the trained GAN generator should be capable of generating images that are similar to the training dataset. Therefore there is no need to retrain the GAN. However, to "embed" a given face image to the latent space, we need to either use the optimization based method https://arxiv.org/abs/1702.04782 mentioned in "Geee"'s post, or train a encoding network. The encoding network would be much easier to train than the decoding/generator network because we do not need to use adversarial loss. I would love to include this functionality into my model once I got a job offer and have some spare time :)


On a related note, I've long been fascinated by how easily we can tell a male from a female, even from a distance. For example, driving around and looking at people walking along the road or following people in the mall, I can tell, with remarkable accuracy, if someone is male or female.

Is it the clothes, or gate, or size? Is it hair style or style of shoes? And at what age does this female/male separation begin. Generally, I can't assess new born people very well but by the age of say 5 years old it becomes much easier.

This software seems to answer some of these questions by doing things like presenting males with stronger jawlines or females with higher cheekbones. Does this same thing happen with other parts of the body (barring the obvious)? I know that females usually have wider hips and as a result walk differently than males. Are there other examples of differences that we intrinsically know but don't consciously realize?

NOTE: I'm speaking in traditional gender roles. Obviously I can't tell if someone is gender dysmorphic from a distance.


> Are there other examples of differences that we intrinsically know but don't consciously realize?

Probably everything from https://en.wikipedia.org/wiki/Sex_differences_in_human_physi....


Sometimes I think the idea that we live in a simulation is preposterous...and then I see things like this, and wonder.


It's fun to create paradoxical faces by alternating between increasing two contradictory vectors. For example, alternating between +bald, +wavyhair, +bald, +wavyhair, ... eventually seems to produce unsettling faces where it's hard to tell where the hair begins and the skin ends.


Nice try, you managed to push to the limit of this algorithm!


I wonder if you apply this sort of model to music. Go from a description of a song to a song that fits that description. I know there are APIs with great descriptors of songs out there.


That would be super cool idea and I have thought about it! To deal with music, we may want to use RNN, LSTM or WaveNet instead of GAN as the backbone generator and use my tl-GAN idea to find the "feature" axes for controlled generation. I would love to contribute to such projects.


I'm not impressed with the demo gif I've seen basically the same results from the 90s era software elastic image. (Motion Image morphing)


Seam more like it is very good at blending images than creating new ones.

Still very cool but not as cool as it initially appeared when I saw it.


Check out the first image in the Results section. It's not blending between two images, it's taking one image (in the middle row) and changing different characteristics. It's the understanding of what these characteristics are that's interesting. Taking them all together, they have created a "feature space" which is a multidimensional space where every point corresponds to a different kind of face.


Thank you for the clarification! The latent space is so well behaved that that I can find linear relationship between the latent vector and the feature labels, and then use the regression slop as the "understanding" of the space


Hi, I am the author of this blog post, feel free to ask me any questions about this work! Thank you for your comments!


Interesting but also lends itself to bad actors and faking photos.


I am looking for a way to generate photos of me after plastic surgery.

The AI should generate a face which will be most beautiful and trusted in all cultures.

It will be using my present face and morph things from there according to the limitations.

Using least incisions in least risky areas of face.

After that we get a surgeon to do perform plastic surgery as per this AI.

I am looking for ideas on how to accomplish it.


Occasionally I have the notion we exist in a holographic reality is incredulous...when I am presented with examples such as this, and reconsider.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: