
Generating custom photo-realistic faces using AI - homarp
https://blog.insightdatascience.com/generating-custom-photo-realistic-faces-using-ai-d170b1b59255
======
jaster
Very impressive work !

On a side note, the section "A word about ethics" was a welcome addition. I
found this:

> If we can generate realistic looking faces of any type, what are the
> implications for our ability to trust in what we see.

resonated with another article posted on HN
([https://news.ycombinator.com/item?id=18309305](https://news.ycombinator.com/item?id=18309305))
talking in part about the ethical impact of engineered addiction.

I guess very soon we will be able to generate "super-attractive" (as in
"superstimuli") faces for virtual personas, according to targeted demographics
and purpose (advertisement, youtube videos for kids, political messages and so
on).

Rather disconcerting...

~~~
munificent
Lately, I find myself wondering if the Great Filter that resolves the Fermi
paradox is the point where a species has enough technology to completely
super-stimulate themselves to oblivion.

~~~
syn0byte
Fermi's paradox doesn't need a resolution besides the vastness of space. The
concept of such logical extrapolations from a sample size of 1 irritates me to
my core.

Devils advocate; I doubt that. "super-stimulating" ourselves into oblivion
requires a level of willful complacence we manage to avoid, at least on the
whole as a species. Otherwise things like Vegan diets wouldn't be fads because
no one would willfully choose crappier food options for the sake of abstract
reasons like "ethics" or "morality" that have zero impact on our daily lives.

~~~
dclowd9901
It also ignores the reality that many of us _do not like_ to be super
stimulated by vapid and shallow experiences. I find there also tends to be
correlative relationship between one's drive and intelligence, and their lack
of eagerness to engage in things that are shallow grabs for stimulation.

~~~
darawk
> It also ignores the reality that many of us _do not like_ to be super
> stimulated by vapid and shallow experiences.

Everyone likes to be super stimulated. That's what the phrase means. It's
tautological. You may just like to be super stimulated by _different_ things,
but at the end of the day, it's just dopamine in your Ventral Tegmental Area.

~~~
lixtra
> Everyone likes to be super stimulated. >> by vapid and shallow experiences

GP does not deny being super stimulatable. Just not by a beautiful face. Once
AI reaches the level that it can be an intellectual mentor people like GP
maybe sucked into infinite pointless learning just for the kick of it.

It’s already happening without AI (I read HN for mostly pointless stimulation;
there is much more fascinating knowledge on the internet than you can
understand in your lifetime)

~~~
therein
I'm imagining a fact bot that shows you a really mind blowing fact in exchange
of $0.25 at the press of a button.

The first fact is free. That's how confident they are that you'll become a
paying member.

------
romwell
OK, I am not well-versed in machine learning, so I need to ask:

How is it really different, in an essential way, from using Eigenfaces[1] for
the same task?

There's a paper from year 2000 that does exactly that[2]; you can see the
results on page 5.

Now, the results in the linked article are _way_ more impressive than the
Eigenfaces-generated random faces, but from skimming the text, it seems like
the principles are the same: dimensionality reduction from a high-dimensional
space into a space where the axis are "meaningful" components (in Eigenfaces,
the reduction is done using SVD).

Eigenfaces is a linear approach; there's also a multilinear version,
TensorFaces by Vasilescu et al[3][4]. I wonder if similar quality images can
be obtained using this approach.

[1][https://en.wikipedia.org/wiki/Eigenface](https://en.wikipedia.org/wiki/Eigenface)

[2][https://link.springer.com/content/pdf/10.3758/BF03207802.pdf](https://link.springer.com/content/pdf/10.3758/BF03207802.pdf)

[3][http://alumni.media.mit.edu/~maov/mica/mica05.pdf](http://alumni.media.mit.edu/~maov/mica/mica05.pdf)

[4][http://web.cs.ucla.edu/~dt/papers/eccv02/eccv02.pdf](http://web.cs.ucla.edu/~dt/papers/eccv02/eccv02.pdf)

[5][https://en.wikipedia.org/wiki/Convex_set](https://en.wikipedia.org/wiki/Convex_set)

~~~
mlboss
As you said the result is way more impressive. That is the main theme when you
try to compare deep learning based approaches to traditional ML approaches
especially in unstructured data(images, speech, text).

Also, deep learning based approaches don't require too much fiddling with
features. Main work involves coming up with NN architecture and loss function.

------
beaconstudios
I have to wonder - is anyone working on using generative AI to create 3d
models for games? Presumably they are but I'm surprised I've not seen more
about it because that's a huge use case for this sort of tech. I only tinker
with game development for fun but I can see the value in, e.g., having an app
that can generate various kinds of models using these feature sliders. Combine
that with style transfer to copy an aesthetic from one model to another and
that could cut down on the asset creation time for video games considerably.

~~~
sangnoir
> I have to wonder - is anyone working on using generative AI to create 3d
> models for games?

It's not AI, but MakeHuman[1] has parametric features for human meshes[2], and
you can hit "Random". It also has sliders for age and gender parameters. I
last used it a while ago, I'm sure I'm underselling it.

1\. [http://www.makehumancommunity.org/](http://www.makehumancommunity.org/)

2\.
[https://www.youtube.com/watch?v=iDhb-6FcqeU](https://www.youtube.com/watch?v=iDhb-6FcqeU)

~~~
Geee
Daz Studio uses the same approach with better quality:
[https://www.daz3d.com/gallery/](https://www.daz3d.com/gallery/)

This approach doesn't work very well because simple linear morphs don't
account for all the correlations and multidimensional variations of an actual
human body shape, resulting in similar or unrealistic body shapes. It's like
nudging a face in a photo editor. Different characters in Daz Studio are still
hand-crafted in a separate 3D modeling software, and just adjusted with the
morphs.

------
zaroth
I think it's a bit misleading to claim that AI is "generating" these images,
as if from whole cloth.

Images of real people have been scored along the adjustable metrics, and then
as you click the +/\- adjustments, the source images are very craftily
"blended" to produce a finite spectrum of results.

Yes, no output image is itself a "real" human (although I wonder if you tweak
the settings just right to exactly match one of the input images if it would
just spit it back out?).

It does not appear to me that we are seeing an AI that has learned what humans
look like and can now generate arbitrary fictional humans. The magic trick is
exposed by lossolo's comment below. [1]

EDIT: Apparently I'm just agreeing with what femto113 said 3 hours ago down-
thread [2]

[1] -
[https://news.ycombinator.com/item?id=18310461](https://news.ycombinator.com/item?id=18310461)

[2] -
[https://news.ycombinator.com/item?id=18311377](https://news.ycombinator.com/item?id=18311377)

~~~
colordrops
That's kind of an absurd statement. It's not possible in this universe at all
for a human or AI to generate an image of a face without any previous
impressions of one.

~~~
zaroth
If it had any "impression" of what a face was, you would never see the results
that occurred in [1]. My point is the algorithm has no impression of what a
"face" is at all. It seems like this was given prototype pictures with scoring
attributes, and mapped those into some sort of gradient, completely
independent of any notion of "face".

------
VikingCoder
Interactive demo here:

[https://www.kaggle.com/summitkwan/tl-gan-
demo](https://www.kaggle.com/summitkwan/tl-gan-demo)

~~~
transpy
It's very interesting and kinda creepy. The controls behave randomly, if you
decrease "age" it might start changing skin color, etc.

~~~
e_ameisen
Actually, you can click on the name of the attribute to "lock" it. That way
you can change one without changing the other!

~~~
transpy
Excellent

------
lossolo
Well, I didn't know changing skin tone would melt your lip, make you bald and
change your gender.

[https://imgur.com/a/vc5dz0L](https://imgur.com/a/vc5dz0L)

AI magic.

~~~
femto113
I think "generated" may be too strong a word for what this system seems to be
doing. It feels more like it has found a bunch of averages among a large
enough set of similar pictures of some fairly homogenous celebrities. When
enough of those averages are put on a gradient along a particular axis you get
transformations that seem magically smooth. In the example you gave I think it
didn't have enough source photos along that axis. I'll further guess that this
works best for celebrities that are generally considered attractive because
attractive faces are already averages.

[1]
[https://www.ncbi.nlm.nih.gov/pubmed/15376799](https://www.ncbi.nlm.nih.gov/pubmed/15376799)
"Images of faces manipulated to make their shapes closer to the average are
perceived as more attractive."

~~~
bitL
The point is that the image has been transformed into a latent space encoding
and interpolating those latent-space variables creates those mind-blowing
effects (imagine just moving from one value to another via a simple convex
combination). How are those latent variables constructed and what do they
represent (or if they could be described easily) is completely up to non-
linear optimization process running on a huge number of dimensions.

~~~
femto113
I don't understand this as the image being transformed, "space" and
"dimensions" seem to originate not with the visual features of the images but
with the attributes that the CelebA image set is annotated with. The coverage
of the set is not uniform with respect to gender and skin color, I'm not 100%
sure how to interpret the attribute values but a some quick math shows that
images with "Pale_Skin" outnumber their opposites by about 20 to 1.

------
fernly
I appreciate the author acknowledging -- if belatedly -- the inbuilt essential
bias of the training set,

> Training on photos of celebrities of Hollywood means that our model will be
> very good at generating photos of a predominantly white and attractive
> demographic... If we were to deploy this as an application, we would want to
> make sure that we have augmented our initial dataset to take into account
> the diversity of our users.

 _Could_ this system generate credible faces of people of color? If it has a
"gender" axis, could it have a "melanin" adjustment axis? Or various ethnic
axes?

~~~
PeterisP
Yes, iff it's given relevant training data that includes a wider mix of
ethnicities.

------
leblancfg
Interesting that the top level layers of that network correlates to what is
essentially gene expression. To think of it, seems feasible to train a network
with whole genome as input, and 3D scanned morphology as output. Again, top
level weights in the network would end up encoding salient gene variances that
cause morphological differences.

High cost to generate the data, but the first thing I’d do if I was setting up
a designer baby startup.

------
nkoren
Amazing work! But I find myself amused and distracted by the spurious
correlations. Eg.: eyeglasses are apparently the domain of balding old men,
not beautiful young women:

[https://imgur.com/a/99WUJ6J](https://imgur.com/a/99WUJ6J)

~~~
red75prime
Degree of having eyeglasses on doesn't make much sense anyway. So it has to be
some spurious correlation for the network to interpolate it anyway.

------
_greim_
There's a transition where a quality signal becomes easy-to-fake, during which
time humans are "fooled", but then we become sensitized. It's no longer a
quality signal and we subsequently refine our aesthetics toward harder-to-fake
signals. For example, wood veneer over particle board, tile-patterned
linoleum, etc. In light of that, I wonder what will happen to the concept of
beauty as a result of this?

~~~
TheRealPomax
Not sure where you were going with this: your examples are examples of fake-
detection in real life. If you show a photograph with wood veneer over
particle board, and it's a high quality wood veneer, then no: no one can tell
it's not wood unless they're at the shoot and have the opportunity to examine
it. Same for linoleum: as long as the photo's lit properly, you can't tell.

And the same is true for what's presented here: as "fake photos", once you hit
photorealistic, you're done. There is no "until humans start seeing the
pixels".

~~~
_greim_
> once you hit photorealistic, you're done

Maybe once, but not over repeat occurrences. Then you have an arms race
between consumers tuning their aesthetic preferences to certain quality
signals, producers trying to exploit those preferences without actually
delivering quality, consumers re-adjusting their preferences after getting
burned, etc.

A closer comparison might be dating profile pics, which as I understand dating
"shoppers" quickly learn to distrust, at least for certain angles or types of
shots. This AI enhancement stuff would presumably cause some rapid evolution
in that particular arms race.

~~~
ben_w
Such an arms race is likely until either the AI or the customer brains hit
their limits. The in-principle limits of a synthetic mind vastly exceeds
anything biologically possible — it’s like comparing wolves to hills, both in
size and speed, and silicon has the advantage both ways.

The only thing keeping us safe is that we don’t yet know how our minds work.

------
hoseja
Could this be used for non-intrusive photo anonymization? Just replace faces
with randomly-generated-but-plausibly-looking ones.

~~~
IshKebab
Nice idea! You could do that but this current method has no way of
constraining the face to fit with an existing body. I'm sure there's a way to
do that though.

------
jatsign
Next workers put out of jobs by AI: Models.

~~~
gambler
Not sure what this has to do with modeling.

On the other hand, this could be useful for generating NPC/character portraits
in RPGs.Fat chance this will ever happen, though. No one wants to deal with
Python mess of libraries and no one seems to care about putting neural net
stuff in self-contained, reusable packages.

~~~
calebh
Unity has been investing heavily in neural nets for AI. I could envision this
also being used for art asset generation. The core TensorFlow API is written
in C++, not Python.

------
ChuckMcM
Those are some really amazing results. I'm wondering when this shows up as an
aid to Police sketch artists.

~~~
summitkwan
Thanks for mentioning this! I think other than using this model for enhancing
Photoshop, Instagram, etc. Police sketching would be an awesome application of
it

------
jpmoyn
This is really cool. My first thought of using it would be seeing how i look
with a beard haha.

~~~
weinzierl
Can anyone explain how to do that (in theory).

I guess one needs to train a model from a dataset that contains a picture of
oneself and many images of other people, some with beards (possibly CelebA-
HQ). According to the README this would take about 2 weeks on a NVIDIA Tesla
V100.

Given that we already have a model from CelebA-HQ would it be possible to use
that as a pre-trained model and just train it a bit more with an additional
image? If possible would that speed things up enough to do it on cheaper
hardware in less than a day?

~~~
Geee
I'm definitely not an expert, but I think the way to go is to use an algorithm
that can find a specific face in the latent space. If the latent space has
enough dimensions (trained with a big enough set) it should already contain
your face. The problem is to find it.

Actually I think this could do the job:
[https://arxiv.org/abs/1702.04782](https://arxiv.org/abs/1702.04782)

[https://github.com/simoroma/RecoverGANlatentVector](https://github.com/simoroma/RecoverGANlatentVector)

~~~
summitkwan
Thanks for the reply! I just want to add a little more details. As long as the
training dataset is diverse enough, the trained GAN generator should be
capable of generating images that are similar to the training dataset.
Therefore there is no need to retrain the GAN. However, to "embed" a given
face image to the latent space, we need to either use the optimization based
method [https://arxiv.org/abs/1702.04782](https://arxiv.org/abs/1702.04782)
mentioned in "Geee"'s post, or train a encoding network. The encoding network
would be much easier to train than the decoding/generator network because we
do not need to use adversarial loss. I would love to include this
functionality into my model once I got a job offer and have some spare time :)

------
Corrado
On a related note, I've long been fascinated by how easily we can tell a male
from a female, even from a distance. For example, driving around and looking
at people walking along the road or following people in the mall, I can tell,
with remarkable accuracy, if someone is male or female.

Is it the clothes, or gate, or size? Is it hair style or style of shoes? And
at what age does this female/male separation begin. Generally, I can't assess
new born people very well but by the age of say 5 years old it becomes much
easier.

This software seems to answer some of these questions by doing things like
presenting males with stronger jawlines or females with higher cheekbones.
Does this same thing happen with other parts of the body (barring the
obvious)? I know that females usually have wider hips and as a result walk
differently than males. Are there other examples of differences that we
intrinsically know but don't consciously realize?

NOTE: I'm speaking in traditional gender roles. Obviously I can't tell if
someone is gender dysmorphic from a distance.

~~~
defanor
> Are there other examples of differences that we intrinsically know but don't
> consciously realize?

Probably everything from
[https://en.wikipedia.org/wiki/Sex_differences_in_human_physi...](https://en.wikipedia.org/wiki/Sex_differences_in_human_physiology).

------
chrisco255
Sometimes I think the idea that we live in a simulation is preposterous...and
then I see things like this, and wonder.

------
xamuel
It's fun to create paradoxical faces by alternating between increasing two
contradictory vectors. For example, alternating between +bald, +wavyhair,
+bald, +wavyhair, ... eventually seems to produce unsettling faces where it's
hard to tell where the hair begins and the skin ends.

~~~
summitkwan
Nice try, you managed to push to the limit of this algorithm!

------
ninetax
I wonder if you apply this sort of model to music. Go from a description of a
song to a song that fits that description. I know there are APIs with great
descriptors of songs out there.

~~~
summitkwan
That would be super cool idea and I have thought about it! To deal with music,
we may want to use RNN, LSTM or WaveNet instead of GAN as the backbone
generator and use my tl-GAN idea to find the "feature" axes for controlled
generation. I would love to contribute to such projects.

------
ohiovr
I'm not impressed with the demo gif I've seen basically the same results from
the 90s era software elastic image. (Motion Image morphing)

------
callesgg
Seam more like it is very good at blending images than creating new ones.

Still very cool but not as cool as it initially appeared when I saw it.

~~~
sp332
Check out the first image in the Results section. It's not blending between
two images, it's taking one image (in the middle row) and changing different
characteristics. It's the understanding of what these characteristics are
that's interesting. Taking them all together, they have created a "feature
space" which is a multidimensional space where every point corresponds to a
different kind of face.

~~~
summitkwan
Thank you for the clarification! The latent space is so well behaved that that
I can find linear relationship between the latent vector and the feature
labels, and then use the regression slop as the "understanding" of the space

------
summitkwan
Hi, I am the author of this blog post, feel free to ask me any questions about
this work! Thank you for your comments!

------
moneil971
Interesting but also lends itself to bad actors and faking photos.

------
choot
I am looking for a way to generate photos of me after plastic surgery.

The AI should generate a face which will be most beautiful and trusted in all
cultures.

It will be using my present face and morph things from there according to the
limitations.

Using least incisions in least risky areas of face.

After that we get a surgeon to do perform plastic surgery as per this AI.

I am looking for ideas on how to accomplish it.

------
samstave
Occasionally I have the notion we exist in a holographic reality is
incredulous...when I am presented with examples such as this, and reconsider.

