
Identifying the dimensions used by the primate brain to decode faces - pk2200
https://www.nytimes.com/2017/06/01/science/facial-recognition-brain-neurons.html?_r=0
======
iandanforth
Key sentence - "the correct choice of face space axes is critical for
achieving a simple explanation of face cells’ responses."

They did PCA over two sets of metrics, taking the top 25 components from each
set and then combined that into a 50d space. Using these dimensions and
measured responses to fit a model resulted in explaining 57% of variance in
real cell firing rates. (Much better than other models including a 5 layer
CNN).

This is pretty cool. I'd like to see a follow-up where the chosen dimensions
were further refined using something a bit more iterative that an arbitrary
PCA cutoff.

Also I really want to know what eye motion was present during each trial. This
paper presents a very "instantaneous" recognition perspective and doesn't talk
about integration over time or the impact of sequential perception of face
components on recognition. (Eg an upside-down face is hard to recognize
because your gaze has to move up from the eyes to see the mouth which is a
sequence rarely encountered in the real world)

~~~
highd
A 5 layer CNN is absurdly shallow, so this isn't particularly surprising. I
routinely work with 150+ layer CNNs - that's fairly standard practice if you
want high-quality results.

~~~
forgotpw1123
This comment is absurd. "Your CNN sucks, I know this because I work with
better ones all the time. (insert metric that doesn't mean much)"

Oh, BTW I do ML consulting lol

~~~
zo7
VGG has 16-19 layers, Inception has ~50 layers, and ResNet has 150 layers. All
of which were the state of the art at one point in time over the last ~2-3
years. A more faithful comparison with CNNs would've used one of these models
pre-trained on a much larger face dataset.

Oh, BTW I do ML consulting lol

~~~
forgotpw1123
I don't think it's anywhere near conclusive yet that more layers = better.
It's pretty telling that the current state-of-the-art is combining a bunch of
layers together in a pseudo random fashion. Nobody understands how these
things work to the point that we can make a formula or equation to produce
better CNN's, or even predict which models will be more effective to any
accuracy. You think more layers is better, because the best models we have
happen to have the most layers? Some deep understanding of concepts there.

~~~
zo7
I don't think I or the parent comment are necessarily suggesting that more
layers are better, but are pointing out that the fact that they're only using
5 layers suggests that they're not using a state of the art architecture. You
can't faithfully say "oh a CNN cannot model this relationship" when it wasn't
a thorough evaluation. (especially given that they don't mention modern face
recognition systems like DeepFace or FaceNet, which I'd be interested to see
if there's any correlation between the embeddings they produce if a simple PCA
model works so well)

Also don't be so dismissive, we have a strong enough empirical and intuitive
understanding of CNNs that we're able to make thoughtful improvements over
time. In fact the insight behind the ResNet paper was noticing that adding
layers doesn't improve performance and that training error actually degrades
as layers are added – the solution to this was to construct the network so
that it learns residual mappings that only modify the input rather than
completely transform it. The whole point of that paper was solving this
degradation problem so they could use some ridiculously deep architecture like
a 150-layer network to get better results.

------
lexicality
"It is a remarkable advance to have identified the dimensions used by the
primate brain to decode faces, he added — and impressive that the researchers
were able to reconstruct from neural signals the face a monkey is looking at."

"These dimensions create a mental “face space” in which an infinite number of
faces can be recognized. There is probably an average face, or something like
it, at the origin, and the brain measures the deviation from this base."

"Dr. Tsao said she was particularly impressed to find she could design a whole
series of faces that a given face cell would not respond to, because they
lacked its preferred combination of dimensions. This ruled out a possible
alternative method of face identification: that the face cells were comparing
incoming images with a set of standard reference faces and looking for
differences."

I'm surprised that they didn't attempt to generate a face with exactly 0 on
all dimensions.

It would be fascinating to know what the most memorable face looks like - and
if it's different per-brain. (Presumably it is monkey shaped!)

------
pk2200
Here's the paper:
[http://www.cell.com/cell/pdf/S0092-8674(17)30538-X.pdf](http://www.cell.com/cell/pdf/S0092-8674\(17\)30538-X.pdf)

~~~
paulfrancisco
All the faces seem computer generated. It would have been nice if they had
used celebrity faces we can all recognize and see what their system comes up
with.

~~~
iandanforth
They are real faces that are morphed along one of the 50 axes they came up
with. They look less real because it's standard practice to exclude hair and
shoulders and backgrounds from face databases (even if this is totally
unrealistic it helps isolated the system under study)

------
DanielleMolloy
This paper is highly exciting for anybody working on the neural code and so-
called encoding models.

Only few days ago there was a similar study reading faces from human brains by
trying to construct a latent space:

[https://arxiv.org/abs/1705.07109](https://arxiv.org/abs/1705.07109)

[https://twitter.com/ccnlab/status/866548346751725568](https://twitter.com/ccnlab/status/866548346751725568)
(animation, over time more dimensions from this latent space (afaik PCA
components) are added)

Given the limitations of fMRI (we can not do single cell recordings in human
brains) the results are not as accurate, but to my knowledge this is the best
we can do in humans so far.

------
curun1r
I remember reading about a study [1] that showed that humans recognize faces
based on how similar they are to the faces of their parents. It's well known
that humans are more easily able to differentiate faces within our own races.
But what the study did was look at people who were adopted by parents of a
different race. Those people were more easily able to differentiate faces of
people of the same race as their adopted parents and had difficulty
differentiating faces of people of their own race. The inference is that we
actually store/recognize facial deltas, not full facial images.

I'm curious how this study would explain or contradict the results of that
study. Also, were the monkeys raised by monkey parents or human scientists?
Monkeys that were allowed to imprint on humans might be more similar to humans
and, yet, unrepresentative of monkeys.

[1] I think it was
[https://www.ncbi.nlm.nih.gov/pubmed/15943669](https://www.ncbi.nlm.nih.gov/pubmed/15943669)

~~~
anothercomment
Hm, I always guessed that the brain simply goes by a kind of "maximum
information gain" approach, picking out the features that tend stick out most
to distinguish things. If unused to, skin color sticks out extremely, so the
signal it generates would drown out the other signals.

But that is just a guess and might also go against the findings of the
article, which I haven't fully understood yet.

In any case it seems to me that the brain is optimized for processing certain
features, as apparently there are people who are unable to recognize faces
(face blindness).

------
daxfohl
I'm skeptical. Like "faked results" skeptical. Crime witness studies show that
most humans can't reproduce another human face that accurately. So-so when the
face is at least of the same race, but when of a different race it's a coin
flip as to whether they can even recognize it. (That said, I've only heard
this on various TV shows, never seen actual research, so the presumption could
be wrong). How can primates do so much better with an entirely different
species? Or, not even primates, but some AI going through primate neural
signals?

~~~
tmalsburg2
Very good point, and yes there's a lot of research showing that humans are
terrible at recognizing faces from other races. No idea why you're being
downvoted.

------
ragebol
If I understand this correctly, this works similar to an embedding in e.g.
deep learning: faces are represented by high-dimensional vectors.

Reading the Cell article on this [0] I couldn't help to see the similarities
with OpenFace [1].

[0]
[http://www.cell.com/cell/fulltext/S0092-8674(17)30538-X](http://www.cell.com/cell/fulltext/S0092-8674\(17\)30538-X)
[1]
[https://cmusatyalab.github.io/openface/#overview](https://cmusatyalab.github.io/openface/#overview)

~~~
supernumerary
Interestingly in the article, they try to differentiate the system from
machine learning:

Advances in machine learning have been made by training a computerized mimic
of a neural network on a given task. Though the networks are successful, they
are also a black box because it is hard to reconstruct how they achieve their
result.

“This has given neuroscience a sense of pessimism that the brain is similarly
a black box,” she said. “Our paper provides a counterexample. We’re recording
from neurons at the highest stage of the visual system and can see that
there’s no black box. My bet is that that will be true throughout the brain.”

~~~
taneq
At this stage I don't think "black box" is a very fair description. We now
understand a fair bit about how artificial neural nets encode information and
calculate things. And what we don't understand, we can still see and study the
processes.

~~~
bonzini
We cannot be entirely sure of the behavior in any situation that is
sufficiently different from anything that was presented during training. You
can't necessarily expect a "common sense" response in that case, and in this
sense the neural networks are black boxes.

~~~
taneq
The same applies to any complex system. 'Black box' means 'not inspectable',
not 'anything which doesn't have predictable behaviour over all conceivable
inputs'.

------
mzitelli
Amazing results, it is incredible to see that our brains do the same process
as CNNs, encoding information using multiple layers of neurons to extract
features. This makes me think that consciousness could be just an extreme
high-level temporal representation of our own senses.

~~~
fatjokes
I'm not certain, but I don't think that's a coincidence. I.e. convnets were
inspired by the neural visual system.

------
gech
Can this model be translated into computer vision code? I always wonder if it
means there are new more efficient models still to be found to copy from
nature, or if the model ends up not being the most efficient and just the
result of evolution.

~~~
ragebol
Sort-of:
[https://cmusatyalab.github.io/openface/#overview](https://cmusatyalab.github.io/openface/#overview)

------
folli
How are these macaques able to so finely differentiate faces of a different
species? I'm pretty sure I wouldn't be able to differentiate many macaque
faces from each other.

------
linux2647
I wonder if this is related to why we recognize faces in nonliving objects
such as cars.

