Hacker News new | past | comments | ask | show | jobs | submit login
Deep Chernoff Faces (ihatethefuture.com)
138 points by pxx 80 days ago | hide | past | favorite | 33 comments

The idea behind Chernoff faces (or using faces for data visualization) seems good: we humans are very good at distinguishing faces, so we can quickly find groups and outliers if the data is encoded with a face.

But we have to be careful with this. Changing facial expressions is not the same as increasing the height of a barplot, we're relating features with expressions and the visualization might express things that you don't want.

There is a very famous example for this in "Life in Los Angeles" (1977) by Eugene Turner [1]. Maybe you can infer the data well but in the end this just ends up being a map of angry black people. The choice of features and how to visualize them is clearly racist.

[1] https://mapdesign.icaci.org/2014/12/mapcarte-353365-life-in-...


That map colored black people dark like their skin, and encoded misery as unhappy faces.

The result accurately showed happy white people and unhappy black people. How is it racist to acknowledge the racially biased distribution of suffering?

I was involved with the development of a system once that was trying to show a required minimum diversity in hiring. Quite by accident, this ended up being an outline person that was filled in more as diversity got better, and both the outline and fill happened to be black. It took a while for this to be noticed, and I think they eventually swapped it out for a small pie chart instead.

See the legend. Things are ranked from "Good to bad", (Urban stress, unemployment) and proportion white population is right there in parallel, screaming "White"=="Good".

There are a number of other lesser reasons, including conflating regions of high urban stress as "Evil" or "Angry" instead of just unhappy[0]. The face visualization just implies too much of a value judgement on the data -- too correlated with the issue (yet misrepresentative) to be a good idea imo.

[0]: Even happy/unhappy may not reflect well at all this variable (again because that's a complicated human emotion, not a simple function of "Health crime and transportation factors").

Edit: I think it's also not necessary to call the map creator (or maybe the map itself) racist, this implies some kind of intentional discrimination (and is quite strong imo), but it does have to me grave mentioned problems.

Yeah, I am also having a really hard time reading this visualization as perpetuating any stereotypes - assuming ofc that the data itself is correct. It would indeed by perpetuating them if it were not the case that urban stress is disproportionately felt in the Black population - but even proponents of racial justice are quite clear that this is the case in many American cities.

The legend uses the words "Low/High" and not "Bad/Good." It's a quantitative measure, not a moral/aesthetic judgment.

The conclusion of this graphic would be - There is more stress in South Central LA, which has a higher proportion of Black Americans. Or possibly, the least stress is towards the West, where there are relatively fewer Blacks.

I guess if you are concerned it will read as, "In South Central LA, there are angry black people, don't go there!" - that leap of faith will be made regardless of how you visualize the data. "Poor people are naturally lazy/violent/immoral," is a stereotype that has existed for far longer than any widely accepted attempt at data visualization.

> The legend uses the words "Low/High" and not "Bad/Good." It's a quantitative measure, not a moral/aesthetic judgment.

This is definitely not true -- yes it uses Low/High, but different variables have different qualities for Low/High -- and "Good" is always on top. "Low" unemployment is on the top while "High" affluence is also on top (i.e. not purely numeric). That should be obvious because the positive emotions are being associated with positive ("good") variable ranges.

I mean, if a quantitative measurement were the main focus here surely we shouldn't be using faces which are known to be loaded with emotions.

You did not seem to address my main concern, which is specifically that low white percentages are portrayed as negative in the legend. (I'm trying to avoid the Motte/Bailey here) -- do you disagree?

> Poor people are naturally lazy/violent/immoral

I think it's more excusable to equate "Poor"=="Bad" than "Certain race"=="Bad", because it's generally accepted being poor is undesirable, while you can't change your ethnic background (and generally I don't think you should)

I think I see now why someone else asked, "Would it be okay if the legend was horizontal?" Because then you don't have the "on top/below" connotation. Is that what you mean by "portrayed as negative", because they are visually lower?

Re how faces are emotionally loaded, that is somewhat the point of using Chernoff faces as a visualization method, though I understand your concern that therefore it SHOULD not be used as such a method because it will convey ideas not implied in the data because of how we interpret faces.

Fundamentally though, to go to this data set in particular and away from the merit or otherwise of Chernoff faces, there's always going to be a tension to depicting the correlation between race and wealth in the US. One way or the other, you have to say the same thing - Blacks are poorer; and/or Blacks have lower factors of general well-being (though somewhat unintuitively, not lower levels of hopefulness.) And you can't avoid the fact that if you have a data visualization that really brings home the correlation then someone is bound to assume causation in the wrong direction and feel the data validates their racist feelings.

But that doesn't make the attempt at using that visualization a racist one.

So you would be ok if row order of columns were randomized?

I would find it better if the proportion white population were placed horizontally or elsewhere entirely.

The other comment noted those metrics are "objective" and only "High"/"Low", but in all cases the least desirable situation (i.e. "bad") is the lowest. I read charts and benchmarks all the time and a usual cognitive shortcut you look for is good/bad (e.g. you see a decreasing graph -- if it's latency that is "low"=="good"; if it were profits it's "low"=="bad"), I'd be surprised if people didn't make similar quick assessments. Note that high white rate is also up, which is usually good too (profits, growth, etc.).

I guess no representation is perfect, but I think at least ethnicity could/should be separated here.

> The idea behind Chernoff faces (or using faces for data visualization) seems good: we humans are very good at distinguishing faces

Except that:

1. We humans are actually _ABYSMAL_ at distinguishing faces (https://en.wikipedia.org/wiki/Cross-race_effect)

2. The ability to differentiate between two things and the ability to translate attributes into metrics are fundamentally so different from each other that any possible truth to the idea instantly becomes wildly irrelevant. https://eagereyes.org/criticism/chernoff-faces

One case of being imperfect isn't "abysmal". Cross facial discrimination is far more accurate than cross-species body discrimination, or wood grain discrimination, or many other things. There are brain regions detected to be dedicated to facial recognition.

> Turner does a good job of building a facial profile out of social conditions and ethnicity. It’s a simple map but one that characterises the spatial structure of socio-economic life in Los Angeles. It’s also a provocative and arresting image and one which is difficult to hide from.

That's a very successful application if you ask me. It shows unhappy black people. They are displayed as unhappy because they are unhappy. The focus should be on how to make them happy, not 'racism'. It is displaying the effects of racism for all to see.

There is an implementation of Chernoff faces where a fish is used instead of a human face, so it is called Chernoff fish:


It is implemented in D3 and React and the source code is here:


I implemented my own version long ago (for MS-DOS) and I am quite surprised that there is still some interest in the topic.

Super cool! Also, a character visualizes data this way in Watts' "Blindsight" novel.

Yeap, that was my introduction to the concept.

Blindsight is wonderful. A bit difficult to follow at a time due to the 'unique' writing style. In the same novel it tackles uploading consciousness, vampires, artificial intelligence, aliens, psychology (heavily) and a few more things.

I've yet to find a better description of what would be an actual alien lifeform. Even if it is biased a bit on marine biology given the author's qualifications, the oceans are host to the most 'alien' environments we are aware of that still contain life.

I still can't wrap my head on the concept of the Icarus Array though.

Anyway, for the uninitiated: https://rifters.com/real/Blindsight.htm

Came here to say this! That creepy cool chernoff face utilizing vampire captain. So cool.

Given that humans were prey for vampires, the variables selected by the captain showed faces in different states of anguish, which the predator was well-equipped to detect. So it is even creepier.

Chernoff faces are an old idea that have become more of a joke (eg https://dl.acm.org/doi/pdf/10.1145/3170427.3188398 ). The problem is that they are weird enough to occasionally lure in well-intentioned people who are new to datavis. Encoding abstract data as faces results in unpredictable, unrelated, gross visualizations that are difficult to read. The only good one I have seen is https://projects.propublica.org/graphics/workers-compensatio... and it’s creepy.

It kind of reminds me of the genre of study material in Japan that personifies everything as a cartoon character. Here is the periodic table as little figures https://resemom.jp/article/img/2017/03/16/37127/162618.html

and this book is it as Manga girls.


One of the weirdest research project I ever got involved in was 'Real-Time Feedback System for Monitoring and Facilitating Discussions'. Using a game engine as the platform, it used dance-off moves to visualize social interaction within a discussion.

One quote from the paper taken from 'The Craft of Information Visualization' (Bederson and Ben Shneiderma): 'Humans can recognise the spatial configuration of elements in a picture and notice relationships among elements quickly.'


this is a very interesting idea, but I think using gender and skin tone to represent data differences is potentially problematic, particularly if the data has any sort of normative meaning. the other variables are interesting though

I think this post is meant to be a joke.

It’s not funny.

It's not a joke about data or race or sex, and it doesn't make any claims about data or race or sex, and it doesn't signal anything about data or race or sex either intentionally or accidentally, and it doesn't do those things because it's demonstrably satire about how chernoff faces are a terrible idea. See the first footnote on the word "favorite" and the second footnote on the word "Clearly".

I think your contempt is a serious misfire.

I’ve got to pull the alarm on this one. This is a huge liability to objectify and promulgate prejudice. It’s literally teaching people to associate good or bad distinctions with specific faces or facial features. How is that not objectifying? I think this should be treated highly suspiciously and probably not done at all.

Edit-I see people may take this as a joke. Sorry, not funny. Especially not right now.

You have to be joking. This is obviously one programmer's experiment on some esoteric data visualization mode that nobody ever used. The author even says in a footnote that:

> "One of my favorite¹ concepts for multi-dimensional data visualization is the Chernoff Face"

> 1: "Favorite" might be code for "useless," going with the theme of this blog"

Don't act like this project will have actual real world uses and ramifications. No, it won't literally teach people to associate good or bad qualities with facial features, because nobody will ever use it for practical purposes. No, there's no liability to promote prejudice, this is just some programmers ML side-project.

> How is that not objectifying?

It literally is. The generated faces are mathematical objects. They're not real people.

In the faces I saw there was a clear connection between race and facial expression of happiness or sadness.

So? That just means that two variables are closely related. You have to be very blind and ignorant to take the implication that white people are less happy, or something. Is it racist to take a photo of a sad white person?

Your comment is pretty alarmist, but I don't find the probability of this article (or Chernoff faces in general) having a negative impact on racial relations very high. Maybe you can convince me otherwise, but I think it's just an interesting, if not very useful, way to display data. I'll leave it up to you to come up with a compelling (plausible) scenario in which this negatively impacts anyone.

It is using that we are used to look at faces as a way to display multi variate categorical variables. Not all of such dimensions are bad/good dichotomies. Deeming a visualization technique as evil because it can be used to poor taste may seem clever, but ultimately will leave us with no visualization techniques left.

it's definitely a joke. nobody uses chernoff faces

> Especially not right now.

Right now what? Sounds like you're jumping at shadows. In general false positives are good for finding errors in systems. However, they can go too far and lead to pathological patterns like human immune system self-alergies.

Applications are open for YC Winter 2021

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact