Take this graph:
It looks like women are rated as more than twice as smart as men. Huge difference.
Except until you run the numbers. Women are rated about 4.3% "smarter" than men. Not twice as smart, like the graph implies. Not 20% smarter. Not even 5%.
Please, pay attention to your graphs. They're great tools, but they can mislead as much as they can help elucidate.
First change I would make is put all the red bars next to each other, and the blue bars next to each other; there is more value in comparing across enthicities/genders than there is in comparing across variables. Second change I would make it get rid of the eye-popping primary colors and choose two more neutral ones (two different shades of beige maybe?)
Finally, Y-axis scale changes too frequently. Also error bars would be nice as has been mentioned.
edit: Now that I think more about it, at a glance it would be nice to show a scatter plot with a dot for each of Male/Female/Ethnicity/etc in the way that you have a scatter plot on your home page. http://judg.me
Not just nice but absolutely crucial. If you're looking at differences of 2%, you need tens of thousands of samples to say anything definitive. He's only got 1000 photos total, so I'm pretty sure most of this article is just an analysis of random noise.
Also, maybe I'm just dense but it took me a REALLY long time to wrap my head around the scoring matrix of the website, and even after a few practices, each photo I still find myself having to think painfully hard about what each quadrant means. I have to imagine that this would add even more noise to the data, if people's heads are hurting and don't know where to click.
One suggestion would be to provide an intuitive guide for each of the quadrants, not just the axes, e.g. "Outgoing idiot," "Shy nerd" etc. but of course you risk biasing the responses based on the language you use. The other option would be to simplify the input, only rating one variable at a time.
Pick another 1000 samples, redo the blog post and see if there is anything in common (not that that would make it statistically significant).
Then maybe there isn't a real difference. See statistical significance.
In the future, if you are making a graph like this, and the differences are so small that they don't show up when your y-axis starts at 0, they probably aren't important enough to talk about.
If you've taken a stats class, you probably know there are times when small differences are meaningful. Any time you're analyzing your own data, though, these differences are rarely worth even considering. Taking a survey and analyzing the resulting data is very hard. There are countless nuances that can affect your results, both mathematical and psychological. Look at political polling - even the people who do it for a living regularly get things wrong.
That's the point: the differences are small.
Second, there is a huge problem with causality here. So for instance, the author writes: "Be Asian if you want to appear smart; Latino if you want to appear extroverted." The problem is that there is a methodological flaw. On the first photo I saw on judge.me, I was presented with this image: http://images.judg.me/82e7fcbd988dbdcac0d00bd53fb93e96.jpg This would appear to me to be a latino or hispanic male at a party. I'm highly inclined to rate them highly on the extrovert scale: they're at a party. But that doesn't indicate stereotypically latino or hispanic features indicate extroversion. It could be that people with stereotypically latino or hispanic features were more likely to upload photos in which the image portrayed a more stereotypically extroverted activity.
Third, it appears that users can upload a photo to the site and see their feedback from votes. It seems highly possible that users self-select a photo that will best affirm the image of themselves they wish to cultivate. In that respect, there's both a huge confirmation bias and huge self-selection bias. If I want to think of myself as an academic, I'll upload a picture of me at my desk studying and watch the "intellectual" ratings pour in. Then I can feel assured that other people perceive me the way I want to be perceived. Additionally, if one wants to conform to social expectations (and things like Asch's line test http://en.wikipedia.org/wiki/Asch_conformity_experiments indicate conformity is common), this data might really be nothing more than showing the degree to which people post photos affirming their conformity to their social expectations (i.e. 'smart' ethnicities posting 'smart-looking' photos) and be saying nothing at all about how people actually perceive ethic cues.
There are huge methodological concerns for this 'study'. Instead, the revelation of this data might actually be the insight that "pictures of yourself at social events makes you look more social." Taking much of anything at all away from this data set would be rather unwise.
The problem is none of this matters since we don't know anything about the people rating the photos. Not their sex, not their age, not their location nothing.
"and nothing about users who judge the photos."
Which is especially problematic since user generated ratings are ordinal, not interval data. Since the idea of an interval between points in ordinal data is essentially meaningless the summary statistics you mentioned are not meaningful either.
It's one thing for Amazon to come up with a mean user rating to give you a sense of how people like something, but it's not a valid method of comparing the data we have here, especially when the differences are so small
E.g. in a model that predicts students' GPA, you could divide your data into a hierarchy consisting of, at the highest level, geographic area, followed by high school, maybe followed by teacher. In that model, the correlation between students who are in the same state, the same school or in the same classroom would be accounted for. You could even go as deep as at an individual level if you have >1 observation per student.
In addition to regular predictive variables, judg.me could probably use their weblogs to group people's judgement scores by country of origin and by individuals, among other possibilities.
You should also check intra-coder reliability, i.e. give the same person the same set of photos with two weeks or so between. You can then again calculate the correlation. This tells you wether your categories are too fuzzy (e.g. what exactly is medium length hair?).
All in all this has serious methodical flaws, from a social science perspective it’s not salvageable, and I haven’t even talked about the complete lack of statistical tests (which, to be honest, would just be like polishing a turd).
If you didn't plan to use gross ratings like this blog did (I think), then I'm pretty sure that you could do a post-normalization by analyzing the frequencies in the sample and determining how much you'd expect each of the traits to affect the rating for every other trait, then trying to determine the if the deviations from that were statistically significant in a universe that contained only those traits.
Honestly - just take the original data and assign every trait a 5 rating, then pick a random trait and pull that value up or down, then check and see what the gross ratings now say about the other traits.
I apoligize if the methodology was more complicated than it looks, and I hope there's a link to the spreadsheet of the original distribution somewhere in the blog that I missed, so someone could make sense of this data.
Who are the users that are judging? What is the breakdown of those users (age,sex,location,education etc.)? What can possibly be inferred from this without knowing that info?
The entire premise of the site is for the user to be judged by strangers. Why would age/sex/location/education of the person doing the judging matter?
Not trying to be overly critical - I like the concept and execution, but these things are really important in statistics.
Unless you're very careful with adjusting for factors (like age, location, etc) and careful with statistics you end up with garbage.
Either do something that's just purely fun, or do something properly and call it research. But don't do something wrong and call it research, because there are so many people ignorant about how science works and ignorant about numbers.
It can bias the results. How did your users find your site?
I am disappointed. I was recently thinking about how people are judged based on looks (and blogged about it) so was hoping for/looking forward to something meatier.
It is an interesting study so I hope they update the post once they have been in business longer.
I have a statistics final exam coming up; if the complete data set were available, I'd love to play with it for practice.
If most black woman that sent their photos are fat and people don't rate fat woman high then the black women will be rated low not because of the race but because being black woman and being fat woman is correlated in the sample data.
Owner of such sites have large sample of some data and they assume that large equals representative and they go on slicing their data by different parameters not controlling for anything and making statements that are only technically true with respect to their data but strongly misleading in many ways.
Similarly, being introverted doesn't mean you have low social skills.
I don't know how you can say that.
The definition here: http://dictionary.reference.com/browse/introvert?s=t
implies low social skills.
Even when people discuss it here, they talk about wanting to be left alone, not going to parties because they aren't interested in socializing, and feeling "weak" after socializing for a short amount of time.
With all of this, I don't know how your social skills could ever be considered high.
Social skills are the ability to make conversation, to make people feel comfortable, to engage others, etc.
Extroversion is the desire to do those things.
Yes, they are clearly correlated, but by absolutely no means interchangeable. There are plenty of introverts who have perfectly decent social skills, and plenty of extroverts who have terrible social skills, because the desire for something is not the same as having a skill for it. For example, an adolescent extrovert may feel strong motivation to be outgoing, loud, talkative, and active, but that doesn't mean he will have good social skills and be polite.
The most common psychological definition of extrovert and introvert that we currently use was largely shaped by Carl Jung, and his definition involves far more than just sociability. However, sociability is the easiest application. What it really comes down to (as the link you provided points out) is a focus on the "external world" vs the "internal world", so I use the word "are" in those two definitions above very loosely.
It also has a definition used in psychology as well, which is something completely different, than the term used colloquially.
I know you mention a random sample of 1000 images, but what were your overall metrics? Did you have a good data set across the board (ie as many Hispanic females as Caucasian males)? What kind of advertising did you do as well?
Reason I ask is I've been working on trying build a face-morpher based on different criteria (make you look 80, fat, African) and these are some of the questions I've got bouncing in my head about how to collect the data.
Law of large numbers says that your error should scale like 1/sqrt(N) where N is the sample size. In this case N = 1000, so 1/sqrt(N) ~ 3%
This measures 1 STD (68% of values lie in an interval of 3% of reported value). To be on the safe side you should take 2 or 3 STDs for the error bars. This already nullifies most of the results!
The actual site rates extroversion vs introversion, but the analysis here mistakenly uses the term social scale, implying that extroversion and sociability are interchangeable. They are correlated, but by absolutely no means are they interchangeable. This analysis should have stuck with the original vocabulary more consistently.
It is hard to determine significance from these graphs, especially as pflats commented that the y-axis are skewed.
But I wonder how many long haired man were in the sample. They are quite rare and a few ponytail grad students might lower the score.
Best of luck.
Sorry but that is a turn-off to me.
It's pretend deference and, to me, mildly sexist. He said men but not 'gentlemen'.
In other words be a rugged, outdoorsy, all-american white guy.
Pretty sure this is only confirming what was already common knowledge.