Hacker News new | past | comments | ask | show | jobs | submit login
People of class drink alcohol (discovermagazine.com)
89 points by iamelgringo on May 3, 2010 | hide | past | favorite | 60 comments

Please note that some of the graphs there start with 50% as original y-axis. It creates an appearance of males drinking twice as much as females, for example.

I guess if you're the kind of person who likes looking at bars instead of reading the text that gives them meaning as well.

I think a floor set at number like 0.5 and with a linear scale is pretty reasonable for readability, unless you're a lout and ignore those little letters that tell you what things actually are.

WTF is the point of a graph if you have to read the fine print to avoid misinterpreting it? Why not just use a table? A graph is supposed to convey the relative magnitudes of the data points, not just their ranking. If I just wanted to show who drank more than whom I would just list them in order.

Looking at these graphs you'd think that: Almost nobody in the south drinks; The vast majority of blacks do not drink; Catholics are twice as likely to drink as Protestants.

And so on, when in fact these are all false conclusions. The real conclusion, which you would easily get from properly scaled graphs, is that the majority of people drink across almost all demographic groups.

If you blame me for drawing false conclusions from such graphs I blame the shitty fucking graphs for being misleading. I mean, they don't all even start at 50%. One starts at %40 and one at %0. Seriously, these graphs are garbage.

Er, so all graphs of temperature should start at absolute zero? It's absurd to expect all graphs to start at zero -- the point of a bar graph is to show relative comparisons, and the whole point of the labels on the axes is to relate the graph to an absolute scale. You even admit this, "[a] graph is supposed to convey the relative magnitudes of the data points" -- Exactly, and so the graph should be scaled in a way that efficiently conveys the relative information. I apologize for the ad hominem, but this is just serious PEBKAC: you need to learn to read graphs. The mistaken conclusions you arrived at could have been avoided if you paid attention to the labels.

Yes, a poorly made graph can make a tiny effect look huge, e.g. if it was the case that 99% of people drank, and the graph was just plotting variation within the remaining 1%, that could be pretty misleading. This is yet another reason that the consumers of information should pay attention to the labels and understand what the real message is from the data. Don't expect the creators of the graph to present the data in the "fairest" way (whatever that is); instead it's our responsibility to think critically and make sure we understand what's really going on. The creator of the graph will naturally act to advance his own interests.

tl;dr: It's a mistake to consider the labels to be "fine print".

You are both missing the forest for the trees. Graph axes should scale in proportion to observed variance of the statistic being shown, automatically creating a "natural scale". The magnitude of the scale is of course proportional to the error in the measurement(s) you are showing, it has nothing to do with absolute numbers.

"The creator of the graph will naturally act to advance his own interests" -> "The biased creator of a graph with too much riding on his hypothesis will surreptitiously act to advance his own interests". People who do this are not worth listening to.

Fully agreed/conceded. I was addressing the graph consumer's point of view. There's definitely good practices the graph creator can follow to most effectively/honestly convey the information contained in the data.

When I see a graph that start's at some random point on the Y axis I assume the person is extremely biased and instantly ignore what they are saying. If I don't see error bars on data points I assume they have zero idea how accurate there information is.

Unfortunately I suspect that this approach is less common among the under educated because trying to create misleading charts is vary common.

You make a good point about temperature, but I stand by my critique because temperature is a quantity and these graphs are all plotting percentages, which should always been graphed from 0% to 100%. If your percentages are all within the range of 99% to 100% then you shouldn't be plotting them as percentages, and if you did then your graph would effectively convey the point that there really isn't a lot of variation among the data.

I'll reiterate my opinion that the purpose of a bar chart is not to give relative orderings (if you want that just list them in order) but to illustrate the magnitude of the differences between data points with respect to the natural range of the data. Percentages have a natural range of values: 0-100. Temperature does not have a natural fixed range, unless we're talking about the weather which is incidentally why I like Fahrenheit.

> with respect to the natural range of the data.

There's hardly ever such a thing as a "natural" range. Even if there is, as in the case of percentages, what's the point on wasting half of the graph on variation that has a mundane explanation? Like if you're plotting popularity of religions in the USA, do you really want to spend about 90% of the vertical space of your graph on Christianity? The point a graph is to shed light on the part of the variation that doesn't have a mundane explanation -- i.e., the interesting part.

What you should be complaining about are graphs that don't have labels, as used by some drug companies in advertisements[1]. Maybe that "graphs without labels are bad" meme has maybe been misinterpreted here.

1. In the USA drug companies are permitted to advertise prescription medications.

> There's hardly ever such a thing as a "natural" range.

Another good point, but if bar A is twice as tall as bar B then variable A's value had better be twice that of B's. Anything else is precisely the kind of shady drug-company data manipulation that you're so concerned about. If vertical space is at an absolute premium (not the case here IMO) and you have to cut the bars then draw little cut marks on them to let the reader know that the bottoms aren't at zero.

(Of course I'm talking about cases when there is a familar concept of zero, like percentages or money or population, not temperature. And if the value goes negative then the bar grows down, cf. any recent chart of US economic indicators.)

If you don't base your bars at zero then your bar chart exaggerates differences between data because the reader looks at two bars and compares their relative sizes the same way he looks at the levels in two glasses of beer to compare their volumes.

You're contradicting yourself again. If there's no "natural" scale, how do we decide when something is "twice" as big as another? How do you decide which scale is "natural" anyway? And it's still pointless to spend most of the vertical space on redundant mundane information when it could be used to actually show the relationships between the groups being graphed.

Overall you're introducing ad hoc and nebulous special cases that are poorly justified, like the notion of a 'familiar zero'. The real solution is to learn to properly read graphs, and to ignore graphs that omit labels.

> like percentages or money or population

If you're plotting the net worth of the top 10 richest people on Earth, do you really want to use a miles-long graph or drown out all the actual information by just showing that the top 10 richest are within one pixel-worth-of-dollars of each other?

We decide X is twice as big as Y when X = 2Y. This is not up for debate.

And re. net worth: That's exactly how I would plot it if I wanted to make a point about income inequality. Carlos Slim is worth US$53 billion. The per capita income in Mexico is US$13,500 and in the US it's $46,000. (Yes I know this is comparing yearly income to lifetime accumulated wealth but ignore that for a moment.) If I give the average Mexican 1 pixel and the average American 3 or 4 pixels, then Mr. Slim gets almost 4 million pixels. At 96 dpi that's roughly 3400 feet. Your monitor would have to be (much) taller than the Burj Khalifa (2717 ft) to accurately show this bar char. This is the picture I would draw if I wanted to make a point about income inequality, a person sitting next to the Burj Khalifa with a ridiculous laptop in is lap that towers over it.

If I just wanted to rank the top 10, I would just list them in order like they do in Forbes.

> We decide X is twice as big as Y when X = 2Y. This is not up for debate.

Yes, but doubled relative to what? e.g. what's twice as big as 50F? Is it 100F or 559.67F? The first one is relative to 0F and the second is relative to absolute zero (50F is 283.15K).

Your point about how to illustrate income inequality is fine, but that's not at all what I was talking about: I was talking about a graph that depicts the distribution of net worth amongst the top 10 richest people. No, a rank order doesn't fully convey that. A table of numbers is hard to understand in a holistic way.

There's a popular, if a little contested, notion of scales that I won't source right now that divides scales into ordered, interval, ratio and absolute depending on things like whether there is such a thing as "twice as big". Fahrenheit is an interval scale, which is why we have Rankine (ratio).

Twice as hot would, in a physics sense, be 1019 °R (560 °F) because SI temperature uses the ratio scale. Fahrenheit is then just syntactic sugar for people who don't care about the physical temperature as much as they care about the range of their comfort zone. When measuring deviation from 0 °F, Fahrenheit also becomes a ratio scale, but not a very useful one (deviation from 50 °F would be better, or you could just use Celsius).

So the real question is, what do you mean "twice as big"? Twice as much warmer than "pretty cold", or twice as energetic?

As for the 10 richest people, I'm not sure they're that near each other, but assuming they are, that's going to be the important bit. If you don't want to focus on that, you have a few options. You could draw the cutmarks. You could make it a zoom lens picture. Or you could establish a baseline, such as the highest income tax bracket or the median top 10 income, offset all incomes appropriately, and draw the bar graph from that. In the latter case, you should draw some bars as negative, and label the baseline appropriately (not 0). I don't think it would make for a good graph.

However, for a distribution, absolutely no offsets allowed.

No, you would use a logarithmic graph for income.

You always chart money starting from zero.

In the case of billionaires you are looking at an estimate of there net worth so while you can chart Carlos Slim Helu & family at 53.5B and William Gates III at 53.0B if you start the chart at 50 billion your error bars would take up 50% of the chart's vertical space.

And that's the major reason why trying to create a chart that exaggerates differences is such a bad idea. If your error bars are 5% and the gap is 1% then pretending there is a significant gap is stupid.

Ok, lets plot uptime on a scale of 0-100%. 99% uptime, 99.9% uptime, 99.9999% uptime, it's all just short of 100%.

Similarly, I'll go to my boss and plot success rates of a certain operation (can't say what due to confidentiality) which usually fails, but occasionally gives us a big win (success rate 0-3%). I'll be sure to plot on a range of 0-100% so that he knows we almost always fail. I certainly don't want to cheat and make him think my improvements of 1% -> 1.5% are a big deal.

In the case of uptime I'd use an inverse logarithmic plot of the amount of downtime.

You chart downtime as a percentage, because you don't actually care about uptime.

Ok, I'll chart downtime as a percentage on a scale of 0-100%. Same problem.

Don't play dumb. Nobody is saying you should plot ROI on a 0% to 100% scale when you have a 730% ROI.

Also, if you are charting rates that go from from 11/1000 to 15/1000 then building a chart that goes from 0% to 2% is reasonable, but 1% to 1.6% is not.

Another good point, but if bar A is twice as tall as bar B then variable A's value had better be twice that of B's.

That's ridiculous. What if the things that you're comparing have an exponential relationship? You should probably use a logarithmic scale---that's what it's designed for, after all.

So then A's logarithm is twice as big as B's logarithm. I don't see the problem here, as long as you put your zero at zero and label your axes.

A grid would help too, because logarithms are rarely perceived and the information consumer needs the warning.

The issue is not at all readability. The issue is that the way the data is presented changes your interpretation of it.

If you try to puzzle out the meaning from these graphs you can. But even if you do you're going to have to fight with the visuals to do so. Is 70 twice 60? No, but it looks like it is when the x-axis starts at 50.

You shouldn't have to try. You shouldn't have to carefully scrutinize all of the numbers there. That's the entire point of presenting the data visually.

   Is 70 twice 60? No, but it looks like it is when the x-axis starts at 50.
However, when you start the axis at 0, the differences between the differences seem very small, while they are what is interesting. That is why this graph start at a different value: the author isn't interested in the absolute numbers: he's interested in the differences from a baseline. Now he could have shown a graph where '0' was the average and plot all the differences, but then many more people would have become confused. Instead he choose the '50' as a baseline, arbitrarily.

Both you and the grandparent are completely wrong in acting all indignant that someone would display a graph like this, when it is a well accepted way to draw attention to the aspects that matter. If I started such a graph at 0, my thesis advisor would rhetorically ask me "What are you trying to convey here?".

From any graph, you can draw right and wrong conclusions. You insist that this graph is wrong because the simplest conclusion you can draw from it is wrong. Well, that's not the fault of the graph: that is the fault of you trying to interpret the graph in an overly naive way, without considering the point of the graph.

The differences between 70 and 60 is not that small (~15%).

There are many ways you can use chart's to try and mislead people, but when you need to do that it's a sign that you are trying to suggest something that's not true.

PS: The first words out of a competent thesis advisory when looking at those charts would be "where are the error bars?"

But the data we're dealing with here isn't on the order of a few percentage points. The values range from about 45-85 on most of them, which is plenty large enough to see given a full range from 0-100. And there's absolutely no reason to use three different scales for four different graphs when the data on all of them is in roughly the same range.

I'm not insisting that the graphs are wrong. I'm saying that they're done badly because they're confusing and make it very easy to draw incorrect conclusions about the data.

In Australia, people of all classes drink like they have just crossed the Mojave desert, barefoot, after escaping Alcatraz.

According to this listing, Australia is not even in the top 20: Luxembourg, Ireland and Hungary seem to be the top three.


I'm wondering if the article in Wikipedia is really based on "alcohol consumption" and not Alcohol reported to be sold in the country?

There is a clear reason why Luxembourg is on top. The taxes on alcohol is (very) low in Luxembourg and a lot of people from the countries closed to Luxembourg (Belgium, France and Germany) are buying their alcohol there. We don't forget the travellers (Luxembourg is a crossroad in EU) making a stop in Luxembourg...

That explanation could work for Luxembourg. But Ireland?

I hang out a couple times a year in Europe with a group of primarily Irish, Germans, and Danes and the Danes are the only ones I ever see drinking at breakfast. They're also the only ones who've ever commented on the fact that I wasn't having a beer at lunch...

> According to this listing, Australia is not even in the top 20:

That was back in 2003.

These days Australians take their drinking far more seriously:


  SOURCE: OECD Health Data 2005
That data set is not very recent as well.

That list doesn't account for the type of alcohol. A liter of whiskey would do more than a liter of beer. Point: Australia could be drunker if they drink relatively more hard alcohol than those higher on the list, giving mahmud the win.

I'm pretty sure they count only the quantity of ethanol consumed.

Yes. It lists 12l of alcohol for Germany where the average person drinks more than a hundred liters of beer.

That answers it. Half of my social circle in .au are Irishmen ;-)

Um, have you visited the U.K. recently?! Based on their ethanol consumption, I think embalmers have a pretty easy go of it.

This article seems to be claiming that people with a larger vocabulary (as tested by a vocab test) are more likely to drink, not 'people of class'.


Summary of that comment: Attitudes towards scripture is the primary predictor, though WORDSUM and education still have significant effects.

Interesting that this correlates with my experience as well. I moved from New England down to North Carolina and the difference in alcohol intake (and religion for that matter) is truly startling.

This article seems to be claiming that intelligence = class.

Actually, it's implicitly claiming that large vocabulary is the same as class.

Which may sound shocking, even offensive at first. However, the correlation between vocabulary and socio-economic class is a very concrete result of many other studies, such that it is a reasonable thing to leave implicit in this case.

Classy people use small words.

Yes, and they tend not to utilize them.

Nah, they just score higher on a vocabulary test because they're articulating extra crisply.

Working in a public library is a good way to dispel illusions about drinking being classy, by the way. We could always tell when some of the veterans got their monthly checks...

Also: Bike thieves do suck, and these graphs would send Tufte on a crying jag.

Ironically, I learned a new word: 'confound' (http://en.wikipedia.org/wiki/Confounding ). And then I realized that the last chart has confounds. Intelligence and vocabulary are correlated, but worth differentiating in this case.

A higher vocabulary is pretty strongly correlated with being well-read. I think we can trust that. And being well-read implies being exposed to different things (though it is by no means the only way to be exposed to different things -- i.e., those who are well-read [subset operator] those who are exposed to a wide variety of things.

But those who are exposed to a wide variety of things are highly correlated with those who wish to take risks to be exposed to new things. And the latter group seems pretty well correlated with those who drink.

Note, intelligence only 'lurks' near vocabulary.

being well-read implies being exposed to different things [...] But those who are exposed to a wide variety of things are highly correlated with those who wish to take risks to be exposed to new things

I don't follow this reasoning. Being well-read implies being exposed to a lot of different books, not a lot of different activities. In my experience, people that read a lot are not people who take risks to be exposed to new things, precisely because they have rather intellectual hobbies and they have less free time due to reading a lot.

Good point. And I don't know how the vocabulary test was made. I think the monotonicity can be explained by, in general, people who read marginally more are marginally exposed to more things (even after to discounting for the important fact that there is less free time and they're more intellectual, etc.). But it's a very good point.

It seems like there are a lot of different factors.

From another article: "In the 1998 to 2004 data, each point higher on the Wordsum test causes a $1,200 decrease in income." [http://www.halfsigma.com/2006/07/higher_intellig.html]

So where does "smartness" fit in? I hate to imply causation, but maybe:

"Wordsat smartness" -> annoying person who uses big words other people don't understand all the time -> social isolation -> less money -> alcohol!


I don't know about the word test scores but the rest of the demographic info doesn't look right. For instance, the second chart. The data suggests that statistically, the odds of a girl drinking are higher than an african-american drinking.

By the way, very very misleading graphs. I was totally shocked by going through them. I had to read the comments here to realize what was being shown.

Apparently, here is the vocabulary test used at the GSS and referenced in the last graph:


At first I was like, “Oh shit how can anyone not get a 10, these words are easy. Society is doomed!”

Then I saw how the choices were actually really strange for some of them. For example, “Emanate” is meant to be matched to “come”, but that's a very poor match. Similarly “Space” matching to ”room” is a tricky sense.

I was curious where the data came from, since the article didn't really explain it. Here's the details:


I thought this was a linkbait title, but it actually is the title of the original article.

One of many thoughts appropriate to this dubious claim: often, questions like, "Do you smoke pot?" or "Do you drink?" have more to do with identity than with the actual frequency of consumption.

I'd bet that people "of class"-- well-to-do, educated, liberal-- are more likely to say "yes" to the question, "Do you smoke pot?" However, a large number of such people are those who haven't smoked for 20 years-- those who wouldn't be averse to smoking pot if it came their way, but haven't looked for it or been near it in a long time. (I have no problem with people who use marijuana, but the fact is that to get regular access often requires association with "unclassy" characters, because most drug dealers are creepy and the process of asking for access is degrading.)

If you look at who is actually smoking pot, and weight by frequency of consumption, I'd guess that the correlation goes away. Same with alcohol.

People always shoot the messenger for pointing this out, so I'll expect about a million downvotes for it, but in case you case didn't know it already GNXP is the cryptoracist camp along with Charles Murray and Steve Sailer. Seriously, check out some of the comments.

Okay, now you can downvote me.

In other words, we should dismiss a priori the possibility of any racial correlates?

> I knew that blacks were more likely to be teetotalers, and expected that women would be as well.

Is there anything scientific about this at all? The whole article sounds fishy.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact