I think a floor set at number like 0.5 and with a linear scale is pretty reasonable for readability, unless you're a lout and ignore those little letters that tell you what things actually are.
Looking at these graphs you'd think that: Almost nobody in the south drinks; The vast majority of blacks do not drink; Catholics are twice as likely to drink as Protestants.
And so on, when in fact these are all false conclusions. The real conclusion, which you would easily get from properly scaled graphs, is that the majority of people drink across almost all demographic groups.
If you blame me for drawing false conclusions from such graphs I blame the shitty fucking graphs for being misleading. I mean, they don't all even start at 50%. One starts at %40 and one at %0. Seriously, these graphs are garbage.
Yes, a poorly made graph can make a tiny effect look huge, e.g. if it was the case that 99% of people drank, and the graph was just plotting variation within the remaining 1%, that could be pretty misleading. This is yet another reason that the consumers of information should pay attention to the labels and understand what the real message is from the data. Don't expect the creators of the graph to present the data in the "fairest" way (whatever that is); instead it's our responsibility to think critically and make sure we understand what's really going on. The creator of the graph will naturally act to advance his own interests.
tl;dr: It's a mistake to consider the labels to be "fine print".
"The creator of the graph will naturally act to advance his own interests" -> "The biased creator of a graph with too much riding on his hypothesis will surreptitiously act to advance his own interests". People who do this are not worth listening to.
Unfortunately I suspect that this approach is less common among the under educated because trying to create misleading charts is vary common.
I'll reiterate my opinion that the purpose of a bar chart is not to give relative orderings (if you want that just list them in order) but to illustrate the magnitude of the differences between data points with respect to the natural range of the data. Percentages have a natural range of values: 0-100. Temperature does not have a natural fixed range, unless we're talking about the weather which is incidentally why I like Fahrenheit.
There's hardly ever such a thing as a "natural" range. Even if there is, as in the case of percentages, what's the point on wasting half of the graph on variation that has a mundane explanation? Like if you're plotting popularity of religions in the USA, do you really want to spend about 90% of the vertical space of your graph on Christianity? The point a graph is to shed light on the part of the variation that doesn't have a mundane explanation -- i.e., the interesting part.
What you should be complaining about are graphs that don't have labels, as used by some drug companies in advertisements. Maybe that "graphs without labels are bad" meme has maybe been misinterpreted here.
1. In the USA drug companies are permitted to advertise prescription medications.
Another good point, but if bar A is twice as tall as bar B then variable A's value had better be twice that of B's. Anything else is precisely the kind of shady drug-company data manipulation that you're so concerned about. If vertical space is at an absolute premium (not the case here IMO) and you have to cut the bars then draw little cut marks on them to let the reader know that the bottoms aren't at zero.
(Of course I'm talking about cases when there is a familar concept of zero, like percentages or money or population, not temperature. And if the value goes negative then the bar grows down, cf. any recent chart of US economic indicators.)
If you don't base your bars at zero then your bar chart exaggerates differences between data because the reader looks at two bars and compares their relative sizes the same way he looks at the levels in two glasses of beer to compare their volumes.
Overall you're introducing ad hoc and nebulous special cases that are poorly justified, like the notion of a 'familiar zero'. The real solution is to learn to properly read graphs, and to ignore graphs that omit labels.
> like percentages or money or population
If you're plotting the net worth of the top 10 richest people on Earth, do you really want to use a miles-long graph or drown out all the actual information by just showing that the top 10 richest are within one pixel-worth-of-dollars of each other?
And re. net worth: That's exactly how I would plot it if I wanted to make a point about income inequality. Carlos Slim is worth US$53 billion. The per capita income in Mexico is US$13,500 and in the US it's $46,000. (Yes I know this is comparing yearly income to lifetime accumulated wealth but ignore that for a moment.) If I give the average Mexican 1 pixel and the average American 3 or 4 pixels, then Mr. Slim gets almost 4 million pixels. At 96 dpi that's roughly 3400 feet. Your monitor would have to be (much) taller than the Burj Khalifa (2717 ft) to accurately show this bar char. This is the picture I would draw if I wanted to make a point about income inequality, a person sitting next to the Burj Khalifa with a ridiculous laptop in is lap that towers over it.
If I just wanted to rank the top 10, I would just list them in order like they do in Forbes.
Yes, but doubled relative to what? e.g. what's twice as big as 50F? Is it 100F or 559.67F? The first one is relative to 0F and the second is relative to absolute zero (50F is 283.15K).
Your point about how to illustrate income inequality is fine, but that's not at all what I was talking about: I was talking about a graph that depicts the distribution of net worth amongst the top 10 richest people. No, a rank order doesn't fully convey that. A table of numbers is hard to understand in a holistic way.
Twice as hot would, in a physics sense, be 1019 °R (560 °F) because SI temperature uses the ratio scale. Fahrenheit is then just syntactic sugar for people who don't care about the physical temperature as much as they care about the range of their comfort zone. When measuring deviation from 0 °F, Fahrenheit also becomes a ratio scale, but not a very useful one (deviation from 50 °F would be better, or you could just use Celsius).
So the real question is, what do you mean "twice as big"? Twice as much warmer than "pretty cold", or twice as energetic?
As for the 10 richest people, I'm not sure they're that near each other, but assuming they are, that's going to be the important bit. If you don't want to focus on that, you have a few options. You could draw the cutmarks. You could make it a zoom lens picture. Or you could establish a baseline, such as the highest income tax bracket or the median top 10 income, offset all incomes appropriately, and draw the bar graph from that. In the latter case, you should draw some bars as negative, and label the baseline appropriately (not 0). I don't think it would make for a good graph.
However, for a distribution, absolutely no offsets allowed.
In the case of billionaires you are looking at an estimate of there net worth so while you can chart Carlos Slim Helu & family at 53.5B and William Gates III at 53.0B if you start the chart at 50 billion your error bars would take up 50% of the chart's vertical space.
And that's the major reason why trying to create a chart that exaggerates differences is such a bad idea. If your error bars are 5% and the gap is 1% then pretending there is a significant gap is stupid.
Similarly, I'll go to my boss and plot success rates of a certain operation (can't say what due to confidentiality) which usually fails, but occasionally gives us a big win (success rate 0-3%). I'll be sure to plot on a range of 0-100% so that he knows we almost always fail. I certainly don't want to cheat and make him think my improvements of 1% -> 1.5% are a big deal.
Also, if you are charting rates that go from from 11/1000 to 15/1000 then building a chart that goes from 0% to 2% is reasonable, but 1% to 1.6% is not.
That's ridiculous. What if the things that you're comparing have an exponential relationship? You should probably use a logarithmic scale---that's what it's designed for, after all.
A grid would help too, because logarithms are rarely perceived and the information consumer needs the warning.
If you try to puzzle out the meaning from these graphs you can. But even if you do you're going to have to fight with the visuals to do so. Is 70 twice 60? No, but it looks like it is when the x-axis starts at 50.
You shouldn't have to try. You shouldn't have to carefully scrutinize all of the numbers there. That's the entire point of presenting the data visually.
Is 70 twice 60? No, but it looks like it is when the x-axis starts at 50.
Both you and the grandparent are completely wrong in acting all indignant that someone would display a graph like this, when it is a well accepted way to draw attention to the aspects that matter. If I started such a graph at 0, my thesis advisor would rhetorically ask me "What are you trying to convey here?".
From any graph, you can draw right and wrong conclusions. You insist that this graph is wrong because the simplest conclusion you can draw from it is wrong. Well, that's not the fault of the graph: that is the fault of you trying to interpret the graph in an overly naive way, without considering the point of the graph.
There are many ways you can use chart's to try and mislead people, but when you need to do that it's a sign that you are trying to suggest something that's not true.
PS: The first words out of a competent thesis advisory when looking at those charts would be "where are the error bars?"
I'm not insisting that the graphs are wrong. I'm saying that they're done badly because they're confusing and make it very easy to draw incorrect conclusions about the data.
There is a clear reason why Luxembourg is on top. The taxes on alcohol is (very) low in Luxembourg and a lot of people from the countries closed to Luxembourg (Belgium, France and Germany) are buying their alcohol there. We don't forget the travellers (Luxembourg is a crossroad in EU) making a stop in Luxembourg...
That was back in 2003.
These days Australians take their drinking far more seriously:
SOURCE: OECD Health Data 2005
Summary of that comment: Attitudes towards scripture is the primary predictor, though WORDSUM and education still have significant effects.
Working in a public library is a good way to dispel illusions about drinking being classy, by the way. We could always tell when some of the veterans got their monthly checks...
Also: Bike thieves do suck, and these graphs would send Tufte on a crying jag.
A higher vocabulary is pretty strongly correlated with being well-read. I think we can trust that. And being well-read implies being exposed to different things (though it is by no means the only way to be exposed to different things -- i.e., those who are well-read [subset operator] those who are exposed to a wide variety of things.
But those who are exposed to a wide variety of things are highly correlated with those who wish to take risks to be exposed to new things. And the latter group seems pretty well correlated with those who drink.
Note, intelligence only 'lurks' near vocabulary.
I don't follow this reasoning. Being well-read implies being exposed to a lot of different books, not a lot of different activities. In my experience, people that read a lot are not people who take risks to be exposed to new things, precisely because they have rather intellectual hobbies and they have less free time due to reading a lot.
It seems like there are a lot of different factors.
So where does "smartness" fit in? I hate to imply causation, but maybe:
"Wordsat smartness" -> annoying person who uses big words other people don't understand all the time -> social isolation -> less money -> alcohol!
By the way, very very misleading graphs. I was totally shocked by going through them. I had to read the comments here to realize what was being shown.
Then I saw how the choices were actually really strange for some of them. For example, “Emanate” is meant to be matched to “come”, but that's a very poor match. Similarly “Space” matching to ”room” is a tricky sense.
One of many thoughts appropriate to this dubious claim: often, questions like, "Do you smoke pot?" or "Do you drink?" have more to do with identity than with the actual frequency of consumption.
I'd bet that people "of class"-- well-to-do, educated, liberal-- are more likely to say "yes" to the question, "Do you smoke pot?" However, a large number of such people are those who haven't smoked for 20 years-- those who wouldn't be averse to smoking pot if it came their way, but haven't looked for it or been near it in a long time. (I have no problem with people who use marijuana, but the fact is that to get regular access often requires association with "unclassy" characters, because most drug dealers are creepy and the process of asking for access is degrading.)
If you look at who is actually smoking pot, and weight by frequency of consumption, I'd guess that the correlation goes away. Same with alcohol.
Okay, now you can downvote me.
Is there anything scientific about this at all? The whole article sounds fishy.