Hacker News new | past | comments | ask | show | jobs | submit login
Why Nobody Understands Your Visualization (petewarden.typepad.com)
35 points by alanthonyc on May 23, 2010 | hide | past | favorite | 22 comments



Back in 2003 I tried introducing many of these techniques to our customers to look at their data in new ways. No one, not even the phd's among them had a clue how to use the visualizations.

We investigated and found that very few people even understood our basic charts (time series etc). An even bigger issue is that ~ 1/100 customers had the ability to take the numbers from a chart and use the numbers to change the way they did things. In other words few people could understand the charts, and even fewer could understand the charts and convert that into actionable information. To reiterate, these were basic time series charts.

We ended up writing a system to interpret the charts for them in plain language (making action easy), and highly annotating the charts with colors and indicators so they became able to be interpreted without any cognitive load.

I've completely gone away from all of the advanced visualization tools given my experience, and before introducing them again I would need strong evidence that it's actually working for our customers.


This reminds me of abstract reasoning questions from IQ tests. You know the kind of thing: you're often given the first three diagrams in a sequence, and you have to infer the logic behind the sequence and choose the correct fourth diagram from a multiple choice; or perhaps you need to find the odd one out.

But such questions are usually not really abstract at all; rather, they rely on learned cultural ontologies for classifying shapes and transformations, a language for describing circles, ovals, squares, rectangles, rotations through right angles, mirroring and symmetry. To my mind, these tests largely measure conformance to a cultural perspective on the world.


Phrased another way: IQ tests are for testing how well you do at IQ tests :)


Ultimately, a visualization is an attempt to communicate some pattern(s) in the data. Visualizations can suffer from the same issues as other forms of communication:

- inadequate information. Readers can't draw conclusions from information you don't include. Commonly missing information: labels, scale, definitions, context for the data.

- information overload. Readers can't figure out which specific conclusions you're trying to communicate if there are many possible conclusions. Highlight the things about the data that you want to bring out using words, colors, arrows, or visualizations of subsets of the data.

- muddled presentation. A blurry graphic is as difficult to understand as writing filled with misspellings and poor grammar. A busy graphic is as difficult to understand as a long, babbling diatribe filled with unnecessarily complex vocabulary. An ambiguously labeled graphic is as difficult to understand as writing filled with ambiguous pronouns.

- inability to dig deeper. Readers can't figure out if your conclusions are sound if they can't look at the details. Interactive visualizations (based on the complete data set) allow users to see how each bit of data fits into the overall picture, and to test your conclusions on subsets of the data. If all you present is a static graphic based on a restricted data set, it's harder to check if you've cherry-picked data.

- unfamiliarity with the language or conventions. Someone unfamiliar with programming will have a hard time understanding certain HN posts, even if they're well written. Someone unfamiliar with bar graphs will have a hard time understanding a visualization based around them, even if it's a good visualization.

- boring subject matter. You can make the greatest visualization in the world, but if the audience doesn't care enough to look at it long enough to understand, your point won't get communicated.


The author misses a large point, and it sort of confounds his other arguments. While form and representation are absolutely important, so is context. People cared about his facebook visualization because the subject matter was of immediate interest to us, given the current news cycle. A visualization has to be part of a compelling story.

Not all visualizations are great, and not all well-made visualizations express a compelling narrative. And not all visualizations, regardless of how well-made and compelling, are right for a target audience. Relevance is different than form; let's not confuse them.


I agree with you that your underlying data has to be on something people care about. It's necessary but not sufficient, since the same visualization without labels and coloring sank without a trace when I released it a week before.


That car data plot is an example of a multidimensional parallel graph. Too bad he didn't go into more depth on this since he showcased this visual representation in his blog post. Some more info you get your feet wet is here: http://filer.case.edu/~dbh10/eecs466/report.html

Parallel graphs were a starting point on multidimensional al visualization. They are useful only to a certain point of complexity, then they are counterproductive as it takes more time to analyze the graphs than to view a number of simpler graphs side by side or in series. Newer techniques involve radial visualization, such as

http://www.infovis-wiki.net/index.php?title=Radial_Hierarchi...

Something to consider with your animations is the cognitive load of animations of complex visualizations. See the following paper:

http://geoanalytics.net/GeoVisualAnalytics08/a15.pdf


It's proper name is "Parallel Coordinates", popularized in the 90s.

Pro: - Spacing saving. Cartesian space is expansive as the X/Y axis can quickly take up the entire 2D plane. Turning the axis parallel to each other saves a lot of space hence the possibility for multi-dimensionality.

Con: - Messy. A normal dot in X/Y scatter plot is "stretched" into a line. So a lot of crossings/overlaps. - Order of the axes matters

Interactivity can help reduce the visual complexity, through axes reordering, or highlighting (also called "brushing"). Also, after little bit training, people can easily develop the ability to recognize patterns in Parallel Coordinates, just think how you learned to read the regression lines in scatter plots : )


The actual plot in question is interactive. You can select regions on each axis.


I've been on the lookout for a blog chronicling bad visualizations. Something like Regretsy for charts and infographics, but perhaps with more constructive critiques. Surely something like that must exist?


there's a world to steal from the cartography and typography traditions, not only because these are already established "visual languages", but also their subtlety and respect to the viewer.

it is a shame that today's infograph designs are mostly going after 3D-rainbow-eye-candies, screaming too much of the designer him/herself.


A great book to help you come up with the appropriate visual representation for what you're trying to say for your given audience is Back of the Envelope by Dan Roam.

It ends with a somewhat unfortunately chosen example, but the actual advice is quite good.


I love that automobile visualization featured at the top, because it seems so promising, only to become increasingly disappointing as you try harder and harder to make sense of it. The low performance only makes matters worse.


The biggest problem is it's best read right to left.


There's also an odd reversal near the middle; number of cylinders is roughly correlated w/ displacement, which is roughly correlated w/ weight, which is roughly correlated w/ horsepower, which is roughly... oops, inversely correlated w/ 0-60 acceleration time. Hence the lines looking like they got twisted in a bunch between the horsepower axis and the acceleration axis.

I think interaction is a very under-utilized aspect of visualization; if that car visualization hadn't had such abysmal performance, being able to play with it might have been very informative. I think interaction is more powerful than time (but probably harder to design effectively).

As it was, I noticed several interesting new (to me) points:

- Quantization of number of cylinders is obvious, but I didn't expect displacement turn out to be so quantized

- The highest acceleration cars didn't have the worst fuel efficiency

- The oldest cars (1970) appeared to include the ones with the worst fuel efficiency.

I've been trying to figure out a better way of laying out the weight/horsepower/acceleration axes, given that weight*acceleration=horsepower (with the addition of some noise due to different measurement techniques, etc.)


Well you're gonna have that twist anyway, unless you flip a couple columns. I sort of prefer having it in the middle. But then i didn't really mind the visualization so maybe I'm the wrong person to ask.


Another problem is that the "acceleration" slot actually describes the opposite of what one would think.


So far it seems that every state I click on has "Megan Fox" in the interests list. I wonder if he cherry-picked his data. :P


She apparently is that popular, at least on Facebook. She has about 6.7 million likes.

For comparison, the Bible is at 2.2 million, Kobe Bryant is at 2.4 million, Starbucks is at 7.2 million, and Obama is at 8.4 million.


According to the Facebook statistics page [0], there are over 400 million users on the site. The same page states that about 70% of those users are from outside the US. That suggests that 30% of Facebook users (about 120 million) reside in the US. Let's assume that the entirety of her fan base is within the US. That means her fans only represent (at the _most_) approximately 5.59% of US Facebook users. I wish I had data per state so I could pin this down more thoroughly (wrt to ratios of Megan Fox fans to state populations), but going on just what I have seen so far I am admittedly suspicious of the gentleman's data.

[0] http://www.facebook.com/press/info.php?statistics


The gentleman himself here!

You're right to be suspicious, there is a bias towards the more popular pages, since they show up more frequently on people's crawlable public profile pages. Only 20 liked pages are shown for each person, and FB apparently pick the most popular.


This can be summed up as being because "you want to make infographics only because they're trendy, but have no idea what your're trying to say, or why, or how to say it effectively."

Fixing labels, simplicity, interactivity, animation, etc., won't solve the problem with the whole conception being wrong, useless, or boring.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: