
Ask HN: Any good books on graphing/charting/visualization? - lukev
My day job has me working on a project that has vast amounts of data available in tabular form, but no way to analyze the data except to search it and display it in more tables. Pages and pages of tables.<p>I'd love to build a way to query the data and display the results visually, and I'm looking for books that demonstrate various techniques for visualizing data that (in many cases) is quite complex. Right now, my experience doesn't really extend beyond basic pie/bar/scatter graphs.<p>I've heard amazing things about Tufte, but looking at the previews of his books on Amazon they seem mostly focused on artistic presentations of information - something a marketer or analyst would create manually, not dynamic charts generated from terabytes of data. Is that the case? Does it still have useful information for the sort of thing I'm doing, or can anyone recommend something more suitable?
======
alilja
Edward Tufte's book The Visual Display of Quantitative Information is a
monumental book. He writes not about how to make your graphs look pretty, but
how to display vast quantities of data and distill them down into useful
graphics that communicate themselves effectively.

He provides examples of good and bad graphs, but more importantly, explains
what exactly it is that makes those examples good and bad, and further
generalizes it so you understand how to make good visualizations. If you don't
want to shell out the money for it, it's probably at your library (remember
those?).

Additionally, if I were you, I'd stay way from statistical approaches to
displaying information unless you have some background or are willing to learn
about it -- it tends to be highly technical and is probably too complex for
what you're trying to do. Basic stats might help you, but not as much as Tufte
will.

~~~
dca
> Edward Tufte's book The Visual Display of Quantitative Information is a
> monumental book.

Agreed, its absolutely excellent. Thanks to Y Combinator for listing it in the
book list.

> Additionally, if I were you, I'd stay way from statistical approaches to
> displaying information...

Not agreed. In my opinion you might have missed what I felt was a main point
of that book: Always learn the appropriate statistics required to understand
the data, choose a correct visualization method to communicate those
statistics effectively, and once you've understood it fully, confirmed the
results, and removed all the cruft, then publish it.

~~~
alilja
What I meant was to shy away from approaches that are PURELY based on
statistics if you have no background in it, because it can get overwhelming
quickly.

Of course, if it's worth it to invest the time required to have a fundamental
understanding of statistics, by all means do so -- but if this is a one-time
or a short-term project, I'm not sure the time commitment is worth it.

~~~
lukev
Hm... In my experience, that's the best part of projects, is being able to
learn something new while doing them.

------
Flemlord
I do a lot of charting for financial services software. The best practical
book that I've found is _The Wall Street Journal Guide to Information
Graphics: The Dos and Don'ts of Presenting Data, Facts, and Figures_. Simple
but practical guidelines for displaying pie/line/area graphs.

But for your situation, check out some of these sites which focus on more
complicated graphing techniques:

<http://www.perceptualedge.com/examples.php>

[http://blogof.francescomugnai.com/2009/04/50-great-
examples-...](http://blogof.francescomugnai.com/2009/04/50-great-examples-of-
infographics/)

<http://interface.fh-potsdam.de/infodesignpatterns/news.php>

<http://patternbrowser.org/>

[http://webdesignledger.com/inspiration/15-stunning-
examples-...](http://webdesignledger.com/inspiration/15-stunning-examples-of-
data-visualization)

<http://www.tableausoftware.com/public/>

------
revorad
I would highly recommend learning R (<http://www.r-project.org/>). It is very
easy to directly query databases and R has many visualisation packages,
including the awesome ggplot2 (<http://had.co.nz/ggplot2/>) based on the
grammar of graphics. I'm writing an R graphs cookbook and my startup's
visualisation product is also built on R (see profile and feel free to email
me if you need any help).

Also, look at Ben Fry's Processing books (<http://benfry.com/>). Here's an
introductory tutorial - [http://blog.blprnt.com/blog/blprnt/your-random-
numbers-getti...](http://blog.blprnt.com/blog/blprnt/your-random-numbers-
getting-started-with-processing-and-data-visualization)).

If you're familiar with Python, check out Matplotlib
(<https://www.packtpub.com/matplotlib-python-development/book>).

~~~
lukev
Any experience with Incanter? Supposedly it's a port of R to Clojure, and
since I love Clojure and everything else is already running on the JVM I'd
lean towards using it as opposed to another standalone program if it's any
good.

~~~
phren0logy
I have used it, but as a warning I'm not an expert user of R, Clojure, or
Incanter. Incanter is very pleasant to use, because I personally much prefer
coding in Clojure to R. The R language is powerful, but I don't find it
obvious at all. You can also use Processing from Clojure if you need to roll
your own charts, in addition to the very capable charting library that's
already there to handle most routine visualization.

That said, Incanter is immature compared to R. If Incanter does what you need,
it might be a great fit, but R has a huge community and list of libraries
right now. There's an R to Clojure bridge, but if you don't yet know R I'm not
sure it's very helpful.

Finally, Incanter is developing at a break-neck pace. Even if it doesn't do
what you want today, it might tomorrow. Literally. I'd love to see the user
base grow, because Clojure seems like a perfect fit for statistical computing.

------
wdewind
Tufte is great, but he's extremely heavy and a bit dated. He is about 80%
brilliant 20% completely missing the point. It's very strange.

If you are looking for a smaller book I've found the WSJ Guide to Information
Graphics by Dona Wong to be pretty decent and pretty straight forward, and
it's about 100 pages. It's not too focused on finance either, although that's
what I got it for (I do front end development for financial analysis company -
lots of charting).

[http://www.amazon.com/Street-Journal-Guide-Information-
Graph...](http://www.amazon.com/Street-Journal-Guide-Information-
Graphics/dp/0393072959)

~~~
matrix
I second this. I found his books interesting, but a sizable portion of the
content is opinion rather than facts that are demonstrated via studies and the
like. They're fun to look at, and there's a few important principles in them,
but they're more of a coffee table book than a real reference.

------
duck
I would check out Visualizing Data -
<http://oreilly.com/catalog/9780596514556>

Also, I enjoy this site <http://flowingdata.com>.

~~~
evgen
Visualizing Data is a good choice, but I would also suggest you note the
recent string of critiques of content-free visualization on flowingdata and
start out with Tufte if you are new to the field.

------
mcantor
I've read The Visual Display of Quantitative Information by Tufte, and I think
it would benefit you even though you are not talking about manually generating
charts. For example, he talks about how it's easy to be misleading with a
chart based on how you calibrate the axes, which is something you'd still need
to do even with dynamically generated visualizations.

------
j-g-faustus
The Tufte books are brilliant. For dynamic charts, his first book (The Visual
Display of Quantitative Information) is the most relevant, it covers the
theory - how to tell a good representation from a bad one - and the basics.

Readings in Information Visualization ( [http://www.amazon.co.uk/Readings-
Information-Visualization-I...](http://www.amazon.co.uk/Readings-Information-
Visualization-Interactive-
Technologies/dp/1558605339/ref=sr_1_1?ie=UTF8&s=books&qid=1271259347&sr=8-1) )
is a collection of papers covering a wide range of techniques for a wide range
of tasks.

Apart from that, it's mostly a matter of picking up interesting ideas wherever
you find them. flowingdata.com is nice, same with
<http://www.informationisbeautiful.net/>

~~~
wdewind
Information is Beautiful is great for fun stuff, but there are a ton of bad
practices on that site, it's not for serious infographics. Be careful copying
from it unless you are really able to tell where he's having fun and where his
charts are serious (because he's very capable of both).

~~~
j-g-faustus
Agreed. I was assuming that you have read your Tufte :) and can tell
informative visuals from those that are primarily pretty or entertaining.

------
Anon84
Leland Wilkinson's "The Grammar of Graphics" [http://www.amazon.com/Grammar-
Graphics-Leland-Wilkinson/dp/0...](http://www.amazon.com/Grammar-Graphics-
Leland-Wilkinson/dp/0387987746) is also excellent and fully implemented in the
R programming language/statistics package (
[http://www.amazon.com/ggplot2-Elegant-Graphics-Data-
Analysis...](http://www.amazon.com/ggplot2-Elegant-Graphics-Data-
Analysis/dp/0387981403/) )

------
ziadbc
If you're big on visualization check out Harvard's www.CS171.org. I'm enrolled
in the class right now and it's been very enriching. I think it is also
available as opencourseware.

Books: <http://www.cs171.org/syllabus.html>

Resources <http://www.cs171.org/resources.html>

------
physcab
Most of these (rather good) suggestions revolve around learning the theory of
representing data. But how does one practically accomplish these visualization
tasks?

I have been delving in this area for the past couple months, and even though I
am still learning, I will give my practical suggestions to the programmer:

1) First accept that there is no silver bullet to data visualization. You pick
the tool that makes the most sense. Sometimes you have to write a Java
program, sometimes a Python program, and yes, even sometimes an Excel
spreadsheet. Don't be picky--just get it done.

2) Programmatically speaking, there are ways to represent truly massive
terabyte datasets.

\- You can learn Processing (used by Ben Fry in Visualizing Data) which is
based on Java and pretty simple to learn. My caveat is that you can't run
these scripts server-side, that is, it doesn't generate jpgs or pngs on demand
due to headless mode constraints.

\- You can use Beautiful Soup in Python to easily modify XML data for SVG
graphics. Check out this: [http://flowingdata.com/2009/11/12/how-to-make-a-us-
county-th...](http://flowingdata.com/2009/11/12/how-to-make-a-us-county-
thematic-map-using-free-tools/)

\- You can learn Java's image library (I haven't done this so I can't really
give any advice, but this is what Processing simplifies I think)

\- You can use Excel to easily pump out bar/pie/line graphs

\- You can use the Google Chart API

\- You can use Flash. Check out AmCharts for that Mint-y goodness.

3) Learn statistics. Browse the Netflix Prize forums. Struggle with MatLab or
R or Octave. You need to learn how to efficiently handle large datasets in
memory to better sift through the essential information you need. For very
very large sets that absolutely cannot be handled in memory, you'll want to
check out Hadoop + MapReduce. Check out Cloudera's distribution for Hadoop.
Handling data is every bit as important as visualizing it.

------
samratjp
I had a similar situation except that I wasn't as smart as you to consider
books in the first place.

But, I did use some really good tools. I highly recommend using Prefuse (yes,
it's java but it ships with great examples and it's open source). If you like
prefuse, then try flare (actionscript based). As far I know, prefuse supports
querying from tables (my data backend was postgres). Here's prefuse:
<http://prefuse.org/> Here's flare:<http://flare.prefuse.org/>

And for a dash of inspiration and more ideas:
<http://www.visualcomplexity.com/vc/>

------
aohtsab
Fun fact - Tufte's in Arlington giving a talk and we're taking a 15 minute
break right now. Compelling speaker and thrilling read (he's giving away four
of his books to every attendee).

~~~
gruseom
I attended a one-day seminar of Tufte's one time and agree that it was pretty
good. But he was by no means "giving away" his books; they were built into the
(considerable) price of the event. I also recall being amazed at how many
people showed up. This was in a large hotel ballroom in SF and we were _jam
packed_ in there.

Another fun fact: when he couldn't get his first book (VDQI) published the way
he wanted, he mortgaged his house and published it himself. Respect.

~~~
ganley
FWIW, compared to similar seminars I thought the price of Tufte's was very
reasonable - IIRC, shy of $400, and includes $160 worth of books. I also
thought the class was quite good, BTW.

------
timwiseman
If you are considering using Python, Beginning Python Visualization
([http://www.amazon.com/Beginning-Python-Visualization-
Transfo...](http://www.amazon.com/Beginning-Python-Visualization-
Transformation-Professionals/dp/1430218436) ) seems quite good to me. It is of
course a niche product targetting though who intend to use Python though. If
you are looking for a more broad based grounding in visualization it is
probably not your best choice.

------
ajdecon
To me it sounds like you want to be using a tool like Matlab or matplotlib in
python to automatically generate various types of plots from your data. There
are a wide variety of books about Matlab, and I don't really know one better
than the rest. For python, there's "Beginning Python Visualization" by
Vaingast. It's pretty introductory, but provides good starting points. The
matplotlib web site also provides a gallery of example plots with code.

------
tedshroyer
Please keep in mind color blind people. I'm red/green blind and about 1/3 of
the charts I run across are meaningless to me. Here are a couple sites with
info: <http://wearecolorblind.com/> <http://www.vischeck.com/>

------
dmlorenzetti
A colleague says great things about Cleveland's "The Elements of Graphing
Data." It talks about how to leverage the way people perceive graphs in order
to convey information. I've flipped through it, and it's on my to-read list.
Sorry I can't say more about it.

------
gourneau
Has anyone had a chance to check out "Beautiful Visualization: Looking at Data
through the Eyes of Experts"

(<http://j.mp/9SxXza>)?

It was just published today according to Amazon.

