Hacker News new | past | comments | ask | show | jobs | submit login
How to Generate FiveThirtyEight Graphs in Python (dataquest.io)
267 points by mircealex on Sept 9, 2017 | hide | past | web | favorite | 52 comments

It's worth nothing as an interesting background note that FiveThirtyEight uses ggplot2/R for their data visualization workflows instead of matplotlib/Python. Specifically, they export ggplot2 visualizations as a PDF/SVG vector image which then can be precisely annotated in Illustrator.

More info in this video: https://channel9.msdn.com/Events/useR-international-R-User-c...

This isn't strictly true. They use an internal tool built using Victory[1]

1: https://github.com/FormidableLabs/victory

The problem I have with most graph plotting libraries: the examples always look great, but they tend to break down on real world data in unexpected ways. Think of overlapping labels, inappropriately placed legends, inappropriately chosen axis ranges, et cetera.

True, some details often require fine-tuning to get them perfectly right. Then again, making plots is a bit of an art, so I'm okay with that.

I didn't know that there was a fivethirtyeight style in matplotlib. Tres cool!

Matplotlib also allows xkcd style plots [1] if that is something you fancy :)

[1] -- http://jakevdp.github.io/blog/2013/07/10/XKCD-plots-in-matpl...

The XKCD graphing is really easy to do, maybe FTE could be refined and incorporated too

I would abbreviate "FiveThirtyEight" as just 538. It's far clearer.

That was the abbreviation used in TFA.

What is FTE?

Oh duh. Thanks.

Better abbreviated "538" imo

For the attribution bar at the bottom of the figure, why not use the text() kwarg xycoords="figure points" or "figure pixels", which both enforces the semantic distinction between the axes (where the data goes) and the figure (where other stuff goes), and avoids having to guess and check the coordinates.

Pretty cool, but I actually prefer the standard style in matplotlib.

It really depends on why you're generating the graphs for. If you needed a visualization for a storytelling article, I really doubt you'd prefer matplotlib's bland standard style.

If you just want to visualize some data fast for yourself, then, yeah, the standard style is really great - it's readable and saves you time.

> I really doubt you'd prefer matplotlib's bland standard style (MBSS)

To be fair, MBSS got much better in the recent version 2.x

Before (version 1.x), I always had to use seaborn and/or modify parameters, but now the default style is good enough for most of my use cases.

Agree, the default style looks much better in 2.x. It's good enough for most use cases, but it's still quite far from publication level, IMO.

Yes, I agree! 1.x was pretty ugly

As someone who's moderately colorblind the color change is really helpful, so even for bland data visualizations it's actually better.

I agree – it's interesting how one is initially drawn to, yet eventually tires of these sorts of plot styles. While Seaborn strikes a good balance, clear and simple monochrome plots never go out of fashion. This is just as much the case when building fancy interactive plots using Shiny/Plotly etc.

Monochrome is problematic for plots with multiple series though, because people's eyes aren't good at distinguishing shades of gray. You can use dashed lines for stuff like line plots, but those don't work well for complex trends and you can only use one or two.

I think monochrome is great for some uses, but at least for the area I work in the majority of figures use color. I'd say the main reason to use monochrome is for print, but most scientific journals are online nowadays and even conference proceedings are handed out in color. Now that doesn't mean everybody uses color well but that's a different discussion...

Interesting, @bede @v3gas I would love to hear your thoughts on my plots -- http://abhirag.in/articles/price_data_present.html

Too fancy? Too plain? Kinda ok? Kinda new to matplotlib, so your feedback would help :)

Nice charts.

Just one idea, on your vegetable chart, you have onions as a red line and tomatoes as yellow. I would swap them around, as I automatically associate tomatoes with red.

Valid point :) I'll change the order of colors in my palette.

The plots are pretty nice. The serif font for the axes captions and the monospaced font for the axis labels are jarring.

Would you prefer I use serif for both or sans serif for both?

Sans-serif for both captions and labels. The serif font that is there right now is looking kinda out of place. Also the {braces} around the captions. Finally, the labels in the legend do not need to be all italicized.

I think they're nice! I like the monospaced font for the axes. Fwiw, I would remove the legend in the case of there being only one curve, especially since you have good titles.

Thanks, I'll fix that, I was so engrossed in making sure that legends are in the right place that I totally overlooked the fact that they are redundant for single curve plots.

I like them. :)

Thanks :)

It would be interesting to see a built-in plotting style that doesn't give patent lawyers the fits. The patent office still requires monochrome plots.

Interesting article, and very helpful. Over the years whenever I need to generate a graph I'd know I have to play around with axies.

Would be more interesting if someone could actuslly replicate the Irma one. But I figure it probably is a plotting around the datapoint (similar idea like a best fit line, but as a curve).

So my tl;dr for generating any interesting graph is a lot of coding... a lot of playing with attributes and modeling.

I am not familiar with FiveThirtyEight but these graphs look quite nice.

Or learn JavaScript and just use D3. It does this out of the box given the dataset.

D3 has an extremely steep learning curve, especially for people who are not familiar with JavaScript.

There are easier ways to make these kinds of plots.

D3 looks really nice, but half the time I try to use it for something that doesn't closely match a Bostock example, I end up giving up and just coding it myself with SVG primitives. I don't understand the motivation behind D3's complex API.

D3 is a functional paradigm for graphics. From the D3 intro: "...styles, attributes, and other properties can be specified as functions of data in D3, not just simple constants."

This sounds trite, but it's enormously powerful and it's what makes D3 worth learning.


In the weeks leading up to the election, they were putting Trump's odds at 1/4, and pretty roundly mocking/cautioning against others who were putting him at 1%. On mobile, so I can't check by hovering, but I don't think they ever put his chances below 1/10: https://projects.fivethirtyeight.com/2016-election-forecast/

> they were putting Trump's odds at 1/4, and pretty roundly mocking/cautioning against others who were putting him at 1%.

Yeah, of all the groups making predictions they gave Trump the highest odds of winning [0].

[0] https://www.buzzfeed.com/jsvine/2016-election-forecast-grade...

Haha, you're hilarious and exceedingly smart. That being said, you're lying. 538 gave 28% chance of a Trump win: https://projects.fivethirtyeight.com/2016-election-forecast/

Why go through all this trouble? You can accomplish the same plot in less than 10 lines of code in R, which includes a 538 theme.

Simply because we're interested to see if we can do in Python whatever we can do in other programming languages / software. We don't really want to learn a new programming language for every thing we can't do in Python yet.

Also, it took 17 lines of code to generate the graph in the tutorial, among which 6 lines were to add labels (excluding from the total the lines of code for reading in the data or importing modules). The teaching approach makes it look that long.

You could also write some functions if you coded this kind of graphs regularly, and make the whole process a breeze.

Because configuring plot styles is significantly less work than switching your entire development environment?

I'd love to see those 10 lines!

I have a R/ggplot2 data visualization tutorial which gets a similar style in about 10 lines: http://minimaxir.com/2017/08/ggplot2-web/

You're not adding a title, a subtitle, a signature bar. You're not bolding the line at y = 0, you're using block-style legends instead of adding customized labels. All these will require extra code.

Your example is potentially misleading in this discussion for anyone who won't bother to go through that article you linked to.

I'm not saying you can't do the graphs in under 10 lines of code, I'm just saying that your example totally misses the point.

The key word is "similar." (and yes I do add a titles/subtitle bar/caption)

Apologies, I only examined the first couple of graphs after the first "FiveThirtyEight" keyword, for which you don't have titles, subtitles etc.

The graphs on your article look really nice, but they are quite far from resembling FTE's, IMO.

R doesn't include a 538 theme, even though ggplot2's theme_minimal() is close.

You are correct. There are third-party plugins, like this one: https://github.com/jrnold/ggthemes, that uses 538's theme.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact