
The fivethirtyeight R package - michaelsbradley
http://blog.revolutionanalytics.com/2017/01/the-fivethirtyeight-r-package.html
======
ivraatiems
In my opinion, 538 did some of the best analytics work in the 2016 election.
They were the only ones who came anywhere close to property estimating the
chances of the outcome that actually occurred, and they were one of only a
couple models who took sufficiently seriously how unusual things were -
particularly, the massive unpopularity of both candidates.

I'm getting really irritated with people (in this thread and other) attacking
538 along partisan or personal lines for not making the same predictions they
did personally. 538 made a model, not a poll; they are only ever as good as
the polls they represent, and even then, they still managed to be better than
quite a lot of public polling and commentary. If Hillary Clinton had won, the
same kinds of people on the other side would be attacking 538 for being _too_
bullish on Trump.

The worst part is that a lot of this is predicated on the idea that giving
something a 1 in 3 chance is the same as saying it'll never happen, when it's
almost the opposite. There are some critical misunderstandings of statistics
going on here, on all sides.

~~~
Houshalter
After the election, Silver said something to the effect of this. That given
the polling data available, there is no way anyone could have favored Trump.
Clinton was leading by a large amount in a large percentage of polls. You
would have to apply some _seriously_ creative statistics to make a model that
favored Trump.

Systemic polling error is a thing that happens. It's happened in previous
elections. The polls are much more accurate than chance, but they aren't
infallible. Particularly in this election, it's difficult to predict voter
turnout from polls.

538 was one of the only models that took that into account. And as far as I
know, they gave better odds to Trump than everyone else that tried to predict
the election with statistical methods. Certainly they have a better track
record than political pundits, which have never been better than chance.

~~~
ghaff
Yep. And it's also worth remembering that FiveThirtyEight's thing is that they
do analysis based on data as opposed to punditry. Yes, they apply corrections
and otherwise filter the data sources. But some people seem to think that
Silver should have gotten up a couple days before the election, state that he
had a bad feeling about things, and dismiss the best forecast he could make
based on some shaky polling data and gone with Trump based on gut feel. Sure,
he would have ended up being correct. That would also run counter to the
fundamental philosophy of the site.

~~~
lmg643
People forget that Silver's approach is basically weighting polls to figure
out what the election will be, and he's the best "poll weigher" out there. But
if the polls are systematically wrong, being the best at it is an unimpressive
subtlety to the layperson.

To the extent 2018 brings increasingly contentious elections, we will likely
see polling to remain inaccurate due to voter unwillingness to state unpopular
opinions. In that case, multi-input "big data" approaches may prevail.

(Of course, it's also possible that 2016 is a "top" in terms of divisive
rhetoric...only time can tell.)

~~~
remarkEon
>In that case, multi-input "big data" approaches may prevail.

What do you think this would look like? Inferring voter preferences based on
proxies or instruments?

------
minimaxir
It's worth nothing that per the linked talks, 538's data visualizations are
indeed created with R and ggplot2; the fancy annotations are done in
Illustrator after exporting from ggplot2 in SVG/PDF.

The vignettes
([https://mran.microsoft.com/web/packages/fivethirtyeight/vign...](https://mran.microsoft.com/web/packages/fivethirtyeight/vignettes/bechdel.html))
are a good tutorial in dplyr/ggplot2 too.

~~~
KurtMueller
Side question: Can I create interactive visualizations with R? Something akin
to D3 in javascript?

~~~
minimaxir
I strongly recommend using Plotly, specifically ggplotly
([https://plot.ly/ggplot2/](https://plot.ly/ggplot2/)), which converts ggplot2
charts to interactive d3 charts with good parity. Plotly also has WebGL and 3D
data visualization support. One of the Plotly developers has a very good book
on using Plotly in R:
[https://cpsievert.github.io/plotly_book/](https://cpsievert.github.io/plotly_book/)

Here's a higher-level demo of ggplotly in one of my R Notebooks:
[http://minimaxir.com/notebooks/breach-
network/](http://minimaxir.com/notebooks/breach-network/)

Shiny requires an external server and is often overkill for static data.

~~~
RockyMcNuts
You can build Shiny apps in RStudio and publish them with one click to
shinyapps.io. Works very similarly to publishing with Plotly, even though
Shiny is a server-side package vs. client-side for Plotly.

(You can build a Plotly chart in ggplot, embed it in a web page and then
script it with JS to get a visualization. Frankly, that's a PITA. You can
write R/shiny code that will generate a visualization, with HTML controls etc,
and you end up writing a web app in R and that's a different PITA. I would
like to see Plotly generate a web wrapper that automates wiring up a plotly
graph and scripting it from an app so you can update data on the fly to make
it a visualization. Supposedly they are working on it with their dash
framework, but it's not released yet.)

~~~
minimaxir
The pricing on shinyapps.io is prohibitive if the post gets any more than a
trickle of page views.

The embed-directly-into-HTML is the approach I use for my Jekyll blog and it
does not require much effort. (I just have to set a YAML flag to load Plotly
library)

~~~
RockyMcNuts
Yes, for embedding a chart and getting some basic active functionality Plotly
is very easy.

The part that requires some custom Plotly coding is when you want to script
the chart ... have HTML buttons, sliders, to filter, recompute, revisualize
the data.

It's beautiful that it even works, and you can programmatically update the
JSON of an embedded chart. But it would be nice if you could give Plotly a
dataset, plot the data, and say, give me a button to filter rows using these
criteria for this column (or run some other code on the data) and re-render
the chart. Right now you have to code that manually and update the JSON model.

Visualization used to mean charts that were animated and/or would dynamically
update but now it means any old chart LOL.

------
nl
I find it interesting that there isn't a single comment here about the package
itself.

There's some interesting datasets in here[1]! Everything from "How American's
like their Steak"[2] to "The Most Common Unisex Names In America: Is Yours One
Of Them?"[3]

As a (mostly) Python person I'd like to see these in Python!

[1]
[https://mran.microsoft.com/web/packages/fivethirtyeight/five...](https://mran.microsoft.com/web/packages/fivethirtyeight/fivethirtyeight.pdf)

[2] [https://fivethirtyeight.com/datalab/how-americans-like-
their...](https://fivethirtyeight.com/datalab/how-americans-like-their-steak/)

[3] [https://fivethirtyeight.com/features/there-are-922-unisex-
na...](https://fivethirtyeight.com/features/there-are-922-unisex-names-in-
america-is-yours-one-of-them/)

------
Tempest1981
I remember HuffingtonPost showing a 98% chance of Hillary winning. Seemed way
too optimistic. I wonder how many voters that influenced to "safely" vote for
a non-mainstream candidate...

Here is their methodology -- apparently quite flawed?

[http://www.huffingtonpost.com/entry/high-probability-
clinton...](http://www.huffingtonpost.com/entry/high-probability-clinton-
winning_us_581d0399e4b0e80b02ca2498)

------
downrightmike
My eyes, the small text.

