Hacker News new | comments | show | ask | jobs | submit login

It's clearly great work by Nate and the rest of the 538 team. However, as has been pointed out, it's not that hard to build a model that gets close to predicting everything correctly.

I built a model in a couple of hours on Sunday afternoon, which simply takes all the most recent polling data, takes an average, does a quick fudge to adjust for the number of polls, and then runs 10,000 simulations to get a probability for each state. The source is on Github:

and the predictions are in this gist:

The result? My model gets 50/51 correct if Florida eventually goes DEM (which looks likely) or 51/51 correct if Florida goes REP.


Edit: full disclosure - with all data up to 6 Nov 20120 it predicts Colorado to be a toss-up, and I manually broke the tie in favour of Democrats, based on earlier models favouring them in that state.

Nice work.

The success, though, isn't his specific model. It's that Silver was able to market the idea that using statistical models is better than a table full of talking heads at predicting an outcome. And that a "dead heat" doesn't mean it's a coin flip. In hindsight, it seems almost ludicrous that it hasn't gained greater traction before.

He deserves a ton of credit for championing analytics into pop/political culture, but I'm sure we'll see many more models and a lot more statisticians during the next election, some of which will be better than Silver's models (if he's still doing them). I'm all for people asking to see the data instead of the media sound bites.

To keep this HN relevant: it was the marketing that drove his startup success. The product (the model) isn't quite perfect and an excellent substitute product was created by someone else in very short order (you). But he put together a great blog, grew out his brand, and eventually saw hockey stick growth.

Yep, I agree with this. I really built the model for fun, just to see how close it was possible to get with a few hours work.

A lot of forecasting problems have a structure like this one did:

- There is a lot of publically available data, and simple statistical models based on that data give pretty good predictions with very little effort.

- If you're an expert with a lot of time on your hands, you can put a lot of effort into improving the forecasts by 1-2%.

- In the end, it comes down to the effective business use of your model. Nate Silver had an excellent brand around his model.

In fact, I'd wager that he knows his model is more complex than it needs to be, but part of his brand is "rogue genius with a complex model that's far too difficult for mere mortals to understand".

A slightly less cynical take on the last point is that it also lets him rebut many objections with, "nope, the model already takes that into account". By having a piece of the model devoted to most possible talking points: convention bounces, whether undecideds break for/against the incumbent, effect of the economy, etc., he can claim he's addressed those critics' points, even if the net effect of addressing them is close to nil.

And an addendum, that 1%-2% accuracy is crucial in close elections like this one, so it's more than worth putting the extra time and effort into getting.

While your project got the right result, I don't think it's mathematically justifiable to take an arithmetic mean of the published survey means without accounting for the standard deviation of each sample distribution.

Doing that would make your model more mathematically complicated, but it's no doubt one of the things that Nate Silver is doing.

But besides that, Silver isn't just trying to call the election the Monday before the election, he's trying to build a model that predicts the outcome as early as possible.

A lot of the stuff he's been throwing into it, like the 'economic factors' or 'convention bounce' adjustments are about keeping the model from going off into the weeds due to predictable fluctuations in public sentiment caused by ephemeral events.

If you compare the history on the right column of the site to the "now cast" tab, which excludes those adjustments, you'll notice that the former contains fewer and more modest fluctuations than the "now cast".

You're right, it's not mathematically justifiable at all. The focus was on quick and dirty results over mathematical sophistication. Note that I described the adjustment for the number of polls as "a fudge"!

> Silver isn't just trying to call the election the Monday before the election, he's trying to build a model that predicts the outcome as early as possible.

That's true, but all the praise he's now getting is for correctly predicting the outcomes in every state - a prediction that he made less than 24 hours before the election took place (in fact, he changed his prediction for Florida from REP to DEM after I'd made my predictions, using up-to-the-minute polling data).

> In fact, I'd wager that he knows his model is more complex than it needs to be, but part of his brand is "rogue genius with a complex model that's far too difficult for mere mortals to understand".

Not what he said on The Colbert Report. "It's not hard math. You take averages then you count to 270."

part of his brand is "rogue genius with a complex model that's far too difficult for mere mortals to understand".

On the contrary, I think his brand is 'anyone who takes some time to study statistics & probability can work this out for themselves.' His key skill is demystifying the modeling and explaining that it's not the result of some secret sauce.

You hit it on the head. It wasn't the idea itself, it's his marketing of it. This is how Levitt broke out of the pack with Freakonomics. It wasn't just being novel or counterintuitive, it was getting up on the roof an shouting about it.

So basically, Nate Silver is to politics what Billy Beane is to baseball?

Well, Nate Silver is kind of to politics what Nate Silver is to baseball. There had been forecasting systems prior to PECOTA, but it upped the bar.

More the Bill James of politics. (Bill James, created Sabermetrics, which are many of the models that Billy Beane utilizes)

Billy Beane is more a someone who utilizes new types of stats in player evaluation, then the person who creates the stats, I'm not sure who the "user" of Nate Silver's metrics are.

It is interesting to note that much of Nate's work stems from his work in Baseball.


I think Beane probably had a lot more institutional resistance to his methods. The pundit class is much stronger in sports, since commentators are the mediators of entire sports for most fans (i.e. the majority don't attend games).

And this is different from US politics, how?

Citizens are not subject to the rules of the sports they follow.

Commentators are mediators of politics for most people in the US.

You're making a category error.

No, I'm making a wry comment about the political apathy of most US citizens, and you are either not getting the joke or trolling by not getting the joke.

The risk of sarcasm is that you are taken seriously. You probably should have just deleted your comment rather than drag it out.

I agree to an extent. His product was clear much better than the existing alternatives - but he also made a brilliant move by choosing the NYT as his exclusive distribution outlet.

It's more than just marketing. It's largely about distribution.

Silver's model isn't particularly advanced on election day. The further you get from election day, the more advanced his forecast is, accounting for various factors and ebbs and flows in polling data -- convention bounces, economic news, etc.

A big part of Silver's success has not been his math, but rather his excellent writing. He offers easy to consume analysis of the polls and makes a boring (to the masses) subject interesting. He can be portrayed as a stats geek when presented to the public, but really he's a great communicator who also knows his data.

Is there an element in this of the fact that if you looked at with any degree of sophistication it wasn't actually as close as some made out?

Pundits and the media gain from the narrative that it was on a knife edge because it keeps people coming back for more, but in reality Romney never really picked up, let alone sustained, the sort of position in the swing states that would have led to a win. It was never comfortable for Obama and Romney wasn't out of it but the idea it was neck and neck was fairly dramatically over stated.

Not undermining what Silver did but it seems to me that the same pundits who spoke out against him made him look more impressive than it was by claiming the race was closer than it was.

A lot of the work silver doesn't isn't just aggregating the polls. He tries to take into account the probability that the polls are wrong.

According to a couple of his blog posts, going just by the most recent polls (and their margins of error) Obama had a 100% chance of winning -- but in the past, polls have sometimes been systematically misleading.

Great. Now you can go on https://intrade.com/ , make some bets and clean up.

I already did.

With great statistical modelling comes great cash windfalls. You're right though, it's pretty straightforward to model it out so it's been baffling to see how many people have out and out claimed that the models are wrong.

Something I don't quite understand, but maybe someone here can help me with.

As I understand it, the poll averaging works because of the central limit theorem. But with so few data points, maybe a dozen polls at most in each state, and often half that, why does it still seem to work? I thought you'd need a few dozen data points at least.

What am I missing?

Let's take two examples - Ohio (a contested state with a lot of polls) and Alabama (only one poll since 1st August, but not contested).

In Ohio there were 44 polls in the month preceding the election, with a mean of 51.4% Obama, 48.6% Romney. If the confidence interval each poll is 4%, then the interval for a single poll is 4% / sqrt(44) = 0.6%, which is easily enough to make a confident forecast of an Obama victory in Ohio (my model was > 98% confident in its Ohio forecast).

In Alabama, the last poll was on Aug 16th, and it was 40% Obama, 60% Romney. Even with a margin of error of 4%, this state was clearly going to go to Romney (my model had 99.9% confidence in this forecast).

This pattern is repeated for almost every state - the states with few polls are not contested, and the hotly contested states are a focus for pollsters, so there are a lot of polls there. The only really difficult states were Florida, Colorado and North Carolina, where the candidates polled so similarly that even with 30-40 polls you didn't have sufficient sample size to make strong forecasts.

It doesn't work like that. Polls are not independent. If poll A is out by a large amount, then it's much more likely that poll B is also out. As well, the confidence interval assumes that the sample was a fair sample, which is highly unlikely. It's very very difficult to get a true random sample of the voting populace. It's hard enough to get a random sample of the populace, and a random sample of those who vote is even harder.

That's why Nate Silver said that Romney had about a 10% chance of winning. Statistically it was much lower, but there was a very real chance of systematic polling errors.

Polls are not independent. If poll A is out by a large amount, then it's much more likely that poll B is also out.

That's a bias issue - I was only addressing sampling variation. If you wnat your model to address bias as well, of course you can build that in, given some priors on what the distribution of the bias is likely to be. I didn't bother in my model (which is why I was forecasting 98-99% chance of an Obama victory).

We could discuss potential poll bias as well, but I thought that was a bit too much for this short comment.

Each poll's reported result is also an average.

I was keen to bet on Intrade, but the ToS and such are long and confusing, and it wasn't clear how easy it is to get one's money out after a win. What was your experience like?

I actually made more taking directional bets on BetFair, which generally had narrower spreads and more liquidity.

I made a some money buying Obama on InTrade and selling on BetFair, at a time when Obama was at 65% on one and 75% on the other, but I didn't start doing it until Monday because it was a lot of faff to set up the InTrade account.

I haven't tried to get my money out yet. I'll let you know how that goes.


Putting just over $2000 on Obama several days ago would've gotten you $800.

I don't think that his writing gets enough credit -- explaining what is happening to generate the predictions is really useful. Pretty much everyone that builds models from the state level polls seems to have made more or less the same predictions, but I've found his discussion to be the most interesting and informative (I may have missed some other good sources though).

the hard work that silver did wasn't in being able to predict the election 2 days in advance, any middle schooler with a calculator could have done that this time around. the work that he did was in being able to forecast the results months in advance, adjusting for a broader variety of inputs, including historical research.

That's fair, but I'd point out two things:

1. Even though any middle schooler with a calculator could have predicted the election 2 days ago, it seems that not many of them did - at least, if you judge based on the odds for a Democrat win you could get on InTrade and BetFair 2 days ago (65% and 75% respectively).

2. One neat thing about my model is that, in principle you can run it in "historical" mode and see what its predictions would have been at any given point in the past. So I can see what probability of a DEM win I would have assigned 2 months ago, and see if that's significantly different from the forecast from a more complex model.


But what people are saying is not "Nate Silver's predictions 1 month before are spot on". They are saying "his predictions ON THE DAY OF ELECTION were spot on".

Which is not that hard to do. Does not mean Nate Silver is a hack. Means people are not measuring his success appropriately.


It's one thing to hold the bottle, quite another to aim the funnel from 10ft away.

Was he really forecasting the result weeks/months ago or just showing what the result would be if he vote happened on that day, according to polling data?

Both. There was a Nov 6 forecast and a "now-cast."

He has both on his site.

Yes it's quite easy to write off the simplicity after the fact. Kudos.

I'd be interested to see how your model works for Senate races. That seems to be the bigger challenge.

House races seem to be far to data sparse to be reliable.

Maybe I'll get around to adding a forecast for the senate/house as well. Although any forecasts I make would be somewhat less compelling now that the results are out...

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact