
A mathematical formula that predicts US elections with 87.5% accuracy - bnveg
https://turingbotsoftware.com/posts/us-election-prediction.html
======
raincom
Here is a quote from Segre's new biography of Fermi: "When Dyson met with him
in 1953, Fermi welcomed him politely, but he quickly put aside the graphs he
was being shown indicating agreement between theory and experiment. His
verdict, as Dyson remembered, was "There are two ways of doing calculations in
theoretical physics. One way, and this is the way I prefer, is to have a clear
physical picture of the process you are calculating. The other way is to have
a precise and self-consistent mathematical formalism. You have neither." When
a stunned Dyson tried to counter by emphasizing the agreement between
experiment and the calculations, Fermi asked him how many free parameters he
had used to obtain the fit. Smiling after being told "Four," Fermi remarked,
"I remember my old friend Johnny von Neumann used to say, with four parameters
I can fit an elephant, and with five I can make him wiggle his trunk." There
was little to add." (p.273)

One can fit an elephant with four free parameters. That's the story of these
curve fitting exercises.

~~~
Grakel
I ask this in all seriousness- what?

~~~
yeezyseezy
In regression and curve fitting, you can add additional, often spurious
parameters, and get a closer fit. A better fit does not mean you have a better
model, it usually means the opposite, that you've overfitted the model, and
that it's completely useless for forecasting and predictive purposes. This is
why in machine learning, researchers are careful to separate training and
testing data. This is just physicists roasting each other over the same issue.

~~~
bnveg
The model shown in the article is cross-validated.

------
Imnimo
I would expect to be able to get 100% (on the training data) on this task.
Shouldn't any sequences of election victories be easily represented with
multiple sin functions?

I feel like the message I'm supposed to take from this demo is "Look how good
TuringBot is! They can automatically find a function to match this data!" But
the actual message I'm getting is "Symbolic regression is too hard for
TuringBot!"

~~~
frabert
Indeed. Just perform an FFT on the data and encode the resulting coefficients
in a closed formula.

------
kevinventullo
That's nothing, I can generate a degree 44 polynomial that predicts US
elections with 100% accuracy!

~~~
bnveg
Now do that with cross validation and get a better result than the reported

------
darksaints
This commits the same fallacy that exists all over the place for time series
data. This bites almost anybody who tries using time series methods for
predicting equities returns.

Since you have a literal dependency between one data point and the next, you
can't train your model using randomized data. So the default that people jump
to is to segment their data with earlier data being the training set, and
later data being the test set.

If your data is the result of a very well understood and controlled process,
this works fabulously (see ARIMA and variants). But the more that the data
relies on significant noise, whether that is pure randomness or Brownian
motion ( _very likely the case with elections_ ), that methodology breaks down
spectacularly. All you end up doing is overfitting to your test set instead of
your training set.

~~~
bnveg
>Since you have a literal dependency between one data point and the next, you
can't train your model using randomized data This is why the train/test split
was not randomized, but sequential.

------
mywittyname
Uhhhh...This is the dumbest ad article I've read in a while.

~~~
frabert
Talk about overfitting...

~~~
bnveg
This is an out of sample result. 4 formulas were compared. Sure it may be an
overfit, as any machine learning model can be, but the procedure that was used
to generate this formula cannot be so easily dismissed.

------
cjohnson318
I thought this was going to be about Lichtman[0].

[0]([https://en.wikipedia.org/wiki/The_Keys_to_the_White_House](https://en.wikipedia.org/wiki/The_Keys_to_the_White_House))

------
cjohnson318
Except that the Democrats of 1940 were much different than the Democrats of
1990, and the Democrats of 2020.

~~~
RhysU
Everything is a time series.

------
defertoreptar
Now repeat the same procedure on randomized data and see how often you can
match or surpass this accuracy.

~~~
bnveg
As a rule of thumb, you cannot. I have tried to fit the Mona Lisa with the
software (brighness as a function of x and y) and could not find anything. It
performs poorly when no structure exists.

------
woopwoop
Maybe the other commenters tearing into this are on a higher plane of irony
from me, but I want to point out that I'm 99% sure the article is meant as a
joke.

~~~
mcphage
Well, I really hope so. It doesn’t really read as a joke, but on the other
hand, they’re bringing their impressive software to bear to attempt to predict
25 bits of data, and can’t even do that very well. So I hope so.

------
purplezooey
Nicolas Cage movies are missing

