Here is a quote from Segre's new biography of Fermi: "When Dyson met with him in 1953, Fermi welcomed him politely, but he quickly put aside the graphs he was being shown indicating agreement between theory and experiment. His verdict, as Dyson remembered, was "There are two ways of doing calculations in theoretical physics. One way, and this is the way I prefer, is to have a clear physical picture of the process you are calculating. The other way is to have a precise and self-consistent mathematical formalism. You have neither." When a stunned Dyson tried to counter by emphasizing the agreement between experiment and the calculations, Fermi asked him how many free parameters he had used to obtain the fit. Smiling after being told "Four," Fermi remarked, "I remember my old friend Johnny von Neumann used to say, with four parameters I can fit an elephant, and with five I can make him wiggle his trunk." There was little to add." (p.273)
One can fit an elephant with four free parameters. That's the story of these curve fitting exercises.
This is out of sample performance. Only 4 models were compared outside the train domain. Also I don't see where you see 22 parameters, the complexity of a model is not defined as its number of parameters.
In regression and curve fitting, you can add additional, often spurious parameters, and get a closer fit. A better fit does not mean you have a better model, it usually means the opposite, that you've overfitted the model, and that it's completely useless for forecasting and predictive purposes. This is why in machine learning, researchers are careful to separate training and testing data. This is just physicists roasting each other over the same issue.
I would expect to be able to get 100% (on the training data) on this task. Shouldn't any sequences of election victories be easily represented with multiple sin functions?
I feel like the message I'm supposed to take from this demo is "Look how good TuringBot is! They can automatically find a function to match this data!" But the actual message I'm getting is "Symbolic regression is too hard for TuringBot!"
This is a Pareto optimization with a limit on the size of the formulas. Sure, with a formula of arbitrary size you can fit anything with 100% accuracy, but the task is much harder if what you are looking for are short models.
This commits the same fallacy that exists all over the place for time series data. This bites almost anybody who tries using time series methods for predicting equities returns.
Since you have a literal dependency between one data point and the next, you can't train your model using randomized data. So the default that people jump to is to segment their data with earlier data being the training set, and later data being the test set.
If your data is the result of a very well understood and controlled process, this works fabulously (see ARIMA and variants). But the more that the data relies on significant noise, whether that is pure randomness or Brownian motion (very likely the case with elections), that methodology breaks down spectacularly. All you end up doing is overfitting to your test set instead of your training set.
>Since you have a literal dependency between one data point and the next, you can't train your model using randomized data
This is why the train/test split was not randomized, but sequential.
This is an out of sample result. 4 formulas were compared. Sure it may be an overfit, as any machine learning model can be, but the procedure that was used to generate this formula cannot be so easily dismissed.
As a rule of thumb, you cannot. I have tried to fit the Mona Lisa with the software (brighness as a function of x and y) and could not find anything. It performs poorly when no structure exists.
Maybe the other commenters tearing into this are on a higher plane of irony from me, but I want to point out that I'm 99% sure the article is meant as a joke.
Well, I really hope so. It doesn’t really read as a joke, but on the other hand, they’re bringing their impressive software to bear to attempt to predict 25 bits of data, and can’t even do that very well. So I hope so.
One can fit an elephant with four free parameters. That's the story of these curve fitting exercises.