Sure but it was a better fit, and before that heliocentric models were definitely the only way forward that didn't keep adding terms every time someone spotted a moon.
Occam's razor - do not multiply terms without necessity - is essentially a loss function.
You're talking about Kepler's model here, not about the gravitational equation. The gravitational equation was not a better fit than Kepler at that time, especially since it used unknown constants.
So would you care to comment on how this relates to the original contention, which is the claim that a loss function could not discover Newton's law of gravitation?
Because what you're arguing, extensively, is that due to lack of fit, Newton's Law of Gravitation wasn't settled science until observational data was of sufficient fidelity to clearly distinguish it.
Formulate the loss function -- you'll find it's just
loss(the-right-answer(perfect-x) - perfect-y)
The most important aspect of "the-right-answer" is its ability to ignore almost all the data.
The existence of planets is "predictable" from the difference between the data and the theory -- if the theory is just a model of the data, it has no capacity to do this.
If you want to "do physics" by brute force optimization you'd need to have all possible measures, all possible data, and then a way of selecting relevant causal structures in that data -- and then able to try every possible model.
loss(Model(all-data|relevant-causal-structures) - Filter(...|...))) forall Model
Of course, (1) this is trivially not computable (eqv. to computing the reals) -- (2) "all possible data with all possible measures" doesn't exist and (3) selecting relevant causal structure requires having a primitive theory not derived from this very process
animals solve this in reverse order: (3) is provided by the body's causal structure; (2) is obtained by using the body to experiment; and (1) we imagine simulated ways-the-world-might-be to reduce the search space down to a finite size.
ie., we DO NOT make theories out of data. We first make theories then use the data to select between them.
This is necessary, since a model of the data (ie., modern AI, ie., automated statistics, etc.) doesnt decide between an infinite number of theories of how the data came to be.
> ie., we DO NOT make theories out of data. We first make theories then use the data to select between them.
No we don't, we make hypotheses and then test them. Hypotheses are based on data.
There are physics experiments being done right now where the exact hope is that existing theory has not predicted the result they produce, because then we'd have data to hypothesis something new.[1]
You are literally describing what deep learning techniques are designed to do while claiming they can't possibly do it.
I know this discussion is a bit old at this point, but I came across this[1] essay for the first time today, and this shows more of what I was trying to get across earlier in the thread. Hopefully you'll find it interesting. Essentially, they trained a GPT on predicting the next move in a game of Othello, and by analyzing the weights of the network, found that the weights encode an understanding of the game state. Specifically, given an input list of moves, it calculates the positions of its own pieces and that of the opponent (a tricky task for a NN given that Othello pieces can swap sides based on moves made on the other side of the board). Doing this allowed it to minimize loss. By analogy, it formed a theory about what makes moves legal in Othello (in this case, the positions of each player's pieces), and found out how to calculate those in order to better predict the next move.
Proving any given AI architecture can't do something doesn't prove all AI architectures forever will never be able to do something. Neural networks aren't all AI, they're not even "neural networks" since the terms wraps up a huge amount of architectural and design choices and algorithms.
Unless you believe in the soul, then the human brain is just a very complicated learning architecture with a specific structure (which we freely know doesn't operate like existing systems...sort of, of course we also don't know that it's not just a convoluted biological path to emulating them for specific subsystems either).
But even your original argument is focused on just playing with words to remove meaning: calling something data doesn't meaningfully make your point, because mathematical symbols are just "data" as well.
Mathematics has no requirement to follow any laws you think it does - 1 + 1 can mean whatever we want, and its a topic of discussion as to why mathematics describes the physical world at all - which is to say, it's valid to say we designed mathematics to follow observed physics.
The whole point is that Newton came up with the law before there was observational data that could prove it, which is fundamentally different from regression. The data is used to reject the theory, not to form it, here.
I get the feeling that the OP is using "loss function" in the figurative sense, and not in the sense of an actual loss function that is fit to observations. We know nobody did that in Newton's time. In Newton's time they didn't even have the least squares method, let alone fit a model to observations by optimising a loss function.
Yes, I'm also using it in the figurative sense. It's not a regression model, the models are developped and then the data is sought out to infirm them. It's the reverse for a regression technique. The model being generated before the data that can support it is a big part of how humans come up with these models and it's fundamentally different in many ways.
What are you talking about? If scientific models aren't developed based on data, then what are they developed based on? Divine inspiration?
No. Very obviously no. The multi-post diversion about Kepler's laws is explicitly evidence to the contrary since Kepler's laws are a curve fitting exercise which matches astronomical data in a specific context but doesn't properly describe the underlying process - i.e. their predictive power vanishes once the context changes. But they do simplify down to Newton's Law once the context is understood.
New data is sought out for models to determine whether they are correct because a correct model has to explain existing data and predict future data. The Bohr Model of the atom was developed because it explained the emission spectra of hydrogen well. It's not correct because it doesn't work anything but hydrogen...but it's actually correct enough that if you're doing nuclear magnetic resonance (which is very hydrogen-centric for organic molecules) then it is in fact good enough to predict and understand spectra with (at least in 1D, 3D protein structure prediction is it's own crazy thing).
This is the entire point of deep learning techniques. The whole idea of latent space representations is that they learn underlying structural content of the data which should include observations about reality.
That's not how the scientific process works. You use your intuition to make a theory, sometimes loosely based on data, and then you come up with an experiment to test it.
We both agree that Kepler was trying to fit curves. But that's not what Newton was trying to do. Newton was trying to explain. Newton's model did not fit the data better than Kepler's model until far after they both died.
Newton's model, to Newton had more loss than Kepler's model.
But it turned out 70 years later that Newton's model was better, because it's only then that there was any data for which it was a better prediction.
You're similarly wrong about Bohr. If all you were interested was to find the emission spectra of hydrogen, there's absolutely no reason you'd try to come up with the Bohr model. Why? Because Rydberg already made a formula that predicted the emission spectra of Hydrogen, 25 years earlier.
The entire point of Bohr's model and of Newton's model is that they weren't empirically better at predicting the phenomena. Indeed, simple curve fitting came up with equations that are far better in practice, earlier.
But they were better at explaining the phenomena.
And that only became relevant because after we had these models, we came up with new experiments, informed by these models, which helped us understand them and eventually push them behind the breaking point.
It's not a curve fitting experiment. We already had better curve fitting models far before either of those was invented. If your goal was to reduce the loss, they'd be useless and there would be no point coming up with them.
That's the difference between the scientific method and mere regression.
(Not the OP) We don't know ho;w the human mind works, or how "intuitions" or "inspiration" come about, but that's no reason to call them "metaphysics". Clearly, they are physical processes that somehow take place in the human brain.
The questions you ask in this comment are good questions, for which we have no good answers. That doesn't mean there's anything supernatural going on, or that anyone is assuming something supernatural is happening. We just don't know how human scientists come up with new hypotheses, that's all there is to it.
But it's not like there's some kind of principled way to do it. There's no formulae, no laws, where we can plug in some data and out pops a hypothesis ready for the testing. Maybe we will find how to define such laws or formulae at some point, but for now, all we got is some scientist waking up one day going "holy cow, that's it!". And then spending the next ten years trying to show that's what it really is.
To clarify, the OP is pointing out that it wasn't Newton's law of universal gravitation that defeated the epicyclical model of the cosmos.
It was Kepler's laws of planetary motion that did for epicycles; and that happened 70 ish years before Newton stated his laws of motion and pointed out that they basically subsume Kepler's laws of planetary motion.
Occam's razor - do not multiply terms without necessity - is essentially a loss function.