The fact that HN weebs gobble this horse shit up as if it were pate de fois gras is also depressing.
TLDR: physics nerds do an unimpressive thing with neural nets. You want to look at something impressive and still mysterious involving physics and neural doohickeys: why do echo state networks (reservoir computers that are effectively projections onto a random hyperplane) reproduce chaotic time series, most famously Mackey-Glass, so well?
Let's see, 3 input parameters, relying on a ton of pre-computed data, no ability to extrapolate beyond the precomputed numbers...they've invented the world's shittiest lookup table.
If your integrator isn't built to respect the symmetries of symplectic space it is going to give qualitatively wrong answers as t -> infinity. A non-symplectic geometry might do a better job if you want to track the Earth around the sun for 1000 years, but if you use a non-symplectic integrator who knows where the Earth ends up in a billion years.
"Deep networks" are the new hypnotic induction phase to induce willing suspensions of disbelief. Maybe there is some "emperor's new clothes" going on. Usually if somebody makes a bogus claim people will think "the person making the claim is stupid", but if a person makes a bogus claim today involving a deep network, the reader tends to assume that they are too stupid to understand it.
Neural doohickeys find real uses in computational chemistry. Accurate ab initio solvers (many-body Schrödinger equation) are very slow and everything else is just heuristic-, quasi- approximation mixed with semi-empirical tricks.
When someone makes even marginally better heuristic improvement, it's more likely that they sell it or start a company than post paper in arxiv. There is serious money to be made.
Looking through the comments here, that doesn't seem to be the case.
I dont think the comments will reflect "gobbling up"
For the small minority who are going to say "I'm really smart and I've taken some physics classes" ... please consider how far "I'm really smart and I've taken some programming classes" would get you on HN.
So the tech companies hired a bunch of us
FTFY. You would be entirely correct.
In most other cases of machine learning there is no "objective" solution and hence no "target function" to approximate.
But it's still an approximation, with things like e.g. backpropagation 'simply' (in the abstract mathematical sense) tweaking weights in the direction of the derivatives to get closer to expected values.
The vast majority of machine learning just builds on that by going deep (more layers), automatically generating inputs (e.g. in game AIs playing against themselves), etc.
One might argue that's even worse than function optimisation as you can only vaguely guess at the target and thus all your validation is suspect and you have to prove it using humans by, for instance, beating them at Starcraft.
A crucial step of any AI/ML project is to define this objective solution/target function. For example, a task like "classify photos into cats and dogs" cannot be solved by an ML system: it's too ambiguous and ill-defined. We can define a specific, unambiguous task, which we feel is somehow similar to "classify photos into cats and dogs", but it wouldn't actually be the same task.
For example, "minimise the average L2 loss across these million example inputs" is a specific task, which we can hence use ML approaches to solve. This task has an objective solution: return 100% cat for all the inputs labelled cat, and 100% dog for those labelled dog. Interestingly, a perfect approximation of this target function would probably be considered a poor solution to the original, fuzzy problem (i.e. it will over-fit); although again that would be an ambiguous, ill-defined statement to make.
There are many ML problems which aren't of the 'fit these examples' type, but they still have some explicit or implicit target function; e.g. genetic algorithms have an explicit fitness function to maximise/minimise.
Even attempts to 'avoid' this "blindly optimise" approach (e.g. regularisation, optimistic exploration, intrinsic rewards, etc.) are usually presented as an augmented target function, e.g. "minimise the average L2 loss ... plus the regularisation term XYZ"
Isn't essentially they key concept of chaotic problems that they aren't predictable, so there is no real "pattern" to train on so to speak?
You can look at it from the point of view of, if we watch the system evolve, can we tell whether the rules were violated at some point, by some arbitrarily small amount? As the chaotic systems evolve, it becomes harder and harder to tell if that is the case. There isn't a discrete transition from knowing to not knowing; our level of knowing goes down over time.
In information theory, we can see that as a loss of bits of precision on the system, requiring more and more bits initially to make up for it. Since we can't compute with real numbers, but only approximations given increasingly more bits over time, even in the pure mathematical case where everything is perfect and specified, we still lose this knowledge as the simulation progresses. It's that much worse in the real world, where we don't even start with all that many bits of precision.
It's not quite the question you asked, but... it's like the shadow of the question you asked, and it's a bit easier to explain. (And reasonably mathematically valid. You can characterize chaotic systems by how many bits they lose per time unit.)
It's related to the ability to read a Lyapunov exponent as a measure of bit loss. Lyapunov exponent is easy to Google up, and if you understand that and information theory it's not a difficult leap to make, but I can't find any nice explanation for people who don't already have those things.
Example 1: Conway's game of life.
Example 2: Collate conjecture.
If the number is even, divide it by two.
If the number is odd, triple it and add one.
The rules are deterministic. But you can't do any predictions other than running the simulation.
If the chaos produces a new kind of behavior the result of the ANN may be totally wrong. In other words - it works well, often.
Is my simplistic thinking right?