>where we all know that mathematics and/or CS deserve the honor
Or semiconductor manufacturers.
All the math and CS needed for AI can fit on a napkin, and had been known for 200+ years. It's the extreme scaling enabled by semiconductor science that really makes the difference.
That's absurd. The computer science needed for AI has not been known for 200 years. For example, transformers were only invented in 2017, diffusion models in 2015.
(When the required math was invented is a different question, but I doubt all of it was known 200 years ago.)
TBF backpropagation was introduced only in the 1970's, although in hindsight it's a quite trivial application of the chain rule.
There were also plenty of "hacks" involved to make the networks scale such as dropout regularization, batch normalization, semi-linear activation functions (e.g. ReLU) and adaptive stochastic gradient descent methods.
The maths for basic NNs is really simple but the practice of them is really messy.
Residual connections are also worth mentioning as an extremely ubiquitous adaptation, one will be hard-pressed to find a modern architecture that doesn't use those at least to some extent, to the point where the original Resnet paper sits at over 200k citations according to google scholar[1].
> All the math and CS needed for AI can fit on a napkin, and had been known for 200+ years.
This isn't really true. If you read a physics textbook from the early 1900s, they didn't really have multivariate calculus and linear algebra expressed as concisely as we do now. It would take several napkins. Plus, statistical mechanics was quite rudimentary, which is important for probability theory.
Or semiconductor manufacturers.
All the math and CS needed for AI can fit on a napkin, and had been known for 200+ years. It's the extreme scaling enabled by semiconductor science that really makes the difference.