Everyone wants to use less compute to fit more in, but (obviously?) the solution will be to use more compute and fit less. Attention isn't (topologically) attentive enough. All these RNN-lite approaches are doomed, beyond saving costs, they're going to get cooked by some other arch—even more expensive than transformers.
Would you mind expanding upon your thesis? If that compute and all those parameters aren't "fitting" the training examples, what is it that the model is learning, and how should that be analyzed?
I think there are two distinct areas. One is the building of the representations, which is achieved by fitting. The other area is loosely defined as "computing" which is some kind of searching for a path through representation space. All of that is wrapped in a translation layer that can turn those representations into stuff we humans can understand and interact with. All of that is achieved to some extent by current transformer architectures, but I guess some believe that they are not quite as effective at the "computation/search" stage.
But how does it get good at "computing"? The way I see it, we either program them to do so manually, or we use ML, at which case the model "fits" the computation based on training examples or environmental feedback, no? What am I missing?
Is this a popular view? I think mathematicians can be odd, but usually they communicate quite well. I think as far as popularization of their fields go, mathematics is probably doing the best out of the lot: numberphile, 3blue1brown etc.
The examples you list are not known as mathematicians; they’re popularisers who (sometimes) happen to have qualifications and a history of studying the subject. 3B1B is absolutely brilliant but Grant Sanderson is not a ‘mathematician’ in the sense of someone who does research in mathematics.
Ironically, the fact that mathematics popularisation is as visible as it is is itself a sign of how much it is needed and therefore how unpopular and misunderstood the subject is. Branches of science like, say, astrophysics don’t need popularisation; people already think they’re cool.
The view of ‘people who are good at mathematics’ being bad at English is a relatively common one, in my experience. At least at the level of university students. People think there’s some sort of conservation of ability or equilibrium in the universe that means that if you have a ‘maths brain’ then you’re no good at much else, and vice versa. If anything, I think there’s a positive correlation between mathematical and communication ability — after all, mathematics is basically just the science of clever notation and clear-headed thinking.
The term "scientific method" is itself a philosophical term (as is "method" in this context). Read, or even skim, this and notice how many of the important figures listed were philosophers:
In that case, let me go a step further: although I wouldn't respond the way some other folks have, I get why they would. Many of my most memorable and most intellectually stimulating classes were those that weren't related to my engineering degree. The philosophy classes, though, never even approached "intellectually stimulating" status. I wrote a good 80-100 pages of pseudointellectual drivel about half-baked analogies like the "answering machine paradox," and accrued thousands in debt in the process.
Another thing: The great thing about Philosophy is that there are no wrong answers. But, the bad thing about philosophy classes is there are wrong answers. Open-endedness and free thinking don't scale to 150-seat lecture halls, indifferent TAs, and PhD-candidate "professors" doing the bare minimum to get a diploma.
If there is one space where it shines, sure it’s mathematics. But even there, the most notable mathematicians highly rely on some intuitions far before they manage to prove anything, as well as while selecting/creating their conceptual tools to attempt to build the proof, and rarely go to the point of formalizing their points through Coq/Isabelle or even with meticulous paper craft à la Principia Mathematica from Russel and Whitehead.