> BERT was nothing short of a revolution in the field when it happened. You coul...

kelipso · 2025-03-18T02:56:22 1742266582

Ha, I remember when the scaling stuff started taking over in computer vision and how it annoyed us grad students. Suddenly companies (obviously, gross places where research and intellectual curiosity goes to die) were able to produce results that were much better than universities and they did it in a way that was inaccessible to us. Of course, not to mention in such a boring way. We didn’t have nearly the compute resources or any way of getting them. Now it’s slightly better I think.

aleph_minus_one · 2025-03-18T13:45:43 1742305543

> It's pretty clear that scaling is the winning method (although exactly how to make best use of scale is an open question), but many researchers find it repulsive.

"Winning" in the sense of "perhaps suitable to build practical applications that make a lot of money" - perhaps (even though I'd claim that whether this is true is still an open question).

On the other hand, "winning" in the sense of "getting a deeper understanding why the method works and having a model that can be analyzed deeply very well", then I would clearly say that scaling is not the winning method.

Scientific research is about truth-seeking, so many researchers are in particular interested in the second interpretation.

Legend2440 · 2025-03-18T20:10:34 1742328634

Limiting yourself to methods that are easy to understand is like looking for your keys under the streetlight. Small-scale methods may be easy to analyze, but they lack the richness and complexity that makes intelligence interesting.

aleph_minus_one · 2025-03-19T00:06:30 1742342790

> Small-scale methods may be easy to analyze, but they lack the richness and complexity that makes intelligence interesting.

I don't disagree.

But what a scientist would do after having strong evidence that huge scaling might help is attempting to understand what part of the much larger complexity leads to this qualitative change.