Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> BERT was nothing short of a revolution in the field when it happened. You could cleanly draw a line pre-BERT and post-BERT. After it came out, something absurd like 95% of papers used it. It was so good, nobody could ignore it.

>Naively, you’d think such a revolutionary paper would be met with open arms. But when it was given the best paper award (at NAACL 2019), the postdocs I talked to universally grumbled about it. Why? It wasn’t interesting, they bemoaned. “It just scaled some stuff up.”

I still hear people making this complaint, despite the extraordinary success of scaling over the last few years.

It's pretty clear that scaling is the winning method (although exactly how to make best use of scale is an open question), but many researchers find it repulsive.



Ha, I remember when the scaling stuff started taking over in computer vision and how it annoyed us grad students. Suddenly companies (obviously, gross places where research and intellectual curiosity goes to die) were able to produce results that were much better than universities and they did it in a way that was inaccessible to us. Of course, not to mention in such a boring way. We didn’t have nearly the compute resources or any way of getting them. Now it’s slightly better I think.


> It's pretty clear that scaling is the winning method (although exactly how to make best use of scale is an open question), but many researchers find it repulsive.

"Winning" in the sense of "perhaps suitable to build practical applications that make a lot of money" - perhaps (even though I'd claim that whether this is true is still an open question).

On the other hand, "winning" in the sense of "getting a deeper understanding why the method works and having a model that can be analyzed deeply very well", then I would clearly say that scaling is not the winning method.

Scientific research is about truth-seeking, so many researchers are in particular interested in the second interpretation.


Limiting yourself to methods that are easy to understand is like looking for your keys under the streetlight. Small-scale methods may be easy to analyze, but they lack the richness and complexity that makes intelligence interesting.


> Small-scale methods may be easy to analyze, but they lack the richness and complexity that makes intelligence interesting.

I don't disagree.

But what a scientist would do after having strong evidence that huge scaling might help is attempting to understand what part of the much larger complexity leads to this qualitative change.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: