Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Grokking is certainly an interesting phenomenon, but have practical applications of it been discovered yet?

I remember seeing grokking demonstrated for MNIST (are there any other non synthetic datasets for which it has been shown?), but the authors of that paper had to make the training data smaller and got a test error far below state of the art.

I'm very interested in this research, just curious about how practically relevant it is (yet).



My gut instinct from reading about the phenomenon says that a “grokked” model of X parameters on Y tokens is not going to outperform an “ungrokked” model with 2X parameters on 2Y tokens - since “grokking” uses the same resources as parameter and token scaling, it’s simply not a competitive scaling mechanism at the moment. It might make sense in some applications where some other hard limit (e.g. memory capacity at inference time) occurs before your resource limit AND you would still see good returns on improvements in quality, but I suspect those are still fairly narrow and/or rare applications.


According to https://arxiv.org/abs/2405.15071 their grokked model outperformed GPT4 and Gemini1.5 on the reasoning task. We can then argue if the task makes sense and the conclusion stands for other use cases but i think grokking can be useful


Wouldn't it be super useful in cases where data is limited?


Nobody is really looking for practical applications for it, and you shouldn't necessarily expect them from this kind of academic research.


That doesn't sound right at all.

Improving generalization in deep learning is a big deal. The phenomenon is academically interesting either way, but e.g. making sota nets more training data economical seems like a practical result that might be entirely within reach.


i think y'all are both right. grokking is a phenomenon that by definition applies to severely overfit neural networks, which is a very different regime than modern ML - but we might learn something from this that we can use to improve regularization


Looks like grokking could give better reasoning and generalization to LLMs, but I'm not sure how practical it would be to overfit a larger LLM

See: https://arxiv.org/abs/2405.15071


grokking doesn't and will not have practical uses, imo - it is just an experiment that revealed cool things that we mostly already suspected about implicit regularization

however, techniques we learn from grokking about implicit regularization might be helpful for the training regimes we actually use


> grokking doesn't and will not have practical uses, imo

I'm not so sure. Reasoning is the next big hurdle, and grokking and parametric memory seem very effective here.

[1] https://arxiv.org/abs/2405.15071




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: