Grokking is certainly an interesting phenomenon, but have practical applications...

fwlr · on June 3, 2024

My gut instinct from reading about the phenomenon says that a “grokked” model of X parameters on Y tokens is not going to outperform an “ungrokked” model with 2X parameters on 2Y tokens - since “grokking” uses the same resources as parameter and token scaling, it’s simply not a competitive scaling mechanism at the moment. It might make sense in some applications where some other hard limit (e.g. memory capacity at inference time) occurs before your resource limit AND you would still see good returns on improvements in quality, but I suspect those are still fairly narrow and/or rare applications.

d3m0t3p · on June 4, 2024

According to https://arxiv.org/abs/2405.15071 their grokked model outperformed GPT4 and Gemini1.5 on the reasoning task. We can then argue if the task makes sense and the conclusion stands for other use cases but i think grokking can be useful

joelthelion · on June 4, 2024

Wouldn't it be super useful in cases where data is limited?

Legend2440 · on June 3, 2024

Nobody is really looking for practical applications for it, and you shouldn't necessarily expect them from this kind of academic research.

svara · on June 4, 2024

That doesn't sound right at all.

Improving generalization in deep learning is a big deal. The phenomenon is academically interesting either way, but e.g. making sota nets more training data economical seems like a practical result that might be entirely within reach.

whimsicalism · on June 4, 2024

i think y'all are both right. grokking is a phenomenon that by definition applies to severely overfit neural networks, which is a very different regime than modern ML - but we might learn something from this that we can use to improve regularization

barfbagginus · on June 4, 2024

Looks like grokking could give better reasoning and generalization to LLMs, but I'm not sure how practical it would be to overfit a larger LLM

See: https://arxiv.org/abs/2405.15071

whimsicalism · on June 4, 2024

grokking doesn't and will not have practical uses, imo - it is just an experiment that revealed cool things that we mostly already suspected about implicit regularization

however, techniques we learn from grokking about implicit regularization might be helpful for the training regimes we actually use

naasking · on June 6, 2024

> grokking doesn't and will not have practical uses, imo

I'm not so sure. Reasoning is the next big hurdle, and grokking and parametric memory seem very effective here.

[1] https://arxiv.org/abs/2405.15071