Hacker News new | past | comments | ask | show | jobs | submit login

This paper is from a small group at an academic institution. They are trying to innovate in the idea space and are probably quite compute constrained. But for proving ideas smaller problems can make easier analysis even leaving aside compute resources. Not all research can jump straight to SOTA applications. It looks quite interesting, and I wouldn't be surprised to see it applied soon to larger problems.



> They are trying to innovate in the idea space and are probably quite compute constrained.

Training a GPT-2 sized model costs ~$20 nowadays in respect to compute: https://github.com/karpathy/llm.c/discussions/481


Baseline time to grok something looks to be around 1000x normal training time so make that $20k per attempt. Probably takes a while too. Their headline number (50x faster than baseline, $400) looks pretty doable if you can make grokking happen reliably at that speed.


$20 per attempt. A paper typically comes after trying hundreds of things. That said, the final version of your idea could certainly try it.


I’ve been in a small group at an academic institution. With our meager resources we trained larger models than this on many different vision problems. I personally train LLMs on OpenWebText than this using a few 4090s (not work related). Is that too much for a small group?

MNIST is solvable using two pixels. It shouldn’t be one of two benchmarks in a paper, again just in my opinion. It’s useful for debugging only.


Again, a small academic institution may not have the experience or know-how to know these things.


I thought so at first, but the repo's[0] owner and the first name listed in the article has Seoul National University on their Github profile. Far away from a small academic institution.

[0]: https://github.com/ironjr/grokfast


It's a free world. Nothing stops you from applying their findings to bigger datasets. It would be a valuable contribution.


How can MNIST be solved using just two binary pixels when there's 10 classes, 0-9?


i'm also curious but my understanding was MNIST pixels are not binary due to some postprocessing artifacts


Oh hm, so they are. I thought they were binary because they used a digital pen to create them, IIRC, and logistic regression is always the baseline; but checking, they technically are grayscale and people don't always binarize them. So I guess information-theoretically, if they are 0-255 valued, then 2 pixels could potentially let you classify pretty well if sufficiently pathological.


> MNIST is solvable using two pixels.

really? do you have any details?

agree it has no business being in a modern paper




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: