This paper is from a small group at an academic institution. They are trying to ...

krasin · 2024-06-03T22:44:11 1717454651

> They are trying to innovate in the idea space and are probably quite compute constrained.

Training a GPT-2 sized model costs ~$20 nowadays in respect to compute: https://github.com/karpathy/llm.c/discussions/481

olaulaja · 2024-06-04T07:40:39 1717486839

Baseline time to grok something looks to be around 1000x normal training time so make that $20k per attempt. Probably takes a while too. Their headline number (50x faster than baseline, $400) looks pretty doable if you can make grokking happen reliably at that speed.

orlp · 2024-06-04T06:00:13 1717480813

$20 per attempt. A paper typically comes after trying hundreds of things. That said, the final version of your idea could certainly try it.

buildbot · 2024-06-03T22:37:49 1717454269

I’ve been in a small group at an academic institution. With our meager resources we trained larger models than this on many different vision problems. I personally train LLMs on OpenWebText than this using a few 4090s (not work related). Is that too much for a small group?

MNIST is solvable using two pixels. It shouldn’t be one of two benchmarks in a paper, again just in my opinion. It’s useful for debugging only.

all2 · 2024-06-03T22:41:14 1717454474

Again, a small academic institution may not have the experience or know-how to know these things.

olnluis · 2024-06-03T22:44:04 1717454644

I thought so at first, but the repo's[0] owner and the first name listed in the article has Seoul National University on their Github profile. Far away from a small academic institution.

[0]: https://github.com/ironjr/grokfast

muskmusk · 2024-06-04T08:30:03 1717489803

It's a free world. Nothing stops you from applying their findings to bigger datasets. It would be a valuable contribution.

gwern · 2024-06-04T20:01:13 1717531273

How can MNIST be solved using just two binary pixels when there's 10 classes, 0-9?

whimsicalism · 2024-06-04T20:43:55 1717533835

i'm also curious but my understanding was MNIST pixels are not binary due to some postprocessing artifacts

gwern · 2024-06-04T23:29:39 1717543779

Oh hm, so they are. I thought they were binary because they used a digital pen to create them, IIRC, and logistic regression is always the baseline; but checking, they technically are grayscale and people don't always binarize them. So I guess information-theoretically, if they are 0-255 valued, then 2 pixels could potentially let you classify pretty well if sufficiently pathological.

whimsicalism · 2024-06-04T00:43:39 1717461819

> MNIST is solvable using two pixels.

really? do you have any details?

agree it has no business being in a modern paper