> that's one preprint on arxiv, that makes a wild claim about a new concept that they acronymise as "RASP". It's not any kind of established terminology, nor is it anything but a claim.
I think you would enjoy learning about RASP, rather than taking such a hardline skeptical position.
> a function is a mapping between the elements of two sets, whereas an algorithm is a sequence of operations that calculates the result of a function and is guaranteed to terminate
I'm aware. Transformers (and RASP programs) are guaranteed to terminate; that's one of their nice properties.
> Is there a good reason not to do that?
Balanced against the value of my unpaid time, a probability of 10^-70 is low enough for the purposes of a quick and fun test.
Speaking of which, I'm going to enjoy my weekend now. I hope you enjoy yours!
EDIT: I realise I was mistaken about the OP. He is not an undergarduate student, as I initially thought. His substack profile says he is a professional engineer and consultant. So his complete cluelessness about computer science fundamentals is not the result of inexperience, and his article is nothing more than an attempt to jump on the current bandwagon of LLM hype rather than an attempt to make sense of things. I thought I was helping a CS grad learn something! What an idiot I am! Fuck. φτου γαμώ την Παναγία μου.
[Earlier text of my comment left in the interest of something or other]
It is, but it's published in the proceedings of the ICML, which means it's been peer reviewed.
The OP has checked out (I guess all this computer scienc-y stuff is boring on a weekend), but even a peer-reviewed article is not enough to cause us to let go of good, old-fashioned computer science. The article basically invents its own language and then proceeds to map transformers to it, to claim that transformers can learn various kinds of programs. It's not convincing.
In any case, learning to sort lists by neural nets is not something new, or unique to transformers, and there's pretty clear understanding of how it works. I explain why it doesn't constitute learning an algorithm in my comment above. The RASP paper doesn't change that. I mean, Recurrent Neural Nets have a known equivalence to FSMs but even they cannot learn algorithms but only approximate them. The OP wrote his article in an obvious effort to understand why GPT is "not an n-gram" even if it behaves like an n-gram model (well, it's a language model, it doesn't matter what it's trained on) so I'm guessing he can appreciate the need for clarity in explaining empirical results and he probably will want to think further on what, exactly, his experiment has shown. I hope my little comment above will help him do that.
Would you change your mind for a different link, like this one? http://proceedings.mlr.press/v139/weiss21a.html
I think you would enjoy learning about RASP, rather than taking such a hardline skeptical position.
> a function is a mapping between the elements of two sets, whereas an algorithm is a sequence of operations that calculates the result of a function and is guaranteed to terminate
I'm aware. Transformers (and RASP programs) are guaranteed to terminate; that's one of their nice properties.
> Is there a good reason not to do that?
Balanced against the value of my unpaid time, a probability of 10^-70 is low enough for the purposes of a quick and fun test.
Speaking of which, I'm going to enjoy my weekend now. I hope you enjoy yours!