Hacker News new | past | comments | ask | show | jobs | submit login
Thinking Like Transformers (arxiv.org)
161 points by ArtWomb 44 days ago | hide | past | favorite | 42 comments

This is nice, however simple fully connected or convolutional networks can encode computer programs too. In fact, the hope for RNNs was that they will learn computer programs, but due to the weirdness of neural network training that never happened in practice to a satisfying degree. So finding a programming model for Transformers is not necessarily as practical as it first seems, since there's no guarantee that they will learn these programs from data.

I think the point here is that all architectures can execute a certain type of program. An RNN maps to a finite state automaton, and this is an attempt at showing what the natural class of programs a transformer can implement is.

Actually, I think it is conventional neural networks which can only approximate finite state machines. RNNs are (in theory, not so much in practice) Turing complete.

> RNNs are Turing complete.

Turing completeness requires access to unlimited read/write memory. RNNs only have a fixed dimensional state. I guess I'm theory that starte is continuous, but it has to be a pretty optimistic model that assumes we can handle unbounded data like that.

Intuitively I think as long as your activation function is sufficiently expressive (e.g. not a step function or something), you should be go to go in theory since that’s what you’re feeding back. Might take a while.

The word I was looking for is "robust". Any realistic model must be able to accept tiny perturbations (like Gaussian noise) of the vectors, since that's how floating arithmetic works. An RNN can't be robustly Turing complete.

I love this paper, I always had an issue of visualizing transformers from reading ML papers, with this I can just play with it. These are simple higher-order functions that can be implemented in any language though, so porting to Python and playing with it in a Jupyter notebook is trivial.

I'm not sure if it's from the authors of the paper, but this appears to be that: https://github.com/tech-srl/RASP

The paper's lead author is the sole committer on that repo.

To go a bit meta, I think this link contributes to Hacker News because it sparks curiosity. While I was among those wondering if this would look into some kind of computational model that did what Transformers do (at least in the modern movies), scan an object in order to reconstruct it / reconfigure itself into that object, I quickly realized this is about something else. But it is something interesting and worth learning more about[0]. So just get over your prime instinct to be disappointed that the subject doesn't match your expectations, and change your attitude towards information shared that feels alien.

[0] https://en.wikipedia.org/wiki/Transformer_(machine_learning_...

They also did this in the original cartoon… actually, their spaceship scanned earth & programmed them with earth-like vehicle forms.

(DID NOT READ THE ENTIRE PAPER, only the abstract, the definition of the language and some of the experiments)

Note sure how useful this is in the larger context of transformers. Transformers (and deep networks in general) are often used when the logic to be used in solving a problem is largely unknown. Example -- How do you write a RASP program that identifies names in a document?

They do have some simple RASP examples in the paper of things that a transformer model can accomplish (Symbolic Reasoning in Transformers) but, again, this is usually something that the model can do as a result of the task it was originally trained for, not a task in and of itself.

The point isn't that you can write programs in RASP. It's that you can use RASP as a tool to reason about what tasks Transformers will be good and bad at and how their architecture influences that.

Oh, for a moment I thought the article was about Autobots …

"As an Autobot, should I buy life insurance or car insurance?"


Same here! Was expecting speculative implementation or comparison how their setup would be and what kind of algorithms etc there could be.

Less than meets the eye

Wage your-battle to-destroy the-evil forces-of...

It's actually about Decepticons?

I am... Megatron!

Inferior fleshlings lay before me... as barely breathing hunks of meat

With the blood of Unicron in my veins, I reign like a god

A god amongst insects

I have existed from the morning of the universe

And I shall exist until the last star falls from the night

My ultimate peace would be granted by the destruction of all life, stars and nebulae

Leaving only nothingness and void

Although I have taken the form of this machine

I am all men as I am no man

And therefore

I am a god

Even in death there is no command but mine

Your race is of no consequence


Kill them all

Autobots, transform and roll out!

(I share your disappointment fwiw)

I believe [1] is the work they're mainly referencing for their RNN equivalent/inspiration.

[1] https://nlp.stanford.edu/~johnhew/rnns-hierarchy.html

As someone working in EE on power transformers, I am deeply disappointed that this article is about neural networks....

Dang, please consider saying something in the guidelines about these kinds of comments?

I found I started enjoying social media a lot more when I changed my mindset from:

This topic isn't relevant to me thus it shouldn't be here.


This topic isn't relevant to me thus I'll simply ignore it.

I would rather participate in communities that are semi-filtered and rely on me providing a second filter for my own taste. If instead the community tries to filter down entirely to my taste, I find it ends up overfitting and I lose almost all of the serendipitious "I didn't know I was interested in this but wow." articles that I love.

In other words, stuff I don't care about isn't a bug, it's a feature—a side effect of allowing a greater variety of content some of which is interesting but which can't be predicted.

This is a very important point.

The issue with social media is it is essentially unsolicited. With TV, you tune to "The Discovery Channel", and if you dont like it, you tune to another.

With social media you are invited to react to things as-if they were for you. This is the origin of, i'd say, 90% of the instigating none-sense that causes trouble.

Social media arguments are often just between not-the-audience and the-audience talking past each other. With the former basically saying, "i dont understand this, and its wasting my time"; and the latter saying, "i understand this and its really important".

Sure, some HN comments and their ensuing discussions may be completely unrelated to the posted topic. I've learned a lot from these over the years, and I'm happy to ignore the ones that I don't care about.

But just as there is now a guideline against making irrelevant and unsolicited nitpicky website design complaints, it would be useful to have a guideline against "I thought the article would be about X" types of comments as well. These are similarly pervasive, and of similarly low value. It might be different if they started a discussion about X (power transformers in this case), but they almost never do.

As a counterpoint, I prefer a world where comments are not overly policed. I believe it stifles creativity, and I think comments like yours harm not help the community by making people less comfortable sharing.

While I generally agree with your stance on over-policing, a third of the comments under this article are off-topic. HN is unique in that the community somewhat agrees that discussion should be on-topic and has a lot less tolerance for single-line jokes or sarcasm.

> I think comments like yours harm not help the community by making people less comfortable sharing.

Maybe I am unique in that, but if your contribution to a thread on a deep learning paper is a joke about decepticons or electrical transformers, it's okay to be less comfortable about sharing.

I'd be fine with a joke which somehow manages to pun NN transformers, electrical transformers, and the series about robots. That would be an interesting and sufficiently novel joke. It's high surprisal.

I have no problem policing the low-effort, un-novel, unsurprising, lame quips about expecting other use of the word transformer. It adds nothing of value and dilutes threads. I've gotten downvoted for doing it, most of us have, it's a right of passage, and one that I appreciate, since it makes HN comment threads jam-packed with interesting info.

I interpret these puns as lighthearted feedback to the point that the title is unclear and jargony. Apparently a transformer is some machine learning thing, I'm sure it is an appropriate name for a journal publication where everybody who sees it will be in the field, but to an outsider it is not really obvious what this title is about.

Exactly. The culture of "no" and control-freaks trying to shackle others into thinking down a linear path. This often leads to ideological homogeneity and pushes a large fraction of people away, amplifying homogeneity.

Instead of downvoting or flagging for merely disinterest or disagreement, perhaps there should be some sort of helpful "hide button" in the form of a plus-to-minus sign next to "parent"?

But then how will I attempt to gain feelings of control, and thus perceived power, during my life in the salient face of my inevitable mortal demise?



Isn't that why most people become police officers and other authority figures? You'll just have to torture living things, wet the bed, and set fires like the rest of us. /s /s /s /s /s

Don't forget moral panicking, outrage crybullying, serial scapegoats crucifixion, taking-out aggression, bikeshedding, and cyberdisinhibitionism are also part of this complete breakfast.

I just think that the term transformer is overloaded with several meanings - electrical transformer, the cartoon, a transfer function in a NLP neural network and I have heard it used in ETL applications for the transform function. Not fields that are closely related, but still it might result in semantic confusion nonetheless. Similarly term translation has a completely different meaning in mathematics than in linguistics, e.g.

There's nothing wrong with somebody expressing confusion over the overly jargony title.

You must be the life of the party.

Story titles often are more than meets the eye. - Lord Dang (Decepticon)

A neural networks paper written as if computer science matters -- I love it!

First, you have to make the shape-changing noises with your mouth while you turn into a Ford Focus.

Then, you find some Energon.

Next, red lasers.

Applications are open for YC Winter 2022

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact