The author quotes Feynman, and I think this is a great example of his concept of explaining complex subjects in simple terms.
I feel like a lot of modern transformers are just sort of cargo-cult imports of old designs because everyone who knew the salient parameters has retired and the current crew just kinda nudges things until they work. A from-scratch explanation, up to the current state of the art, would be invaluable to anyone who deals with them.
But nah. This is HN, where headlines are their own code.
Was posted here a while back. Fascinating guy.
I don't rember the winding part being that much fun. :)
It was something old, akin to this: https://www.youtube.com/watch?v=Y-GyMYZ8yTU
Note that the teacher has a Machine Learning course with video lectures on youtube that he references throughout the article : http://www.peterbloem.nl/teaching/machine-learning
Honestly his lectures were fun and easy to look forward too, I'm really glad his post is getting traction.
If you find his video lectures they are a really graceful introduction to most ML concepts.
The author has a gift for explaining these concepts.
Plug: I just published this demo using GD to find control points for Bezier Curves: http://nhq.github.io/beezy/public/
Consider looking at your errors and judging whether they stem from things your current model doesn't do well but that Transformers do, i.e., correlating two steps in a sequence across a large number of time steps. Attention is basically a memory module, so if you don't need that it's just a waste of compute resources.
> Attention is basically a memory module, so if you don't need that it's just a waste of compute resources.
But aren't CNNs also like a memory module (ie: they memorize how leopard skin looks like)? I guess attention is a more sophisticated kind of memory, "more dynamic" so to speak.
Anyway, I'm glad to hear that a transformer architecture isn't totally stupid for my task, I will look up the literature, there seems to be a bit on this matter.
Because that would seriously be awesome.