Off topic: can anyone tell me what the author has used to convert their paper into this beautiful webpage? It's apparently a trend with ML papers to have their own website.
Tangent, the screen reader recording is cool. The main issue for me is that it doesn’t know where I am on the page, so it just starts from the beginning and I need to guess the position of the relevant audio based on how far down on the page I am. It would be cool to have some sort of bookmarks that match both the text and the audio, so that I can skip both to the same section at the same time
Okay so what stood out to me here is the math work: namely, if we can identify the parts of the model which are doing the math... Can that be "augmented" to generalise it? Could you take over the math neurons and either optimise them by hand or plug in a software "implant" to do the work properly?
It will be an area of active research, but it remains to be seen if it is overall beneficial.
Custom architecture components break the homogeneity that enables the massive parallelism.
There may be an approach similar to some FPGAs where there are a few things that are commonly needed that might be implemented specifically to avoid constructing them out of generalisable parts. The calculation is quite different though because hardware has more fixed limits and being fixed once manufactured means a great deal more forethought has to take place. Software models, being more flexible can avoid the extra thinking by adding a few million parameters.
One possibility is KANs which can also provide a calculable expression. I could see a fancy convolution KAN being converted into an expression and then applied across an entire input.
I feel like maybe in the future there may be a compiler that can make custom GPU code for specific pathways.
In the short term it might be useful if there is some found formula that requires many layers that can be calculated earlier with a custom expression and fed directly into a higher layer enabling the information to be more broadly used.