> but transformers are still so new and so effective that they will probably dominate for a while.
They're mostly easy grant money and are being gamed by entire research groups worldwide to be seen as effective on the published papers. State of academia...
We (humans) are following the last thing that worked (imagine if we could do true gradient decent on the algorithm space).
Good question, and I'm interested to hear the other responses.