XLSTM won't replace the Transformer. Two bitter lessons

theihrbdbhdue · 2024-05-10T20:10:39

There seems to be a logical fallacy in the post.

"We didn't pick up on xLSTM because our data was small and not discriminative enough. So xLSTM is bad on larger data as well (even though we likely can't / couldn't pick out self-attention either)."

Regardless, the conclusion in itself is also easily invalidated, since the xLSTM paper itself has benchmarks against GPT-3 on a 15B token dataset.

Digression: I'm increasingly noticing just how bad the scientific community - across all STEM fields - is when it comes to epistemology.

This is infinitely worse in fields like humanities, since they can't even test their bs theories unless they are violent "revolutionaries".

Hopefully, with all the AI hype-bubble, good philosophy may finally be funded (and that "Ph.D" might actually not sound like an ironic joke).

markisus · 2024-05-10T20:12:54

It’s extremely strong almost to the point of absurdity to say that transformers are the optimal architecture. As far as we know, human minds are not transformer networks and outperform them in many respects. And it’s even absurd to me to say that the human mind has optimal architecture even though we haven’t seen a better one in billions of years.