Clearly it’s not an exact-attention transformer - perhaps some sort sparse / approximate attention, or recurrent-transformer-ish-thing like RWKV?
Their twitter announcement[0] does say it’s a novel architecture they’re calling a “Long Term Memory Network”. But who knows what that actually means.
[0] https://twitter.com/magicailabs/status/1666116949560967168