That's a very nice blog post, thanks! I'm not sure why you think HTM layers are ...

That's a very nice blog post, thanks!

I'm not sure why you think HTM layers are bigger than modern DL layers. The HTM layer configuration used in the paper (B=128, M=32, N=2048, and K=40) is 335M parameters. Compare to GPT-3 with 96 layers, where each layer has 1.8B parameters. Much larger models than GPT-3 have already appeared with no end in sight as to how much more they can scale.

The point is, if HTM worked, people would throw compute resources at it, just like they do with DL models. But it doesn't.