I’m not GP, but I don’t think their position is necessarily in tension with leveraging computation. Not all FLOPs are equal, and furthermore FLOPs != Watts. In fact a much more efficient architecture might be that much more effective at leveraging computation than just burning a bigger pile of GPUs with the current transformer stack