Scaling laws operate in the limit but eventually practical considerations dominate. There's a lot we haven't yet fully appreciated about biological vision and cognition -- and indeed, common sense as regards sensible video generation and processing -- that have not made their way into this kind of model. NeRFs are interesting and I hope to see more from that side of things in the coming months and years.
Yes and in that time we've learned some important lessons that it would be unwise to ignore, e.g. comprehension of 3D geometry despite 2D input visual data.