Some[1] think that things are trending in the opposite direction: away from clever manipulations and hard coded domain knowledge, and towards large scale general models.
Yeah, I was surprised to see the architecture diagram is so complex. It's been a while since I saw a design that wasn't just "stack more transformer layers".
[1]: http://www.incompleteideas.net/IncIdeas/BitterLesson.html