I never understood the confusion over how X11 refers to the "server" and the "client". You hit a key in xterm, the client, it requests, to the server, that certain pixels light up in such a way as to represent a character. Clients make requests, servers carry them out. The apparent confusion is more confounding when you consider that, today, few people actually run X clients remotely for everyday purposes, so the server and the client are right in front of you on the same machine, so confusion over the "server" being a "remote" resource are unfounded. On top of that, I can not comprehend what kinds of requests could be made by the video hardware, and the software that drives it, that would make it a client and the X application the server.
Fortunately there is some API refinement going on throughout the whole stack - Gallium3D, kernel modesetting and of course - companies finally releasing their chip specs (except Nvidia).
* Rest of industry: apps are requested by little machines, and provided by big machines
* X: display is requested by big machines, and provided by little machines
Both 'app serving' and 'display serving' models are valid, however X just happens to have used display serving when every other piece of remote display software went for the other model.