The original design of Windows NT had the graphics stack in user space. It remained this way in the first two releases, 3.1 and 3.5, but the performance was not competitive with desktop contemporaries like OS/2 and Win95 that had graphics in the kernel. So Windows NT 4.0 reverted to that design.
I guess part of the reason for the poor performance was that video card drivers of the era were tightly coupled with the OS, effectively replacing GDI functions with their own implementations. The NT 3.x design made it difficult to support these existing graphics accelerators.
Graphics drivers were moved into user mode in Windows Vista. If you've ever seen the screen blink, and a balloon pop up saying that the graphics driver was restarted -- that would've been a bluescreen in XP.
I wonder why they didn't move the rest of GDI out of the kernel at the same time? A composing window manager was also introduced in Windows Vista, which really cuts down on the number of window repaints. Surely they can afford to take the performance hit now?
If I were going to write a microkernel OS today, and was going to rely entirely on the GPU for rendering, could I get away with putting the graphics stack entirely in user-space, or would it still be relatively slower?
In Linux systems good chunk of graphics stack already resides in userspace. The kernel provides only modesetting, GPU memory management, access multiplexing and isolation.
Applications render by calling OpenGL library, which obviously executes in calling processes. OpenGL library asks the kernel to allocate required memory buffers, prepares code for the GPU and submits it to the kernel for execution. Kernel collects jobs submitted by processes, executes them on the GPU and notifies processes on completion.
When application wants to display something on the screen, it shares its buffer with the X server (or equivalent) and instructs it to redraw its window from this buffer. The display server uses some special syscall to obtain access to screen output buffer.
Remember the news that L4D2 on Linux outperformed the Windows version? As I understand it, the L4D2 code base uses Direct 3D, and on Linux it uses a D3D emulator with OpenGL as the backend, and yet after going through these user-space hoops it still gets a better framerate. The best theory I've heard is that context switching is cheaper on Linux.
"D3D emulator with OpenGL as the backend" sounds an awful lot like Wine. There've been reports of Wine apps running faster than their Windows equivalent too, although usually some wine bug or another gets in the way of performance.
This wouldn't surprise me. Based on the console output, Counterstrike Source does the same thing. Lots of output about Direct3D that look exactly like messages from wine. I suspect they did this to speed up porting.
Usually the user-space overhead is real tiny, I can't imagine it would be needed to place it in the kernel. Linux has the high-level graphics stack in user-space and it works just fine.
This (plus some low-level register poking) is the stuff you would need to implement in your GPU daemon. Actual generation of drawing commands, compilation of shaders etc is performed by applications.
The overhead should be in the order of few context switches to the GPU server for each frame rendered by any application (in Linux it's few syscalls instead of context switches). Probably not terrible.
I guess part of the reason for the poor performance was that video card drivers of the era were tightly coupled with the OS, effectively replacing GDI functions with their own implementations. The NT 3.x design made it difficult to support these existing graphics accelerators.