In Linux systems good chunk of graphics stack already resides in userspace. The kernel provides only modesetting, GPU memory management, access multiplexing and isolation.
Applications render by calling OpenGL library, which obviously executes in calling processes. OpenGL library asks the kernel to allocate required memory buffers, prepares code for the GPU and submits it to the kernel for execution. Kernel collects jobs submitted by processes, executes them on the GPU and notifies processes on completion.
When application wants to display something on the screen, it shares its buffer with the X server (or equivalent) and instructs it to redraw its window from this buffer. The display server uses some special syscall to obtain access to screen output buffer.
Remember the news that L4D2 on Linux outperformed the Windows version? As I understand it, the L4D2 code base uses Direct 3D, and on Linux it uses a D3D emulator with OpenGL as the backend, and yet after going through these user-space hoops it still gets a better framerate. The best theory I've heard is that context switching is cheaper on Linux.
"D3D emulator with OpenGL as the backend" sounds an awful lot like Wine. There've been reports of Wine apps running faster than their Windows equivalent too, although usually some wine bug or another gets in the way of performance.
This wouldn't surprise me. Based on the console output, Counterstrike Source does the same thing. Lots of output about Direct3D that look exactly like messages from wine. I suspect they did this to speed up porting.
Applications render by calling OpenGL library, which obviously executes in calling processes. OpenGL library asks the kernel to allocate required memory buffers, prepares code for the GPU and submits it to the kernel for execution. Kernel collects jobs submitted by processes, executes them on the GPU and notifies processes on completion.
When application wants to display something on the screen, it shares its buffer with the X server (or equivalent) and instructs it to redraw its window from this buffer. The display server uses some special syscall to obtain access to screen output buffer.